mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-21 20:18:06 +02:00
Prerelease cleanup (#46)
* feat: Add const_bound_vars tracking to prevent false positives in ownership checks
* feat: Introduce field interner and typed bounded vars for enhanced type tracking
* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking
* feat: Centralize method name extraction with bare_method_name helper
* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch
* feat: Enhance C++ taint tracking with additional container operations and inline method resolution
* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking
* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis
* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations
* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details
* test: Add comprehensive tests for lattice algebra laws and SSA edge cases
* feat: Add destructured session user handling and safe user ID access patterns
* feat: Implement row-population reverse-walk for enhanced authorization checks
* feat: Enhance authorization checks with local alias chain for self-actor types
* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction
* feat: Implement chained method call inner-gate rebinding for SSRF prevention
* feat: Add observability and error modules, enhance debug functionality, and implement theme context
* feat: Remove Auth Analysis page and update navigation to redirect to Explorer
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity
* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build
The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(closure-capture): flip JS/TS fixtures to required-finding
The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.
Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".
Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis
* feat: Introduce health module and enhance health score computation with calibration tests
* feat: Add expectations configuration and cleanup .gitignore for log files
* feat: Implement theme selection and enhance settings panel for triage sync
* feat: Suppress false positives for strcpy calls with literal sources in AST
* feat: Update analyse_function_ssa to return body CFG for accurate analysis
* feat: Add bug report and feature request templates for improved issue tracking
* feat: removed dev scripts
* feat: update README.md for clarity and consistency in fixture descriptions
* feat: removed dev docs
* feat: clean up error handling and UI elements for improved user experience
* feat: adjust button sizes in HeaderBar for better UI consistency
* feat: enhance taint analysis with additional context for sanitizer and taint findings
* cargo fmt
* prettier
* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts
* feat: add script to frame PNG screenshots with brand gradient
* feat: add fuzzing support with new targets and CI workflows
* refactor: streamline match expressions and improve formatting in CLI and output handling
* feat: enhance configuration display with detailed output options
* feat: stage demo configuration for improved CLI screenshot output
* feat: expose merge_configs function for user-configurable settings
* refactor: simplify code structure and improve readability in config handling
* refactor: improve descriptions for vulnerability patterns in various languages
* feat: update MIT License section with additional usage details and copyright information
* feat: update screenshots
* refactor: update build process and paths for frontend assets
* feat: add cross-file taint fuzzing target and supporting dictionary
* refactor: clean up formatting and comments in fuzz configuration and example files
* refactor: remove outdated comments and clean up CI configuration files
* chore: update changelog dates and improve formatting in documentation
* refactor: update Cargo.toml and CI configuration for improved packaging and build process
* refactor: enhance quote-stripping logic to prevent panics and add regression tests
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
79c29b394d
commit
82f18184b1
348 changed files with 48731 additions and 2925 deletions
|
|
@ -570,4 +570,158 @@ mod tests {
|
|||
fn is_non_negative_unknown() {
|
||||
assert!(!BitFact::top().is_non_negative());
|
||||
}
|
||||
|
||||
// ── Additional lattice algebra laws ──────────────────────────────
|
||||
|
||||
fn sample_bits() -> Vec<BitFact> {
|
||||
vec![
|
||||
BitFact::bottom(),
|
||||
BitFact::top(),
|
||||
BitFact::from_const(0),
|
||||
BitFact::from_const(1),
|
||||
BitFact::from_const(-1),
|
||||
BitFact::from_const(0xFF),
|
||||
BitFact::from_const(i64::MIN),
|
||||
BitFact::from_const(i64::MAX),
|
||||
]
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_associative_bit() {
|
||||
let xs = sample_bits();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
for c in &xs {
|
||||
let lhs = a.join(b).join(c);
|
||||
let rhs = a.join(&b.join(c));
|
||||
assert_eq!(
|
||||
lhs, rhs,
|
||||
"join not associative for {:?}, {:?}, {:?}",
|
||||
a, b, c
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_idempotent_bit() {
|
||||
for a in sample_bits() {
|
||||
assert_eq!(a.meet(&a), a, "meet not idempotent for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_associative_bit() {
|
||||
let xs = sample_bits();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
for c in &xs {
|
||||
let lhs = a.meet(b).meet(c);
|
||||
let rhs = a.meet(&b.meet(c));
|
||||
assert_eq!(
|
||||
lhs, rhs,
|
||||
"meet not associative for {:?}, {:?}, {:?}",
|
||||
a, b, c
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_top_identity_bit() {
|
||||
for a in sample_bits() {
|
||||
assert_eq!(a.meet(&BitFact::top()), a, "x ⊓ ⊤ failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_bottom_absorbing_bit() {
|
||||
for a in sample_bits() {
|
||||
assert_eq!(
|
||||
a.meet(&BitFact::bottom()),
|
||||
BitFact::bottom(),
|
||||
"x ⊓ ⊥ failed for {:?}",
|
||||
a
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_top_absorbing_bit() {
|
||||
for a in sample_bits() {
|
||||
assert_eq!(
|
||||
a.join(&BitFact::top()),
|
||||
BitFact::top(),
|
||||
"x ⊔ ⊤ failed for {:?}",
|
||||
a
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn widen_idempotent_bit() {
|
||||
for a in sample_bits() {
|
||||
assert_eq!(a.widen(&a), a, "widen(x, x) failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
/// **Soundness**: `widen(a, b) ⊒ join(a, b)` for the bit lattice.
|
||||
#[test]
|
||||
fn widen_over_approximates_join_bit() {
|
||||
let xs = sample_bits();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let j = a.join(b);
|
||||
let w = a.widen(b);
|
||||
assert!(
|
||||
j.leq(&w),
|
||||
"widen({:?}, {:?}) = {:?} does not over-approx join = {:?}",
|
||||
a,
|
||||
b,
|
||||
w,
|
||||
j
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `a ⊓ b ⊑ a` and `a ⊓ b ⊑ b` — meet is the greatest lower bound.
|
||||
#[test]
|
||||
fn meet_is_lower_bound_bit() {
|
||||
let xs = sample_bits();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let m = a.meet(b);
|
||||
assert!(m.leq(a), "a ⊓ b ⊑ a failed for {:?}, {:?}", a, b);
|
||||
assert!(m.leq(b), "a ⊓ b ⊑ b failed for {:?}, {:?}", a, b);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `a ⊑ a ⊔ b` and `b ⊑ a ⊔ b` — join is the least upper bound.
|
||||
#[test]
|
||||
fn join_is_upper_bound_bit() {
|
||||
let xs = sample_bits();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let j = a.join(b);
|
||||
assert!(a.leq(&j), "a ⊑ a ⊔ b failed for {:?}, {:?}", a, b);
|
||||
assert!(b.leq(&j), "b ⊑ a ⊔ b failed for {:?}, {:?}", a, b);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Joining `i64::MIN` and `i64::MAX` (extreme sign-bit-different
|
||||
/// constants) must not panic and must produce a valid Top-or-bottom
|
||||
/// bit fact (used in path-merging).
|
||||
#[test]
|
||||
fn join_min_max_signbit_safe() {
|
||||
let a = BitFact::from_const(i64::MIN);
|
||||
let b = BitFact::from_const(i64::MAX);
|
||||
let _ = a.join(&b); // must not panic
|
||||
let _ = a.meet(&b);
|
||||
let _ = a.widen(&b);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1032,4 +1032,360 @@ mod tests {
|
|||
let shift = IntervalFact::exact(1);
|
||||
assert!(x.right_shift(&shift).is_top());
|
||||
}
|
||||
|
||||
/// `a - b` overflows when `a.lo - b.hi` underflows or
|
||||
/// `a.hi - b.lo` overflows. We expect the corresponding bound to
|
||||
/// drop to `None`. Mirrors `overflow_add` / `overflow_mul`.
|
||||
#[test]
|
||||
fn overflow_sub() {
|
||||
let a = IntervalFact::exact(i64::MIN);
|
||||
let b = IntervalFact::exact(1);
|
||||
let r = a.sub(&b);
|
||||
assert_eq!(r.lo, None, "underflow on i64::MIN - 1 must drop lo to None");
|
||||
// hi: i64::MIN - 1 also underflows, so hi must also be None.
|
||||
assert_eq!(r.hi, None, "i64::MIN - 1 underflows on hi too");
|
||||
}
|
||||
|
||||
/// Division of `i64::MIN` by `-1` overflows (`i64::MAX + 1`).
|
||||
/// `checked_div` returns `None` for that case; we want the bound to
|
||||
/// gracefully degrade, not panic.
|
||||
#[test]
|
||||
fn div_i64_min_by_minus_one_does_not_panic() {
|
||||
let a = IntervalFact::exact(i64::MIN);
|
||||
let b = IntervalFact::exact(-1);
|
||||
let r = a.div(&b);
|
||||
// Either bound becomes None (graceful) — exact representation
|
||||
// depends on the impl, but we mainly assert no panic occurred
|
||||
// and the result is a valid interval.
|
||||
assert!(
|
||||
r.lo.is_none() || r.hi.is_none() || (r.lo.is_some() && r.hi.is_some()),
|
||||
"div should never panic on i64::MIN / -1"
|
||||
);
|
||||
}
|
||||
|
||||
/// Modulo with a single-point negative divisor: `[0,10] % -3` must
|
||||
/// be a valid interval (no panic, no negative-zero bound nonsense).
|
||||
#[test]
|
||||
fn modulo_negative_divisor_singleton() {
|
||||
let a = IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(10),
|
||||
};
|
||||
let b = IntervalFact::exact(-3);
|
||||
let r = a.modulo(&b);
|
||||
// |b| = 3 ⇒ result bounded by [0, 2] for non-negative dividend.
|
||||
assert_eq!(r.lo, Some(0));
|
||||
assert_eq!(r.hi, Some(2));
|
||||
}
|
||||
|
||||
/// Modulo by an interval that *contains* zero must escape to Top —
|
||||
/// modulo-by-zero is undefined and we cannot precise-narrow it.
|
||||
#[test]
|
||||
fn modulo_divisor_spans_zero_is_top() {
|
||||
let a = IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(100),
|
||||
};
|
||||
let b = IntervalFact {
|
||||
lo: Some(-1),
|
||||
hi: Some(1),
|
||||
};
|
||||
let r = a.modulo(&b);
|
||||
assert!(r.is_top(), "modulo by zero-spanning divisor must be Top");
|
||||
}
|
||||
|
||||
/// `[i64::MIN, i64::MAX]` is the maximal interval. Any join with
|
||||
/// any other interval must remain `[i64::MIN, i64::MAX]` (or Top
|
||||
/// equivalent) — this guards against accidental narrowing on join.
|
||||
#[test]
|
||||
fn full_range_is_join_absorbing() {
|
||||
let full = IntervalFact {
|
||||
lo: Some(i64::MIN),
|
||||
hi: Some(i64::MAX),
|
||||
};
|
||||
let small = IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(10),
|
||||
};
|
||||
let j = full.join(&small);
|
||||
assert_eq!(j.lo, Some(i64::MIN), "join must not narrow lo");
|
||||
assert_eq!(j.hi, Some(i64::MAX), "join must not narrow hi");
|
||||
}
|
||||
|
||||
// ── Additional lattice algebra laws ──────────────────────────────
|
||||
// These guard the soundness of the dataflow framework: join/meet/widen
|
||||
// must satisfy the standard lattice axioms or fixpoint convergence
|
||||
// and abstract correctness break.
|
||||
|
||||
fn sample_intervals() -> Vec<IntervalFact> {
|
||||
vec![
|
||||
IntervalFact::bottom(),
|
||||
IntervalFact::top(),
|
||||
IntervalFact::exact(0),
|
||||
IntervalFact::exact(-7),
|
||||
IntervalFact {
|
||||
lo: Some(2),
|
||||
hi: Some(8),
|
||||
},
|
||||
IntervalFact {
|
||||
lo: None,
|
||||
hi: Some(10),
|
||||
},
|
||||
IntervalFact {
|
||||
lo: Some(-5),
|
||||
hi: None,
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn join_with_top_is_top() {
|
||||
for a in sample_intervals() {
|
||||
let j = a.join(&IntervalFact::top());
|
||||
assert!(j.is_top(), "x ⊔ ⊤ = ⊤ failed for {:?}", a);
|
||||
let j2 = IntervalFact::top().join(&a);
|
||||
assert!(j2.is_top(), "⊤ ⊔ x = ⊤ failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_idempotent() {
|
||||
for a in sample_intervals() {
|
||||
assert_eq!(a.meet(&a), a, "x ⊓ x = x failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_commutative() {
|
||||
let xs = sample_intervals();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
assert_eq!(
|
||||
a.meet(b),
|
||||
b.meet(a),
|
||||
"meet not commutative for {:?} / {:?}",
|
||||
a,
|
||||
b
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_associative() {
|
||||
let xs = sample_intervals();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
for c in &xs {
|
||||
let lhs = a.meet(b).meet(c);
|
||||
let rhs = a.meet(&b.meet(c));
|
||||
assert_eq!(lhs, rhs, "meet not associative for {:?},{:?},{:?}", a, b, c);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_top_identity() {
|
||||
for a in sample_intervals() {
|
||||
assert_eq!(
|
||||
a.meet(&IntervalFact::top()),
|
||||
a,
|
||||
"x ⊓ ⊤ = x failed for {:?}",
|
||||
a
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_bottom_absorbing() {
|
||||
for a in sample_intervals() {
|
||||
assert!(
|
||||
a.meet(&IntervalFact::bottom()).is_bottom(),
|
||||
"x ⊓ ⊥ = ⊥ failed for {:?}",
|
||||
a
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn widen_idempotent() {
|
||||
for a in sample_intervals() {
|
||||
assert_eq!(a.widen(&a), a, "widen(x, x) = x failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
/// **Soundness**: widening must over-approximate join.
|
||||
/// `widen(a, b) ⊒ join(a, b)` for all a, b.
|
||||
/// Without this, fixpoint iteration converges to an unsound result.
|
||||
#[test]
|
||||
fn widen_over_approximates_join() {
|
||||
let xs = sample_intervals();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let j = a.join(b);
|
||||
let w = a.widen(b);
|
||||
assert!(
|
||||
j.leq(&w),
|
||||
"widen({:?}, {:?}) = {:?} does not over-approximate join = {:?}",
|
||||
a,
|
||||
b,
|
||||
w,
|
||||
j
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_reflexive() {
|
||||
for a in sample_intervals() {
|
||||
assert!(a.leq(&a), "x ⊑ x failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn leq_transitive() {
|
||||
// a ⊑ b ⊑ c ⇒ a ⊑ c
|
||||
let a = IntervalFact::exact(5);
|
||||
let b = IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(10),
|
||||
};
|
||||
let c = IntervalFact::top();
|
||||
assert!(a.leq(&b));
|
||||
assert!(b.leq(&c));
|
||||
assert!(a.leq(&c), "leq must be transitive");
|
||||
}
|
||||
|
||||
/// `x ⊔ y` is the least upper bound: both x and y must be ⊑ join(x,y).
|
||||
#[test]
|
||||
fn join_is_upper_bound() {
|
||||
let xs = sample_intervals();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let j = a.join(b);
|
||||
assert!(a.leq(&j), "a ⊑ a ⊔ b failed for {:?}, {:?}", a, b);
|
||||
assert!(b.leq(&j), "b ⊑ a ⊔ b failed for {:?}, {:?}", a, b);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `x ⊓ y` is the greatest lower bound: meet(x,y) ⊑ both x and y.
|
||||
#[test]
|
||||
fn meet_is_lower_bound() {
|
||||
let xs = sample_intervals();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let m = a.meet(b);
|
||||
assert!(m.leq(a), "a ⊓ b ⊑ a failed for {:?}, {:?}", a, b);
|
||||
assert!(m.leq(b), "a ⊓ b ⊑ b failed for {:?}, {:?}", a, b);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Arithmetic edge cases not previously covered ─────────────────
|
||||
|
||||
/// Multiplication by exact zero must yield exact zero, regardless
|
||||
/// of the other operand. This is critical for taint suppression
|
||||
/// (`x * 0` is provably bounded).
|
||||
#[test]
|
||||
fn mul_by_zero_singleton_is_zero() {
|
||||
let zero = IntervalFact::exact(0);
|
||||
let inputs = [
|
||||
IntervalFact::exact(42),
|
||||
IntervalFact {
|
||||
lo: Some(-100),
|
||||
hi: Some(100),
|
||||
},
|
||||
IntervalFact {
|
||||
lo: Some(i64::MIN),
|
||||
hi: Some(i64::MAX),
|
||||
},
|
||||
IntervalFact::top(),
|
||||
];
|
||||
for a in inputs.iter() {
|
||||
// Note: when a is Top, mul currently short-circuits to Top.
|
||||
// The zero-singleton case is the precise one we care about
|
||||
// for sink suppression; assert it for non-Top inputs.
|
||||
if !a.is_top() {
|
||||
let r = a.mul(&zero);
|
||||
assert_eq!(r, IntervalFact::exact(0), "x * 0 should be 0 for {:?}", a);
|
||||
let r2 = zero.mul(a);
|
||||
assert_eq!(r2, IntervalFact::exact(0), "0 * x should be 0 for {:?}", a);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Bottom propagates through every arithmetic op.
|
||||
#[test]
|
||||
fn bottom_propagates_through_arith() {
|
||||
let bot = IntervalFact::bottom();
|
||||
let x = IntervalFact::exact(5);
|
||||
assert!(bot.add(&x).is_bottom());
|
||||
assert!(x.add(&bot).is_bottom());
|
||||
assert!(bot.sub(&x).is_bottom());
|
||||
assert!(bot.mul(&x).is_bottom());
|
||||
assert!(bot.div(&x).is_bottom());
|
||||
assert!(bot.modulo(&x).is_bottom());
|
||||
assert!(bot.bit_and(&x).is_bottom());
|
||||
assert!(bot.bit_or(&x).is_bottom());
|
||||
assert!(bot.bit_xor(&x).is_bottom());
|
||||
assert!(bot.left_shift(&x).is_bottom());
|
||||
assert!(bot.right_shift(&x).is_bottom());
|
||||
}
|
||||
|
||||
/// Division by exact zero must escape to Top (not crash, not produce
|
||||
/// a bogus interval). Currently handled by the spans-zero check.
|
||||
#[test]
|
||||
fn div_by_exact_zero_is_top() {
|
||||
let a = IntervalFact::exact(10);
|
||||
let zero = IntervalFact::exact(0);
|
||||
assert!(
|
||||
a.div(&zero).is_top(),
|
||||
"division by exact zero must escape to Top"
|
||||
);
|
||||
}
|
||||
|
||||
/// Modulo with exact-zero divisor — must escape to Top.
|
||||
#[test]
|
||||
fn modulo_by_exact_zero_is_top() {
|
||||
let a = IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(100),
|
||||
};
|
||||
let zero = IntervalFact::exact(0);
|
||||
assert!(a.modulo(&zero).is_top());
|
||||
}
|
||||
|
||||
/// Add involving Top stays Top on the unbounded side.
|
||||
#[test]
|
||||
fn add_with_top_is_top() {
|
||||
let r = IntervalFact::exact(5).add(&IntervalFact::top());
|
||||
assert!(r.is_top(), "5 + Top should be Top, got {:?}", r);
|
||||
}
|
||||
|
||||
/// Subtraction: i64::MAX - i64::MIN should overflow gracefully.
|
||||
#[test]
|
||||
fn sub_overflow_extreme() {
|
||||
let a = IntervalFact::exact(i64::MAX);
|
||||
let b = IntervalFact::exact(i64::MIN);
|
||||
let r = a.sub(&b); // i64::MAX - i64::MIN overflows
|
||||
assert!(
|
||||
r.lo.is_none() || r.hi.is_none(),
|
||||
"extreme subtraction must not panic and must drop a bound"
|
||||
);
|
||||
}
|
||||
|
||||
/// `bottom().widen(x)` must be defined and converge.
|
||||
#[test]
|
||||
fn widen_with_bottom() {
|
||||
let x = IntervalFact::exact(5);
|
||||
let bot = IntervalFact::bottom();
|
||||
let w1 = bot.widen(&x);
|
||||
// Bottom widens to the new value (no growth observed yet).
|
||||
assert_eq!(w1, x);
|
||||
let w2 = x.widen(&bot);
|
||||
assert_eq!(w2, x);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -424,6 +424,23 @@ pub fn classify_path_rejection_axes(text: &str) -> smallvec::SmallVec<[PathRejec
|
|||
let mut out: smallvec::SmallVec<[PathRejection; 3]> = smallvec::SmallVec::new();
|
||||
for clause in split_top_level_or(text) {
|
||||
let clause = clause.trim();
|
||||
// Multi-axis special case: `!filepath.IsLocal(p)` (Go).
|
||||
// `filepath.IsLocal` returns true iff the path stays within the
|
||||
// current directory — no leading `/`, no `..` segments, no Windows
|
||||
// drive root. Idiomatic Go path-traversal guard:
|
||||
// `if !filepath.IsLocal(p) { return }`
|
||||
// The TRUE branch terminates; the FALSE branch (where IsLocal is
|
||||
// true) proves both `dotdot = No` and `absolute = No` on the
|
||||
// argument simultaneously. Recognise it here so both axes flow
|
||||
// into the surviving branch's PathFact narrowing.
|
||||
if has_negated_filepath_is_local(clause) {
|
||||
for axis in [PathRejection::DotDot, PathRejection::IsAbsolute] {
|
||||
if !out.contains(&axis) {
|
||||
out.push(axis);
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
let cls = classify_path_rejection_atom(clause);
|
||||
if !matches!(cls, PathRejection::None) && !out.contains(&cls) {
|
||||
out.push(cls);
|
||||
|
|
@ -432,6 +449,29 @@ pub fn classify_path_rejection_axes(text: &str) -> smallvec::SmallVec<[PathRejec
|
|||
out
|
||||
}
|
||||
|
||||
/// Detect `!filepath.IsLocal(<expr>)` — Go's idiomatic path-traversal
|
||||
/// guard. Whitespace-tolerant: `! filepath.IsLocal(`, `!filepath . IsLocal(`,
|
||||
/// etc. Used by [`classify_path_rejection_axes`] to inject both
|
||||
/// [`PathRejection::DotDot`] and [`PathRejection::IsAbsolute`] on the false
|
||||
/// branch (which is the local-path branch by construction).
|
||||
fn has_negated_filepath_is_local(clause: &str) -> bool {
|
||||
// Strip surrounding parens once to handle `(!filepath.IsLocal(p))`.
|
||||
let trimmed = clause.trim();
|
||||
let inner = trimmed
|
||||
.strip_prefix('(')
|
||||
.and_then(|s| s.strip_suffix(')'))
|
||||
.unwrap_or(trimmed)
|
||||
.trim();
|
||||
// Remove the leading `!` and any whitespace.
|
||||
let after_not = match inner.strip_prefix('!') {
|
||||
Some(rest) => rest.trim_start(),
|
||||
None => return false,
|
||||
};
|
||||
// Compress whitespace around `.` so `filepath . IsLocal(` matches.
|
||||
let compact: String = after_not.chars().filter(|c| !c.is_whitespace()).collect();
|
||||
compact.starts_with("filepath.IsLocal(")
|
||||
}
|
||||
|
||||
fn classify_path_rejection_atom(clause: &str) -> PathRejection {
|
||||
// `.contains("..")` (Rust, Java) / `.includes("..")` (JS/TS) /
|
||||
// `.include?("..")` (Ruby) / `strings.Contains(s, "..")` (Go) /
|
||||
|
|
|
|||
|
|
@ -76,32 +76,54 @@ impl StringFact {
|
|||
|
||||
/// Exact known string value: prefix and suffix are the full string, and
|
||||
/// the finite domain is `{s}`.
|
||||
///
|
||||
/// Empty prefix/suffix are normalised to `None` because "starts/ends with
|
||||
/// the empty string" carries no constraint — keeping `Some("")` would
|
||||
/// break join idempotence (`Some("")` ⊔ `Some("")` collapses to `None`).
|
||||
pub fn exact(s: &str) -> Self {
|
||||
let prefix = truncate_prefix(s);
|
||||
let suffix = truncate_suffix(s);
|
||||
Self {
|
||||
prefix: Some(prefix),
|
||||
suffix: Some(suffix),
|
||||
prefix: if prefix.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(prefix)
|
||||
},
|
||||
suffix: if suffix.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(suffix)
|
||||
},
|
||||
domain: Some(vec![s.to_string()]),
|
||||
is_bottom: false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Known prefix only.
|
||||
/// Known prefix only. Empty `p` normalises to no-prefix-info (`None`).
|
||||
pub fn from_prefix(p: &str) -> Self {
|
||||
let prefix = truncate_prefix(p);
|
||||
Self {
|
||||
prefix: Some(truncate_prefix(p)),
|
||||
prefix: if prefix.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(prefix)
|
||||
},
|
||||
suffix: None,
|
||||
domain: None,
|
||||
is_bottom: false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Known suffix only.
|
||||
/// Known suffix only. Empty `s` normalises to no-suffix-info (`None`).
|
||||
pub fn from_suffix(s: &str) -> Self {
|
||||
let suffix = truncate_suffix(s);
|
||||
Self {
|
||||
prefix: None,
|
||||
suffix: Some(truncate_suffix(s)),
|
||||
suffix: if suffix.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(suffix)
|
||||
},
|
||||
domain: None,
|
||||
is_bottom: false,
|
||||
}
|
||||
|
|
@ -386,25 +408,31 @@ fn truncate_suffix(s: &str) -> String {
|
|||
}
|
||||
}
|
||||
|
||||
/// Longest common prefix of two strings.
|
||||
/// Longest common prefix of two strings, char-aligned.
|
||||
///
|
||||
/// Iterates by `char` rather than `byte` so multi-byte UTF-8 code points are
|
||||
/// either kept whole or dropped — a byte-wise comparison would slice into the
|
||||
/// middle of a code point and produce mojibake (`x as char` on a UTF-8
|
||||
/// continuation byte yields a garbage Latin-1 character).
|
||||
pub fn longest_common_prefix(a: &str, b: &str) -> String {
|
||||
a.bytes()
|
||||
.zip(b.bytes())
|
||||
a.chars()
|
||||
.zip(b.chars())
|
||||
.take_while(|(x, y)| x == y)
|
||||
.map(|(x, _)| x as char)
|
||||
.map(|(x, _)| x)
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Longest common suffix of two strings.
|
||||
/// Longest common suffix of two strings, char-aligned.
|
||||
pub fn longest_common_suffix(a: &str, b: &str) -> String {
|
||||
let lcs: String = a
|
||||
.bytes()
|
||||
let mut lcs: Vec<char> = a
|
||||
.chars()
|
||||
.rev()
|
||||
.zip(b.bytes().rev())
|
||||
.zip(b.chars().rev())
|
||||
.take_while(|(x, y)| x == y)
|
||||
.map(|(x, _)| x as char)
|
||||
.map(|(x, _)| x)
|
||||
.collect();
|
||||
lcs.chars().rev().collect()
|
||||
lcs.reverse();
|
||||
lcs.into_iter().collect()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
|
|
@ -675,4 +703,256 @@ mod tests {
|
|||
!StringFact::finite_set(vec!["ls".into(), "rm;reboot".into()]).is_finite_shell_safe()
|
||||
);
|
||||
}
|
||||
|
||||
/// `concat("", x)` and `concat(x, "")` must round-trip the
|
||||
/// non-empty operand's prefix/suffix. The current `concat` keeps
|
||||
/// LHS prefix and RHS suffix verbatim. After empty-string
|
||||
/// normalisation, `exact("")` carries no prefix/suffix info, so
|
||||
/// the LHS prefix is `None` (unknown) and only the RHS suffix
|
||||
/// survives.
|
||||
#[test]
|
||||
fn concat_empty_string_lhs_preserves_rhs_suffix() {
|
||||
let empty = StringFact::exact("");
|
||||
let rhs = StringFact::exact("x");
|
||||
let r = empty.concat(&rhs);
|
||||
assert_eq!(r.prefix, None);
|
||||
assert_eq!(r.suffix.as_deref(), Some("x"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn concat_empty_string_rhs_preserves_lhs_prefix() {
|
||||
let lhs = StringFact::exact("x");
|
||||
let empty = StringFact::exact("");
|
||||
let r = lhs.concat(&empty);
|
||||
assert_eq!(r.prefix.as_deref(), Some("x"));
|
||||
assert_eq!(r.suffix, None);
|
||||
}
|
||||
|
||||
/// Bottom is concat-absorbing: concat with bottom in either
|
||||
/// position yields bottom (no flow can reach the call site).
|
||||
#[test]
|
||||
fn concat_with_bottom_is_bottom() {
|
||||
let bot = StringFact::bottom();
|
||||
let any = StringFact::exact("anything");
|
||||
assert!(bot.concat(&any).is_bottom());
|
||||
assert!(any.concat(&bot).is_bottom());
|
||||
}
|
||||
|
||||
/// Joining two distinct URL prefixes must reduce to their LCP, not
|
||||
/// fall through to `None`. This is the property SSRF prefix-lock
|
||||
/// suppression depends on at phi nodes.
|
||||
#[test]
|
||||
fn join_distinct_urls_reduces_to_lcp() {
|
||||
let a = StringFact::from_prefix("https://api.example.com/");
|
||||
let b = StringFact::from_prefix("https://db.example.com/");
|
||||
let r = a.join(&b);
|
||||
// Common prefix is "https://" — anything past that diverges.
|
||||
assert_eq!(
|
||||
r.prefix.as_deref(),
|
||||
Some("https://"),
|
||||
"join must compute LCP, not drop the prefix entirely"
|
||||
);
|
||||
}
|
||||
|
||||
/// Meet of two prefix-locks with no overlap must collapse to
|
||||
/// bottom (it represents an unsatisfiable conjunction).
|
||||
#[test]
|
||||
fn meet_disjoint_prefixes_is_bottom() {
|
||||
let a = StringFact::from_prefix("/var/");
|
||||
let b = StringFact::from_prefix("/etc/");
|
||||
let r = a.meet(&b);
|
||||
assert!(
|
||||
r.is_bottom(),
|
||||
"meet of disjoint prefix-locks must be bottom"
|
||||
);
|
||||
}
|
||||
|
||||
// ── Additional lattice algebra laws ──────────────────────────────
|
||||
|
||||
fn sample_strings() -> Vec<StringFact> {
|
||||
vec![
|
||||
StringFact::bottom(),
|
||||
StringFact::top(),
|
||||
StringFact::exact(""),
|
||||
StringFact::exact("hello"),
|
||||
StringFact::from_prefix("https://"),
|
||||
StringFact::from_suffix(".com"),
|
||||
StringFact::finite_set(vec!["a".into(), "b".into()]),
|
||||
]
|
||||
}
|
||||
|
||||
/// `x ⊔ x = x` — join is idempotent across all sample shapes.
|
||||
#[test]
|
||||
fn join_idempotent_string() {
|
||||
for a in sample_strings() {
|
||||
assert_eq!(a.join(&a), a, "join not idempotent for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
/// `x ⊔ y = y ⊔ x` — join is commutative.
|
||||
#[test]
|
||||
fn join_commutative_string() {
|
||||
let xs = sample_strings();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
assert_eq!(
|
||||
a.join(b),
|
||||
b.join(a),
|
||||
"join not commutative for {:?} / {:?}",
|
||||
a,
|
||||
b
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `x ⊓ x = x` — meet is idempotent.
|
||||
#[test]
|
||||
fn meet_idempotent_string() {
|
||||
for a in sample_strings() {
|
||||
assert_eq!(a.meet(&a), a, "meet not idempotent for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
/// `x ⊓ y = y ⊓ x` — meet is commutative.
|
||||
#[test]
|
||||
fn meet_commutative_string() {
|
||||
let xs = sample_strings();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
assert_eq!(
|
||||
a.meet(b),
|
||||
b.meet(a),
|
||||
"meet not commutative for {:?} / {:?}",
|
||||
a,
|
||||
b
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `x ⊓ ⊤ = x` and `x ⊓ ⊥ = ⊥`.
|
||||
#[test]
|
||||
fn meet_identity_string() {
|
||||
for a in sample_strings() {
|
||||
assert_eq!(a.meet(&StringFact::top()), a, "x ⊓ ⊤ failed for {:?}", a);
|
||||
assert!(
|
||||
a.meet(&StringFact::bottom()).is_bottom(),
|
||||
"x ⊓ ⊥ failed for {:?}",
|
||||
a
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// `x ⊑ x` — leq is reflexive.
|
||||
#[test]
|
||||
fn leq_reflexive_string() {
|
||||
for a in sample_strings() {
|
||||
assert!(a.leq(&a), "x ⊑ x failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
/// **Soundness**: `widen(a, b) ⊒ join(a, b)` — widening must
|
||||
/// over-approximate join, otherwise dataflow loses information.
|
||||
#[test]
|
||||
fn widen_over_approximates_join_string() {
|
||||
let xs = sample_strings();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let j = a.join(b);
|
||||
let w = a.widen(b);
|
||||
assert!(
|
||||
j.leq(&w),
|
||||
"widen({:?}, {:?}) = {:?} does not over-approximate join = {:?}",
|
||||
a,
|
||||
b,
|
||||
w,
|
||||
j
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn widen_idempotent_string() {
|
||||
for a in sample_strings() {
|
||||
assert_eq!(a.widen(&a), a, "widen(x, x) failed for {:?}", a);
|
||||
}
|
||||
}
|
||||
|
||||
/// Join is upper bound: `a ⊑ a ⊔ b` and `b ⊑ a ⊔ b`.
|
||||
#[test]
|
||||
fn join_is_upper_bound_string() {
|
||||
let xs = sample_strings();
|
||||
for a in &xs {
|
||||
for b in &xs {
|
||||
let j = a.join(b);
|
||||
assert!(
|
||||
a.leq(&j),
|
||||
"a ⊑ a ⊔ b failed for {:?}, {:?} (join={:?})",
|
||||
a,
|
||||
b,
|
||||
j
|
||||
);
|
||||
assert!(
|
||||
b.leq(&j),
|
||||
"b ⊑ a ⊔ b failed for {:?}, {:?} (join={:?})",
|
||||
a,
|
||||
b,
|
||||
j
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Empty-string exact value must distinguish from Top — it is a
|
||||
/// singleton (`{""}`), not unconstrained. After the empty-prefix
|
||||
/// normalisation, prefix/suffix are `None` (carry no extra info)
|
||||
/// but the `domain` field still pins the value to exactly `""`.
|
||||
#[test]
|
||||
fn exact_empty_string_is_not_top() {
|
||||
let e = StringFact::exact("");
|
||||
assert!(!e.is_top(), "exact(\"\") must not be Top");
|
||||
assert!(!e.is_bottom(), "exact(\"\") must not be Bottom");
|
||||
assert_eq!(e.prefix, None, "empty prefix normalised to None");
|
||||
assert_eq!(e.suffix, None, "empty suffix normalised to None");
|
||||
assert_eq!(e.domain.as_deref(), Some(&[String::new()][..]));
|
||||
}
|
||||
|
||||
/// LCP/LCS with multi-byte UTF-8 chars must not split a code point
|
||||
/// (would produce invalid UTF-8 strings or panic).
|
||||
#[test]
|
||||
fn lcp_lcs_unicode_safe() {
|
||||
// Both start with é (2-byte char in UTF-8).
|
||||
let a = StringFact::exact("éclair");
|
||||
let b = StringFact::exact("éclat");
|
||||
let j = a.join(&b);
|
||||
// LCP should be "écla" (still valid UTF-8). At minimum it must
|
||||
// be a valid Rust string and not panic.
|
||||
let prefix = j.prefix.as_deref().unwrap_or("");
|
||||
assert!(prefix.is_char_boundary(prefix.len()));
|
||||
assert!(prefix.starts_with('é'));
|
||||
|
||||
// Suffix with multibyte: "café" vs "naïvé" share "é" suffix?
|
||||
// Simpler: both end with "好" (3-byte CJK).
|
||||
let a = StringFact::exact("你好");
|
||||
let b = StringFact::exact("您好");
|
||||
let j = a.join(&b);
|
||||
let suffix = j.suffix.as_deref().unwrap_or("");
|
||||
assert!(suffix.is_char_boundary(0) && suffix.is_char_boundary(suffix.len()));
|
||||
assert!(suffix.ends_with('好'));
|
||||
}
|
||||
|
||||
/// Concat with empty-string `exact("")` should preserve the other
|
||||
/// side's prefix/suffix knowledge (empty is the identity).
|
||||
#[test]
|
||||
fn concat_with_empty_exact_preserves_other() {
|
||||
let s = StringFact::exact("hello");
|
||||
let e = StringFact::exact("");
|
||||
let r = s.concat(&e);
|
||||
// Concat should preserve prefix from `s`.
|
||||
assert_eq!(r.prefix.as_deref(), Some("hello"));
|
||||
let r2 = e.concat(&s);
|
||||
assert_eq!(r2.suffix.as_deref(), Some("hello"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
1611
src/ast.rs
1611
src/ast.rs
File diff suppressed because it is too large
Load diff
|
|
@ -1,7 +1,7 @@
|
|||
use super::config::AuthAnalysisRules;
|
||||
use super::model::{
|
||||
AnalysisUnit, AuthCheck, AuthCheckKind, AuthorizationModel, OperationKind, SensitiveOperation,
|
||||
ValueRef, ValueSourceKind,
|
||||
AnalysisUnit, AnalysisUnitKind, AuthCheck, AuthCheckKind, AuthorizationModel, OperationKind,
|
||||
SensitiveOperation, ValueRef, ValueSourceKind,
|
||||
};
|
||||
use crate::patterns::Severity;
|
||||
|
||||
|
|
@ -67,6 +67,9 @@ fn check_ownership_gaps(model: &AuthorizationModel, rules: &AuthAnalysisRules) -
|
|||
let mut findings = Vec::new();
|
||||
|
||||
for unit in &model.units {
|
||||
if !unit_has_user_input_evidence(unit) {
|
||||
continue;
|
||||
}
|
||||
for op in &unit.operations {
|
||||
if op.kind == OperationKind::TokenLookup {
|
||||
continue;
|
||||
|
|
@ -116,6 +119,9 @@ fn check_partial_batch_authorization(
|
|||
let mut findings = Vec::new();
|
||||
|
||||
for unit in &model.units {
|
||||
if !unit_has_user_input_evidence(unit) {
|
||||
continue;
|
||||
}
|
||||
for op in &unit.operations {
|
||||
// In-memory bookkeeping is never a batch sink.
|
||||
if op.sink_class.is_some_and(|c| !c.is_auth_relevant()) {
|
||||
|
|
@ -167,6 +173,9 @@ fn check_stale_authorization(
|
|||
let mut findings = Vec::new();
|
||||
|
||||
for unit in &model.units {
|
||||
if !unit_has_user_input_evidence(unit) {
|
||||
continue;
|
||||
}
|
||||
for op in unit.operations.iter().filter(|operation| {
|
||||
operation.kind == OperationKind::Mutation
|
||||
&& operation.sink_class.is_none_or(|c| c.is_auth_relevant())
|
||||
|
|
@ -211,6 +220,18 @@ fn check_token_override_without_validation(
|
|||
let mut findings = Vec::new();
|
||||
|
||||
for unit in &model.units {
|
||||
// The rule reasons about "Token acceptance flow" — by
|
||||
// construction, that is a user-facing handler that receives a
|
||||
// token from the client and writes through token-bound state.
|
||||
// Internal helpers, Celery / cron tasks, Django migrations,
|
||||
// pytest fixtures, and seed-data utilities have no user reach
|
||||
// and cannot host a token-acceptance flow even when their
|
||||
// call shape happens to look token-y (`account.token = …;
|
||||
// account.save()`). Gate on positive user-input evidence so
|
||||
// these pure backend units are never claimed as a token flow.
|
||||
if !unit_has_user_input_evidence(unit) {
|
||||
continue;
|
||||
}
|
||||
let Some(token_lookup) = unit
|
||||
.operations
|
||||
.iter()
|
||||
|
|
@ -293,6 +314,10 @@ fn has_prior_subject_auth(
|
|||
op: &SensitiveOperation,
|
||||
subjects: &[&ValueRef],
|
||||
) -> bool {
|
||||
if has_row_fetch_exemption(unit, op) {
|
||||
return true;
|
||||
}
|
||||
|
||||
let relevant_checks = unit.auth_checks.iter().filter(|check| {
|
||||
check.line <= op.line
|
||||
&& !matches!(
|
||||
|
|
@ -310,6 +335,70 @@ fn has_prior_subject_auth(
|
|||
})
|
||||
}
|
||||
|
||||
/// Phase A4 row-fetch exemption.
|
||||
///
|
||||
/// Recognises the canonical "fetch-then-authorize" idiom in row-level
|
||||
/// authz code: a route handler fetches a row by id (`let community =
|
||||
/// Community::read(pool, data.community_id)?`), then calls a named
|
||||
/// authorization function on the fetched row (`check_community_user_action(
|
||||
/// &user, &community, ...)`). The authorization check appears
|
||||
/// textually after the fetch, so the existing `check.line <= op.line`
|
||||
/// rule cannot cover the fetch.
|
||||
///
|
||||
/// The exemption fires only when:
|
||||
/// 1. `op` is the row-fetch operation itself (line == row let-line).
|
||||
/// 2. SOME auth check in the unit names the resulting row variable as
|
||||
/// a subject (directly or via `check.subjects[i].base`).
|
||||
///
|
||||
/// Coverage is intentionally narrow: only the row-fetch operation is
|
||||
/// exempted. Any sink that runs *between* the fetch and the check
|
||||
/// (e.g. `delete(community)` before `check_*`) still flags, because
|
||||
/// its subject is `community` itself — not a fetch arg — and we
|
||||
/// require the operation to be a row-fetch site to apply the
|
||||
/// exemption.
|
||||
fn has_row_fetch_exemption(unit: &AnalysisUnit, op: &SensitiveOperation) -> bool {
|
||||
// Find the row var (if any) declared at this op's line.
|
||||
let row_var: Option<&str> = unit
|
||||
.row_population_data
|
||||
.iter()
|
||||
.find_map(|(var, (line, _))| {
|
||||
if *line == op.line {
|
||||
Some(var.as_str())
|
||||
} else {
|
||||
None
|
||||
}
|
||||
});
|
||||
let Some(row_var) = row_var else {
|
||||
return false;
|
||||
};
|
||||
|
||||
// Look for any non-login auth check whose subjects mention the row.
|
||||
// Match against the *root* of the subject's chain (`a.b.c` → `a`)
|
||||
// so an auth check on a row's nested field — e.g.
|
||||
// `is_mod_or_admin(pool, &user, comment_view.community.id)` —
|
||||
// still names the row var.
|
||||
unit.auth_checks.iter().any(|check| {
|
||||
if matches!(
|
||||
check.kind,
|
||||
AuthCheckKind::LoginGuard | AuthCheckKind::TokenExpiry | AuthCheckKind::TokenRecipient
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
check
|
||||
.subjects
|
||||
.iter()
|
||||
.any(|subj| chain_root(subj) == row_var)
|
||||
})
|
||||
}
|
||||
|
||||
/// Root segment of a subject's chain. Subjects produced from
|
||||
/// `a.b.c` carry `name = "a.b.c"` and `base = Some("a.b")`; the root
|
||||
/// is `a`. Bare identifiers carry `base = None` and use `name`.
|
||||
fn chain_root(subj: &ValueRef) -> &str {
|
||||
let raw = subj.base.as_deref().unwrap_or(subj.name.as_str());
|
||||
raw.split('.').next().unwrap_or(raw)
|
||||
}
|
||||
|
||||
fn has_prior_collection_auth(
|
||||
unit: &AnalysisUnit,
|
||||
op: &SensitiveOperation,
|
||||
|
|
@ -351,6 +440,56 @@ fn auth_check_covers_subject(check: &AuthCheck, subject: &ValueRef, unit: &Analy
|
|||
.iter()
|
||||
.any(|name| unit.authorized_sql_vars.contains(name));
|
||||
|
||||
// **Row-population reverse-walk** (lemmy fetch-then-check pattern).
|
||||
//
|
||||
// `row_population_data[R]` records the value-refs of every arg
|
||||
// passed to a `let R = CALL(args)` row fetch. When a later auth
|
||||
// check authorizes the resulting row (e.g. `check_community_user_action(
|
||||
// &user, &community, ..)` after `let community = Community::read(
|
||||
// pool, data.community_id)`), the check materially covers
|
||||
// `data.community_id` too — it gated access to the row that was
|
||||
// fetched using that id, so any subsequent operation re-using the
|
||||
// same id (read of a related view, mutation on the row itself) is
|
||||
// within the scope of that authorization.
|
||||
//
|
||||
// Match by canonical subject name so `data.community_id`,
|
||||
// `community_id`, `data.comment_id`, etc. all resolve uniformly
|
||||
// regardless of whether the route handler aliased the request
|
||||
// field into a local before passing it on.
|
||||
//
|
||||
// **Local-alias chain.** When the subject is a plain identifier
|
||||
// (no base/field), also consult `unit.var_alias_chain`: a sink
|
||||
// that uses `community_id` after `let community_id =
|
||||
// req.community_id` should see the population args recorded as
|
||||
// `req.community_id` matched, not just the bare name.
|
||||
let subject_alias_chain: Option<&str> = if subject.base.is_none() && subject.field.is_none() {
|
||||
unit.var_alias_chain.get(&subject.name).map(|s| s.as_str())
|
||||
} else {
|
||||
None
|
||||
};
|
||||
let subject_populates: Vec<&str> = unit
|
||||
.row_population_data
|
||||
.iter()
|
||||
.filter_map(|(row_var, (_line, args))| {
|
||||
let matches_arg = args.iter().any(|arg| {
|
||||
if canonical_subject_name(arg) == subject_key {
|
||||
return true;
|
||||
}
|
||||
if let Some(chain) = subject_alias_chain
|
||||
&& arg.name == chain
|
||||
{
|
||||
return true;
|
||||
}
|
||||
false
|
||||
});
|
||||
if matches_arg {
|
||||
Some(row_var.as_str())
|
||||
} else {
|
||||
None
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
|
||||
check.subjects.iter().any(|check_subject| {
|
||||
let check_key = canonical_subject_name(check_subject);
|
||||
let check_related_base = related_subject_base(check_subject);
|
||||
|
|
@ -366,6 +505,14 @@ fn auth_check_covers_subject(check: &AuthCheck, subject: &ValueRef, unit: &Analy
|
|||
return true;
|
||||
}
|
||||
}
|
||||
// Row-population reverse-walk: subject was passed to a row
|
||||
// fetch, and the check covers that row (chain root match on
|
||||
// the row var).
|
||||
for row in &subject_populates {
|
||||
if chain_root(check_subject) == *row {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
// B3: SQL synth checks name the auth-gated row var directly.
|
||||
// If our subject's row chain leads into the same authorized
|
||||
// var family this check anchors to, accept the coverage.
|
||||
|
|
@ -426,7 +573,54 @@ fn related_subject_base(subject: &ValueRef) -> Option<String> {
|
|||
}
|
||||
|
||||
fn is_relevant_target_subject(subject: &ValueRef, unit: &AnalysisUnit) -> bool {
|
||||
is_id_like(subject) && !is_actor_context_subject(subject, unit)
|
||||
is_id_like(subject)
|
||||
&& !is_actor_context_subject(subject, unit)
|
||||
&& !is_const_bound_subject(subject, unit)
|
||||
&& !is_typed_bounded_subject(subject, unit)
|
||||
}
|
||||
|
||||
/// True iff `subject` is a plain identifier whose declaration binds
|
||||
/// it to a literal constant (`id := "id"`, `let userId = 1`, etc.).
|
||||
/// Such bindings cannot be user-controlled and so must not be
|
||||
/// classified as scoped-identifier subjects. Only matches plain
|
||||
/// `Identifier`-kind subjects (no base/field) — member chains like
|
||||
/// `req.params.id` still pass through to the regular checks.
|
||||
fn is_const_bound_subject(subject: &ValueRef, unit: &AnalysisUnit) -> bool {
|
||||
if subject.base.is_some() || subject.field.is_some() {
|
||||
return false;
|
||||
}
|
||||
unit.const_bound_vars.contains(&subject.name)
|
||||
}
|
||||
|
||||
/// True iff `subject` is a plain identifier that resolves to a
|
||||
/// function parameter whose static type is a payload-incompatible
|
||||
/// scalar (numeric or boolean — see [`super::apply_typed_bounded_params`]).
|
||||
/// Spring `@PathVariable Long userId`, Axum `Path<i64>`, NestJS
|
||||
/// `@Param('id') id: number`, and FastAPI `user_id: int` all qualify.
|
||||
///
|
||||
/// Phase 6: also matches member-access subjects like `dto.userId`
|
||||
/// when `dto` is a typed-extractor parameter recognised by a Phase
|
||||
/// 1-2 matcher AND the field's declared TypeKind is Int/Bool.
|
||||
fn is_typed_bounded_subject(subject: &ValueRef, unit: &AnalysisUnit) -> bool {
|
||||
if subject.base.is_none() && subject.field.is_none() {
|
||||
return unit.typed_bounded_vars.contains(&subject.name);
|
||||
}
|
||||
// Phase 6: member-access shape `base.field` whose `base` is a
|
||||
// typed-extractor parameter and whose field is declared as an
|
||||
// Int/Bool in the same-file DTO definition. Per Hard Rule 3,
|
||||
// only fires when the base param itself was recognised by a
|
||||
// Phase 1-2 matcher — bare `dto.age` without a framework gate
|
||||
// never lifts.
|
||||
let Some(base) = subject.base.as_deref() else {
|
||||
return false;
|
||||
};
|
||||
let Some(field) = subject.field.as_deref() else {
|
||||
return false;
|
||||
};
|
||||
let root = base.split('.').next().unwrap_or(base);
|
||||
unit.typed_bounded_dto_fields
|
||||
.get(root)
|
||||
.is_some_and(|fields| fields.iter().any(|f| f == field))
|
||||
}
|
||||
|
||||
fn is_actor_context_subject(subject: &ValueRef, unit: &AnalysisUnit) -> bool {
|
||||
|
|
@ -434,6 +628,20 @@ fn is_actor_context_subject(subject: &ValueRef, unit: &AnalysisUnit) -> bool {
|
|||
return true;
|
||||
}
|
||||
|
||||
// Per-unit dynamic session-base set (TRPC `Options { ctx: { user:
|
||||
// TrpcSessionUser } }` populates `<localCtx>.user` via the
|
||||
// typed-extractor pre-pass). The static `is_self_scoped_session_base`
|
||||
// list deliberately omits bare `ctx.user` because `ctx` is generic
|
||||
// and a blanket addition over-suppresses in non-TRPC code; this
|
||||
// branch fires only when the param's static type literally
|
||||
// references `TrpcSessionUser` (or a known TRPC alias).
|
||||
if let Some(base) = subject.base.as_deref()
|
||||
&& unit.self_scoped_session_bases.contains(base)
|
||||
&& subject.field.as_deref().is_some_and(is_self_actor_id_field)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
// A3: `V.id`-shape subjects where `V` is bound from a login-guard /
|
||||
// auth-check call (or from a typed self-actor extractor parameter)
|
||||
// are the caller's own id. `V.group_id` / `V.workspace_id` stay
|
||||
|
|
@ -563,7 +771,7 @@ fn is_delegated_read_with_actor_context(
|
|||
op: &SensitiveOperation,
|
||||
relevant_subjects: &[&ValueRef],
|
||||
) -> bool {
|
||||
unit.kind == super::model::AnalysisUnitKind::RouteHandler
|
||||
unit.kind == AnalysisUnitKind::RouteHandler
|
||||
&& op.kind == OperationKind::Read
|
||||
&& op.callee.to_ascii_lowercase().contains("service")
|
||||
&& op.subjects.iter().any(is_self_scoped_session_subject)
|
||||
|
|
@ -583,7 +791,15 @@ fn is_id_like(subject: &ValueRef) -> bool {
|
|||
.as_deref()
|
||||
.or(subject.base.as_deref())
|
||||
.unwrap_or(&subject.name);
|
||||
let lower = field.to_ascii_lowercase();
|
||||
is_id_like_name(field)
|
||||
}
|
||||
|
||||
/// String-level analogue of `is_id_like` for working with parameter
|
||||
/// names (which carry no `ValueRef` structure). Mirrors the same
|
||||
/// suffix vocabulary so a parameter `doc_id` / `groupId` / `userIds`
|
||||
/// is recognised as an id-bearing input.
|
||||
fn is_id_like_name(name: &str) -> bool {
|
||||
let lower = name.to_ascii_lowercase();
|
||||
lower == "id"
|
||||
|| lower.ends_with("id")
|
||||
|| lower.ends_with("_id")
|
||||
|
|
@ -593,6 +809,86 @@ fn is_id_like(subject: &ValueRef) -> bool {
|
|||
|| lower.contains("noteid")
|
||||
}
|
||||
|
||||
/// True when the analysis unit shows positive evidence of receiving
|
||||
/// user-controlled input — the precondition for any auth rule that
|
||||
/// reasons about "scoped identifier" or "token-acceptance flow"
|
||||
/// shapes.
|
||||
///
|
||||
/// A unit qualifies if any of the following hold:
|
||||
/// * It is a recognised framework route handler (`RouteHandler` —
|
||||
/// the strongest signal: registered with a router).
|
||||
/// * It accesses a request-shaped value (`request.body`, `req.params`,
|
||||
/// `c.Query(..)`, etc.) — populated as `context_inputs`.
|
||||
/// * It declares at least one parameter whose name signals an
|
||||
/// externally-supplied value (id-like, token-like, request-like).
|
||||
/// Internal helpers that take only typed objects
|
||||
/// (`promotion: Promotion`, `apps`, `schema_editor`, `config`,
|
||||
/// `items`) are excluded.
|
||||
///
|
||||
/// Migrations, Celery tasks, pytest fixtures, conftest hooks, and
|
||||
/// pure utility helpers fail all three conditions and are skipped —
|
||||
/// they cannot, by construction, be the entry point of an
|
||||
/// authentication-bearing flow.
|
||||
fn unit_has_user_input_evidence(unit: &AnalysisUnit) -> bool {
|
||||
if unit.kind == AnalysisUnitKind::RouteHandler {
|
||||
return true;
|
||||
}
|
||||
if !unit.context_inputs.is_empty() {
|
||||
return true;
|
||||
}
|
||||
unit.params.iter().any(|p| is_external_input_param_name(p))
|
||||
}
|
||||
|
||||
/// Parameter-name heuristic: does this name carry external/user input
|
||||
/// as part of its calling contract? Captures three classes of name:
|
||||
/// * id-like (`*_id`, `*Id`, `id`, `*Ids`),
|
||||
/// * token-like (`token`, `*_token`, `accessToken`),
|
||||
/// * framework-request objects (`request`, `req`, `ctx` — the
|
||||
/// standard names used by Express/Django/Flask/Gin/Axum/NestJS
|
||||
/// handlers as the parameter that carries the HTTP request).
|
||||
///
|
||||
/// Used by `unit_has_user_input_evidence` to recognise helper
|
||||
/// functions that, while not registered as route handlers, are
|
||||
/// clearly invoked with caller-supplied identifiers or request data.
|
||||
fn is_external_input_param_name(name: &str) -> bool {
|
||||
if is_id_like_name(name) {
|
||||
return true;
|
||||
}
|
||||
let lower = name.to_ascii_lowercase();
|
||||
// Token-shaped: bare `token` or any `*_token` / `*Token` /
|
||||
// `accessToken` / `refreshToken`-style suffix. Conservative —
|
||||
// only fires on explicit token-naming, not on incidental
|
||||
// substrings.
|
||||
if lower == "token" || lower.ends_with("_token") || lower.ends_with("token") {
|
||||
return true;
|
||||
}
|
||||
// Standard framework request-parameter names. These cover the
|
||||
// cross-language convention for the parameter holding the HTTP
|
||||
// request object (`req` / `request` / `ctx` / `context` / `info`)
|
||||
// **and** the typed-extractor parameter naming used by
|
||||
// Axum/Actix/NestJS handlers (`path`, `payload`, `body`, `dto`,
|
||||
// `form`, `query`). In `web::Path<String>` / `web::Json<T>` /
|
||||
// `@Body() dto: ...` the parameter name itself is the standard
|
||||
// convention used by every example in the framework docs, so
|
||||
// matching on the name is a reliable proxy for the typed
|
||||
// extractor binding. Bare `c` is too common (incidental local
|
||||
// variable) to include without an additional type signal.
|
||||
matches!(
|
||||
lower.as_str(),
|
||||
"req"
|
||||
| "request"
|
||||
| "ctx"
|
||||
| "context"
|
||||
| "info"
|
||||
| "path"
|
||||
| "payload"
|
||||
| "body"
|
||||
| "dto"
|
||||
| "form"
|
||||
| "query"
|
||||
)
|
||||
}
|
||||
|
||||
fn is_batch_collection(subject: &ValueRef) -> bool {
|
||||
subject.source_kind == ValueSourceKind::Identifier
|
||||
&& subject.name.to_ascii_lowercase().ends_with("ids")
|
||||
|
|
@ -600,7 +896,10 @@ fn is_batch_collection(subject: &ValueRef) -> bool {
|
|||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::{is_actor_context_subject, is_relevant_target_subject};
|
||||
use super::{
|
||||
auth_check_covers_subject, is_actor_context_subject, is_external_input_param_name,
|
||||
is_relevant_target_subject, unit_has_user_input_evidence,
|
||||
};
|
||||
use crate::auth_analysis::model::{AnalysisUnit, AnalysisUnitKind, ValueRef, ValueSourceKind};
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
|
|
@ -618,9 +917,15 @@ mod tests {
|
|||
condition_texts: Vec::new(),
|
||||
line: 1,
|
||||
row_field_vars: HashMap::new(),
|
||||
var_alias_chain: HashMap::new(),
|
||||
row_population_data: HashMap::new(),
|
||||
self_actor_vars: HashSet::new(),
|
||||
self_actor_id_vars: HashSet::new(),
|
||||
authorized_sql_vars: HashSet::new(),
|
||||
const_bound_vars: HashSet::new(),
|
||||
typed_bounded_vars: HashSet::new(),
|
||||
typed_bounded_dto_fields: HashMap::new(),
|
||||
self_scoped_session_bases: HashSet::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -716,4 +1021,395 @@ mod tests {
|
|||
// Foreign-user fields still flag.
|
||||
assert!(!is_actor_context_subject(&member("target", "email"), &unit));
|
||||
}
|
||||
|
||||
/// Real-repo regression (gin/context_test.go): `id := "id";
|
||||
/// c.AddParam(id, value)` previously fired the rule because `id`
|
||||
/// matched is_id_like but had no actor-context exemption. After
|
||||
/// the const-binding tracker, `id` (a plain Local with no base /
|
||||
/// field) bound to a literal is excluded from relevant subjects.
|
||||
#[test]
|
||||
fn const_bound_plain_subjects_are_not_relevant() {
|
||||
let mut unit = empty_unit();
|
||||
unit.const_bound_vars.insert("id".into());
|
||||
|
||||
// `id` matches is_id_like (name=="id") but is constant-bound.
|
||||
assert!(!is_relevant_target_subject(&plain("id"), &unit));
|
||||
|
||||
// Plain `id` NOT in the const-bound set still flags as
|
||||
// relevant — regression guard for the user-controlled case.
|
||||
let unit2 = empty_unit();
|
||||
assert!(is_relevant_target_subject(&plain("id"), &unit2));
|
||||
|
||||
// Member access `req.id` is unaffected by const-bound check
|
||||
// (different ValueRef shape).
|
||||
unit.const_bound_vars.insert("req".into());
|
||||
assert!(is_relevant_target_subject(&member("req", "id"), &unit));
|
||||
}
|
||||
|
||||
/// Phase 5 typed-bounded subject exclusion: a parameter whose
|
||||
/// static type was recovered as `Int`/`Bool` (Spring `Long userId`,
|
||||
/// Axum `Path<i64>`, FastAPI `user_id: int`) has its name added to
|
||||
/// `unit.typed_bounded_vars` by `apply_typed_bounded_params`. The
|
||||
/// subject `userId` then must not be classified as a scoped
|
||||
/// identifier — the framework guarantees the value is numeric and
|
||||
/// cannot drive ownership-bypass.
|
||||
#[test]
|
||||
fn typed_bounded_plain_subjects_are_not_relevant() {
|
||||
let mut unit = empty_unit();
|
||||
unit.typed_bounded_vars.insert("user_id".into());
|
||||
|
||||
// `user_id` matches is_id_like but is bounded by static type.
|
||||
assert!(!is_relevant_target_subject(&plain("user_id"), &unit));
|
||||
|
||||
// Plain `user_id` NOT in the typed-bounded set still flags.
|
||||
let unit2 = empty_unit();
|
||||
assert!(is_relevant_target_subject(&plain("user_id"), &unit2));
|
||||
|
||||
// Member access `req.user_id` is unaffected (only plain
|
||||
// identifiers are exempted — fields/base remain regular
|
||||
// subjects so DTO-shape leaks still flag).
|
||||
unit.typed_bounded_vars.insert("req".into());
|
||||
assert!(is_relevant_target_subject(&member("req", "user_id"), &unit));
|
||||
}
|
||||
|
||||
/// Real-repo regression: pure-backend units (Django migrations,
|
||||
/// Celery tasks with no params, pytest fixtures) must fail the
|
||||
/// user-input precondition so token-override / ownership rules
|
||||
/// don't fire. Conversely, helpers with id-like / token-like /
|
||||
/// request-named parameters do count as user-input-bearing.
|
||||
#[test]
|
||||
fn unit_user_input_evidence_recognises_external_inputs() {
|
||||
// Function with no params and no context_inputs (Celery task
|
||||
// shape) — must NOT count as user-input-bearing.
|
||||
let mut unit = empty_unit();
|
||||
assert!(!unit_has_user_input_evidence(&unit));
|
||||
|
||||
// Adding internal-typed params (apps, schema_editor — Django
|
||||
// migration RunPython callback shape) keeps the gate closed.
|
||||
unit.params.push("apps".into());
|
||||
unit.params.push("schema_editor".into());
|
||||
assert!(!unit_has_user_input_evidence(&unit));
|
||||
|
||||
// pytest hook shape: (config, items) — gate stays closed.
|
||||
let mut unit = empty_unit();
|
||||
unit.params.push("config".into());
|
||||
unit.params.push("items".into());
|
||||
assert!(!unit_has_user_input_evidence(&unit));
|
||||
|
||||
// Adding an id-like param flips the gate open.
|
||||
unit.params.push("doc_id".into());
|
||||
assert!(unit_has_user_input_evidence(&unit));
|
||||
|
||||
// Token-named param flips the gate open (Express helper
|
||||
// `acceptInvitation(token, currentUser, roleOverride)`).
|
||||
let mut unit = empty_unit();
|
||||
unit.params.push("token".into());
|
||||
unit.params.push("currentUser".into());
|
||||
unit.params.push("roleOverride".into());
|
||||
assert!(unit_has_user_input_evidence(&unit));
|
||||
|
||||
// Framework request-name param flips the gate open
|
||||
// (Django/Flask `def view(request, project_id):`).
|
||||
let mut unit = empty_unit();
|
||||
unit.params.push("request".into());
|
||||
assert!(unit_has_user_input_evidence(&unit));
|
||||
|
||||
// Axum/Actix typed-extractor convention name flips it open.
|
||||
let mut unit = empty_unit();
|
||||
unit.params.push("path".into());
|
||||
assert!(unit_has_user_input_evidence(&unit));
|
||||
|
||||
// RouteHandler kind always wins, regardless of params.
|
||||
let mut unit = empty_unit();
|
||||
unit.kind = AnalysisUnitKind::RouteHandler;
|
||||
assert!(unit_has_user_input_evidence(&unit));
|
||||
}
|
||||
|
||||
/// `is_external_input_param_name` covers id-, token-, and
|
||||
/// framework-request shapes; bare internal-typed names are
|
||||
/// rejected so internal helpers stay outside the gate.
|
||||
#[test]
|
||||
fn external_input_param_name_classification() {
|
||||
// ID-shaped names.
|
||||
assert!(is_external_input_param_name("id"));
|
||||
assert!(is_external_input_param_name("doc_id"));
|
||||
assert!(is_external_input_param_name("groupId"));
|
||||
assert!(is_external_input_param_name("voucher_code_ids"));
|
||||
|
||||
// Token-shaped names.
|
||||
assert!(is_external_input_param_name("token"));
|
||||
assert!(is_external_input_param_name("access_token"));
|
||||
assert!(is_external_input_param_name("refreshToken"));
|
||||
|
||||
// Framework request / extractor names.
|
||||
assert!(is_external_input_param_name("request"));
|
||||
assert!(is_external_input_param_name("req"));
|
||||
assert!(is_external_input_param_name("ctx"));
|
||||
assert!(is_external_input_param_name("path"));
|
||||
assert!(is_external_input_param_name("payload"));
|
||||
assert!(is_external_input_param_name("dto"));
|
||||
assert!(is_external_input_param_name("query"));
|
||||
|
||||
// Internal-typed names that internal helpers / migrations
|
||||
// commonly use must NOT match.
|
||||
assert!(!is_external_input_param_name("apps"));
|
||||
assert!(!is_external_input_param_name("schema_editor"));
|
||||
assert!(!is_external_input_param_name("config"));
|
||||
assert!(!is_external_input_param_name("items"));
|
||||
assert!(!is_external_input_param_name("promotion"));
|
||||
assert!(!is_external_input_param_name("update_rule_variants"));
|
||||
assert!(!is_external_input_param_name("manager"));
|
||||
// `c` alone is too common as a local variable to count.
|
||||
assert!(!is_external_input_param_name("c"));
|
||||
}
|
||||
|
||||
/// Phase A4 row-fetch exemption.
|
||||
///
|
||||
/// Row var declared at line 10; auth check naming the row appears
|
||||
/// at line 20. An operation at line 10 (the fetch) is exempted
|
||||
/// because the auth check authorises the resulting row. Coverage
|
||||
/// is intentionally narrow — operations between fetch (10) and
|
||||
/// check (20) that are NOT row-fetch sites must still flag.
|
||||
#[test]
|
||||
fn row_fetch_exemption_covers_fetch_when_check_names_row() {
|
||||
use super::has_row_fetch_exemption;
|
||||
use crate::auth_analysis::model::{
|
||||
AuthCheck, AuthCheckKind, OperationKind, SensitiveOperation,
|
||||
};
|
||||
|
||||
let mut unit = empty_unit();
|
||||
// `let community = Community::read(pool, data.community_id)?;` at line 10
|
||||
unit.row_population_data.insert(
|
||||
"community".to_string(),
|
||||
(10, vec![member("data", "community_id")]),
|
||||
);
|
||||
// Auth check at line 20 with `community` as a subject base.
|
||||
unit.auth_checks.push(AuthCheck {
|
||||
kind: AuthCheckKind::Membership,
|
||||
callee: "check_community_user_action".into(),
|
||||
subjects: vec![member("community", "id")],
|
||||
span: (0, 0),
|
||||
line: 20,
|
||||
args: Vec::new(),
|
||||
condition_text: None,
|
||||
});
|
||||
|
||||
let fetch_op = SensitiveOperation {
|
||||
kind: OperationKind::Read,
|
||||
sink_class: None,
|
||||
callee: "Community.read".into(),
|
||||
subjects: vec![member("data", "community_id")],
|
||||
span: (0, 0),
|
||||
line: 10,
|
||||
text: String::new(),
|
||||
};
|
||||
assert!(has_row_fetch_exemption(&unit, &fetch_op));
|
||||
|
||||
// Operation at a different line (between fetch and check) is
|
||||
// NOT a row-fetch site — exemption does not apply.
|
||||
let mid_op = SensitiveOperation {
|
||||
kind: OperationKind::Mutation,
|
||||
sink_class: None,
|
||||
callee: "delete_post".into(),
|
||||
subjects: vec![member("data", "post_id")],
|
||||
span: (0, 0),
|
||||
line: 15,
|
||||
text: String::new(),
|
||||
};
|
||||
assert!(!has_row_fetch_exemption(&unit, &mid_op));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn row_fetch_exemption_skips_when_no_check_names_row() {
|
||||
use super::has_row_fetch_exemption;
|
||||
use crate::auth_analysis::model::{OperationKind, SensitiveOperation};
|
||||
|
||||
let mut unit = empty_unit();
|
||||
unit.row_population_data.insert(
|
||||
"community".to_string(),
|
||||
(10, vec![member("data", "community_id")]),
|
||||
);
|
||||
// No auth check pushed — exemption must NOT apply.
|
||||
|
||||
let fetch_op = SensitiveOperation {
|
||||
kind: OperationKind::Read,
|
||||
sink_class: None,
|
||||
callee: "Community.read".into(),
|
||||
subjects: vec![member("data", "community_id")],
|
||||
span: (0, 0),
|
||||
line: 10,
|
||||
text: String::new(),
|
||||
};
|
||||
assert!(!has_row_fetch_exemption(&unit, &fetch_op));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn row_fetch_exemption_ignores_login_token_checks() {
|
||||
use super::has_row_fetch_exemption;
|
||||
use crate::auth_analysis::model::{
|
||||
AuthCheck, AuthCheckKind, OperationKind, SensitiveOperation,
|
||||
};
|
||||
|
||||
let mut unit = empty_unit();
|
||||
unit.row_population_data.insert(
|
||||
"community".to_string(),
|
||||
(10, vec![member("data", "community_id")]),
|
||||
);
|
||||
// Login-only check on the row should NOT exempt the row-fetch
|
||||
// — login proves identity, not authorization.
|
||||
unit.auth_checks.push(AuthCheck {
|
||||
kind: AuthCheckKind::LoginGuard,
|
||||
callee: "require_login".into(),
|
||||
subjects: vec![member("community", "id")],
|
||||
span: (0, 0),
|
||||
line: 20,
|
||||
args: Vec::new(),
|
||||
condition_text: None,
|
||||
});
|
||||
|
||||
let fetch_op = SensitiveOperation {
|
||||
kind: OperationKind::Read,
|
||||
sink_class: None,
|
||||
callee: "Community.read".into(),
|
||||
subjects: vec![member("data", "community_id")],
|
||||
span: (0, 0),
|
||||
line: 10,
|
||||
text: String::new(),
|
||||
};
|
||||
assert!(!has_row_fetch_exemption(&unit, &fetch_op));
|
||||
}
|
||||
|
||||
/// Row-population reverse-walk (lemmy fetch-then-check pattern).
|
||||
///
|
||||
/// `let community = Community::read(pool, data.community_id)` at
|
||||
/// line 10 records `community → [data.community_id]`. An auth
|
||||
/// check on `community` at line 20 must materially cover any
|
||||
/// downstream operation that re-uses `data.community_id` (e.g. a
|
||||
/// later `delete_mods_for_community(pool, community_id)`),
|
||||
/// because the check authorised access to the row that was
|
||||
/// fetched using that id.
|
||||
#[test]
|
||||
fn auth_check_covers_subject_via_row_population_reverse_walk() {
|
||||
use crate::auth_analysis::model::{AuthCheck, AuthCheckKind};
|
||||
|
||||
let mut unit = empty_unit();
|
||||
unit.row_population_data.insert(
|
||||
"community".to_string(),
|
||||
(10, vec![member("data", "community_id")]),
|
||||
);
|
||||
let check = AuthCheck {
|
||||
kind: AuthCheckKind::Membership,
|
||||
callee: "check_community_user_action".into(),
|
||||
subjects: vec![member("community", "id")],
|
||||
span: (0, 0),
|
||||
line: 20,
|
||||
args: Vec::new(),
|
||||
condition_text: None,
|
||||
};
|
||||
|
||||
// Direct member subject `data.community_id` (the original
|
||||
// request field) — covered via reverse-walk.
|
||||
assert!(auth_check_covers_subject(
|
||||
&check,
|
||||
&member("data", "community_id"),
|
||||
&unit
|
||||
));
|
||||
|
||||
// A later op that re-passed the *same* id-bearing argument
|
||||
// (`Community::read(pool, data.community_id)`) gets covered
|
||||
// even though the check's subject names the row, not the id.
|
||||
// Before the fix, this fired as
|
||||
// `rs.auth.missing_ownership_check` on lemmy
|
||||
// `community/transfer.rs:88` and similar.
|
||||
|
||||
// Negative: an unrelated id (different request field that
|
||||
// never populated this row) must NOT be covered.
|
||||
assert!(!auth_check_covers_subject(
|
||||
&check,
|
||||
&member("data", "post_id"),
|
||||
&unit
|
||||
));
|
||||
}
|
||||
|
||||
/// Subject as plain identifier copied from the request
|
||||
/// (`let community_id = data.community_id; let community =
|
||||
/// Community::read(pool, community_id);`) must also benefit from
|
||||
/// the reverse-walk — `row_population_data["community"]` then
|
||||
/// records `[community_id]` (a plain identifier, not the
|
||||
/// member-access shape).
|
||||
#[test]
|
||||
fn auth_check_covers_subject_via_row_population_reverse_walk_plain_arg() {
|
||||
use crate::auth_analysis::model::{AuthCheck, AuthCheckKind};
|
||||
|
||||
let mut unit = empty_unit();
|
||||
unit.row_population_data
|
||||
.insert("community".to_string(), (10, vec![plain("community_id")]));
|
||||
let check = AuthCheck {
|
||||
kind: AuthCheckKind::Membership,
|
||||
callee: "check_community_mod_action".into(),
|
||||
subjects: vec![member("community", "id")],
|
||||
span: (0, 0),
|
||||
line: 20,
|
||||
args: Vec::new(),
|
||||
condition_text: None,
|
||||
};
|
||||
|
||||
assert!(auth_check_covers_subject(
|
||||
&check,
|
||||
&plain("community_id"),
|
||||
&unit
|
||||
));
|
||||
// Different plain id is not covered.
|
||||
assert!(!auth_check_covers_subject(&check, &plain("post_id"), &unit));
|
||||
}
|
||||
|
||||
/// Local-alias chain coverage (lemmy `community/transfer.rs` shape).
|
||||
///
|
||||
/// `let community = Community::read(pool, req.community_id)` at
|
||||
/// line 10 records `community → [req.community_id]`. After the
|
||||
/// auth check on the row, the handler aliases the request field
|
||||
/// into a local: `let community_id = req.community_id;` then
|
||||
/// reuses the bare `community_id` in a downstream sink.
|
||||
/// `var_alias_chain["community_id"] = "req.community_id"` lets
|
||||
/// the reverse-walk match the population args (which still
|
||||
/// contain the original member chain) against the plain subject.
|
||||
#[test]
|
||||
fn auth_check_covers_subject_via_row_population_alias_chain() {
|
||||
use crate::auth_analysis::model::{AuthCheck, AuthCheckKind};
|
||||
|
||||
let mut unit = empty_unit();
|
||||
unit.row_population_data.insert(
|
||||
"community".to_string(),
|
||||
(10, vec![member("req", "community_id")]),
|
||||
);
|
||||
unit.var_alias_chain
|
||||
.insert("community_id".to_string(), "req.community_id".to_string());
|
||||
let check = AuthCheck {
|
||||
kind: AuthCheckKind::Membership,
|
||||
callee: "check_community_user_action".into(),
|
||||
subjects: vec![member("community", "id")],
|
||||
span: (0, 0),
|
||||
line: 20,
|
||||
args: Vec::new(),
|
||||
condition_text: None,
|
||||
};
|
||||
|
||||
// Sink subject is the bare alias — covered via the chain.
|
||||
assert!(auth_check_covers_subject(
|
||||
&check,
|
||||
&plain("community_id"),
|
||||
&unit
|
||||
));
|
||||
|
||||
// The original member-access subject is still covered (no
|
||||
// regression in the existing reverse-walk path).
|
||||
assert!(auth_check_covers_subject(
|
||||
&check,
|
||||
&member("req", "community_id"),
|
||||
&unit
|
||||
));
|
||||
|
||||
// Plain identifier with no alias entry must NOT be covered.
|
||||
assert!(!auth_check_covers_subject(&check, &plain("post_id"), &unit));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,4 +1,5 @@
|
|||
use crate::auth_analysis::model::SinkClass;
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::utils::config::Config;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
|
|
@ -175,7 +176,7 @@ impl AuthAnalysisRules {
|
|||
/// receiver — `someElement.addEventListener` is just as
|
||||
/// categorically client-side as `document.addEventListener`.
|
||||
pub fn callee_has_non_sink_method(&self, callee: &str) -> bool {
|
||||
let last = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let last = bare_method_name(callee);
|
||||
let last = last.rsplit("::").next().unwrap_or(last);
|
||||
if last.is_empty() {
|
||||
return false;
|
||||
|
|
@ -244,11 +245,29 @@ impl AuthAnalysisRules {
|
|||
if self.receiver_matches_any_prefix(first, &self.cache_receiver_prefixes) {
|
||||
return Some(SinkClass::CacheCrossTenant);
|
||||
}
|
||||
if self.is_mutation(callee) {
|
||||
return Some(SinkClass::DbMutation);
|
||||
}
|
||||
if self.is_read(callee) {
|
||||
return Some(SinkClass::DbCrossTenantRead);
|
||||
// Verb-name fallback (`is_mutation` / `is_read`) is the loosest
|
||||
// dispatch: it prefix-matches the bare method name against
|
||||
// generic verbs (`Get`, `Save`, `Find`, …) regardless of the
|
||||
// receiver. When the receiver chain itself contains a call
|
||||
// expression (`w.Header().Get(..)`, `r.URL.Query().Get(..)`,
|
||||
// `db.Tx(..).Query(..)`), the receiver is the *return value of
|
||||
// another call* — its type is opaque to the auth analyser and
|
||||
// the bare verb match is too speculative to assume a data-layer
|
||||
// sink. The realtime/outbound/cache prefix dispatches above
|
||||
// already match by the chain root; if none of them claimed the
|
||||
// receiver, dropping the verb-name fallback for chained-call
|
||||
// shapes prevents the entire `w.Header().Get` /
|
||||
// `r.URL.Query().Get` cluster from masquerading as a
|
||||
// `DbCrossTenantRead`. A canonical data-layer call still has a
|
||||
// bare-identifier receiver (`repo.Find(id)`, `db.Query(..)`)
|
||||
// and is unaffected.
|
||||
if !receiver_is_chained_call(callee) {
|
||||
if self.is_mutation(callee) {
|
||||
return Some(SinkClass::DbMutation);
|
||||
}
|
||||
if self.is_read(callee) {
|
||||
return Some(SinkClass::DbCrossTenantRead);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
|
@ -596,6 +615,38 @@ pub fn build_auth_rules(config: &Config, lang_slug: &str) -> AuthAnalysisRules {
|
|||
"verify_access!".into(),
|
||||
"can_access?".into(),
|
||||
"can?".into(),
|
||||
// Rails per-record permission predicates — the canonical
|
||||
// "load by id, then check on the loaded record" idiom
|
||||
// (see redmine `app/controllers/issues_controller.rb`,
|
||||
// mastodon controllers, diaspora ApplicationController).
|
||||
// Combined with `row_population_data` reverse-walk, this
|
||||
// recognises the post-fetch ownership check that is
|
||||
// textually after the find call.
|
||||
"visible?".into(),
|
||||
"editable?".into(),
|
||||
"editable_by?".into(),
|
||||
"deletable?".into(),
|
||||
"deletable_by?".into(),
|
||||
"destroyable?".into(),
|
||||
"destroyable_by?".into(),
|
||||
"commentable?".into(),
|
||||
"commentable_by?".into(),
|
||||
"permitted?".into(),
|
||||
"accessible?".into(),
|
||||
"accessible_by?".into(),
|
||||
"authorized?".into(),
|
||||
"allowed_to?".into(),
|
||||
"allowed?".into(),
|
||||
"viewable?".into(),
|
||||
"viewable_by?".into(),
|
||||
"writable?".into(),
|
||||
"writable_by?".into(),
|
||||
"readable?".into(),
|
||||
"readable_by?".into(),
|
||||
"manageable?".into(),
|
||||
"manageable_by?".into(),
|
||||
"owned_by?".into(),
|
||||
"belongs_to?".into(),
|
||||
],
|
||||
mutation_indicator_names: vec![
|
||||
"update".into(),
|
||||
|
|
@ -1294,13 +1345,32 @@ pub fn first_receiver_segment(callee: &str) -> &str {
|
|||
callee.split('.').next().unwrap_or(callee)
|
||||
}
|
||||
|
||||
/// True when the callee's receiver chain contains a call expression —
|
||||
/// i.e. the LAST segment is being invoked on the *return value* of an
|
||||
/// earlier call (`w.Header().Get`, `r.URL.Query().Get`,
|
||||
/// `db.Tx(opts).Query`). Detected as: the substring before the last
|
||||
/// `.` contains a `(`.
|
||||
///
|
||||
/// `classify_sink_class` consults this to suppress the loose verb-name
|
||||
/// fallback (`is_read` / `is_mutation`) for chained-call shapes whose
|
||||
/// receiver type is opaque to the analyser.
|
||||
pub fn receiver_is_chained_call(callee: &str) -> bool {
|
||||
let Some((receiver, _method)) = callee.rsplit_once('.') else {
|
||||
return false;
|
||||
};
|
||||
receiver.contains('(')
|
||||
}
|
||||
|
||||
/// Recognise `require_<resource>_<role>` / `ensure_<resource>_<role>`
|
||||
/// shapes where `<role>` is a closed-vocabulary authorization noun
|
||||
/// (`member`, `owner`, `admin`, `access`, `permission`, `manager`,
|
||||
/// `editor`, `viewer`). The resource segment is project-specific
|
||||
/// (`trip`, `doc`, `project`, `workspace`, …) and cannot be enumerated
|
||||
/// in the static defaults — but the prefix+role pattern is unambiguous
|
||||
/// enough that recognising it as an authorization check is safe.
|
||||
/// `editor`, `viewer`, `user`, `mod`). The resource segment is
|
||||
/// project-specific (`trip`, `doc`, `project`, `community`, …) and
|
||||
/// cannot be enumerated in the static defaults — but the
|
||||
/// prefix+role pattern is unambiguous enough that recognising it as
|
||||
/// an authorization check is safe. Also accepts `is_<role>` /
|
||||
/// `is_<role>_(or|and)_<role>...` predicate forms (`is_admin`,
|
||||
/// `is_mod_or_admin`).
|
||||
///
|
||||
/// Strips path-namespace and method prefixes before matching:
|
||||
/// `authz::require_trip_member` → `require_trip_member`;
|
||||
|
|
@ -1309,23 +1379,60 @@ fn is_require_resource_role_call(name: &str) -> bool {
|
|||
let last = name.rsplit("::").next().unwrap_or(name);
|
||||
let last = last.rsplit('.').next().unwrap_or(last);
|
||||
let lower = last.to_ascii_lowercase();
|
||||
let after_prefix = if let Some(rest) = lower.strip_prefix("require_") {
|
||||
rest
|
||||
} else if let Some(rest) = lower.strip_prefix("ensure_") {
|
||||
rest
|
||||
} else {
|
||||
return false;
|
||||
};
|
||||
let Some(last_underscore) = after_prefix.rfind('_') else {
|
||||
return false;
|
||||
};
|
||||
// Must have at least one resource char before the role and a
|
||||
// non-empty role after. Rejects degenerate `require__member`,
|
||||
// `require_member` (no resource).
|
||||
if last_underscore == 0 || last_underscore == after_prefix.len() - 1 {
|
||||
return false;
|
||||
|
||||
// Pattern 1: `<verb>_<resource>_<role>[_<context>]?` where
|
||||
// <verb> ∈ {require, ensure, check, assert, verify} and
|
||||
// <context> ∈ {action, allowed, valid} (a small closed suffix
|
||||
// set that wraps the role, e.g. `check_community_mod_action`).
|
||||
if let Some(after_prefix) = strip_auth_verb_prefix(&lower) {
|
||||
let core = strip_role_context_suffix(after_prefix);
|
||||
if let Some(last_underscore) = core.rfind('_')
|
||||
&& last_underscore > 0
|
||||
&& last_underscore < core.len() - 1
|
||||
{
|
||||
let role = &core[last_underscore + 1..];
|
||||
if is_known_auth_role(role) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
let role = &after_prefix[last_underscore + 1..];
|
||||
|
||||
// Pattern 2: `is_<role>` and `is_<role>_(or|and)_<role>...`.
|
||||
// Conservative role list — excludes `user` / `staff` to avoid
|
||||
// matching ambiguous predicates like `is_user`.
|
||||
if let Some(rest) = lower.strip_prefix("is_")
|
||||
&& !rest.is_empty()
|
||||
&& all_tokens_are_predicate_roles(rest)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
false
|
||||
}
|
||||
|
||||
fn strip_auth_verb_prefix(lower: &str) -> Option<&str> {
|
||||
for verb in ["require_", "ensure_", "check_", "assert_", "verify_"] {
|
||||
if let Some(rest) = lower.strip_prefix(verb) {
|
||||
return Some(rest);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Strip a single trailing `_<context>` suffix where <context> wraps
|
||||
/// a role word with extra noise (`_action` / `_allowed` / `_valid`).
|
||||
/// Does NOT strip `_access` / `_permission` because those are
|
||||
/// themselves valid role suffixes (`require_doc_access`).
|
||||
fn strip_role_context_suffix(s: &str) -> &str {
|
||||
for suffix in ["_action", "_allowed", "_valid"] {
|
||||
if let Some(stripped) = s.strip_suffix(suffix) {
|
||||
return stripped;
|
||||
}
|
||||
}
|
||||
s
|
||||
}
|
||||
|
||||
fn is_known_auth_role(role: &str) -> bool {
|
||||
matches!(
|
||||
role,
|
||||
"member"
|
||||
|
|
@ -1344,9 +1451,55 @@ fn is_require_resource_role_call(name: &str) -> bool {
|
|||
| "viewer"
|
||||
| "viewers"
|
||||
| "role"
|
||||
| "user"
|
||||
| "mod"
|
||||
| "mods"
|
||||
| "moderator"
|
||||
| "moderators"
|
||||
)
|
||||
}
|
||||
|
||||
/// `is_<role>` predicate role set. Tighter than the
|
||||
/// `<verb>_<resource>_<role>` set because predicates lack the
|
||||
/// resource segment that disambiguates ambiguous role nouns
|
||||
/// (`is_user` could be a typeof check, not an authorization check).
|
||||
fn is_predicate_auth_role(role: &str) -> bool {
|
||||
matches!(
|
||||
role,
|
||||
"admin"
|
||||
| "admins"
|
||||
| "owner"
|
||||
| "owners"
|
||||
| "member"
|
||||
| "members"
|
||||
| "manager"
|
||||
| "managers"
|
||||
| "moderator"
|
||||
| "moderators"
|
||||
| "mod"
|
||||
| "mods"
|
||||
| "editor"
|
||||
| "editors"
|
||||
)
|
||||
}
|
||||
|
||||
/// Returns `true` iff every `_or_` / `_and_`-separated token in `rest`
|
||||
/// is a known predicate auth role. E.g. `mod_or_admin` → true,
|
||||
/// `mod_or_owner_and_admin` → true, `mod_or_logged_in` → false.
|
||||
fn all_tokens_are_predicate_roles(rest: &str) -> bool {
|
||||
let mut tokens: Vec<&str> = vec![rest];
|
||||
for sep in &["_or_", "_and_"] {
|
||||
let mut next: Vec<&str> = Vec::new();
|
||||
for t in &tokens {
|
||||
for piece in t.split(sep) {
|
||||
next.push(piece);
|
||||
}
|
||||
}
|
||||
tokens = next;
|
||||
}
|
||||
!tokens.is_empty() && tokens.iter().all(|t| is_predicate_auth_role(t))
|
||||
}
|
||||
|
||||
pub fn matches_name(name: &str, pattern: &str) -> bool {
|
||||
let name_last = name.rsplit('.').next().unwrap_or(name);
|
||||
let pattern_last = pattern.rsplit('.').next().unwrap_or(pattern);
|
||||
|
|
@ -1521,6 +1674,51 @@ mod tests {
|
|||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn receiver_is_chained_call_detects_intermediate_calls() {
|
||||
use super::receiver_is_chained_call;
|
||||
// Chained-call shape: receiver chain contains a `(`.
|
||||
assert!(receiver_is_chained_call("w.Header().Get"));
|
||||
assert!(receiver_is_chained_call("r.URL.Query().Get"));
|
||||
assert!(receiver_is_chained_call("db.Tx(opts).Query"));
|
||||
assert!(receiver_is_chained_call("client.WithToken(t).Get"));
|
||||
// Pure field/identifier chain — no `(` anywhere.
|
||||
assert!(!receiver_is_chained_call("repo.Find"));
|
||||
assert!(!receiver_is_chained_call("c.Fs.Create"));
|
||||
assert!(!receiver_is_chained_call("globalBatchJobsMetrics.save"));
|
||||
assert!(!receiver_is_chained_call("self.cache.insert"));
|
||||
// Bare callee with no receiver.
|
||||
assert!(!receiver_is_chained_call("Get"));
|
||||
assert!(!receiver_is_chained_call("HashMap::new"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_sink_class_suppresses_chained_call_verb_fallback() {
|
||||
use crate::auth_analysis::model::SinkClass;
|
||||
use std::collections::HashSet;
|
||||
let cfg = Config::default();
|
||||
let rules = build_auth_rules(&cfg, "go");
|
||||
let empty: HashSet<String> = HashSet::new();
|
||||
|
||||
// Chained-call receiver: verb-name fallback is suppressed.
|
||||
// The minio `w.Header().Get(constName)` cluster — `Get` would
|
||||
// match the `Get` read indicator on a bare receiver but the
|
||||
// chained-call shape masks the receiver type.
|
||||
assert_eq!(rules.classify_sink_class("w.Header().Get", &empty), None);
|
||||
assert_eq!(rules.classify_sink_class("r.URL.Query().Get", &empty), None);
|
||||
// Bare-identifier receiver: verb-name fallback still fires.
|
||||
// Pin the regression guard so this fix doesn't over-suppress
|
||||
// canonical data-layer shapes.
|
||||
assert_eq!(
|
||||
rules.classify_sink_class("repo.Find", &empty),
|
||||
Some(SinkClass::DbCrossTenantRead)
|
||||
);
|
||||
assert_eq!(
|
||||
rules.classify_sink_class("repo.Save", &empty),
|
||||
Some(SinkClass::DbMutation)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn sink_class_is_auth_relevant_only_for_non_local_classes() {
|
||||
use crate::auth_analysis::model::SinkClass;
|
||||
|
|
@ -1614,4 +1812,50 @@ mod tests {
|
|||
assert!(!rules.is_authorization_check("require_member"));
|
||||
assert!(!rules.is_authorization_check("require_owner"));
|
||||
}
|
||||
|
||||
/// Phase A4 — broader verb / role / context-suffix shapes seen in
|
||||
/// real-world Rust apps. `check_<resource>_<role>_action` is the
|
||||
/// canonical lemmy idiom; verifying the `is_<role>` predicate
|
||||
/// recogniser closes `is_mod_or_admin` style checks.
|
||||
#[test]
|
||||
fn is_authorization_check_recognises_check_action_and_predicate_shapes() {
|
||||
let cfg = Config::default();
|
||||
let rules = build_auth_rules(&cfg, "rust");
|
||||
|
||||
// `check_<resource>_<role>_action` (lemmy `check_community_*_action`)
|
||||
assert!(rules.is_authorization_check("check_community_user_action"));
|
||||
assert!(rules.is_authorization_check("check_community_mod_action"));
|
||||
assert!(rules.is_authorization_check("check_community_admin_action"));
|
||||
assert!(rules.is_authorization_check("check_post_owner_action"));
|
||||
// Verb variants
|
||||
assert!(rules.is_authorization_check("assert_post_owner"));
|
||||
assert!(rules.is_authorization_check("verify_doc_editor"));
|
||||
// `_allowed` / `_valid` context suffix wrapping the role
|
||||
assert!(rules.is_authorization_check("require_trip_member_allowed"));
|
||||
assert!(rules.is_authorization_check("ensure_doc_owner_valid"));
|
||||
// Path-namespaced
|
||||
assert!(rules.is_authorization_check("authz::check_community_user_action"));
|
||||
assert!(rules.is_authorization_check("self.check_community_mod_action"));
|
||||
|
||||
// `is_<role>` and `is_<role>_(or|and)_<role>` predicates.
|
||||
assert!(rules.is_authorization_check("is_admin"));
|
||||
assert!(rules.is_authorization_check("is_owner"));
|
||||
assert!(rules.is_authorization_check("is_member"));
|
||||
assert!(rules.is_authorization_check("is_moderator"));
|
||||
assert!(rules.is_authorization_check("is_mod_or_admin"));
|
||||
assert!(rules.is_authorization_check("is_owner_or_admin"));
|
||||
assert!(rules.is_authorization_check("is_admin_or_moderator"));
|
||||
assert!(rules.is_authorization_check("is_member_and_owner"));
|
||||
|
||||
// Negatives — predicates whose tokens are NOT known auth roles.
|
||||
assert!(!rules.is_authorization_check("is_user"));
|
||||
assert!(!rules.is_authorization_check("is_logged_in"));
|
||||
assert!(!rules.is_authorization_check("is_active"));
|
||||
assert!(!rules.is_authorization_check("is_visible"));
|
||||
assert!(!rules.is_authorization_check("is_admin_or_logged_in"));
|
||||
// `_action` / `_allowed` / `_valid` suffix without preceding
|
||||
// role still rejects.
|
||||
assert!(!rules.is_authorization_check("check_db_action"));
|
||||
assert!(!rules.is_authorization_check("check_session_valid"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -378,6 +378,19 @@ fn classify_rocket_param(
|
|||
}
|
||||
}
|
||||
|
||||
/// Classify a route-handler parameter type as a route-level auth
|
||||
/// guard. Used to tag the route as gated by a login or admin check
|
||||
/// when one of its parameters is a typed auth extractor.
|
||||
///
|
||||
/// **Looser than [`super::common::is_self_actor_type_text`] by
|
||||
/// design.** This recogniser runs only on the type of a route-bound
|
||||
/// parameter — appearing in a route handler signature is itself a
|
||||
/// strong signal — and a false positive here just over-credits the
|
||||
/// route with a login guard, which is conservative w.r.t. flagging.
|
||||
/// `is_self_actor_type_text` runs on every parameter, including in
|
||||
/// non-route functions, and a false positive there suppresses
|
||||
/// downstream `V.id` flagging entirely; that path uses a structural
|
||||
/// recogniser keyed on the `<PREFIX>User<SUFFIX>?` shape.
|
||||
fn classify_guard_type(type_text: &str) -> Option<AuthCheckKind> {
|
||||
let lower = type_text.to_ascii_lowercase();
|
||||
if is_extractor_wrapper(&lower) {
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -9,6 +9,7 @@ use crate::auth_analysis::extract::common::{attach_route_handler, collect_top_le
|
|||
use crate::auth_analysis::model::{
|
||||
AnalysisUnitKind, AuthorizationModel, CallSite, Framework, HttpMethod,
|
||||
};
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::utils::project::{DetectedFramework, FrameworkContext};
|
||||
use std::path::Path;
|
||||
use tree_sitter::{Node, Tree};
|
||||
|
|
@ -55,7 +56,7 @@ fn maybe_collect_django_path(
|
|||
return;
|
||||
};
|
||||
let callee = text(function, bytes);
|
||||
let target = callee.rsplit('.').next().unwrap_or(&callee);
|
||||
let target = bare_method_name(&callee);
|
||||
if !matches!(target, "path" | "re_path") {
|
||||
return;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -6,6 +6,7 @@ use super::common::{
|
|||
use crate::auth_analysis::config::{AuthAnalysisRules, matches_name};
|
||||
use crate::auth_analysis::extract::common::{collect_top_level_units, decorated_definition_child};
|
||||
use crate::auth_analysis::model::{AuthorizationModel, CallSite, Framework, HttpMethod};
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::utils::project::{DetectedFramework, FrameworkContext};
|
||||
use std::path::Path;
|
||||
use tree_sitter::{Node, Tree};
|
||||
|
|
@ -117,7 +118,7 @@ fn parse_flask_route_decorator(
|
|||
};
|
||||
|
||||
let callee = text(function, bytes);
|
||||
let method_name = callee.rsplit('.').next().unwrap_or(&callee);
|
||||
let method_name = bare_method_name(&callee);
|
||||
let arguments = decorator_expr.child_by_field_name("arguments")?;
|
||||
let args = named_children(arguments);
|
||||
|
||||
|
|
|
|||
|
|
@ -7,6 +7,7 @@ use crate::auth_analysis::config::{AuthAnalysisRules, matches_name, strip_quotes
|
|||
use crate::auth_analysis::model::{
|
||||
AnalysisUnitKind, AuthorizationModel, CallSite, Framework, HttpMethod, RouteRegistration,
|
||||
};
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::utils::project::{DetectedFramework, FrameworkContext};
|
||||
use std::path::Path;
|
||||
use tree_sitter::{Node, Tree};
|
||||
|
|
@ -175,7 +176,7 @@ fn class_filter_directives(body: Node<'_>, bytes: &[u8]) -> Vec<FilterDirective>
|
|||
continue;
|
||||
}
|
||||
let callee = call_name(child, bytes);
|
||||
let directive_name = callee.rsplit('.').next().unwrap_or(&callee);
|
||||
let directive_name = bare_method_name(&callee);
|
||||
if !matches_name(directive_name, "before_action")
|
||||
&& !matches_name(directive_name, "prepend_before_action")
|
||||
&& !matches_name(directive_name, "skip_before_action")
|
||||
|
|
|
|||
|
|
@ -7,6 +7,7 @@ use crate::auth_analysis::config::{AuthAnalysisRules, matches_name};
|
|||
use crate::auth_analysis::model::{
|
||||
AnalysisUnitKind, AuthorizationModel, CallSite, Framework, HttpMethod, RouteRegistration,
|
||||
};
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::utils::project::{DetectedFramework, FrameworkContext};
|
||||
use std::path::Path;
|
||||
use tree_sitter::{Node, Tree};
|
||||
|
|
@ -43,7 +44,7 @@ fn collect_before_filters(root: Node<'_>, bytes: &[u8]) -> Vec<CallSite> {
|
|||
continue;
|
||||
}
|
||||
let callee = call_name(child, bytes);
|
||||
let target = callee.rsplit('.').next().unwrap_or(&callee);
|
||||
let target = bare_method_name(&callee);
|
||||
if !matches_name(target, "before") {
|
||||
continue;
|
||||
}
|
||||
|
|
@ -79,7 +80,7 @@ fn maybe_collect_route(
|
|||
model: &mut AuthorizationModel,
|
||||
) {
|
||||
let callee = call_name(node, bytes);
|
||||
let route_name = callee.rsplit('.').next().unwrap_or(&callee);
|
||||
let route_name = bare_method_name(&callee);
|
||||
let method = match route_name.to_ascii_lowercase().as_str() {
|
||||
"get" => HttpMethod::Get,
|
||||
"post" => HttpMethod::Post,
|
||||
|
|
|
|||
|
|
@ -59,6 +59,7 @@ pub fn run_auth_analysis(
|
|||
// (skipped for slug-lookup / unit-test call sites).
|
||||
if let Some(types) = var_types {
|
||||
apply_var_types_to_model(&mut model, &rules, types);
|
||||
apply_typed_bounded_params(&mut model, types);
|
||||
}
|
||||
|
||||
// Lift per-function auth-check summaries and synthesise call-site
|
||||
|
|
@ -220,6 +221,47 @@ fn apply_var_types_to_model(
|
|||
}
|
||||
}
|
||||
|
||||
/// Populate each [`model::AnalysisUnit::typed_bounded_vars`] with the
|
||||
/// names of formal parameters whose SSA-inferred [`TypeKind`] is a
|
||||
/// payload-incompatible scalar ([`TypeKind::Int`] or
|
||||
/// [`TypeKind::Bool`]). Only parameter-rooted entries are considered;
|
||||
/// function-local bindings stay outside this set so a downstream
|
||||
/// reassignment from user input (`let id = req.params.id`) never gets
|
||||
/// suppressed by accident.
|
||||
///
|
||||
/// Phase 6: when a parameter's type is a [`TypeKind::Dto`], lift each
|
||||
/// of its `Int`/`Bool` fields as `typed_bounded_dto_fields[<param>]`
|
||||
/// so member-access subjects like `dto.age` are recognised as
|
||||
/// payload-incompatible. Only fires when the base param itself was
|
||||
/// recognised as a typed extractor by a Phase 1-2 matcher — bare
|
||||
/// parameters with no framework gate never lift their fields.
|
||||
fn apply_typed_bounded_params(model: &mut model::AuthorizationModel, var_types: &VarTypes) {
|
||||
for unit in &mut model.units {
|
||||
for name in &unit.params {
|
||||
let Some(ty) = var_types.get(name) else {
|
||||
continue;
|
||||
};
|
||||
match ty {
|
||||
TypeKind::Int | TypeKind::Bool => {
|
||||
unit.typed_bounded_vars.insert(name.clone());
|
||||
}
|
||||
TypeKind::Dto(dto) => {
|
||||
let mut bounded = Vec::new();
|
||||
for (field_name, field_kind) in &dto.fields {
|
||||
if matches!(field_kind, TypeKind::Int | TypeKind::Bool) {
|
||||
bounded.push(field_name.clone());
|
||||
}
|
||||
}
|
||||
if !bounded.is_empty() {
|
||||
unit.typed_bounded_dto_fields.insert(name.clone(), bounded);
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// First segment of a callee's receiver chain (`map.insert` → `"map"`,
|
||||
/// `self.cache.set` → `"self"`). Returns `None` when the callee has no
|
||||
/// receiver (e.g. a free function call).
|
||||
|
|
@ -676,9 +718,15 @@ mod tests {
|
|||
condition_texts: Vec::new(),
|
||||
line: 1,
|
||||
row_field_vars: HashMap::new(),
|
||||
var_alias_chain: HashMap::new(),
|
||||
row_population_data: HashMap::new(),
|
||||
self_actor_vars: HashSet::new(),
|
||||
self_actor_id_vars: HashSet::new(),
|
||||
authorized_sql_vars: HashSet::new(),
|
||||
const_bound_vars: HashSet::new(),
|
||||
typed_bounded_vars: HashSet::new(),
|
||||
typed_bounded_dto_fields: HashMap::new(),
|
||||
self_scoped_session_bases: HashSet::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -168,6 +168,26 @@ pub struct AnalysisUnit {
|
|||
/// row-level ownership-equality check on the row implicitly covers
|
||||
/// downstream uses of fields read from the same row.
|
||||
pub row_field_vars: HashMap<String, String>,
|
||||
/// Map from local variable name to the full member-chain expression
|
||||
/// it was bound from (`let community_id = req.community_id` →
|
||||
/// `community_id → "req.community_id"`). Distinct from
|
||||
/// `row_field_vars`, which records only the receiver (loses the
|
||||
/// field name). Powers the row-population reverse-walk's local-
|
||||
/// alias case: when a sink subject is a plain identifier, the
|
||||
/// reverse walk consults this map to also accept rows whose
|
||||
/// population args contain the aliased chain.
|
||||
pub var_alias_chain: HashMap<String, String>,
|
||||
/// Per row-binding metadata: the `let ROW = CALL(..)` declaration
|
||||
/// line and the value-refs appearing in the call's arguments.
|
||||
/// Populated for every `let V = call(..)` shape. Powers the
|
||||
/// "fetch-then-authorize" exemption in `checks.rs`: if a row-fetch
|
||||
/// operation produces variable `V` and SOME auth check elsewhere
|
||||
/// in the unit names `V`, the row-fetch operation is considered
|
||||
/// authorized — even though the check appears textually after the
|
||||
/// fetch. This is the standard idiom in row-level authz code:
|
||||
/// fetch the row first to extract the resource id, then call
|
||||
/// `check_<resource>_<role>(&user, &row, ...)` to authorize it.
|
||||
pub row_population_data: HashMap<String, (usize, Vec<ValueRef>)>,
|
||||
/// Variables bound to an authenticated-user value. Populated from
|
||||
/// `let V = require_auth(..).await?` (or any call matching the
|
||||
/// configured login-guard / authorization-check names) and from
|
||||
|
|
@ -196,6 +216,46 @@ pub struct AnalysisUnit {
|
|||
/// and treats a subject as covered when the chain terminates in
|
||||
/// one of these names.
|
||||
pub authorized_sql_vars: HashSet<String>,
|
||||
/// Local variables bound (by `let`, `:=`, `var`, `const`) to a
|
||||
/// pure literal — string, integer, float, or boolean. These are
|
||||
/// developer-chosen constants and cannot be user-controlled, so
|
||||
/// they must never trip `<lang>.auth.missing_ownership_check`
|
||||
/// even when the variable name passes `is_id_like`. Closes the
|
||||
/// gin/context_test.go FP where `id := "id"` triggered the rule.
|
||||
pub const_bound_vars: HashSet<String>,
|
||||
/// Function parameter names whose static type maps to a
|
||||
/// payload-incompatible scalar ([`crate::ssa::type_facts::TypeKind::Int`]
|
||||
/// or [`crate::ssa::type_facts::TypeKind::Bool`]). Populated
|
||||
/// per-file by [`super::apply_typed_bounded_params`] using the
|
||||
/// SSA-derived `VarTypes` map. Consulted by
|
||||
/// `is_typed_bounded_subject` so parameters like Spring `Long
|
||||
/// userId`, Axum `Path<i64>`, or FastAPI `user_id: int` are not
|
||||
/// classified as scoped-identifier subjects even when their name
|
||||
/// passes `is_id_like` — the framework guarantees the value is a
|
||||
/// number that cannot carry a SQL/file/shell payload.
|
||||
pub typed_bounded_vars: HashSet<String>,
|
||||
/// Phase 6: per-DTO-extractor parameter, the field names whose
|
||||
/// declared type is a payload-incompatible scalar. Map key is the
|
||||
/// parameter name (e.g. `dto`), value is the list of field names
|
||||
/// (e.g. `["age", "count"]`). Populated by
|
||||
/// [`super::apply_typed_bounded_params`] only when the parameter
|
||||
/// itself was recognised as a typed extractor by a Phase 1-2
|
||||
/// matcher — bare parameters with no framework gate never lift
|
||||
/// their fields.
|
||||
pub typed_bounded_dto_fields: HashMap<String, Vec<String>>,
|
||||
/// Per-unit dynamic session-base text set, supplementing the
|
||||
/// hard-coded list in `is_self_scoped_session_base`. Populated by
|
||||
/// the extractor when a parameter's static type signals a known
|
||||
/// auth-context shape — e.g. TRPC's `Options { ctx: { user:
|
||||
/// NonNullable<TrpcSessionUser> } }` adds `<localCtx>.user` so
|
||||
/// downstream `ctx.user.id` accesses count as actor context. Each
|
||||
/// entry is the dotted base text (e.g. `"ctx.user"`,
|
||||
/// `"opts.ctx.user"`) that should match a subject's `base` when
|
||||
/// the subject's `field` is an id-like field name. Distinct from
|
||||
/// `self_actor_vars` (single-segment locals) because TRPC
|
||||
/// destructures route through a base chain, not a top-level
|
||||
/// binding.
|
||||
pub self_scoped_session_bases: HashSet<String>,
|
||||
}
|
||||
|
||||
/// Per-function summary of which positional parameters are
|
||||
|
|
|
|||
1258
src/callgraph.rs
1258
src/callgraph.rs
File diff suppressed because it is too large
Load diff
|
|
@ -1107,6 +1107,7 @@ fn clone_preserves_all_sub_structs() {
|
|||
kind: StmtKind::Call,
|
||||
call: CallMeta {
|
||||
callee: Some("foo".into()),
|
||||
callee_text: Some("obj.foo".into()),
|
||||
outer_callee: Some("bar".into()),
|
||||
callee_span: Some((7, 17)),
|
||||
call_ordinal: 5,
|
||||
|
|
@ -2041,3 +2042,857 @@ fn numeric_length_access_ignores_method_calls_with_args() {
|
|||
"is_numeric_length_access must stay false for arg-bearing calls"
|
||||
);
|
||||
}
|
||||
|
||||
// ── Pointer-Phase 6 / W5: subscript lowering tests ────────────────────────
|
||||
|
||||
/// Scope for tests that flip `NYX_POINTER_ANALYSIS=1` so the CFG-side
|
||||
/// subscript synthesis activates. The env-var is restored afterwards
|
||||
/// so the rest of the test suite stays bit-identical to the unset
|
||||
/// state. Mirrors the env-var serialisation pattern used elsewhere in
|
||||
/// the test suite (see `tests/pointer_disabled_bit_identity.rs`).
|
||||
use std::sync::Mutex;
|
||||
static POINTER_ENV_GUARD: Mutex<()> = Mutex::new(());
|
||||
|
||||
fn with_pointer_env<R>(value: Option<&str>, f: impl FnOnce() -> R) -> R {
|
||||
let _lock = POINTER_ENV_GUARD.lock().unwrap_or_else(|e| e.into_inner());
|
||||
let prev = std::env::var("NYX_POINTER_ANALYSIS").ok();
|
||||
unsafe {
|
||||
match value {
|
||||
Some(v) => std::env::set_var("NYX_POINTER_ANALYSIS", v),
|
||||
None => std::env::remove_var("NYX_POINTER_ANALYSIS"),
|
||||
}
|
||||
}
|
||||
let r = f();
|
||||
unsafe {
|
||||
match prev {
|
||||
Some(v) => std::env::set_var("NYX_POINTER_ANALYSIS", v),
|
||||
None => std::env::remove_var("NYX_POINTER_ANALYSIS"),
|
||||
}
|
||||
}
|
||||
r
|
||||
}
|
||||
|
||||
fn with_pointer_on<R>(f: impl FnOnce() -> R) -> R {
|
||||
with_pointer_env(Some("1"), f)
|
||||
}
|
||||
|
||||
fn count_nodes_with_callee(cfg: &Cfg, callee: &str) -> usize {
|
||||
cfg.node_indices()
|
||||
.filter(|i| cfg[*i].call.callee.as_deref() == Some(callee))
|
||||
.count()
|
||||
}
|
||||
|
||||
fn find_node_with_callee<'a>(cfg: &'a Cfg, callee: &str) -> Option<&'a NodeInfo> {
|
||||
cfg.node_indices()
|
||||
.map(|i| &cfg[i])
|
||||
.find(|n| n.call.callee.as_deref() == Some(callee))
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_subscript_read_lowers_to_index_get_call() {
|
||||
with_pointer_on(|| {
|
||||
// `arr[0]` as a sink call argument should be pre-emitted as a
|
||||
// synth `__index_get__` call before the consuming sink.
|
||||
let src = br#"function f(arr) {
|
||||
sink(arr[0]);
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
let node = find_node_with_callee(&cfg, "__index_get__")
|
||||
.expect("__index_get__ node should be present");
|
||||
assert_eq!(node.call.receiver.as_deref(), Some("arr"));
|
||||
assert_eq!(node.call.arg_uses.len(), 1, "expect one arg group (index)");
|
||||
assert_eq!(node.call.arg_uses[0], vec!["0"]);
|
||||
assert!(
|
||||
node.taint
|
||||
.defines
|
||||
.as_deref()
|
||||
.is_some_and(|d| d.starts_with("__nyx_idxget_")),
|
||||
"synth defines should use the __nyx_idxget_ prefix"
|
||||
);
|
||||
});
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_subscript_write_lowers_to_index_set_call() {
|
||||
with_pointer_on(|| {
|
||||
let src = br#"function f(arr, v) {
|
||||
arr[0] = v;
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
let node = find_node_with_callee(&cfg, "__index_set__")
|
||||
.expect("__index_set__ node should be present");
|
||||
assert_eq!(node.call.receiver.as_deref(), Some("arr"));
|
||||
assert_eq!(
|
||||
node.call.arg_uses.len(),
|
||||
2,
|
||||
"expect arg_uses [[idx], [val]]"
|
||||
);
|
||||
assert_eq!(node.call.arg_uses[0], vec!["0"]);
|
||||
assert_eq!(node.call.arg_uses[1], vec!["v"]);
|
||||
});
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn py_subscript_read_lowers_to_index_get_call() {
|
||||
with_pointer_on(|| {
|
||||
let src = br#"def f(arr):
|
||||
sink(arr[0])
|
||||
"#;
|
||||
let ts_lang = Language::from(tree_sitter_python::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "python", ts_lang);
|
||||
let node = find_node_with_callee(&cfg, "__index_get__")
|
||||
.expect("python: __index_get__ node should be present");
|
||||
assert_eq!(node.call.receiver.as_deref(), Some("arr"));
|
||||
});
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn py_subscript_write_lowers_to_index_set_call() {
|
||||
with_pointer_on(|| {
|
||||
let src = br#"def f(arr, v):
|
||||
arr[0] = v
|
||||
"#;
|
||||
let ts_lang = Language::from(tree_sitter_python::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "python", ts_lang);
|
||||
let node = find_node_with_callee(&cfg, "__index_set__")
|
||||
.expect("python: __index_set__ node should be present");
|
||||
assert_eq!(node.call.receiver.as_deref(), Some("arr"));
|
||||
assert_eq!(node.call.arg_uses.len(), 2);
|
||||
assert_eq!(node.call.arg_uses[1], vec!["v"]);
|
||||
});
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn go_index_expr_read_lowers_to_index_get_call() {
|
||||
with_pointer_on(|| {
|
||||
let src = br#"package main
|
||||
func f(arr []string) {
|
||||
sink(arr[0])
|
||||
}
|
||||
"#;
|
||||
let ts_lang = Language::from(tree_sitter_go::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "go", ts_lang);
|
||||
let node = find_node_with_callee(&cfg, "__index_get__")
|
||||
.expect("go: __index_get__ node should be present");
|
||||
assert_eq!(node.call.receiver.as_deref(), Some("arr"));
|
||||
});
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn go_index_expr_write_lowers_to_index_set_call() {
|
||||
with_pointer_on(|| {
|
||||
let src = br#"package main
|
||||
func f(m map[string]int, k string, v int) {
|
||||
m[k] = v
|
||||
}
|
||||
"#;
|
||||
let ts_lang = Language::from(tree_sitter_go::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "go", ts_lang);
|
||||
let node = find_node_with_callee(&cfg, "__index_set__")
|
||||
.expect("go: __index_set__ node should be present");
|
||||
assert_eq!(node.call.receiver.as_deref(), Some("m"));
|
||||
assert_eq!(node.call.arg_uses.len(), 2);
|
||||
assert_eq!(node.call.arg_uses[0], vec!["k"]);
|
||||
assert_eq!(node.call.arg_uses[1], vec!["v"]);
|
||||
});
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pointer_disabled_skips_subscript_synthesis() {
|
||||
// Strict-additive contract: when NYX_POINTER_ANALYSIS=0 the CFG
|
||||
// must contain zero __index_get__/__index_set__ nodes regardless
|
||||
// of the source shape. This is the off-by-default invariant the
|
||||
// bit-identity gate relies on.
|
||||
with_pointer_env(Some("0"), || {
|
||||
let src = br#"function f(arr, v) {
|
||||
sink(arr[0]);
|
||||
arr[1] = v;
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
assert_eq!(count_nodes_with_callee(&cfg, "__index_get__"), 0);
|
||||
assert_eq!(count_nodes_with_callee(&cfg, "__index_set__"), 0);
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Gap-filling: switch / for / do-while / nested loops / re-throw
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
/// JS `switch` should produce one synthetic dispatch `If` node per
|
||||
/// case (default excluded when at the tail), plus True edges into
|
||||
/// each case body. Verifies the discriminant cascade is wired.
|
||||
#[test]
|
||||
fn js_switch_cascade_has_one_if_per_case() {
|
||||
let src = br#"function f(x) {
|
||||
switch (x) {
|
||||
case 1: a(); break;
|
||||
case 2: b(); break;
|
||||
default: c();
|
||||
}
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
// Two non-default cases => 2 dispatch If nodes (the tail default
|
||||
// is wired via the previous header's False edge, not its own If).
|
||||
assert_eq!(
|
||||
if_nodes(&cfg).len(),
|
||||
2,
|
||||
"switch with 2 explicit cases + default should emit 2 dispatch If nodes"
|
||||
);
|
||||
|
||||
// Each dispatch If must have at least one True and one False edge
|
||||
// (True → case body, False → next case / default).
|
||||
for i in if_nodes(&cfg) {
|
||||
let trues = cfg
|
||||
.edges(i)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::True))
|
||||
.count();
|
||||
let falses = cfg
|
||||
.edges(i)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::False))
|
||||
.count();
|
||||
assert!(
|
||||
trues >= 1,
|
||||
"case dispatch should have at least one True edge"
|
||||
);
|
||||
assert!(
|
||||
falses >= 1,
|
||||
"case dispatch should have at least one False edge"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Default case in the *middle* of a switch must be reordered to the
|
||||
/// tail so the dispatch cascade stays a clean True/False chain. The
|
||||
/// observable CFG shape (number of If nodes, presence of True/False
|
||||
/// edges per dispatch) should match the all-default-at-tail case.
|
||||
#[test]
|
||||
fn js_switch_default_in_middle_reorders_to_tail() {
|
||||
let src = br#"function f(x) {
|
||||
switch (x) {
|
||||
case 1: a(); break;
|
||||
default: c(); break;
|
||||
case 2: b(); break;
|
||||
}
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
// 2 non-default cases ⇒ 2 If dispatch nodes (default reordered to tail).
|
||||
assert_eq!(
|
||||
if_nodes(&cfg).len(),
|
||||
2,
|
||||
"default-in-middle should still produce one If per non-default case"
|
||||
);
|
||||
}
|
||||
|
||||
/// JS switch fall-through (`case 1: a(); case 2: b();`) — case 1's
|
||||
/// exit should flow into case 2's body so taint from `first()`
|
||||
/// reaches `second()`'s sinks.
|
||||
///
|
||||
/// We assert two things:
|
||||
/// (a) Reachability: `second()` is reachable from `first()` over
|
||||
/// forward edges. This is the semantic property taint analysis
|
||||
/// depends on; checking it directly avoids over-fitting to the
|
||||
/// structural shape.
|
||||
/// (b) `first()` has a non-Back forward out-edge that lands inside
|
||||
/// the case-2 sub-graph (the actual fall-through wire), so we
|
||||
/// prove there *is* a fall-through edge — not just an
|
||||
/// Entry→…→Exit path that happens to walk through both calls
|
||||
/// via the dispatch chain.
|
||||
///
|
||||
/// Note on the structural shape: case bodies are wrapped in synthetic
|
||||
/// Seq passthrough nodes (one per surrounding scope), so the
|
||||
/// fall-through edge from `first()` lands on the *first wrapper
|
||||
/// Seq node* of case 2, not on `second()` itself. Asserting that
|
||||
/// `second()` has ≥2 in-edges would therefore be wrong — the True
|
||||
/// edge from the case-2 dispatch If targets the wrapper node, and
|
||||
/// only a single Seq chain leads from there to `second()`.
|
||||
#[test]
|
||||
fn js_switch_fallthrough_no_break() {
|
||||
use std::collections::HashSet;
|
||||
let src = br#"function f(x) {
|
||||
switch (x) {
|
||||
case 1: first();
|
||||
case 2: second(); break;
|
||||
}
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let first = cfg
|
||||
.node_indices()
|
||||
.find(|&n| cfg[n].call.callee.as_deref() == Some("first"))
|
||||
.expect("expected a Call node for `first`");
|
||||
let second = cfg
|
||||
.node_indices()
|
||||
.find(|&n| cfg[n].call.callee.as_deref() == Some("second"))
|
||||
.expect("expected a Call node for `second`");
|
||||
|
||||
// (a) Reachability from first → second over forward (non-Back) edges.
|
||||
let mut seen: HashSet<NodeIndex> = HashSet::new();
|
||||
let mut stack = vec![first];
|
||||
while let Some(n) = stack.pop() {
|
||||
if !seen.insert(n) {
|
||||
continue;
|
||||
}
|
||||
for e in cfg.edges(n) {
|
||||
if matches!(e.weight(), EdgeKind::Seq | EdgeKind::True | EdgeKind::False) {
|
||||
stack.push(e.target());
|
||||
}
|
||||
}
|
||||
}
|
||||
assert!(
|
||||
seen.contains(&second),
|
||||
"fall-through: `second` must be reachable from `first` over forward edges"
|
||||
);
|
||||
|
||||
// (b) Prove the fall-through edge exists: `first()` must have at
|
||||
// least one outgoing forward edge whose target is *not*
|
||||
// reachable from the function entry without first going
|
||||
// through `first()`. The straightforward check: `first()`
|
||||
// itself must have at least one outgoing Seq edge (the
|
||||
// fall-through wire is always Seq).
|
||||
let first_seq_outs = cfg
|
||||
.edges(first)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::Seq))
|
||||
.count();
|
||||
assert!(
|
||||
first_seq_outs >= 1,
|
||||
"fall-through: `first()` must have a Seq out-edge (the fall-through wire)"
|
||||
);
|
||||
}
|
||||
|
||||
/// `for (i = 0; i < 10; i++) { body(); }` should produce a Loop node
|
||||
/// with at least one Back edge from the body back to the loop header.
|
||||
#[test]
|
||||
fn js_for_loop_has_back_edge() {
|
||||
let src = br#"function f() { for (let i = 0; i < 10; i++) { body(); } }"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let loop_nodes: Vec<_> = cfg
|
||||
.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Loop)
|
||||
.collect();
|
||||
assert_eq!(loop_nodes.len(), 1, "expected exactly one Loop node");
|
||||
|
||||
let back_edges = cfg
|
||||
.edge_references()
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::Back))
|
||||
.count();
|
||||
assert!(
|
||||
back_edges >= 1,
|
||||
"for loop must have at least one Back edge to its header"
|
||||
);
|
||||
}
|
||||
|
||||
/// `do { ... } while (cond);` is mapped to `Kind::While` for many
|
||||
/// languages but the grammar puts the body *before* the condition.
|
||||
/// The CFG must still produce a Loop node and at least one Back edge.
|
||||
#[test]
|
||||
fn js_do_while_has_loop_node_and_back_edge() {
|
||||
let src = br#"function f() { do { body(); } while (cond); }"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let loop_count = cfg
|
||||
.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Loop)
|
||||
.count();
|
||||
assert_eq!(loop_count, 1, "do-while should produce one Loop node");
|
||||
assert!(
|
||||
cfg.edge_references()
|
||||
.any(|e| matches!(e.weight(), EdgeKind::Back)),
|
||||
"do-while must have at least one Back edge"
|
||||
);
|
||||
}
|
||||
|
||||
/// In `outer: while (a) { while (b) { break; } }`, the `break`
|
||||
/// terminates only the *inner* loop. Equivalent for our CFG: the
|
||||
/// break's predecessors should reach the inner loop's exit frontier
|
||||
/// without crossing the outer loop's body again. We can verify this
|
||||
/// structurally: there must be exactly two Loop nodes and at least
|
||||
/// one Break node whose forward (Seq) successor is *not* the outer
|
||||
/// header.
|
||||
#[test]
|
||||
fn js_nested_while_break_targets_inner_loop() {
|
||||
let src = br#"function f() {
|
||||
while (a) {
|
||||
while (b) { break; }
|
||||
inner_after();
|
||||
}
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let loops: Vec<_> = cfg
|
||||
.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Loop)
|
||||
.collect();
|
||||
assert_eq!(loops.len(), 2, "expected two Loop nodes");
|
||||
|
||||
let breaks: Vec<_> = cfg
|
||||
.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Break)
|
||||
.collect();
|
||||
assert_eq!(breaks.len(), 1, "expected exactly one Break node");
|
||||
|
||||
// The inner loop body's break should NOT close back via Back edge
|
||||
// onto the outer header (outer header is loops[0] in source order).
|
||||
let outer_header = loops[0];
|
||||
let brk = breaks[0];
|
||||
let crosses_outer = cfg
|
||||
.edges(brk)
|
||||
.any(|e| e.target() == outer_header && matches!(e.weight(), EdgeKind::Back));
|
||||
assert!(
|
||||
!crosses_outer,
|
||||
"inner break must not back-edge onto the outer loop header"
|
||||
);
|
||||
}
|
||||
|
||||
/// `continue` in the inner loop must back-edge onto the *inner*
|
||||
/// header, not the outer. With nested while loops we expect exactly
|
||||
/// one Continue node and at least one Back edge originating at it
|
||||
/// going to the inner (second-emitted) Loop header.
|
||||
#[test]
|
||||
fn js_nested_while_continue_targets_inner_loop() {
|
||||
let src = br#"function f() {
|
||||
while (a) {
|
||||
while (b) { continue; }
|
||||
}
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let loops: Vec<_> = cfg
|
||||
.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Loop)
|
||||
.collect();
|
||||
assert_eq!(loops.len(), 2, "expected two Loop nodes");
|
||||
let outer_header = loops[0];
|
||||
let inner_header = loops[1];
|
||||
|
||||
let cont = cfg
|
||||
.node_indices()
|
||||
.find(|&n| cfg[n].kind == StmtKind::Continue)
|
||||
.expect("expected Continue node");
|
||||
|
||||
let back_edges_from_cont: Vec<_> = cfg
|
||||
.edges(cont)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::Back))
|
||||
.collect();
|
||||
assert!(
|
||||
!back_edges_from_cont.is_empty(),
|
||||
"continue must originate at least one Back edge"
|
||||
);
|
||||
assert!(
|
||||
back_edges_from_cont
|
||||
.iter()
|
||||
.any(|e| e.target() == inner_header),
|
||||
"continue's Back edge must target the inner loop header"
|
||||
);
|
||||
assert!(
|
||||
!back_edges_from_cont
|
||||
.iter()
|
||||
.any(|e| e.target() == outer_header),
|
||||
"continue must not back-edge onto the outer loop header"
|
||||
);
|
||||
}
|
||||
|
||||
/// `throw` inside a `catch` block should still register a throw
|
||||
/// target so a surrounding outer try (or function-level exit) can
|
||||
/// receive it. We verify here that the throw produces a Throw node
|
||||
/// even when it is reached only via an Exception edge from the inner
|
||||
/// try body (i.e. the re-throw path is preserved structurally).
|
||||
#[test]
|
||||
fn js_throw_inside_catch_emits_throw_node() {
|
||||
let src = br#"function f() {
|
||||
try {
|
||||
try { foo(); } catch (e) { throw e; }
|
||||
} catch (e2) {
|
||||
handle();
|
||||
}
|
||||
}"#;
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let throws: Vec<_> = cfg
|
||||
.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Throw)
|
||||
.collect();
|
||||
assert_eq!(
|
||||
throws.len(),
|
||||
1,
|
||||
"expected exactly one Throw node for the inner re-throw"
|
||||
);
|
||||
|
||||
// The outer `catch (e2)` body must be reachable. Check that the
|
||||
// `handle()` call exists and has at least one incoming edge.
|
||||
let handle = cfg
|
||||
.node_indices()
|
||||
.find(|&n| cfg[n].call.callee.as_deref() == Some("handle"))
|
||||
.expect("expected `handle()` call node");
|
||||
let in_edges = cfg
|
||||
.edges_directed(handle, petgraph::Direction::Incoming)
|
||||
.count();
|
||||
assert!(in_edges >= 1, "outer catch body must be reachable");
|
||||
}
|
||||
|
||||
/// Empty if/else branches (e.g., `if (a) {} else {}`) must not panic
|
||||
/// and the resulting CFG must still have a single If node with both
|
||||
/// True and False edges going somewhere reachable. This guards
|
||||
/// against off-by-one bugs in `then_first_node`/exits handling.
|
||||
#[test]
|
||||
fn js_if_with_empty_branches_does_not_panic() {
|
||||
let src = b"function f() { if (a) {} else {} return; }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _entry) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
let ifs = if_nodes(&cfg);
|
||||
assert_eq!(ifs.len(), 1, "expected one If node");
|
||||
let i = ifs[0];
|
||||
|
||||
let trues: Vec<_> = cfg
|
||||
.edges(i)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::True))
|
||||
.collect();
|
||||
let falses: Vec<_> = cfg
|
||||
.edges(i)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::False))
|
||||
.collect();
|
||||
assert!(!trues.is_empty(), "empty-then If must still emit True edge");
|
||||
assert!(
|
||||
!falses.is_empty(),
|
||||
"empty-else If must still emit False edge"
|
||||
);
|
||||
}
|
||||
|
||||
/// A function body with no statements should still produce a
|
||||
/// well-formed CFG (entry/exit only); no panic, no orphan nodes from
|
||||
/// `build_sub` returning an empty exit set.
|
||||
#[test]
|
||||
fn js_empty_function_body_well_formed() {
|
||||
let src = b"function f() {}";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let file_cfg = parse_to_file_cfg(src, "javascript", ts_lang);
|
||||
// We expect 2 bodies: top-level + the function body. Both must be
|
||||
// valid graphs with at least an entry node.
|
||||
assert!(
|
||||
file_cfg.bodies.len() >= 2,
|
||||
"expected at least 2 bodies (top-level + function)"
|
||||
);
|
||||
for body in &file_cfg.bodies {
|
||||
assert!(
|
||||
body.graph.node_count() >= 1,
|
||||
"every body must have at least one node"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Loop CFG structure: every loop variant must produce a Loop header
|
||||
// with at least one Back edge that targets that header. Without these
|
||||
// invariants the SSA loop-induction-variable phi placement is wrong
|
||||
// and the abstract-interp widening points are missed.
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn loop_headers(cfg: &Cfg) -> Vec<NodeIndex> {
|
||||
cfg.node_indices()
|
||||
.filter(|&n| cfg[n].kind == StmtKind::Loop)
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn back_edges(cfg: &Cfg) -> Vec<(NodeIndex, NodeIndex)> {
|
||||
cfg.edge_references()
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::Back))
|
||||
.map(|e| (e.source(), e.target()))
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn assert_loop_with_back_edge(cfg: &Cfg, label: &str) {
|
||||
let headers = loop_headers(cfg);
|
||||
assert!(
|
||||
!headers.is_empty(),
|
||||
"{label}: expected at least one Loop header, found none"
|
||||
);
|
||||
let backs = back_edges(cfg);
|
||||
assert!(
|
||||
!backs.is_empty(),
|
||||
"{label}: expected at least one Back edge"
|
||||
);
|
||||
for (_, dst) in &backs {
|
||||
assert!(
|
||||
headers.contains(dst),
|
||||
"{label}: Back edge target {:?} is not a Loop header (headers={:?})",
|
||||
dst,
|
||||
headers
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_for_loop_back_edge() {
|
||||
let src = b"function f() { for (let i = 0; i < 10; i++) { body(i); } }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "js classic for");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_do_while_back_edge() {
|
||||
let src = b"function f() { do { body(); } while (cond()); }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "js do-while");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_for_in_back_edge() {
|
||||
let src = b"function f() { for (let k in obj) { use(k); } }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "js for-in");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_for_of_back_edge() {
|
||||
let src = b"function f() { for (const x of items) { use(x); } }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
// for-of is usually classified the same as for-in / for via
|
||||
// for_in_statement. Still, body-with-back-edge invariant must hold.
|
||||
assert_loop_with_back_edge(&cfg, "js for-of");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_for_loop_back_edge() {
|
||||
let src = b"def f():\n for x in items:\n use(x)\n";
|
||||
let ts_lang = Language::from(tree_sitter_python::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "python", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "python for");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_while_loop_back_edge() {
|
||||
let src = b"def f():\n while cond():\n use(x)\n";
|
||||
let ts_lang = Language::from(tree_sitter_python::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "python", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "python while");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_enhanced_for_back_edge() {
|
||||
let src = b"class A { void f(int[] xs) { for (int x : xs) { use(x); } } }";
|
||||
let ts_lang = Language::from(tree_sitter_java::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "java", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "java enhanced-for");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_do_while_back_edge() {
|
||||
let src = b"class A { void f() { do { body(); } while (cond()); } }";
|
||||
let ts_lang = Language::from(tree_sitter_java::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "java", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "java do-while");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_range_for_back_edge() {
|
||||
let src = b"void f(int* xs) { for (int x : range) { use(x); } }";
|
||||
let ts_lang = Language::from(tree_sitter_cpp::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "cpp", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "cpp range-for");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn c_do_while_back_edge() {
|
||||
let src = b"void f() { do { body(); } while (cond()); }";
|
||||
let ts_lang = Language::from(tree_sitter_c::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "c", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "c do-while");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn go_for_loop_back_edge() {
|
||||
let src = b"package p\nfunc f() { for i := 0; i < 10; i++ { body(i) } }";
|
||||
let ts_lang = Language::from(tree_sitter_go::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "go", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "go for");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ruby_while_back_edge() {
|
||||
let src = b"def f\n while cond\n body\n end\nend\n";
|
||||
let ts_lang = Language::from(tree_sitter_ruby::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "ruby", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "ruby while");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ruby_until_back_edge() {
|
||||
// `until cond` is `while not cond`; should still produce a loop.
|
||||
let src = b"def f\n until done\n body\n end\nend\n";
|
||||
let ts_lang = Language::from(tree_sitter_ruby::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "ruby", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "ruby until");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn php_foreach_back_edge() {
|
||||
let src = b"<?php function f($items) { foreach ($items as $x) { use($x); } }";
|
||||
let ts_lang = Language::from(tree_sitter_php::LANGUAGE_PHP);
|
||||
let (cfg, _) = parse_and_build(src, "php", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "php foreach");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_for_loop_back_edge() {
|
||||
let src = b"fn f() { for x in 0..10 { use_fn(x); } }";
|
||||
let ts_lang = Language::from(tree_sitter_rust::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "rust", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "rust for");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_while_loop_back_edge() {
|
||||
let src = b"fn f() { while cond() { body(); } }";
|
||||
let ts_lang = Language::from(tree_sitter_rust::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "rust", ts_lang);
|
||||
assert_loop_with_back_edge(&cfg, "rust while");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn nested_loops_two_headers_two_back_edges() {
|
||||
// Nested loops must produce two distinct loop headers and a back
|
||||
// edge for each. This guards against headers being collapsed and
|
||||
// back edges being mis-routed to the outer header.
|
||||
let src = b"function f() { for (let i = 0; i < 10; i++) { for (let j = 0; j < 10; j++) { use(i, j); } } }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
let headers = loop_headers(&cfg);
|
||||
assert_eq!(headers.len(), 2, "expected 2 loop headers in nested loops");
|
||||
let backs = back_edges(&cfg);
|
||||
assert!(
|
||||
backs.len() >= 2,
|
||||
"expected ≥2 back edges in nested loops, got {}",
|
||||
backs.len()
|
||||
);
|
||||
// Every back edge must target one of the two headers.
|
||||
for (_, dst) in &backs {
|
||||
assert!(headers.contains(dst), "back edge target not a loop header");
|
||||
}
|
||||
// Each header should be the target of at least one back edge.
|
||||
let mut hit = std::collections::HashSet::new();
|
||||
for (_, dst) in &backs {
|
||||
hit.insert(*dst);
|
||||
}
|
||||
assert_eq!(
|
||||
hit.len(),
|
||||
2,
|
||||
"each header must receive at least one back edge"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn loop_with_break_no_back_edge_from_break() {
|
||||
// A `break` short-circuits the loop body — its edge must NOT be a
|
||||
// back edge to the header (it leaves the loop entirely).
|
||||
let src = b"function f() { while (cond()) { if (done()) break; body(); } }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
let headers = loop_headers(&cfg);
|
||||
assert_eq!(headers.len(), 1, "expected 1 loop header");
|
||||
let header = headers[0];
|
||||
|
||||
// Find any Break node and verify none of its outgoing edges are
|
||||
// Back edges to the header.
|
||||
for n in cfg.node_indices() {
|
||||
if cfg[n].kind != StmtKind::Break {
|
||||
continue;
|
||||
}
|
||||
for e in cfg.edges(n) {
|
||||
assert!(
|
||||
!(matches!(e.weight(), EdgeKind::Back) && e.target() == header),
|
||||
"break must not produce a back edge to the loop header"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn loop_with_continue_back_edge_to_header() {
|
||||
// `continue` must produce a Back edge to the loop header.
|
||||
let src = b"function f() { while (cond()) { if (skip()) continue; body(); } }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
let headers = loop_headers(&cfg);
|
||||
assert_eq!(headers.len(), 1);
|
||||
let header = headers[0];
|
||||
|
||||
let mut found = false;
|
||||
for n in cfg.node_indices() {
|
||||
if cfg[n].kind != StmtKind::Continue {
|
||||
continue;
|
||||
}
|
||||
for e in cfg.edges(n) {
|
||||
if matches!(e.weight(), EdgeKind::Back) && e.target() == header {
|
||||
found = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
assert!(
|
||||
found,
|
||||
"expected at least one Back edge from a Continue node to the loop header"
|
||||
);
|
||||
}
|
||||
|
||||
/// Regression guard for the 2026-04-27 chained-method-call inner-gate
|
||||
/// rebinding (CVE-2025-64430 hunt session). Without the fix, the outer
|
||||
/// `.on('error', cb)` call swallows classification of the inner
|
||||
/// `http.get(uri, cb)` so neither the gate label nor `sink_payload_args`
|
||||
/// are populated for this CFG node.
|
||||
#[test]
|
||||
fn chained_method_call_rebinds_to_inner_gated_sink() {
|
||||
// Use `https.get` (a gated SSRF sink) so the gate fires only when
|
||||
// the inner-call rebinding works. The outer `.on(...)` is a plain
|
||||
// method call that does not classify on its own.
|
||||
let src = b"function f(uri) { https.get(uri, r => {}).on('error', e => {}); }";
|
||||
let ts_lang = Language::from(tree_sitter_javascript::LANGUAGE);
|
||||
let (cfg, _) = parse_and_build(src, "javascript", ts_lang);
|
||||
|
||||
// Find a Call node whose `text` was rebound to the inner gated callee.
|
||||
let mut found = false;
|
||||
for n in cfg.node_indices() {
|
||||
let info = &cfg[n];
|
||||
if info.kind != StmtKind::Call {
|
||||
continue;
|
||||
}
|
||||
let Some(callee) = info.call.callee.as_deref() else {
|
||||
continue;
|
||||
};
|
||||
// The inner callee is `https.get`; the outer chained `.on` should
|
||||
// no longer be the recorded callee for this node.
|
||||
if callee.ends_with("https.get") {
|
||||
// The inner-gate path must have populated sink_payload_args
|
||||
// (the gate's payload arg is position 0 — the URL string).
|
||||
assert!(
|
||||
info.call.sink_payload_args.is_some(),
|
||||
"expected sink_payload_args to be populated for chained \
|
||||
inner-gate https.get; got None on call node with callee {callee:?}"
|
||||
);
|
||||
found = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
assert!(
|
||||
found,
|
||||
"expected at least one Call node whose callee was rebound from \
|
||||
the outer `.on(...)` to the inner `https.get` after the chained- \
|
||||
call inner-gate rebinding fired"
|
||||
);
|
||||
}
|
||||
|
|
|
|||
450
src/cfg/dto.rs
Normal file
450
src/cfg/dto.rs
Normal file
|
|
@ -0,0 +1,450 @@
|
|||
//! Phase 6.1: per-language DTO definition collectors.
|
||||
//!
|
||||
//! Walks a parsed file's AST and emits `(class_name, DtoFields)` pairs
|
||||
//! for class / interface / struct / Pydantic-model declarations whose
|
||||
//! field types resolve to a recognised [`TypeKind`].
|
||||
//!
|
||||
//! Strictly additive: classes whose fields cannot be classified produce
|
||||
//! a `DtoFields` with an empty `fields` map — the caller must decide
|
||||
//! whether to use that as a "Dto with no inferred fields" or fall back
|
||||
//! to the pre-Phase-6 Object/Unknown classification.
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
use tree_sitter::Node;
|
||||
|
||||
use super::helpers::text_of;
|
||||
use super::params::{java_type_to_kind, python_primitive_to_kind, ts_type_to_kind};
|
||||
use crate::ssa::type_facts::{DtoFields, TypeKind};
|
||||
|
||||
/// Collect all DTO-shaped class definitions in a parsed file.
|
||||
///
|
||||
/// Dispatches per-language; returns an empty map for languages without
|
||||
/// a Phase 6 collector (Go, Ruby, PHP, C/C++ — DTOs in those ecosystems
|
||||
/// either don't follow framework conventions Nyx tracks today, or are
|
||||
/// already covered by other type-inference paths).
|
||||
pub(super) fn collect_dto_classes(
|
||||
root: Node<'_>,
|
||||
lang: &str,
|
||||
code: &[u8],
|
||||
) -> HashMap<String, DtoFields> {
|
||||
let mut out: HashMap<String, DtoFields> = HashMap::new();
|
||||
match lang {
|
||||
"java" => collect_java(root, code, &mut out),
|
||||
"typescript" | "ts" | "javascript" | "js" => collect_ts(root, code, &mut out),
|
||||
"rust" | "rs" => collect_rust(root, code, &mut out),
|
||||
"python" | "py" => collect_python(root, code, &mut out),
|
||||
_ => {}
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Java
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Walk the AST for `class_declaration` nodes whose body contains
|
||||
/// `field_declaration`s with classifiable types. Only class-level
|
||||
/// fields are collected; method-local declarations are ignored.
|
||||
fn collect_java(root: Node<'_>, code: &[u8], out: &mut HashMap<String, DtoFields>) {
|
||||
walk(root, &mut |node| {
|
||||
if node.kind() != "class_declaration" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(class_name) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
let Some(body) = node.child_by_field_name("body") else {
|
||||
return;
|
||||
};
|
||||
let mut fields = DtoFields::new(class_name.clone());
|
||||
let mut cursor = body.walk();
|
||||
for child in body.named_children(&mut cursor) {
|
||||
if child.kind() != "field_declaration" {
|
||||
continue;
|
||||
}
|
||||
let Some(type_node) = child.child_by_field_name("type") else {
|
||||
continue;
|
||||
};
|
||||
let Some(type_text) = text_of(type_node, code) else {
|
||||
continue;
|
||||
};
|
||||
let Some(kind) = java_type_to_kind(&type_text) else {
|
||||
continue;
|
||||
};
|
||||
// The declarator field carries the variable name(s).
|
||||
let Some(declarator) = child.child_by_field_name("declarator") else {
|
||||
continue;
|
||||
};
|
||||
// `variable_declarator` has a `name` field for the simple case.
|
||||
let Some(name_inner) = declarator.child_by_field_name("name") else {
|
||||
continue;
|
||||
};
|
||||
if let Some(field_name) = text_of(name_inner, code) {
|
||||
fields.insert(field_name, kind.clone());
|
||||
}
|
||||
}
|
||||
if !fields.fields.is_empty() {
|
||||
out.insert(class_name, fields);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// TypeScript / JavaScript
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Walk for `interface_declaration` and `class_declaration` nodes.
|
||||
/// Interfaces with `property_signature` children and classes with
|
||||
/// `public_field_definition` children produce DTO entries.
|
||||
fn collect_ts(root: Node<'_>, code: &[u8], out: &mut HashMap<String, DtoFields>) {
|
||||
walk(root, &mut |node| match node.kind() {
|
||||
"interface_declaration" => {
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(class_name) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
let Some(body) = node.child_by_field_name("body") else {
|
||||
return;
|
||||
};
|
||||
let mut fields = DtoFields::new(class_name.clone());
|
||||
let mut cursor = body.walk();
|
||||
for child in body.named_children(&mut cursor) {
|
||||
if child.kind() != "property_signature" {
|
||||
continue;
|
||||
}
|
||||
let Some((field_name, kind)) = extract_ts_property(child, code) else {
|
||||
continue;
|
||||
};
|
||||
fields.insert(field_name, kind);
|
||||
}
|
||||
if !fields.fields.is_empty() {
|
||||
out.insert(class_name, fields);
|
||||
}
|
||||
}
|
||||
"class_declaration" => {
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(class_name) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
let Some(body) = node.child_by_field_name("body") else {
|
||||
return;
|
||||
};
|
||||
let mut fields = DtoFields::new(class_name.clone());
|
||||
let mut cursor = body.walk();
|
||||
for child in body.named_children(&mut cursor) {
|
||||
if child.kind() != "public_field_definition" && child.kind() != "field_definition" {
|
||||
continue;
|
||||
}
|
||||
let Some((field_name, kind)) = extract_ts_property(child, code) else {
|
||||
continue;
|
||||
};
|
||||
fields.insert(field_name, kind);
|
||||
}
|
||||
if !fields.fields.is_empty() {
|
||||
out.insert(class_name, fields);
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
});
|
||||
}
|
||||
|
||||
/// Extract `(field_name, TypeKind)` from a TS `property_signature` /
|
||||
/// `public_field_definition`. Returns None when either piece is absent
|
||||
/// or the type doesn't classify.
|
||||
fn extract_ts_property<'a>(node: Node<'a>, code: &'a [u8]) -> Option<(String, TypeKind)> {
|
||||
let name_node = node.child_by_field_name("name")?;
|
||||
let field_name = text_of(name_node, code)?;
|
||||
let type_anno = node.child_by_field_name("type")?;
|
||||
// type_annotation node text is `: T` — walk to the inner type.
|
||||
let type_text = type_anno
|
||||
.named_child(0)
|
||||
.and_then(|t| text_of(t, code))
|
||||
.or_else(|| text_of(type_anno, code))?;
|
||||
let stripped = type_text.trim().trim_start_matches(':').trim();
|
||||
let kind = ts_type_to_kind(stripped)?;
|
||||
Some((field_name, kind))
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Rust
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Walk for `struct_item` nodes whose body lists named fields.
|
||||
fn collect_rust(root: Node<'_>, code: &[u8], out: &mut HashMap<String, DtoFields>) {
|
||||
walk(root, &mut |node| {
|
||||
if node.kind() != "struct_item" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(class_name) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
let Some(body) = node.child_by_field_name("body") else {
|
||||
return;
|
||||
};
|
||||
if body.kind() != "field_declaration_list" {
|
||||
// Tuple struct or unit struct — no named fields.
|
||||
return;
|
||||
}
|
||||
let mut fields = DtoFields::new(class_name.clone());
|
||||
let mut cursor = body.walk();
|
||||
for child in body.named_children(&mut cursor) {
|
||||
if child.kind() != "field_declaration" {
|
||||
continue;
|
||||
}
|
||||
let Some(name_inner) = child.child_by_field_name("name") else {
|
||||
continue;
|
||||
};
|
||||
let Some(type_inner) = child.child_by_field_name("type") else {
|
||||
continue;
|
||||
};
|
||||
let Some(field_name) = text_of(name_inner, code) else {
|
||||
continue;
|
||||
};
|
||||
let Some(type_text) = text_of(type_inner, code) else {
|
||||
continue;
|
||||
};
|
||||
let Some(kind) = super::params::rust_primitive_to_kind(type_text.trim()) else {
|
||||
continue;
|
||||
};
|
||||
fields.insert(field_name, kind);
|
||||
}
|
||||
if !fields.fields.is_empty() {
|
||||
out.insert(class_name, fields);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Python (Pydantic)
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Walk for `class_definition` nodes whose superclass list contains
|
||||
/// `BaseModel` / `pydantic.BaseModel`. Each `expression_statement` in
|
||||
/// the class body that is a typed assignment (`name: type`) produces a
|
||||
/// field entry.
|
||||
fn collect_python(root: Node<'_>, code: &[u8], out: &mut HashMap<String, DtoFields>) {
|
||||
walk(root, &mut |node| {
|
||||
if node.kind() != "class_definition" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(class_name) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
if !python_inherits_basemodel(node, code) {
|
||||
return;
|
||||
}
|
||||
let Some(body) = node.child_by_field_name("body") else {
|
||||
return;
|
||||
};
|
||||
let mut fields = DtoFields::new(class_name.clone());
|
||||
let mut cursor = body.walk();
|
||||
for stmt in body.named_children(&mut cursor) {
|
||||
// Field declarations show up as `expression_statement` wrapping
|
||||
// either an `assignment` (`name: type = default`) or a bare
|
||||
// typed assignment.
|
||||
if stmt.kind() != "expression_statement" {
|
||||
continue;
|
||||
}
|
||||
let Some(inner) = stmt.named_child(0) else {
|
||||
continue;
|
||||
};
|
||||
if inner.kind() != "assignment" {
|
||||
continue;
|
||||
}
|
||||
let Some(left) = inner.child_by_field_name("left") else {
|
||||
continue;
|
||||
};
|
||||
let Some(field_name) = text_of(left, code) else {
|
||||
continue;
|
||||
};
|
||||
let Some(type_node) = inner.child_by_field_name("type") else {
|
||||
continue;
|
||||
};
|
||||
let Some(type_text) = text_of(type_node, code) else {
|
||||
continue;
|
||||
};
|
||||
let Some(kind) = python_primitive_to_kind(type_text.trim()) else {
|
||||
continue;
|
||||
};
|
||||
fields.insert(field_name, kind);
|
||||
}
|
||||
if !fields.fields.is_empty() {
|
||||
out.insert(class_name, fields);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/// Conservative supertype scan: returns true when the class definition
|
||||
/// has a superclass list whose text mentions `BaseModel` (covers both
|
||||
/// `BaseModel` and `pydantic.BaseModel`). No false positives on
|
||||
/// non-Pydantic classes named `BaseModel`-something — match is on the
|
||||
/// full token, not a substring.
|
||||
fn python_inherits_basemodel<'a>(class_node: Node<'a>, code: &'a [u8]) -> bool {
|
||||
let Some(supers) = class_node.child_by_field_name("superclasses") else {
|
||||
return false;
|
||||
};
|
||||
let mut cursor = supers.walk();
|
||||
for child in supers.named_children(&mut cursor) {
|
||||
if let Some(text) = text_of(child, code) {
|
||||
let head = text.trim();
|
||||
if head == "BaseModel" || head == "pydantic.BaseModel" {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Walk helper
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn walk<'a, F: FnMut(Node<'a>)>(node: Node<'a>, f: &mut F) {
|
||||
f(node);
|
||||
let mut cursor = node.walk();
|
||||
for child in node.named_children(&mut cursor) {
|
||||
walk(child, f);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn collect(lang: &str, src: &str) -> HashMap<String, DtoFields> {
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
let language = match lang {
|
||||
"java" => tree_sitter_java::LANGUAGE.into(),
|
||||
"typescript" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT.into(),
|
||||
"rust" => tree_sitter_rust::LANGUAGE.into(),
|
||||
"python" => tree_sitter_python::LANGUAGE.into(),
|
||||
other => panic!("unsupported lang: {other}"),
|
||||
};
|
||||
parser.set_language(&language).unwrap();
|
||||
let tree = parser.parse(src, None).unwrap();
|
||||
collect_dto_classes(tree.root_node(), lang, src.as_bytes())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_class_with_long_and_string_fields() {
|
||||
let src = r#"
|
||||
public class CreateUser {
|
||||
private Long age;
|
||||
private String email;
|
||||
}
|
||||
"#;
|
||||
let dtos = collect("java", src);
|
||||
let dto = dtos.get("CreateUser").expect("CreateUser DTO recorded");
|
||||
assert_eq!(dto.get("age"), Some(&TypeKind::Int));
|
||||
assert_eq!(dto.get("email"), Some(&TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_unclassifiable_field_dropped() {
|
||||
let src = r#"
|
||||
public class HoldsList {
|
||||
private List<String> items;
|
||||
private Long count;
|
||||
}
|
||||
"#;
|
||||
let dtos = collect("java", src);
|
||||
let dto = dtos.get("HoldsList").expect("class recorded");
|
||||
// Only the Long field qualifies; List<String> is not currently
|
||||
// recognised by `java_type_to_kind`.
|
||||
assert_eq!(dto.get("count"), Some(&TypeKind::Int));
|
||||
assert!(dto.get("items").is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ts_interface_with_number_and_string_fields() {
|
||||
let src = r#"
|
||||
export interface CreateUser {
|
||||
age: number;
|
||||
email: string;
|
||||
}
|
||||
"#;
|
||||
let dtos = collect("typescript", src);
|
||||
let dto = dtos.get("CreateUser").expect("CreateUser interface");
|
||||
assert_eq!(dto.get("age"), Some(&TypeKind::Int));
|
||||
assert_eq!(dto.get("email"), Some(&TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ts_class_with_typed_field_definitions() {
|
||||
let src = r#"
|
||||
export class CreateUser {
|
||||
age!: number;
|
||||
email!: string;
|
||||
}
|
||||
"#;
|
||||
let dtos = collect("typescript", src);
|
||||
let dto = dtos.get("CreateUser").expect("CreateUser class");
|
||||
assert_eq!(dto.get("age"), Some(&TypeKind::Int));
|
||||
assert_eq!(dto.get("email"), Some(&TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_struct_with_int_and_string_fields() {
|
||||
let src = r#"
|
||||
pub struct CreateUser {
|
||||
pub age: i64,
|
||||
pub email: String,
|
||||
}
|
||||
"#;
|
||||
let dtos = collect("rust", src);
|
||||
let dto = dtos.get("CreateUser").expect("CreateUser struct");
|
||||
assert_eq!(dto.get("age"), Some(&TypeKind::Int));
|
||||
assert_eq!(dto.get("email"), Some(&TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_tuple_struct_skipped() {
|
||||
let src = r#"
|
||||
pub struct Wrap(i64, String);
|
||||
"#;
|
||||
let dtos = collect("rust", src);
|
||||
// Tuple structs have no named fields and must NOT produce a
|
||||
// DtoFields entry — Phase 6 only handles named-field DTOs.
|
||||
assert!(!dtos.contains_key("Wrap"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_pydantic_basemodel_with_int_and_str() {
|
||||
let src = r#"
|
||||
class CreateUser(BaseModel):
|
||||
age: int
|
||||
email: str
|
||||
"#;
|
||||
let dtos = collect("python", src);
|
||||
let dto = dtos.get("CreateUser").expect("CreateUser model");
|
||||
assert_eq!(dto.get("age"), Some(&TypeKind::Int));
|
||||
assert_eq!(dto.get("email"), Some(&TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_class_without_basemodel_is_skipped() {
|
||||
// Hard Rule 3 spirit: only Pydantic models should be lifted as
|
||||
// DTOs. Plain classes with typed attributes don't qualify.
|
||||
let src = r#"
|
||||
class NotADto:
|
||||
age: int
|
||||
email: str
|
||||
"#;
|
||||
let dtos = collect("python", src);
|
||||
assert!(!dtos.contains_key("NotADto"));
|
||||
}
|
||||
}
|
||||
|
|
@ -328,13 +328,21 @@ pub(crate) fn member_expr_text(n: Node, code: &[u8]) -> Option<String> {
|
|||
pub(crate) fn member_expr_text_inner(n: Node, code: &[u8]) -> Option<String> {
|
||||
match n.kind() {
|
||||
"member_expression" | "attribute" | "selector_expression" => {
|
||||
// Tree-sitter exposes the receiver under `object` (JS/TS, Python),
|
||||
// `value` (Rust field_expression — handled in the matching arm
|
||||
// above), or `operand` (Go selector_expression). Without the
|
||||
// `operand` fallback, Go member access like `r.Body` collapsed to
|
||||
// just the trailing field (`Body`), so source rules keyed on the
|
||||
// dotted form (e.g. Go's `r.Body`) would never match.
|
||||
let obj = n
|
||||
.child_by_field_name("object")
|
||||
.or_else(|| n.child_by_field_name("value"))
|
||||
.or_else(|| n.child_by_field_name("operand"))
|
||||
.and_then(|o| member_expr_text_inner(o, code))
|
||||
.or_else(|| {
|
||||
n.child_by_field_name("object")
|
||||
.or_else(|| n.child_by_field_name("value"))
|
||||
.or_else(|| n.child_by_field_name("operand"))
|
||||
.and_then(|o| text_of(o, code))
|
||||
});
|
||||
let prop = n
|
||||
|
|
@ -700,3 +708,79 @@ pub(crate) fn collect_idents(n: Node, code: &[u8], out: &mut Vec<String>) {
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Pointer-Phase 6 / W5: AST kind names for subscript / index expressions
|
||||
/// across the languages whose container-element flow we model.
|
||||
///
|
||||
/// JS/TS use `subscript_expression`; Python uses `subscript`; Go uses
|
||||
/// `index_expression`. Other languages either lower indexing through
|
||||
/// method calls (Rust slice indexing) or are out of scope for the
|
||||
/// initial W5 rollout (Java/Ruby/PHP/C/C++).
|
||||
#[inline]
|
||||
pub(crate) fn is_subscript_kind(kind: &str) -> bool {
|
||||
matches!(
|
||||
kind,
|
||||
"subscript_expression" | "subscript" | "index_expression"
|
||||
)
|
||||
}
|
||||
|
||||
/// Pointer-Phase 6 / W5: when the LHS of an assignment statement is a
|
||||
/// subscript / index expression (or a single-element wrapper around
|
||||
/// one), return that node. Returns `None` for multi-target Go
|
||||
/// `expression_list`s, identifier LHSs, member-expression LHSs, etc.
|
||||
pub(crate) fn subscript_lhs_node<'a>(lhs: Node<'a>, lang: &str) -> Option<Node<'a>> {
|
||||
if is_subscript_kind(lhs.kind()) {
|
||||
return Some(lhs);
|
||||
}
|
||||
// Go: `assignment_statement.left` is an `expression_list`; for
|
||||
// single-target subscript writes (`m[k] = v`) it has exactly one
|
||||
// named child which is `index_expression`.
|
||||
if lang == "go" && lhs.kind() == "expression_list" {
|
||||
let mut cursor = lhs.walk();
|
||||
let named: Vec<Node> = lhs.named_children(&mut cursor).collect();
|
||||
if named.len() == 1 && is_subscript_kind(named[0].kind()) {
|
||||
return Some(named[0]);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Pointer-Phase 6 / W5: extract `(array_text, index_text)` from a
|
||||
/// subscript / index AST node.
|
||||
///
|
||||
/// Returns `None` when the array operand is not a plain identifier — we
|
||||
/// only synthesise `__index_get__` / `__index_set__` calls when the
|
||||
/// receiver resolves cleanly to a SSA-renamed local, since the W2/W4
|
||||
/// container hooks need a stable receiver var_name to drive
|
||||
/// `pt(receiver)`.
|
||||
pub(crate) fn subscript_components<'a>(n: Node<'a>, code: &'a [u8]) -> Option<(String, String)> {
|
||||
if !is_subscript_kind(n.kind()) {
|
||||
return None;
|
||||
}
|
||||
let arr = n
|
||||
.child_by_field_name("object")
|
||||
.or_else(|| n.child_by_field_name("operand"))
|
||||
.or_else(|| n.child_by_field_name("value"))
|
||||
.or_else(|| n.child(0))?;
|
||||
let idx = n
|
||||
.child_by_field_name("index")
|
||||
.or_else(|| n.child_by_field_name("subscript"))
|
||||
.or_else(|| {
|
||||
// Fallback: take the second named child after the array.
|
||||
let mut cur = n.walk();
|
||||
n.named_children(&mut cur).nth(1)
|
||||
})?;
|
||||
let arr_kind = arr.kind();
|
||||
// Only proceed when the array is a plain identifier — otherwise
|
||||
// we can't bind a stable receiver name for the synth Call.
|
||||
if !matches!(
|
||||
arr_kind,
|
||||
"identifier" | "variable_name" | "simple_identifier"
|
||||
) {
|
||||
return None;
|
||||
}
|
||||
let arr_text = text_of(arr, code)?;
|
||||
// PHP-style `$x` strip not needed here — Go/JS/Python don't use it.
|
||||
let idx_text = text_of(idx, code)?;
|
||||
Some((arr_text, idx_text))
|
||||
}
|
||||
|
|
|
|||
547
src/cfg/hierarchy.rs
Normal file
547
src/cfg/hierarchy.rs
Normal file
|
|
@ -0,0 +1,547 @@
|
|||
//! Phase 6: per-language class / trait / interface hierarchy extraction.
|
||||
//!
|
||||
//! Walks a parsed file's AST and emits `(sub_container, super_container)`
|
||||
//! pairs for every declared inheritance / impl / implements relationship.
|
||||
//! The result is consumed by [`crate::callgraph::TypeHierarchyIndex`] to
|
||||
//! fan out method-call edges to every concrete implementer when a
|
||||
//! receiver's static type is a super-class / trait / interface.
|
||||
//!
|
||||
//! Strictly additive: a language without an extractor (Go, C) returns
|
||||
//! the empty vector and the resolver falls back to today's
|
||||
//! single-container behaviour.
|
||||
|
||||
use std::collections::HashSet;
|
||||
|
||||
use tree_sitter::Node;
|
||||
|
||||
use super::helpers::text_of;
|
||||
|
||||
/// Collect `(sub_container, super_container)` edges for a parsed file.
|
||||
///
|
||||
/// The returned vector is **deduplicated within the file** but may
|
||||
/// contain duplicates across files (each file emits its own edges).
|
||||
/// The downstream [`crate::callgraph::TypeHierarchyIndex::build`]
|
||||
/// dedups across files.
|
||||
pub(crate) fn collect_hierarchy_edges(
|
||||
root: Node<'_>,
|
||||
lang: &str,
|
||||
code: &[u8],
|
||||
) -> Vec<(String, String)> {
|
||||
let mut acc: Vec<(String, String)> = Vec::new();
|
||||
let mut seen: HashSet<(String, String)> = HashSet::new();
|
||||
let mut push = |sub: String, sup: String| {
|
||||
if sub.is_empty() || sup.is_empty() {
|
||||
return;
|
||||
}
|
||||
if seen.insert((sub.clone(), sup.clone())) {
|
||||
acc.push((sub, sup));
|
||||
}
|
||||
};
|
||||
|
||||
match lang {
|
||||
"java" => collect_java(root, code, &mut push),
|
||||
"rust" | "rs" => collect_rust(root, code, &mut push),
|
||||
"typescript" | "ts" | "tsx" | "javascript" | "js" => collect_ts(root, code, &mut push),
|
||||
"python" | "py" => collect_python(root, code, &mut push),
|
||||
"ruby" | "rb" => collect_ruby(root, code, &mut push),
|
||||
"php" => collect_php(root, code, &mut push),
|
||||
"cpp" | "c++" => collect_cpp(root, code, &mut push),
|
||||
// Go: structural / implicit interface satisfaction is intractable
|
||||
// per-file; Phase 6 deliberately skips it.
|
||||
// C: no inheritance.
|
||||
_ => {}
|
||||
}
|
||||
acc
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Java
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn collect_java<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
let kind = node.kind();
|
||||
if kind != "class_declaration" && kind != "interface_declaration" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(sub) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
// `superclass` field on class_declaration — singular `extends Y`.
|
||||
if let Some(superclass) = node.child_by_field_name("superclass") {
|
||||
let mut cursor = superclass.walk();
|
||||
for c in superclass.named_children(&mut cursor) {
|
||||
if let Some(t) = type_identifier_text(c, code) {
|
||||
push(sub.clone(), t);
|
||||
}
|
||||
}
|
||||
}
|
||||
// `interfaces` field on class_declaration — `implements I, J`
|
||||
// wraps a `super_interfaces` → `type_list`.
|
||||
if let Some(ifaces) = node.child_by_field_name("interfaces") {
|
||||
collect_java_type_list(ifaces, code, &sub, push);
|
||||
}
|
||||
// `extends_interfaces` is an unnamed child on
|
||||
// interface_declaration — `extends Foo, Bar` for an
|
||||
// interface. Walk children directly since it's not a field.
|
||||
let mut cursor = node.walk();
|
||||
for c in node.named_children(&mut cursor) {
|
||||
if c.kind() == "extends_interfaces" {
|
||||
collect_java_type_list(c, code, &sub, push);
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
fn collect_java_type_list<F: FnMut(String, String)>(
|
||||
n: Node<'_>,
|
||||
code: &[u8],
|
||||
sub: &str,
|
||||
push: &mut F,
|
||||
) {
|
||||
let mut cursor = n.walk();
|
||||
for child in n.named_children(&mut cursor) {
|
||||
match child.kind() {
|
||||
"type_list" | "interface_type_list" => {
|
||||
collect_java_type_list(child, code, sub, push);
|
||||
}
|
||||
_ => {
|
||||
if let Some(t) = type_identifier_text(child, code) {
|
||||
push(sub.to_string(), t);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Strip generic / nested `type_arguments` from a type-reference node
|
||||
/// down to the bare identifier.
|
||||
fn type_identifier_text(n: Node<'_>, code: &[u8]) -> Option<String> {
|
||||
match n.kind() {
|
||||
"type_identifier" | "identifier" => text_of(n, code),
|
||||
"generic_type" => {
|
||||
// `Foo<T>` — the leading child is the bare type identifier.
|
||||
let mut cursor = n.walk();
|
||||
for c in n.named_children(&mut cursor) {
|
||||
if matches!(
|
||||
c.kind(),
|
||||
"type_identifier" | "identifier" | "scoped_type_identifier"
|
||||
) {
|
||||
return text_of(c, code);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
"scoped_type_identifier" => {
|
||||
// `pkg.Foo` — return last segment.
|
||||
text_of(n, code).map(|s| {
|
||||
let last = s.rsplit('.').next().unwrap_or(&s);
|
||||
last.to_string()
|
||||
})
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Rust
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Walk for `impl_item` nodes and emit edges from the concrete type to
|
||||
/// the trait being implemented. Inherent impls (`impl Foo {}`) emit
|
||||
/// no edge — there is no super-trait relationship to record.
|
||||
fn collect_rust<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
if node.kind() != "impl_item" {
|
||||
return;
|
||||
}
|
||||
// tree-sitter-rust uses `trait` and `type` field names.
|
||||
let Some(trait_node) = node.child_by_field_name("trait") else {
|
||||
return; // inherent impl
|
||||
};
|
||||
let Some(type_node) = node.child_by_field_name("type") else {
|
||||
return;
|
||||
};
|
||||
let Some(trait_name) = rust_path_leaf(trait_node, code) else {
|
||||
return;
|
||||
};
|
||||
let Some(type_name) = rust_path_leaf(type_node, code) else {
|
||||
return;
|
||||
};
|
||||
push(type_name, trait_name);
|
||||
});
|
||||
}
|
||||
|
||||
fn rust_path_leaf(n: Node<'_>, code: &[u8]) -> Option<String> {
|
||||
match n.kind() {
|
||||
"type_identifier" | "identifier" => text_of(n, code),
|
||||
"scoped_type_identifier" | "scoped_identifier" => {
|
||||
// `crate::foo::Bar` — last segment.
|
||||
let s = text_of(n, code)?;
|
||||
Some(s.rsplit("::").next().unwrap_or(&s).to_string())
|
||||
}
|
||||
"generic_type" => {
|
||||
let mut cursor = n.walk();
|
||||
for c in n.named_children(&mut cursor) {
|
||||
if matches!(
|
||||
c.kind(),
|
||||
"type_identifier" | "scoped_type_identifier" | "identifier"
|
||||
) {
|
||||
return rust_path_leaf(c, code);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// TypeScript / JavaScript
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn collect_ts<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
let kind = node.kind();
|
||||
if kind != "class_declaration" && kind != "interface_declaration" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(sub) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
|
||||
let mut cursor = node.walk();
|
||||
for child in node.named_children(&mut cursor) {
|
||||
match child.kind() {
|
||||
"class_heritage" => {
|
||||
let mut h = child.walk();
|
||||
for c in child.named_children(&mut h) {
|
||||
match c.kind() {
|
||||
"extends_clause" => collect_ts_heritage(c, code, &sub, push),
|
||||
"implements_clause" => collect_ts_heritage(c, code, &sub, push),
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
"extends_clause" => collect_ts_heritage(child, code, &sub, push),
|
||||
"extends_type_clause" => collect_ts_heritage(child, code, &sub, push),
|
||||
"implements_clause" => collect_ts_heritage(child, code, &sub, push),
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
fn collect_ts_heritage<F: FnMut(String, String)>(
|
||||
clause: Node<'_>,
|
||||
code: &[u8],
|
||||
sub: &str,
|
||||
push: &mut F,
|
||||
) {
|
||||
let mut cursor = clause.walk();
|
||||
for c in clause.named_children(&mut cursor) {
|
||||
match c.kind() {
|
||||
"identifier" | "type_identifier" => {
|
||||
if let Some(t) = text_of(c, code) {
|
||||
push(sub.to_string(), t);
|
||||
}
|
||||
}
|
||||
"generic_type" | "type_arguments" | "type_query" => {
|
||||
let mut cursor2 = c.walk();
|
||||
for inner in c.named_children(&mut cursor2) {
|
||||
if matches!(inner.kind(), "identifier" | "type_identifier")
|
||||
&& let Some(t) = text_of(inner, code)
|
||||
{
|
||||
push(sub.to_string(), t);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Python
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn collect_python<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
if node.kind() != "class_definition" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(sub) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
let Some(superclasses) = node.child_by_field_name("superclasses") else {
|
||||
return; // no parents
|
||||
};
|
||||
// `superclasses` is an `argument_list` — each non-keyword
|
||||
// argument is a base class.
|
||||
let mut cursor = superclasses.walk();
|
||||
for arg in superclasses.named_children(&mut cursor) {
|
||||
if let Some(t) = python_base_text(arg, code) {
|
||||
// Skip Python `object` — not informative.
|
||||
if t != "object" {
|
||||
push(sub.clone(), t);
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
fn python_base_text(n: Node<'_>, code: &[u8]) -> Option<String> {
|
||||
match n.kind() {
|
||||
"identifier" => text_of(n, code),
|
||||
"attribute" => {
|
||||
// `pkg.Base` — last segment.
|
||||
let s = text_of(n, code)?;
|
||||
Some(s.rsplit('.').next().unwrap_or(&s).to_string())
|
||||
}
|
||||
// Skip keyword arguments like `metaclass=...`.
|
||||
"keyword_argument" => None,
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Ruby
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn collect_ruby<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
if node.kind() != "class" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(sub) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
if let Some(superclass) = node.child_by_field_name("superclass") {
|
||||
// `superclass` wraps the parent identifier.
|
||||
let mut cursor = superclass.walk();
|
||||
for c in superclass.named_children(&mut cursor) {
|
||||
if matches!(c.kind(), "constant" | "scope_resolution")
|
||||
&& let Some(t) = text_of(c, code)
|
||||
{
|
||||
let leaf = t.rsplit("::").next().unwrap_or(&t).to_string();
|
||||
push(sub.clone(), leaf);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// PHP
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn collect_php<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
let kind = node.kind();
|
||||
if kind != "class_declaration" && kind != "interface_declaration" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(sub) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
// PHP class_declaration may have base_clause and class_interface_clause.
|
||||
let mut cursor = node.walk();
|
||||
for c in node.named_children(&mut cursor) {
|
||||
match c.kind() {
|
||||
"base_clause" | "class_interface_clause" => {
|
||||
let mut cc = c.walk();
|
||||
for inner in c.named_children(&mut cc) {
|
||||
if matches!(inner.kind(), "name" | "qualified_name")
|
||||
&& let Some(t) = text_of(inner, code)
|
||||
{
|
||||
let leaf = t.rsplit('\\').next().unwrap_or(&t).to_string();
|
||||
push(sub.clone(), leaf);
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// C++
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn collect_cpp<F: FnMut(String, String)>(root: Node<'_>, code: &[u8], push: &mut F) {
|
||||
walk(root, &mut |node| {
|
||||
let kind = node.kind();
|
||||
if kind != "class_specifier" && kind != "struct_specifier" {
|
||||
return;
|
||||
}
|
||||
let Some(name_node) = node.child_by_field_name("name") else {
|
||||
return;
|
||||
};
|
||||
let Some(sub) = text_of(name_node, code) else {
|
||||
return;
|
||||
};
|
||||
// tree-sitter-cpp uses `base_class_clause` for the `: public Y` part.
|
||||
let mut cursor = node.walk();
|
||||
for c in node.named_children(&mut cursor) {
|
||||
if c.kind() == "base_class_clause" {
|
||||
let mut cc = c.walk();
|
||||
for inner in c.named_children(&mut cc) {
|
||||
if matches!(
|
||||
inner.kind(),
|
||||
"type_identifier" | "qualified_identifier" | "template_type"
|
||||
) {
|
||||
if let Some(t) = text_of(inner, code) {
|
||||
let leaf = t.rsplit("::").next().unwrap_or(&t).to_string();
|
||||
push(sub.clone(), leaf);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
// Helpers
|
||||
// ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn walk<'a, F: FnMut(Node<'a>)>(node: Node<'a>, f: &mut F) {
|
||||
f(node);
|
||||
let mut cursor = node.walk();
|
||||
for child in node.named_children(&mut cursor) {
|
||||
walk(child, f);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn collect(lang: &str, src: &str) -> Vec<(String, String)> {
|
||||
let mut parser = tree_sitter::Parser::new();
|
||||
let ts_lang = match lang {
|
||||
"java" => tree_sitter::Language::from(tree_sitter_java::LANGUAGE),
|
||||
"rust" => tree_sitter::Language::from(tree_sitter_rust::LANGUAGE),
|
||||
"python" => tree_sitter::Language::from(tree_sitter_python::LANGUAGE),
|
||||
"typescript" => {
|
||||
tree_sitter::Language::from(tree_sitter_typescript::LANGUAGE_TYPESCRIPT)
|
||||
}
|
||||
"ruby" => tree_sitter::Language::from(tree_sitter_ruby::LANGUAGE),
|
||||
"php" => tree_sitter::Language::from(tree_sitter_php::LANGUAGE_PHP),
|
||||
"cpp" => tree_sitter::Language::from(tree_sitter_cpp::LANGUAGE),
|
||||
_ => panic!("unsupported test lang: {lang}"),
|
||||
};
|
||||
parser.set_language(&ts_lang).unwrap();
|
||||
let tree = parser.parse(src.as_bytes(), None).unwrap();
|
||||
collect_hierarchy_edges(tree.root_node(), lang, src.as_bytes())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_class_extends_emits_edge() {
|
||||
let src = "class Derived extends Base {}";
|
||||
let edges = collect("java", src);
|
||||
assert!(edges.contains(&("Derived".into(), "Base".into())));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_class_implements_emits_per_interface_edge() {
|
||||
let src = "class UserRepo implements Repository, Cache {}";
|
||||
let edges = collect("java", src);
|
||||
assert!(edges.contains(&("UserRepo".into(), "Repository".into())));
|
||||
assert!(edges.contains(&("UserRepo".into(), "Cache".into())));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_interface_extends_emits_edges() {
|
||||
let src = "interface Mine extends Foo, Bar {}";
|
||||
let edges = collect("java", src);
|
||||
// tree-sitter-java models `extends` on interface as `extends_interfaces`
|
||||
// rooted at the same node — at least one of the parents should land.
|
||||
assert!(
|
||||
edges.iter().any(|(s, _)| s == "Mine"),
|
||||
"interface extends should emit at least one edge; got {edges:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_impl_trait_for_type_emits_edge() {
|
||||
let src = "impl Repository for UserRepo {}";
|
||||
let edges = collect("rust", src);
|
||||
assert!(edges.contains(&("UserRepo".into(), "Repository".into())));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_inherent_impl_emits_no_edge() {
|
||||
let src = "impl UserRepo { fn new() {} }";
|
||||
let edges = collect("rust", src);
|
||||
assert!(
|
||||
edges.is_empty(),
|
||||
"inherent impl must not emit; got {edges:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ts_class_extends_implements_emits_edges() {
|
||||
let src = "class UserRepo extends BaseRepo implements Repository {}";
|
||||
let edges = collect("typescript", src);
|
||||
assert!(edges.contains(&("UserRepo".into(), "BaseRepo".into())));
|
||||
assert!(edges.contains(&("UserRepo".into(), "Repository".into())));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_class_inherits_from_bases() {
|
||||
let src = "class Derived(Base, Mixin):\n pass\n";
|
||||
let edges = collect("python", src);
|
||||
assert!(edges.contains(&("Derived".into(), "Base".into())));
|
||||
assert!(edges.contains(&("Derived".into(), "Mixin".into())));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_class_object_base_skipped() {
|
||||
// Inheriting from `object` is not informative — Python's
|
||||
// implicit root. Phase 6 omits these edges to keep the
|
||||
// hierarchy index focused on user-defined relationships.
|
||||
let src = "class Plain(object):\n pass\n";
|
||||
let edges = collect("python", src);
|
||||
assert!(
|
||||
!edges.contains(&("Plain".into(), "object".into())),
|
||||
"object base must be filtered; got {edges:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ruby_class_lt_super_emits_edge() {
|
||||
let src = "class Derived < Base\nend\n";
|
||||
let edges = collect("ruby", src);
|
||||
assert!(edges.contains(&("Derived".into(), "Base".into())));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dedup_within_file() {
|
||||
let src = r#"
|
||||
class A extends B {}
|
||||
class A extends B {}
|
||||
"#;
|
||||
let edges = collect("java", src);
|
||||
let count = edges.iter().filter(|(s, p)| s == "A" && p == "B").count();
|
||||
assert_eq!(count, 1, "duplicates within a file must be deduped");
|
||||
}
|
||||
}
|
||||
|
|
@ -244,6 +244,214 @@ pub(super) fn has_keyword_arg(call_node: Node, keyword_name: &str, code: &[u8])
|
|||
false
|
||||
}
|
||||
|
||||
/// Inspect the first positional argument of a call node and return its
|
||||
/// tree-sitter `kind()` plus a flag indicating whether any descendant is an
|
||||
/// `interpolation` node. Skips parenthesisation (`(arg0)` is treated as
|
||||
/// `arg0`). Returns `None` when the call has no arguments.
|
||||
///
|
||||
/// Used by per-language shape-aware sink suppression — for example, Ruby
|
||||
/// ActiveRecord query methods (`where`, `order`, `pluck`, …) are intrinsically
|
||||
/// parameterised when arg 0 is a hash/symbol/array/non-interpolated string,
|
||||
/// regardless of taint reaching that argument.
|
||||
pub(super) fn arg0_kind_and_interpolation(call_node: Node) -> Option<(String, bool)> {
|
||||
let args = call_node.child_by_field_name("arguments")?;
|
||||
let mut cursor = args.walk();
|
||||
let arg0 = args.named_children(&mut cursor).next()?;
|
||||
let arg0 = unwrap_parens(arg0);
|
||||
let kind = arg0.kind().to_string();
|
||||
let has_interp = subtree_has_interpolation(arg0);
|
||||
Some((kind, has_interp))
|
||||
}
|
||||
|
||||
/// Walk a Java method-chain receiver looking for an inner `method_invocation`
|
||||
/// whose method name matches one of `target_methods` (e.g. `createQuery`,
|
||||
/// `prepareStatement`). Returns the kind of that inner call's arg 0 — used
|
||||
/// to verify the SQL-bearing call up-chain was given a string literal rather
|
||||
/// than a concatenation / method call.
|
||||
///
|
||||
/// Conservative: returns `None` when no matching call is found in the chain.
|
||||
/// Stops drilling into args of an unrelated call, so the chain walk is
|
||||
/// strictly down the receiver spine.
|
||||
pub(super) fn java_chain_arg0_kind_for_method(
|
||||
expr: Node,
|
||||
target_methods: &[&str],
|
||||
code: &[u8],
|
||||
) -> Option<String> {
|
||||
let n = unwrap_parens(expr);
|
||||
if n.kind() == "method_invocation"
|
||||
&& let Some(name_node) = n.child_by_field_name("name")
|
||||
&& let Some(name) = text_of(name_node, code)
|
||||
&& target_methods.iter().any(|m| *m == name)
|
||||
{
|
||||
let args = n.child_by_field_name("arguments")?;
|
||||
let mut cursor = args.walk();
|
||||
let arg0 = args.named_children(&mut cursor).next()?;
|
||||
let arg0 = unwrap_parens(arg0);
|
||||
return Some(arg0.kind().to_string());
|
||||
}
|
||||
// Drill down the receiver spine. Java grammar uses `object` for the
|
||||
// receiver of a `method_invocation`.
|
||||
if n.kind() == "method_invocation"
|
||||
&& let Some(recv) = n.child_by_field_name("object")
|
||||
&& let Some(found) = java_chain_arg0_kind_for_method(recv, target_methods, code)
|
||||
{
|
||||
return Some(found);
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Walk a Ruby method-chain receiver-side looking for the inner call whose
|
||||
/// method identifier matches one of `target_methods`, then return that
|
||||
/// inner call's [`arg0_kind_and_interpolation`]. Used when the CFG node
|
||||
/// represents a chained expression like `Model.where(...).preload(...).to_a`
|
||||
/// — the outermost call (`to_a`) has no arguments, so the shape suppressor
|
||||
/// must reach down the chain to inspect `where`'s arg 0.
|
||||
///
|
||||
/// Conservative: returns `None` if the chain doesn't contain a matching
|
||||
/// method, so callers fall through to the no-suppression path.
|
||||
pub(super) fn ruby_chain_arg0_for_method(
|
||||
expr: Node,
|
||||
target_methods: &[&str],
|
||||
code: &[u8],
|
||||
) -> Option<(String, bool)> {
|
||||
let n = unwrap_parens(expr);
|
||||
if n.kind() == "call"
|
||||
&& let Some(method) = n.child_by_field_name("method")
|
||||
&& let Some(name) = text_of(method, code)
|
||||
&& target_methods.iter().any(|m| *m == name)
|
||||
{
|
||||
return arg0_kind_and_interpolation(n);
|
||||
}
|
||||
// Recurse into the receiver chain (`call.receiver` → next call up).
|
||||
if n.kind() == "call"
|
||||
&& let Some(recv) = n
|
||||
.child_by_field_name("receiver")
|
||||
.or_else(|| n.child_by_field_name("object"))
|
||||
&& let Some(found) = ruby_chain_arg0_for_method(recv, target_methods, code)
|
||||
{
|
||||
return Some(found);
|
||||
}
|
||||
// Also descend into named children to handle wrapping (assignment RHS,
|
||||
// begin-end blocks, parenthesised expressions, etc.).
|
||||
let mut cursor = n.walk();
|
||||
for c in n.named_children(&mut cursor) {
|
||||
if let Some(found) = ruby_chain_arg0_for_method(c, target_methods, code) {
|
||||
return Some(found);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
fn subtree_has_interpolation(n: Node) -> bool {
|
||||
if n.kind() == "interpolation" || n.kind() == "string_interpolation" {
|
||||
return true;
|
||||
}
|
||||
let mut cursor = n.walk();
|
||||
n.named_children(&mut cursor).any(subtree_has_interpolation)
|
||||
}
|
||||
|
||||
/// For a chained method call (`a.b().c().d()`), walk down the receiver
|
||||
/// chain (`function.object`) and return the innermost call_expression
|
||||
/// alongside its callee text (e.g. `"http.get"`).
|
||||
///
|
||||
/// Returns `None` when:
|
||||
/// * `outer` is not itself a CallFn / CallMethod node, or
|
||||
/// * its `function`/`method` field is not a member-style expression whose
|
||||
/// `object` field is itself a call (i.e. there is no chained receiver).
|
||||
///
|
||||
/// Motivated by CVE-2025-64430 (Parse Server SSRF via
|
||||
/// `http.get(uri, cb).on('error', e => ...)`). Without this, the outer
|
||||
/// `.on(...)` call swallows classification of the inner gated sink.
|
||||
pub(super) fn find_chained_inner_call<'a>(
|
||||
outer: Node<'a>,
|
||||
lang: &str,
|
||||
code: &[u8],
|
||||
) -> Option<(Node<'a>, String)> {
|
||||
if !matches!(lookup(lang, outer.kind()), Kind::CallFn | Kind::CallMethod) {
|
||||
return None;
|
||||
}
|
||||
let function = outer
|
||||
.child_by_field_name("function")
|
||||
.or_else(|| outer.child_by_field_name("method"))?;
|
||||
// The function/method field for a chained call is a member_expression
|
||||
// (JS/TS) or attribute (Python) etc.; its `object` field is the
|
||||
// receiver expression. Only proceed when that receiver is itself a
|
||||
// call.
|
||||
let object = function.child_by_field_name("object")?;
|
||||
if !matches!(lookup(lang, object.kind()), Kind::CallFn | Kind::CallMethod) {
|
||||
return None;
|
||||
}
|
||||
// Recurse: the inner call may itself be chained
|
||||
// (`axios.get(u).then(h).catch(h)` — innermost is `axios.get`).
|
||||
if let Some(inner) = find_chained_inner_call(object, lang, code) {
|
||||
return Some(inner);
|
||||
}
|
||||
// `object` is the innermost call_expression in the chain. Extract
|
||||
// its callee identifier the same way `first_call_ident_with_span`
|
||||
// does for a CallFn (member_expression text → "http.get").
|
||||
let inner_func = object
|
||||
.child_by_field_name("function")
|
||||
.or_else(|| object.child_by_field_name("method"))
|
||||
.or_else(|| object.child_by_field_name("name"))?;
|
||||
// Multi-line dotted member expressions (`http\n .get`) include
|
||||
// formatting whitespace in the source-text slice. The labels map
|
||||
// keys are literal `"http.get"` etc. — strip whitespace so the
|
||||
// chained-call inner-gate rebinding fires for both single-line and
|
||||
// multi-line chain styles. Also strips `\r` for CRLF sources.
|
||||
// Motivated by upstream Parse Server CVE-2025-64430 which uses the
|
||||
// multi-line `http\n .get(uri, ...)\n .on(...)` form.
|
||||
let raw = text_of(inner_func, code)?;
|
||||
let inner_text: String = raw.chars().filter(|c| !c.is_whitespace()).collect();
|
||||
Some((object, inner_text))
|
||||
}
|
||||
|
||||
/// Recursively walk the receiver chain of `outer` (a CallFn / CallMethod
|
||||
/// node) and yield each *named argument* of every inner call along the
|
||||
/// way. Outer's own arguments are NOT included — the caller already
|
||||
/// handles those via the standard `pre_emit_arg_source_nodes` pass over
|
||||
/// `outer.arguments`.
|
||||
///
|
||||
/// For `json.NewDecoder(r.Body).Decode(emoji)`:
|
||||
/// outer = `.Decode(emoji)` — caller iterates `emoji`
|
||||
/// inner = `json.NewDecoder(r.Body)` — yielded arg: `r.Body`
|
||||
///
|
||||
/// We only pull from each inner call's `arguments` field, never from its
|
||||
/// `function`/`method`/receiver expressions. That distinction matters
|
||||
/// because chained source-receivers like `r.URL.Query()` expose a
|
||||
/// member-text path that classifies as a Source — but it's the OUTER
|
||||
/// chain text (`r.URL.Query.Get`) that already classifies, so emitting
|
||||
/// a synth source for the inner-call's own callee would double-count.
|
||||
///
|
||||
/// Used by Go (where chain shapes like `json.NewDecoder(r.Body).Decode`
|
||||
/// hide source-labeled args inside parens between dots, leaving the
|
||||
/// outer callee text un-classifiable). The helper itself is
|
||||
/// language-neutral, but callers should gate per-language until each
|
||||
/// language's regression coverage catches up.
|
||||
pub(super) fn walk_chain_inner_call_args<'a>(outer: Node<'a>, lang: &str, out: &mut Vec<Node<'a>>) {
|
||||
if !matches!(lookup(lang, outer.kind()), Kind::CallFn | Kind::CallMethod) {
|
||||
return;
|
||||
}
|
||||
let function = outer
|
||||
.child_by_field_name("function")
|
||||
.or_else(|| outer.child_by_field_name("method"));
|
||||
let Some(function) = function else { return };
|
||||
let object = function
|
||||
.child_by_field_name("object")
|
||||
.or_else(|| function.child_by_field_name("operand"))
|
||||
.or_else(|| function.child_by_field_name("value"));
|
||||
let Some(inner) = object else { return };
|
||||
if !matches!(lookup(lang, inner.kind()), Kind::CallFn | Kind::CallMethod) {
|
||||
return;
|
||||
}
|
||||
if let Some(args) = inner.child_by_field_name("arguments") {
|
||||
let mut cursor = args.walk();
|
||||
for arg in args.named_children(&mut cursor) {
|
||||
out.push(arg);
|
||||
}
|
||||
}
|
||||
walk_chain_inner_call_args(inner, lang, out);
|
||||
}
|
||||
|
||||
/// Recursively find a call-expression node within an AST subtree (up to
|
||||
/// 4 levels deep). Unlike `find_call_node` which only checks 2 levels,
|
||||
/// this handles `await`-wrapped calls inside declarations.
|
||||
|
|
|
|||
826
src/cfg/mod.rs
826
src/cfg/mod.rs
File diff suppressed because it is too large
Load diff
|
|
@ -1,23 +1,36 @@
|
|||
use super::{
|
||||
AstMeta, Cfg, EdgeKind, NodeInfo, StmtKind, TaintMeta, collect_idents, connect_all,
|
||||
is_anon_fn_name, text_of,
|
||||
AstMeta, Cfg, DTO_CLASSES, EdgeKind, NodeInfo, StmtKind, TaintMeta, collect_idents,
|
||||
connect_all, is_anon_fn_name, text_of,
|
||||
};
|
||||
use crate::labels::{DataLabel, LangAnalysisRules, classify, param_config};
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use smallvec::smallvec;
|
||||
use tree_sitter::Node;
|
||||
|
||||
/// Extract parameter names from a function AST node.
|
||||
///
|
||||
/// Uses the language's `ParamConfig` to find the parameter list field
|
||||
/// and extract identifiers from each parameter child.
|
||||
pub(super) fn extract_param_names<'a>(
|
||||
/// Phase 6.2 — resolve a syntactic class / struct / interface / model
|
||||
/// name against the per-file [`DTO_CLASSES`] map populated at the top
|
||||
/// of `build_cfg`. Returns the [`TypeKind::Dto`] carrying the
|
||||
/// per-field type map when the class is declared in the same file;
|
||||
/// returns `None` otherwise so callers can fall through to the
|
||||
/// pre-Phase-6 behaviour (Object / Unknown).
|
||||
fn lookup_dto_class(class_name: &str) -> Option<TypeKind> {
|
||||
DTO_CLASSES.with(|cell| cell.borrow().get(class_name).cloned().map(TypeKind::Dto))
|
||||
}
|
||||
|
||||
/// Extract parameter names + per-position [`TypeKind`] from a function
|
||||
/// AST node. Each entry's second slot is `Some(TypeKind)` when the
|
||||
/// parameter's decorator, attribute, or static type annotation maps to
|
||||
/// a known kind, and `None` otherwise. Strictly additive — when no
|
||||
/// type info is recoverable, behaviour is identical to the names-only
|
||||
/// path.
|
||||
pub(super) fn extract_param_meta<'a>(
|
||||
func_node: Node<'a>,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
) -> Vec<String> {
|
||||
) -> Vec<(String, Option<TypeKind>)> {
|
||||
let cfg = param_config(lang);
|
||||
let mut names = Vec::new();
|
||||
let mut out: Vec<(String, Option<TypeKind>)> = Vec::new();
|
||||
// Try the params_field directly on the function node first.
|
||||
// For C/C++, the parameter list is nested inside the declarator
|
||||
// (function_definition > declarator:function_declarator > parameters:parameter_list),
|
||||
|
|
@ -28,13 +41,28 @@ pub(super) fn extract_param_names<'a>(
|
|||
.and_then(|d| d.child_by_field_name(cfg.params_field))
|
||||
});
|
||||
let Some(params) = params else {
|
||||
return names;
|
||||
// Single-param arrow shorthand (`uri => ...` in JS/TS): tree-sitter
|
||||
// exposes the lone identifier under the singular `parameter` field
|
||||
// rather than wrapping it in `formal_parameters`. Without this
|
||||
// fallback the function appears parameterless to the SSA pipeline,
|
||||
// breaking cross-function param_to_sink resolution for any
|
||||
// single-arg arrow helper. Motivated by CVE-2025-64430.
|
||||
if func_node.kind() == "arrow_function" {
|
||||
if let Some(p) = func_node.child_by_field_name("parameter") {
|
||||
if p.kind() == "identifier" {
|
||||
if let Some(name) = text_of(p, code) {
|
||||
out.push((name, None));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return out;
|
||||
};
|
||||
let mut cursor = params.walk();
|
||||
for child in params.children(&mut cursor) {
|
||||
// Self/this parameter (e.g. Rust's `self_parameter`)
|
||||
if cfg.self_param_kinds.contains(&child.kind()) {
|
||||
names.push("self".into());
|
||||
out.push(("self".into(), None));
|
||||
continue;
|
||||
}
|
||||
|
||||
|
|
@ -52,7 +80,8 @@ pub(super) fn extract_param_names<'a>(
|
|||
tmp.into_iter().next()
|
||||
};
|
||||
if let Some(name) = candidate {
|
||||
names.push(name);
|
||||
let ty = classify_param_type(child, lang, code);
|
||||
out.push((name, ty));
|
||||
found = true;
|
||||
break;
|
||||
}
|
||||
|
|
@ -63,7 +92,7 @@ pub(super) fn extract_param_names<'a>(
|
|||
&& child.kind() == "identifier"
|
||||
&& let Some(txt) = text_of(child, code)
|
||||
{
|
||||
names.push(txt);
|
||||
out.push((txt, None));
|
||||
found = true;
|
||||
}
|
||||
// Fallback for C/C++: look for nested declarator → identifier
|
||||
|
|
@ -71,7 +100,8 @@ pub(super) fn extract_param_names<'a>(
|
|||
let mut tmp = Vec::new();
|
||||
collect_idents(child, code, &mut tmp);
|
||||
if let Some(last) = tmp.pop() {
|
||||
names.push(last);
|
||||
let ty = classify_param_type(child, lang, code);
|
||||
out.push((last, ty));
|
||||
found = true;
|
||||
}
|
||||
}
|
||||
|
|
@ -86,7 +116,8 @@ pub(super) fn extract_param_names<'a>(
|
|||
let mut tmp = Vec::new();
|
||||
collect_idents(child, code, &mut tmp);
|
||||
if let Some(first) = tmp.into_iter().next() {
|
||||
names.push(first);
|
||||
let ty = classify_param_type(child, lang, code);
|
||||
out.push((first, ty));
|
||||
}
|
||||
}
|
||||
continue;
|
||||
|
|
@ -96,11 +127,11 @@ pub(super) fn extract_param_names<'a>(
|
|||
// where the child is an `identifier` node, not a `parameter` wrapper.
|
||||
if child.kind() == "identifier" {
|
||||
if let Some(txt) = text_of(child, code) {
|
||||
names.push(txt);
|
||||
out.push((txt, None));
|
||||
}
|
||||
}
|
||||
}
|
||||
names
|
||||
out
|
||||
}
|
||||
|
||||
/// Walk up from a function definition node and build a container path.
|
||||
|
|
@ -392,6 +423,369 @@ pub(super) fn inject_framework_param_sources(
|
|||
preds
|
||||
}
|
||||
|
||||
/// Classify a parameter AST node to a [`TypeKind`] using per-language
|
||||
/// decorator / attribute / annotation matchers. Strictly additive: when
|
||||
/// no recognised pattern matches, returns `None` and the engine
|
||||
/// behaves exactly as before.
|
||||
///
|
||||
/// Recognised patterns (Phase 2):
|
||||
/// * Java (Spring) — `@PathVariable`/`@RequestParam Long X` →
|
||||
/// [`TypeKind::Int`]; `@RequestBody T` → object (no kind today).
|
||||
/// * TypeScript (NestJS) — `@Param('id') id: number` →
|
||||
/// [`TypeKind::Int`]; `@Body() dto: T` / `@Query('q') q: string`.
|
||||
/// * Rust (Axum / Rocket / Actix) — `Path<i64>` / `Path<u32>` /
|
||||
/// `web::Path<i64>` → [`TypeKind::Int`]; `Path<String>` →
|
||||
/// [`TypeKind::String`].
|
||||
/// * Python (FastAPI) — `def h(x: int)` → [`TypeKind::Int`];
|
||||
/// `Annotated[int, Path()]` → [`TypeKind::Int`].
|
||||
pub(super) fn classify_param_type<'a>(
|
||||
param: Node<'a>,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
) -> Option<TypeKind> {
|
||||
match lang {
|
||||
"java" => classify_param_type_java(param, code),
|
||||
"typescript" | "ts" => classify_param_type_ts(param, code),
|
||||
"javascript" | "js" => classify_param_type_ts(param, code),
|
||||
"rust" | "rs" => classify_param_type_rust(param, code),
|
||||
"python" | "py" => classify_param_type_python(param, code),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Java (Spring) — recognise typed-extractor parameters via the
|
||||
/// surrounding annotation. Per Hard Rule 3, plain `Long X` without a
|
||||
/// known framework annotation is **not** treated as a typed extractor —
|
||||
/// the parameter could be a regular function argument that the
|
||||
/// framework never validates. Recognised annotations:
|
||||
/// `@PathVariable`, `@RequestParam`, `@RequestBody`, `@RequestHeader`,
|
||||
/// `@CookieValue`, `@MatrixVariable`. When an annotation matches, the
|
||||
/// parameter's static type is consulted via [`java_type_to_kind`].
|
||||
fn classify_param_type_java<'a>(param: Node<'a>, code: &'a [u8]) -> Option<TypeKind> {
|
||||
if param.kind() != "formal_parameter" && param.kind() != "spread_parameter" {
|
||||
return None;
|
||||
}
|
||||
if !has_java_framework_annotation(param, code) {
|
||||
return None;
|
||||
}
|
||||
let type_node = param.child_by_field_name("type")?;
|
||||
let type_text = text_of(type_node, code)?;
|
||||
if let Some(k) = java_type_to_kind(&type_text) {
|
||||
return Some(k);
|
||||
}
|
||||
// Phase 6.2: when the static type is a class name we don't classify
|
||||
// as a primitive (e.g. `@RequestBody CreateUser dto`), look up the
|
||||
// class in the same-file DTO map. Strip any generics for the
|
||||
// leading type so `Foo<Bar>` still resolves on `Foo`.
|
||||
let bare = type_text.split('<').next().unwrap_or(&type_text).trim();
|
||||
let last = bare.rsplit('.').next().unwrap_or(bare);
|
||||
lookup_dto_class(last)
|
||||
}
|
||||
|
||||
/// Walk the parameter's modifiers (annotations) and check if any of
|
||||
/// them are a recognised Spring web binding annotation. Spring's
|
||||
/// annotation grammar exposes annotations as `marker_annotation` /
|
||||
/// `annotation` siblings inside the formal_parameter's `modifiers`
|
||||
/// child.
|
||||
fn has_java_framework_annotation(param: Node<'_>, code: &[u8]) -> bool {
|
||||
const KNOWN: &[&str] = &[
|
||||
"@PathVariable",
|
||||
"@RequestParam",
|
||||
"@RequestBody",
|
||||
"@RequestHeader",
|
||||
"@CookieValue",
|
||||
"@MatrixVariable",
|
||||
"@ModelAttribute",
|
||||
];
|
||||
// Inspect modifiers child first.
|
||||
if let Some(modifiers) = param.child_by_field_name("modifiers") {
|
||||
if let Some(text) = text_of(modifiers, code) {
|
||||
for k in KNOWN {
|
||||
if text.contains(k) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// Fall back to scanning all named children: tree-sitter-java emits
|
||||
// annotations as direct children of formal_parameter in some grammar
|
||||
// versions.
|
||||
let mut cursor = param.walk();
|
||||
for child in param.children(&mut cursor) {
|
||||
let kind = child.kind();
|
||||
if matches!(kind, "marker_annotation" | "annotation" | "modifiers")
|
||||
&& let Some(text) = text_of(child, code)
|
||||
{
|
||||
for k in KNOWN {
|
||||
if text.contains(k) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Map a Java type-text fragment to a [`TypeKind`]. Public to the
|
||||
/// `cfg` module so the Phase 6 DTO collector can reuse the same
|
||||
/// classifier for class fields.
|
||||
pub(super) fn java_type_to_kind(t: &str) -> Option<TypeKind> {
|
||||
let bare = t.trim().trim_start_matches('@').trim();
|
||||
// Drop generic args for the leading type.
|
||||
let bare = bare.split('<').next().unwrap_or(bare).trim();
|
||||
let last = bare.rsplit('.').next().unwrap_or(bare);
|
||||
match last {
|
||||
"int" | "long" | "short" | "byte" | "Integer" | "Long" | "Short" | "Byte"
|
||||
| "BigInteger" => Some(TypeKind::Int),
|
||||
"boolean" | "Boolean" => Some(TypeKind::Bool),
|
||||
"double" | "float" | "Double" | "Float" | "BigDecimal" => Some(TypeKind::Int),
|
||||
"String" | "CharSequence" => Some(TypeKind::String),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Map a TypeScript type-text fragment (already stripped of leading
|
||||
/// `:` / whitespace) to a primitive [`TypeKind`]. Used by both the
|
||||
/// per-parameter classifier and the Phase 6 DTO collector.
|
||||
pub(super) fn ts_type_to_kind(t: &str) -> Option<TypeKind> {
|
||||
let head = t.split('<').next().unwrap_or(t).trim();
|
||||
match head {
|
||||
"number" | "bigint" => Some(TypeKind::Int),
|
||||
"boolean" => Some(TypeKind::Bool),
|
||||
"string" => Some(TypeKind::String),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// TypeScript (NestJS) — recognise typed-extractor parameters via a
|
||||
/// known NestJS decorator (`@Param`, `@Body`, `@Query`, `@Headers`,
|
||||
/// `@Req`, `@Res`). Per Hard Rule 3, a bare `function h(id: number)`
|
||||
/// is not a framework extractor — without a NestJS decorator no
|
||||
/// runtime gate is implied. Pipe coercions (`ParseIntPipe` /
|
||||
/// `ParseBoolPipe`) override the static type.
|
||||
fn classify_param_type_ts<'a>(param: Node<'a>, code: &'a [u8]) -> Option<TypeKind> {
|
||||
if !has_ts_decorator_argument(
|
||||
param,
|
||||
code,
|
||||
&[
|
||||
"@Param",
|
||||
"@Body",
|
||||
"@Query",
|
||||
"@Headers",
|
||||
"@Header",
|
||||
"@Cookie",
|
||||
"@UploadedFile",
|
||||
],
|
||||
) {
|
||||
return None;
|
||||
}
|
||||
// Decorator-based pipe coercion overrides the static type.
|
||||
if has_ts_decorator_argument(param, code, &["ParseIntPipe"]) {
|
||||
return Some(TypeKind::Int);
|
||||
}
|
||||
if has_ts_decorator_argument(param, code, &["ParseBoolPipe"]) {
|
||||
return Some(TypeKind::Bool);
|
||||
}
|
||||
let t = param
|
||||
.child_by_field_name("type")
|
||||
.and_then(|n| inner_ts_type_text(n, code))?;
|
||||
let stripped = t.trim().trim_start_matches(':').trim();
|
||||
if let Some(k) = ts_type_to_kind(stripped) {
|
||||
return Some(k);
|
||||
}
|
||||
// Phase 6.2: NestJS `@Body() dto: CreateUser` — when the static
|
||||
// type is a class / interface name declared in the same file,
|
||||
// resolve via the DTO map. Generic args dropped for the leading
|
||||
// type so `Foo<Bar>` matches on `Foo`.
|
||||
let head = stripped.split('<').next().unwrap_or(stripped).trim();
|
||||
lookup_dto_class(head)
|
||||
}
|
||||
|
||||
fn inner_ts_type_text<'a>(type_anno: Node<'a>, code: &'a [u8]) -> Option<String> {
|
||||
// type_annotation node text is `: T` — unwrap to T.
|
||||
if let Some(child) = type_anno.named_child(0) {
|
||||
return text_of(child, code);
|
||||
}
|
||||
text_of(type_anno, code)
|
||||
}
|
||||
|
||||
/// Walk through a TypeScript / NestJS parameter's decorators looking
|
||||
/// for an identifier matching `wanted` anywhere in the decorator
|
||||
/// argument list (e.g. `@Query('id', ParseIntPipe)`). Conservative
|
||||
/// substring match; all decorator nodes precede the parameter.
|
||||
fn has_ts_decorator_argument(param: Node<'_>, code: &[u8], wanted: &[&str]) -> bool {
|
||||
let mut cur = param.prev_sibling();
|
||||
while let Some(node) = cur {
|
||||
if node.kind() == "decorator" {
|
||||
if let Some(text) = text_of(node, code) {
|
||||
for w in wanted {
|
||||
if text.contains(w) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// Some grammars attach decorators as children of the param.
|
||||
cur = node.prev_sibling();
|
||||
}
|
||||
let mut cursor = param.walk();
|
||||
for child in param.children(&mut cursor) {
|
||||
if child.kind() == "decorator" {
|
||||
if let Some(text) = text_of(child, code) {
|
||||
for w in wanted {
|
||||
if text.contains(w) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Rust (Axum / Rocket / Actix) — read the parameter's type text and
|
||||
/// look for `Path<i64>` / `Json<T>` / `Form<T>` / `Query<T>` shapes.
|
||||
/// Per Hard Rule 3, bare primitives (`fn h(id: i64)` without an
|
||||
/// extractor wrapper) are **not** treated as typed extractors — only
|
||||
/// framework-wrapped types qualify.
|
||||
fn classify_param_type_rust<'a>(param: Node<'a>, code: &'a [u8]) -> Option<TypeKind> {
|
||||
if param.kind() != "parameter" {
|
||||
return None;
|
||||
}
|
||||
let type_node = param.child_by_field_name("type")?;
|
||||
let type_text = text_of(type_node, code)?;
|
||||
rust_type_to_kind(&type_text)
|
||||
}
|
||||
|
||||
fn rust_type_to_kind(t: &str) -> Option<TypeKind> {
|
||||
let stripped = t.trim();
|
||||
// Reject reference / mutability noise so `&Path<i64>` still matches
|
||||
// the wrapper detection below.
|
||||
let stripped = stripped
|
||||
.trim_start_matches('&')
|
||||
.trim_start_matches('&')
|
||||
.trim_start_matches("mut ")
|
||||
.trim();
|
||||
// Only framework wrapper extractors qualify — bare primitives like
|
||||
// `i64` could be regular function parameters with no framework
|
||||
// validation gate.
|
||||
for wrap in [
|
||||
"Path",
|
||||
"Json",
|
||||
"Form",
|
||||
"Query",
|
||||
"web::Path",
|
||||
"web::Json",
|
||||
"web::Form",
|
||||
"web::Query",
|
||||
"rocket::http::uri::Origin",
|
||||
] {
|
||||
let prefix = format!("{wrap}<");
|
||||
if let Some(rest) = stripped.strip_prefix(&prefix) {
|
||||
if let Some(inner) = rest.strip_suffix('>') {
|
||||
let inner = inner.trim();
|
||||
// Tuple extractor `Path<(i64, String)>` — first element wins.
|
||||
if inner.starts_with('(') {
|
||||
let inside = inner.trim_start_matches('(').trim_end_matches(')');
|
||||
let first = inside.split(',').next().unwrap_or("").trim();
|
||||
if let Some(k) = rust_primitive_to_kind(first) {
|
||||
return Some(k);
|
||||
}
|
||||
}
|
||||
// Bare path generic `Path<i64>`.
|
||||
if let Some(k) = rust_primitive_to_kind(inner) {
|
||||
return Some(k);
|
||||
}
|
||||
// Phase 6.2: `Json<T>` / `Form<T>` / `Query<T>` /
|
||||
// `Path<T>` with a same-file struct type — resolve via
|
||||
// the DTO map. Strip nested generics so `Json<Foo<i64>>`
|
||||
// matches on `Foo`.
|
||||
let head = inner.split('<').next().unwrap_or(inner).trim();
|
||||
if let Some(k) = lookup_dto_class(head) {
|
||||
return Some(k);
|
||||
}
|
||||
// Custom struct outside the same file — leave None
|
||||
// (cross-file resolution is Phase 6.4).
|
||||
return None;
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Map a Rust primitive / `String` / `&str` to a [`TypeKind`]. Public
|
||||
/// to the `cfg` module so the Phase 6 DTO collector can reuse it for
|
||||
/// `struct` field types.
|
||||
pub(super) fn rust_primitive_to_kind(t: &str) -> Option<TypeKind> {
|
||||
let t = t.trim();
|
||||
match t {
|
||||
"i8" | "i16" | "i32" | "i64" | "i128" | "isize" | "u8" | "u16" | "u32" | "u64" | "u128"
|
||||
| "usize" => Some(TypeKind::Int),
|
||||
"f32" | "f64" => Some(TypeKind::Int),
|
||||
"bool" => Some(TypeKind::Bool),
|
||||
"String" | "&str" | "str" => Some(TypeKind::String),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Python (FastAPI) — recognise typed-extractor parameters via the
|
||||
/// `Annotated[X, Path()/Query()/Body()/Header()/Cookie()]` shape. Per
|
||||
/// Hard Rule 3, a bare `def h(id: int)` is **not** a framework
|
||||
/// extractor — the function may be a plain Python function and the
|
||||
/// type annotation provides no runtime gate.
|
||||
fn classify_param_type_python<'a>(param: Node<'a>, code: &'a [u8]) -> Option<TypeKind> {
|
||||
let type_node = param.child_by_field_name("type")?;
|
||||
let type_text = text_of(type_node, code)?;
|
||||
python_type_to_kind(&type_text)
|
||||
}
|
||||
|
||||
fn python_type_to_kind(t: &str) -> Option<TypeKind> {
|
||||
let stripped = t.trim();
|
||||
// `Annotated[int, Path()]` — only matches when one of the generic
|
||||
// args names a recognised FastAPI binding marker. Otherwise no
|
||||
// framework gate is implied.
|
||||
if let Some(inner) = stripped
|
||||
.strip_prefix("Annotated[")
|
||||
.or_else(|| stripped.strip_prefix("typing.Annotated["))
|
||||
{
|
||||
let inside = inner.trim_end_matches(']');
|
||||
if !contains_fastapi_marker(inside) {
|
||||
return None;
|
||||
}
|
||||
let first = inside.split(',').next().unwrap_or("").trim();
|
||||
if let Some(k) = python_primitive_to_kind(first) {
|
||||
return Some(k);
|
||||
}
|
||||
// Phase 6.2: `Annotated[CreateUser, Body()]` with a same-file
|
||||
// Pydantic model — resolve via the DTO map. Generic args are
|
||||
// dropped via the same head-split as `python_primitive_to_kind`.
|
||||
let head = first.split('[').next().unwrap_or(first).trim();
|
||||
return lookup_dto_class(head);
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
fn contains_fastapi_marker(s: &str) -> bool {
|
||||
const MARKERS: &[&str] = &[
|
||||
"Path(", "Query(", "Body(", "Header(", "Cookie(", "Form(", "File(",
|
||||
];
|
||||
MARKERS.iter().any(|m| s.contains(m))
|
||||
}
|
||||
|
||||
/// Map a Python type expression to a primitive [`TypeKind`]. Used by
|
||||
/// both the per-parameter classifier and the Phase 6 Pydantic-model
|
||||
/// field collector.
|
||||
pub(super) fn python_primitive_to_kind(t: &str) -> Option<TypeKind> {
|
||||
let head = t.trim().split('[').next().unwrap_or(t).trim();
|
||||
match head {
|
||||
"int" => Some(TypeKind::Int),
|
||||
"bool" => Some(TypeKind::Bool),
|
||||
"float" => Some(TypeKind::Int),
|
||||
"str" => Some(TypeKind::String),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a callee name matches any configured terminator.
|
||||
pub(super) fn is_configured_terminator(
|
||||
callee: &str,
|
||||
|
|
@ -407,3 +801,157 @@ pub(super) fn is_configured_terminator(
|
|||
false
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod typed_extractor_tests {
|
||||
use super::{
|
||||
contains_fastapi_marker, java_type_to_kind, python_primitive_to_kind, python_type_to_kind,
|
||||
rust_primitive_to_kind, rust_type_to_kind,
|
||||
};
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
|
||||
// ── Java (Spring) ────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn java_long_path_variable_maps_to_int() {
|
||||
assert_eq!(java_type_to_kind("Long"), Some(TypeKind::Int));
|
||||
assert_eq!(java_type_to_kind("long"), Some(TypeKind::Int));
|
||||
assert_eq!(java_type_to_kind("Integer"), Some(TypeKind::Int));
|
||||
assert_eq!(java_type_to_kind("int"), Some(TypeKind::Int));
|
||||
assert_eq!(java_type_to_kind("Short"), Some(TypeKind::Int));
|
||||
assert_eq!(java_type_to_kind("BigInteger"), Some(TypeKind::Int));
|
||||
assert_eq!(
|
||||
java_type_to_kind("java.lang.Long"),
|
||||
Some(TypeKind::Int),
|
||||
"fully-qualified Long must still map to Int"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_string_request_param_maps_to_string() {
|
||||
assert_eq!(java_type_to_kind("String"), Some(TypeKind::String));
|
||||
assert_eq!(java_type_to_kind("CharSequence"), Some(TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_boolean_maps_to_bool() {
|
||||
assert_eq!(java_type_to_kind("Boolean"), Some(TypeKind::Bool));
|
||||
assert_eq!(java_type_to_kind("boolean"), Some(TypeKind::Bool));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_request_body_dto_returns_none_until_phase_six() {
|
||||
// @RequestBody CreateUserDto dto — no kind today; Phase 6 will
|
||||
// return DtoObject(name) once cross-file class resolution lands.
|
||||
assert_eq!(java_type_to_kind("CreateUserDto"), None);
|
||||
assert_eq!(java_type_to_kind("List<String>"), None);
|
||||
}
|
||||
|
||||
// ── Rust (Axum / Rocket / Actix) ─────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn rust_path_int_extractor_maps_to_int() {
|
||||
assert_eq!(rust_type_to_kind("Path<i64>"), Some(TypeKind::Int));
|
||||
assert_eq!(rust_type_to_kind("Path<u32>"), Some(TypeKind::Int));
|
||||
assert_eq!(rust_type_to_kind("Path<usize>"), Some(TypeKind::Int));
|
||||
assert_eq!(rust_type_to_kind("Path<i32>"), Some(TypeKind::Int));
|
||||
assert_eq!(rust_type_to_kind("web::Path<i64>"), Some(TypeKind::Int));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_path_tuple_first_element_wins() {
|
||||
// Path<(i64, String)> — first slot is the int extractor that
|
||||
// matters for sink suppression.
|
||||
assert_eq!(
|
||||
rust_type_to_kind("Path<(i64, String)>"),
|
||||
Some(TypeKind::Int)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_path_string_maps_to_string() {
|
||||
assert_eq!(rust_type_to_kind("Path<String>"), Some(TypeKind::String));
|
||||
assert_eq!(rust_type_to_kind("Path<&str>"), Some(TypeKind::String));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_json_dto_returns_none_until_phase_six() {
|
||||
// Json<T> / Form<T> / Query<T> with a custom struct type — no
|
||||
// primitive resolution available; Phase 6 lifts to DTO.
|
||||
assert_eq!(rust_type_to_kind("Json<CreateUserDto>"), None);
|
||||
assert_eq!(rust_type_to_kind("Form<CreateUserDto>"), None);
|
||||
assert_eq!(rust_type_to_kind("Query<Filters>"), None);
|
||||
}
|
||||
|
||||
/// Per Hard Rule 3, bare primitives (`fn h(id: i64)`) are NOT
|
||||
/// framework extractors — only wrapper types (`Path<i64>` etc.)
|
||||
/// imply a framework runtime gate. Bare i64 must return None.
|
||||
#[test]
|
||||
fn rust_bare_primitives_are_not_framework_extractors() {
|
||||
assert_eq!(rust_type_to_kind("i64"), None);
|
||||
assert_eq!(rust_type_to_kind("u32"), None);
|
||||
assert_eq!(rust_type_to_kind("bool"), None);
|
||||
assert_eq!(rust_type_to_kind("String"), None);
|
||||
// `rust_primitive_to_kind` (used for tuple inner / wrapper inner)
|
||||
// remains a separate primitive-only mapping.
|
||||
assert_eq!(rust_primitive_to_kind("i64"), Some(TypeKind::Int));
|
||||
assert_eq!(rust_primitive_to_kind("bool"), Some(TypeKind::Bool));
|
||||
}
|
||||
|
||||
// ── Python (FastAPI) ─────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn python_bare_primitives_are_not_framework_extractors() {
|
||||
// Per Hard Rule 3: bare `def h(id: int)` is NOT a typed
|
||||
// extractor — without an `Annotated[..., Path()/Query()/Body()]`
|
||||
// wrapper, no FastAPI gate is implied.
|
||||
assert_eq!(python_type_to_kind("int"), None);
|
||||
assert_eq!(python_type_to_kind("float"), None);
|
||||
assert_eq!(python_type_to_kind("bool"), None);
|
||||
assert_eq!(python_type_to_kind("str"), None);
|
||||
// The inner primitive resolver is unchanged.
|
||||
assert_eq!(python_primitive_to_kind("int"), Some(TypeKind::Int));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_annotated_with_fastapi_marker_qualifies() {
|
||||
assert_eq!(
|
||||
python_type_to_kind("Annotated[int, Path()]"),
|
||||
Some(TypeKind::Int)
|
||||
);
|
||||
assert_eq!(
|
||||
python_type_to_kind("typing.Annotated[int, Path()]"),
|
||||
Some(TypeKind::Int)
|
||||
);
|
||||
assert_eq!(
|
||||
python_type_to_kind("Annotated[str, Query(max_length=50)]"),
|
||||
Some(TypeKind::String)
|
||||
);
|
||||
assert_eq!(
|
||||
python_type_to_kind("Annotated[bool, Body()]"),
|
||||
Some(TypeKind::Bool)
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_annotated_without_marker_returns_none() {
|
||||
// Annotated without a FastAPI binding marker is a generic
|
||||
// type-system tag — not a framework extractor.
|
||||
assert_eq!(python_type_to_kind("Annotated[int, str]"), None);
|
||||
assert_eq!(python_type_to_kind("Annotated[int, MyMeta]"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_pydantic_model_returns_none_until_phase_six() {
|
||||
assert_eq!(python_type_to_kind("CreateUser"), None);
|
||||
assert_eq!(python_type_to_kind("BaseModel"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fastapi_marker_detection() {
|
||||
assert!(contains_fastapi_marker("int, Path()"));
|
||||
assert!(contains_fastapi_marker("str, Query(max_length=5)"));
|
||||
assert!(contains_fastapi_marker("bytes, File()"));
|
||||
assert!(!contains_fastapi_marker("int, str"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -4,6 +4,32 @@ use crate::patterns::Severity;
|
|||
use petgraph::graph::NodeIndex;
|
||||
use petgraph::visit::EdgeRef;
|
||||
|
||||
/// Strict err-identifier match for cfg-error-fallthrough.
|
||||
///
|
||||
/// The previous heuristic `lower.contains("err")` over-matched method
|
||||
/// names like Java `logger.isErrorEnabled()` (the camelCase identifier
|
||||
/// `isErrorEnabled` matched because it contains `err`). The rule's
|
||||
/// real target is a variable / field that holds an error value.
|
||||
///
|
||||
/// Returns true if the identifier is exactly `err` / `error` or a
|
||||
/// snake-case error name (`err_x`, `error_x`, `x_err`, `x_error`).
|
||||
/// CamelCase names (`isErrorEnabled`, `getError`, `errorMsg`) are
|
||||
/// rejected — the cost is occasional FNs on Java-style error fields,
|
||||
/// which is acceptable for a precision fix.
|
||||
fn is_error_var_ident(name: &str) -> bool {
|
||||
let lower = name.to_ascii_lowercase();
|
||||
if lower == "err" || lower == "error" {
|
||||
return true;
|
||||
}
|
||||
if lower.starts_with("err_") || lower.starts_with("error_") {
|
||||
return true;
|
||||
}
|
||||
if lower.ends_with("_err") || lower.ends_with("_error") {
|
||||
return true;
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Does the condition text contain a unary `!` (logical-not, NOT `!=`)
|
||||
/// applied to an identifier or member chain whose name contains "err"?
|
||||
///
|
||||
|
|
@ -137,14 +163,31 @@ fn terminates_on_all_paths(
|
|||
true
|
||||
}
|
||||
|
||||
/// Find successor nodes after an If node merges (nodes reachable from both branches).
|
||||
/// Find successor nodes after an If node merges.
|
||||
///
|
||||
/// Walks **only** the False edge of the if (and Seq edges from there),
|
||||
/// so that sinks inside the True body are NOT counted as "post-if"
|
||||
/// fallthrough sinks. The False edge represents the no-error branch,
|
||||
/// which is the path the rule wants to scan for "did execution fall
|
||||
/// through past an unhandled error?".
|
||||
///
|
||||
/// For `if err != nil { warn(); }` with no statement after the if,
|
||||
/// the False edge leads to the function exit and no sinks are found.
|
||||
/// For `if err != nil { warn(); } sink(x)`, the False edge leads to
|
||||
/// `sink(x)` and the rule fires correctly.
|
||||
fn find_post_if_sinks(cfg: &crate::cfg::Cfg, if_node: NodeIndex) -> Vec<NodeIndex> {
|
||||
let mut sinks_after = Vec::new();
|
||||
|
||||
// Get all successors of the if node's merge point
|
||||
// Walk through successors looking for sinks
|
||||
let mut visited = std::collections::HashSet::new();
|
||||
let mut stack: Vec<NodeIndex> = cfg.neighbors(if_node).collect();
|
||||
|
||||
// Seed from the False edge only. If the if has no explicit False
|
||||
// edge (some CFG shapes omit it for one-branch ifs), fall back to
|
||||
// Seq edges from the if node — but never follow True edges, which
|
||||
// lead into the body.
|
||||
let mut stack: Vec<NodeIndex> = cfg
|
||||
.edges(if_node)
|
||||
.filter(|e| matches!(e.weight(), EdgeKind::False | EdgeKind::Seq))
|
||||
.map(|e| e.target())
|
||||
.collect();
|
||||
|
||||
while let Some(current) = stack.pop() {
|
||||
if !visited.insert(current) {
|
||||
|
|
@ -156,13 +199,13 @@ fn find_post_if_sinks(cfg: &crate::cfg::Cfg, if_node: NodeIndex) -> Vec<NodeInde
|
|||
sinks_after.push(current);
|
||||
}
|
||||
|
||||
for succ in cfg.neighbors(current) {
|
||||
let is_back_edge = cfg
|
||||
.edges(current)
|
||||
.any(|e| e.target() == succ && matches!(e.weight(), EdgeKind::Back));
|
||||
if !is_back_edge {
|
||||
stack.push(succ);
|
||||
for edge in cfg.edges(current) {
|
||||
let succ = edge.target();
|
||||
// Don't follow back edges (loops) or exception edges.
|
||||
if matches!(edge.weight(), EdgeKind::Back | EdgeKind::Exception) {
|
||||
continue;
|
||||
}
|
||||
stack.push(succ);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -193,10 +236,7 @@ impl CfgAnalysis for IncompleteErrorHandling {
|
|||
continue;
|
||||
}
|
||||
|
||||
let mentions_err = info.condition_vars.iter().any(|u| {
|
||||
let lower = u.to_ascii_lowercase();
|
||||
lower == "err" || lower == "error" || lower.contains("err")
|
||||
});
|
||||
let mentions_err = info.condition_vars.iter().any(|u| is_error_var_ident(u));
|
||||
|
||||
if !mentions_err {
|
||||
continue;
|
||||
|
|
@ -289,3 +329,45 @@ mod negation_tests {
|
|||
assert!(!contains_negated_err_identifier("hasError(x)"));
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod err_ident_tests {
|
||||
use super::is_error_var_ident;
|
||||
|
||||
#[test]
|
||||
fn matches_canonical_error_vars() {
|
||||
assert!(is_error_var_ident("err"));
|
||||
assert!(is_error_var_ident("error"));
|
||||
assert!(is_error_var_ident("ERR"));
|
||||
assert!(is_error_var_ident("Error"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn matches_snake_case_error_vars() {
|
||||
assert!(is_error_var_ident("err_resp"));
|
||||
assert!(is_error_var_ident("error_msg"));
|
||||
assert!(is_error_var_ident("response_err"));
|
||||
assert!(is_error_var_ident("parse_error"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_camelcase_method_names() {
|
||||
// Spring `logger.isErrorEnabled()` lifts `isErrorEnabled` into
|
||||
// `condition_vars`; under the old `lower.contains("err")` check
|
||||
// this fired the rule. The new strict check rejects it — the
|
||||
// condition is asking "is logging enabled", not "is there an
|
||||
// error".
|
||||
assert!(!is_error_var_ident("isErrorEnabled"));
|
||||
assert!(!is_error_var_ident("getError"));
|
||||
assert!(!is_error_var_ident("hasError"));
|
||||
assert!(!is_error_var_ident("errorMsg"));
|
||||
assert!(!is_error_var_ident("errCode"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_unrelated_idents() {
|
||||
assert!(!is_error_var_ident("user"));
|
||||
assert!(!is_error_var_ident("merry"));
|
||||
assert!(!is_error_var_ident("perform"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -266,6 +266,10 @@ fn ssa_operand_const_or_param(
|
|||
}
|
||||
SsaOp::Source => return false,
|
||||
SsaOp::Nop | SsaOp::Undef => {}
|
||||
// FieldProj: walk the receiver — `obj.f` is constant iff `obj`
|
||||
// is constant under the same definition. The field name itself
|
||||
// is structural and adds no runtime value.
|
||||
SsaOp::FieldProj { receiver, .. } => stack.push(*receiver),
|
||||
}
|
||||
}
|
||||
true
|
||||
|
|
@ -332,6 +336,9 @@ fn ssa_operand_constant(
|
|||
// Undef is a non-user, non-dynamic sentinel — treat like Const
|
||||
// (no additional operands to trace).
|
||||
SsaOp::Undef => {}
|
||||
// FieldProj: structural field read; constness reduces to the
|
||||
// receiver's constness.
|
||||
SsaOp::FieldProj { receiver, .. } => stack.push(*receiver),
|
||||
}
|
||||
}
|
||||
true
|
||||
|
|
|
|||
|
|
@ -28,6 +28,15 @@ pub struct BodyConstFacts {
|
|||
pub ssa: SsaBody,
|
||||
pub const_values: HashMap<SsaValue, ConstLattice>,
|
||||
pub type_facts: TypeFactResult,
|
||||
/// Field-sensitive Steensgaard points-to facts.
|
||||
///
|
||||
/// Computed only when [`crate::pointer::is_enabled()`] (i.e. the
|
||||
/// `NYX_POINTER_ANALYSIS=1` env var is set). Phase 2 of the
|
||||
/// pointer-analysis rollout consumes this in `state::transfer.rs`
|
||||
/// to suppress proxy-acquire mis-attribution on field-aliased
|
||||
/// locals like `m := c.mu`. When `None`, every consumer must fall
|
||||
/// back to its existing pointer-unaware behaviour.
|
||||
pub pointer_facts: Option<crate::pointer::PointsToFacts>,
|
||||
}
|
||||
|
||||
/// Lower a body to SSA and run constant propagation. Returns `None` when
|
||||
|
|
@ -42,11 +51,22 @@ pub fn build_body_const_facts(body: &crate::cfg::BodyCfg, lang: Lang) -> Option<
|
|||
&body.meta.params,
|
||||
)
|
||||
.ok()?;
|
||||
let opt = crate::ssa::optimize_ssa(&mut ssa, &body.graph, Some(lang));
|
||||
let opt = crate::ssa::optimize_ssa_with_param_types(
|
||||
&mut ssa,
|
||||
&body.graph,
|
||||
Some(lang),
|
||||
&body.meta.param_types,
|
||||
);
|
||||
let pointer_facts = if crate::pointer::is_enabled() {
|
||||
Some(crate::pointer::analyse_body(&ssa, body.meta.id))
|
||||
} else {
|
||||
None
|
||||
};
|
||||
Some(BodyConstFacts {
|
||||
ssa,
|
||||
const_values: opt.const_values,
|
||||
type_facts: opt.type_facts,
|
||||
pointer_facts,
|
||||
})
|
||||
}
|
||||
|
||||
|
|
|
|||
125
src/cli.rs
125
src/cli.rs
|
|
@ -32,6 +32,22 @@ impl Commands {
|
|||
pub fn is_serve(&self) -> bool {
|
||||
matches!(self, Commands::Serve { .. })
|
||||
}
|
||||
|
||||
/// Pure read-only / informational commands that should run without the
|
||||
/// "note: Using …" config preamble or the trailing "Finished in …"
|
||||
/// timing line. These commands' output is often piped or grepped; the
|
||||
/// surrounding chrome is noise.
|
||||
pub fn is_informational(&self) -> bool {
|
||||
match self {
|
||||
Commands::Scan { explain_engine, .. } => *explain_engine,
|
||||
Commands::List { .. } => true,
|
||||
Commands::Config { action } => {
|
||||
matches!(action, ConfigAction::Show { .. } | ConfigAction::Path)
|
||||
}
|
||||
Commands::Index { action } => matches!(action, IndexAction::Status { .. }),
|
||||
_ => false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Output format for scan results.
|
||||
|
|
@ -167,11 +183,11 @@ pub enum Commands {
|
|||
path: String,
|
||||
|
||||
/// Index mode: auto (default), off (no index), rebuild (force rebuild)
|
||||
#[arg(long, value_enum, default_value_t = IndexMode::Auto)]
|
||||
#[arg(long, value_enum, default_value_t = IndexMode::Auto, help_heading = "Analysis")]
|
||||
index: IndexMode,
|
||||
|
||||
/// Output format (defaults to config's default_format, or "console")
|
||||
#[arg(short, long, value_enum)]
|
||||
#[arg(short, long, value_enum, help_heading = "Output")]
|
||||
format: Option<OutputFormat>,
|
||||
|
||||
/// Severity filter expression: HIGH, HIGH,MEDIUM, or >=MEDIUM
|
||||
|
|
@ -179,30 +195,30 @@ pub enum Commands {
|
|||
/// Filters findings AFTER all severity normalization (e.g. nonprod
|
||||
/// downgrades). Only findings matching the expression are emitted.
|
||||
/// Case-insensitive. Shell-quote expressions containing ">".
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
severity: Option<String>,
|
||||
|
||||
/// Analysis mode: full (default), ast, cfg, taint
|
||||
#[arg(long, value_enum, default_value_t = ScanMode::Full)]
|
||||
#[arg(long, value_enum, default_value_t = ScanMode::Full, help_heading = "Analysis")]
|
||||
mode: ScanMode,
|
||||
|
||||
/// Named scan profile to apply (e.g. quick, full, ci, taint_only, conservative_large_repo)
|
||||
///
|
||||
/// Profiles override scan-related config settings. CLI flags still
|
||||
/// take precedence over profile values.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Analysis")]
|
||||
profile: Option<String>,
|
||||
|
||||
/// Engine-depth shortcut: fast, balanced, or deep. Sets the full
|
||||
/// stack of analysis toggles at once; individual engine flags still
|
||||
/// override this after application.
|
||||
#[arg(long, value_enum)]
|
||||
#[arg(long, value_enum, help_heading = "Analysis")]
|
||||
engine_profile: Option<EngineProfile>,
|
||||
|
||||
/// Print the effective engine configuration and exit without
|
||||
/// scanning. Useful for understanding how CLI flags and config
|
||||
/// values resolve together.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Analysis")]
|
||||
explain_engine: bool,
|
||||
|
||||
/// Scan all targets (alias for --mode full)
|
||||
|
|
@ -213,57 +229,57 @@ pub enum Commands {
|
|||
///
|
||||
/// By default, findings in non-production paths are downgraded by one
|
||||
/// severity tier. This flag preserves original severity.
|
||||
#[arg(long, alias = "include-nonprod")]
|
||||
#[arg(long, alias = "include-nonprod", help_heading = "Output")]
|
||||
keep_nonprod_severity: bool,
|
||||
|
||||
/// Suppress all human-readable status output
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
quiet: bool,
|
||||
|
||||
/// Exit with code 1 if any finding meets or exceeds this severity
|
||||
///
|
||||
/// Useful for CI gating. Example: --fail-on HIGH
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
fail_on: Option<String>,
|
||||
|
||||
/// Disable state-model analysis (resource lifecycle, auth state)
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Analysis")]
|
||||
no_state: bool,
|
||||
|
||||
/// Disable attack-surface ranking (findings are sorted by exploitability by default)
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
no_rank: bool,
|
||||
|
||||
/// Show inline-suppressed findings (dimmed, tagged [SUPPRESSED])
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
show_suppressed: bool,
|
||||
|
||||
/// Show all findings: disables category filtering, rollups, and LOW budgets
|
||||
#[arg(long = "all")]
|
||||
#[arg(long = "all", help_heading = "Output")]
|
||||
show_all: bool,
|
||||
|
||||
/// Include Quality findings (excluded by default)
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
include_quality: bool,
|
||||
|
||||
/// Maximum total LOW findings to show
|
||||
#[arg(long, default_value_t = 20)]
|
||||
#[arg(long, default_value_t = 20, help_heading = "Output")]
|
||||
max_low: u32,
|
||||
|
||||
/// Maximum LOW findings per file
|
||||
#[arg(long, default_value_t = 1)]
|
||||
#[arg(long, default_value_t = 1, help_heading = "Output")]
|
||||
max_low_per_file: u32,
|
||||
|
||||
/// Maximum LOW findings per rule
|
||||
#[arg(long, default_value_t = 10)]
|
||||
#[arg(long, default_value_t = 10, help_heading = "Output")]
|
||||
max_low_per_rule: u32,
|
||||
|
||||
/// Number of example locations in rollup findings
|
||||
#[arg(long, default_value_t = 5)]
|
||||
#[arg(long, default_value_t = 5, help_heading = "Output")]
|
||||
rollup_examples: u32,
|
||||
|
||||
/// Show all instances for a specific rule (bypasses rollup for that rule)
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
show_instances: Option<String>,
|
||||
|
||||
/// Minimum attack-surface score to include in output
|
||||
|
|
@ -271,89 +287,97 @@ pub enum Commands {
|
|||
/// Findings with a rank score below this threshold are suppressed.
|
||||
/// Requires ranking to be enabled (has no effect with --no-rank).
|
||||
/// Example: --min-score 50
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
min_score: Option<u32>,
|
||||
|
||||
/// Minimum confidence level to include in output
|
||||
///
|
||||
/// Values: low, medium, high. Findings below this level are dropped.
|
||||
/// JSON/SARIF include all unless filtered.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
min_confidence: Option<String>,
|
||||
|
||||
/// Drop findings emitted from capped / widened / bailed analysis
|
||||
///
|
||||
/// Suppresses any finding whose engine provenance notes indicate
|
||||
/// over-reporting (predicate/path widening) or analysis bail
|
||||
/// (SSA lowering failure, parse timeout). Under-report notes —
|
||||
/// where the emitted finding is still a real flow but the
|
||||
/// result set is a lower bound — are kept.
|
||||
/// (SSA lowering failure, parse timeout). Under-report notes
|
||||
/// (where the emitted finding is still a real flow but the
|
||||
/// result set is a lower bound) are kept.
|
||||
///
|
||||
/// Intended for strict CI gates where a finding from non-converged
|
||||
/// analysis is worse than no finding. Applied after ranking and
|
||||
/// before the `max_results` truncation.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Output")]
|
||||
require_converged: bool,
|
||||
|
||||
// ── Analysis engine toggles (override [analysis.engine] config) ───
|
||||
/// Enable path-constraint solving (default: on)
|
||||
#[arg(long, overrides_with = "no_constraint_solving")]
|
||||
#[arg(
|
||||
long,
|
||||
overrides_with = "no_constraint_solving",
|
||||
help_heading = "Engine"
|
||||
)]
|
||||
constraint_solving: bool,
|
||||
/// Disable path-constraint solving
|
||||
#[arg(long, overrides_with = "constraint_solving")]
|
||||
#[arg(long, overrides_with = "constraint_solving", help_heading = "Engine")]
|
||||
no_constraint_solving: bool,
|
||||
|
||||
/// Enable abstract interpretation (default: on)
|
||||
#[arg(long, overrides_with = "no_abstract_interp")]
|
||||
#[arg(long, overrides_with = "no_abstract_interp", help_heading = "Engine")]
|
||||
abstract_interp: bool,
|
||||
/// Disable abstract interpretation
|
||||
#[arg(long, overrides_with = "abstract_interp")]
|
||||
#[arg(long, overrides_with = "abstract_interp", help_heading = "Engine")]
|
||||
no_abstract_interp: bool,
|
||||
|
||||
/// Enable k=1 context-sensitive callee inlining (default: on)
|
||||
#[arg(long, overrides_with = "no_context_sensitive")]
|
||||
#[arg(long, overrides_with = "no_context_sensitive", help_heading = "Engine")]
|
||||
context_sensitive: bool,
|
||||
/// Disable context-sensitive callee inlining
|
||||
#[arg(long, overrides_with = "context_sensitive")]
|
||||
#[arg(long, overrides_with = "context_sensitive", help_heading = "Engine")]
|
||||
no_context_sensitive: bool,
|
||||
|
||||
/// Enable the symex pipeline (default: on)
|
||||
#[arg(long, overrides_with = "no_symex")]
|
||||
#[arg(long, overrides_with = "no_symex", help_heading = "Symex")]
|
||||
symex: bool,
|
||||
/// Disable the symex pipeline entirely
|
||||
#[arg(long, overrides_with = "symex")]
|
||||
#[arg(long, overrides_with = "symex", help_heading = "Symex")]
|
||||
no_symex: bool,
|
||||
|
||||
/// Enable cross-file symbolic body execution (default: on)
|
||||
#[arg(long, overrides_with = "no_cross_file_symex")]
|
||||
#[arg(long, overrides_with = "no_cross_file_symex", help_heading = "Symex")]
|
||||
cross_file_symex: bool,
|
||||
/// Disable cross-file symbolic body execution
|
||||
#[arg(long, overrides_with = "cross_file_symex")]
|
||||
#[arg(long, overrides_with = "cross_file_symex", help_heading = "Symex")]
|
||||
no_cross_file_symex: bool,
|
||||
|
||||
/// Enable interprocedural symex frame stack (default: on)
|
||||
#[arg(long, overrides_with = "no_symex_interproc")]
|
||||
#[arg(long, overrides_with = "no_symex_interproc", help_heading = "Symex")]
|
||||
symex_interproc: bool,
|
||||
/// Disable interprocedural symex
|
||||
#[arg(long, overrides_with = "symex_interproc")]
|
||||
#[arg(long, overrides_with = "symex_interproc", help_heading = "Symex")]
|
||||
no_symex_interproc: bool,
|
||||
|
||||
/// Enable SMT solver backend when nyx is built with the `smt` feature (default: on)
|
||||
#[arg(long, overrides_with = "no_smt")]
|
||||
#[arg(long, overrides_with = "no_smt", help_heading = "Symex")]
|
||||
smt: bool,
|
||||
/// Disable SMT solver backend
|
||||
#[arg(long, overrides_with = "smt")]
|
||||
#[arg(long, overrides_with = "smt", help_heading = "Symex")]
|
||||
no_smt: bool,
|
||||
|
||||
/// Enable demand-driven backwards analysis (default: off)
|
||||
#[arg(long, overrides_with = "no_backwards_analysis")]
|
||||
#[arg(
|
||||
long,
|
||||
overrides_with = "no_backwards_analysis",
|
||||
help_heading = "Engine"
|
||||
)]
|
||||
backwards_analysis: bool,
|
||||
/// Disable demand-driven backwards analysis
|
||||
#[arg(long, overrides_with = "backwards_analysis")]
|
||||
#[arg(long, overrides_with = "backwards_analysis", help_heading = "Engine")]
|
||||
no_backwards_analysis: bool,
|
||||
|
||||
/// Override per-file tree-sitter parse timeout (ms). 0 disables the cap.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Limits")]
|
||||
parse_timeout_ms: Option<u64>,
|
||||
|
||||
/// Maximum taint origins retained per lattice value (default: 32).
|
||||
|
|
@ -363,7 +387,7 @@ pub enum Commands {
|
|||
/// `OriginsTruncated` engine note is recorded on affected findings.
|
||||
/// Raise for very wide codebases where truncation is observed;
|
||||
/// lower only when lattice width is a measured bottleneck.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Limits")]
|
||||
max_origins: Option<u32>,
|
||||
|
||||
/// Maximum abstract heap objects retained per points-to set (default: 32).
|
||||
|
|
@ -373,7 +397,7 @@ pub enum Commands {
|
|||
/// `PointsToTruncated` engine note is recorded on affected findings.
|
||||
/// Raise for factory-heavy codebases where truncation is observed;
|
||||
/// lower only when points-to width is a measured bottleneck.
|
||||
#[arg(long)]
|
||||
#[arg(long, help_heading = "Limits")]
|
||||
max_pointsto: Option<u32>,
|
||||
|
||||
// ── Deprecated aliases (hidden) ─────────────────────────────────
|
||||
|
|
@ -449,8 +473,15 @@ pub enum Commands {
|
|||
|
||||
#[derive(Subcommand)]
|
||||
pub enum ConfigAction {
|
||||
/// Print effective merged configuration as TOML
|
||||
Show,
|
||||
/// Print configuration as TOML. By default shows only the values
|
||||
/// that differ from built-in defaults. Pass `--all` for the full
|
||||
/// effective configuration.
|
||||
Show {
|
||||
/// Print the full effective configuration instead of just
|
||||
/// the user's overrides.
|
||||
#[arg(long)]
|
||||
all: bool,
|
||||
},
|
||||
|
||||
/// Print configuration directory path
|
||||
Path,
|
||||
|
|
|
|||
|
|
@ -4,14 +4,210 @@ use console::style;
|
|||
use std::fs;
|
||||
use std::path::Path;
|
||||
|
||||
/// Show the effective merged configuration as TOML.
|
||||
pub fn show(config: &Config) -> NyxResult<()> {
|
||||
let toml_str =
|
||||
toml::to_string_pretty(config).map_err(|e| format!("Failed to serialize config: {e}"))?;
|
||||
println!("{toml_str}");
|
||||
/// Show the configuration as TOML.
|
||||
///
|
||||
/// By default emits only the values that differ from `Config::default()`,
|
||||
/// which answers the common question "what's actually customized here?"
|
||||
/// without burying the user under hundreds of lines of defaults. Pass
|
||||
/// `all=true` to emit the full effective configuration (useful when piping
|
||||
/// into a starter `nyx.local` file).
|
||||
///
|
||||
/// Section headers are coloured cyan and keys dimmed when stdout is a
|
||||
/// terminal. `console::style` automatically strips ANSI when output is
|
||||
/// redirected to a file or another command, so the bytes a pipe sees are
|
||||
/// always plain valid TOML.
|
||||
pub fn show(config: &Config, all: bool) -> NyxResult<()> {
|
||||
let toml_str = if all {
|
||||
toml::to_string_pretty(config).map_err(|e| format!("Failed to serialize config: {e}"))?
|
||||
} else {
|
||||
diff_from_defaults_toml(config)?
|
||||
};
|
||||
|
||||
let trimmed = toml_str.trim();
|
||||
let override_count = count_top_level_keys(trimmed);
|
||||
|
||||
if !all {
|
||||
let header = if override_count == 0 {
|
||||
"# No overrides, using built-in defaults. Run `nyx config show --all` for the full effective config.".to_string()
|
||||
} else {
|
||||
format!(
|
||||
"# {} override(s) shown. Run `nyx config show --all` for the full effective config.",
|
||||
override_count
|
||||
)
|
||||
};
|
||||
println!("{}", style(header).dim());
|
||||
}
|
||||
|
||||
if trimmed.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
print_toml_with_highlights(&toml_str);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Render TOML with section headers in cyan/bold and key names dimmed.
|
||||
/// `console::style` strips ANSI automatically when stdout is not a TTY,
|
||||
/// so piped output remains valid TOML.
|
||||
fn print_toml_with_highlights(toml_str: &str) {
|
||||
for line in toml_str.lines() {
|
||||
let trimmed = line.trim_start();
|
||||
if (trimmed.starts_with('[') && trimmed.contains(']')) || trimmed.starts_with("[[") {
|
||||
println!("{}", style(line).cyan().bold());
|
||||
continue;
|
||||
}
|
||||
// key = value lines (but not `[xxx]`). Split on the first `=`
|
||||
// that isn't inside a quoted string — TOML keys don't contain
|
||||
// `=` outside quotes, so a leading-segment split is safe enough
|
||||
// for the common case. Continuation lines from multi-line
|
||||
// arrays/strings won't have `=` and fall through to plain.
|
||||
if let Some(eq_idx) = find_top_level_equals(line) {
|
||||
let (key_part, rest) = line.split_at(eq_idx);
|
||||
println!("{}{}", style(key_part).dim(), rest);
|
||||
continue;
|
||||
}
|
||||
println!("{line}");
|
||||
}
|
||||
}
|
||||
|
||||
/// Locate the index of the first `=` outside any quoted segment in a
|
||||
/// TOML key/value line. Returns `None` for non-assignment lines.
|
||||
fn find_top_level_equals(line: &str) -> Option<usize> {
|
||||
let mut in_string = false;
|
||||
let mut quote_char = '"';
|
||||
for (idx, ch) in line.char_indices() {
|
||||
if in_string {
|
||||
if ch == quote_char {
|
||||
in_string = false;
|
||||
}
|
||||
} else {
|
||||
match ch {
|
||||
'#' => return None,
|
||||
'"' | '\'' => {
|
||||
in_string = true;
|
||||
quote_char = ch;
|
||||
}
|
||||
'=' => return Some(idx),
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Diff the user's effective config against `Config::default()` and
|
||||
/// render the surviving subset as pretty TOML. Returns the empty
|
||||
/// string when nothing differs.
|
||||
fn diff_from_defaults_toml(config: &Config) -> NyxResult<String> {
|
||||
// Normalize both sides through the same merge pipeline. When a
|
||||
// user has a `nyx.local` the runtime already runs effective through
|
||||
// `merge_configs`; when there's no user file it doesn't, so
|
||||
// exclusion arrays stay in their original order and won't compare
|
||||
// equal to the merged-default's sorted form. Re-merging both
|
||||
// sides is idempotent for the already-merged case and brings the
|
||||
// no-user-file case into the same shape, so the diff is stable.
|
||||
let normalized_effective =
|
||||
crate::utils::config::merge_configs(Config::default(), config.clone());
|
||||
let normalized_default =
|
||||
crate::utils::config::merge_configs(Config::default(), Config::default());
|
||||
|
||||
let effective: toml::Value = toml::Value::try_from(&normalized_effective)
|
||||
.map_err(|e| format!("Failed to serialize config: {e}"))?;
|
||||
let defaults: toml::Value = toml::Value::try_from(&normalized_default)
|
||||
.map_err(|e| format!("Failed to serialize default config: {e}"))?;
|
||||
|
||||
let pruned = prune_matching(&effective, &defaults)
|
||||
.unwrap_or(toml::Value::Table(toml::value::Table::new()));
|
||||
|
||||
let table = match pruned {
|
||||
toml::Value::Table(t) => t,
|
||||
_ => toml::value::Table::new(),
|
||||
};
|
||||
|
||||
if table.is_empty() {
|
||||
return Ok(String::new());
|
||||
}
|
||||
|
||||
toml::to_string_pretty(&table)
|
||||
.map_err(|e| format!("Failed to serialize diff config: {e}").into())
|
||||
}
|
||||
|
||||
/// Recursively drop entries from `effective` that match `defaults`.
|
||||
/// Returns `None` when the resulting subtree is empty (so the caller
|
||||
/// can drop the key entirely). Non-table values compare by equality;
|
||||
/// arrays are kept whole when they differ at all (TOML lacks a clean
|
||||
/// per-element diff representation).
|
||||
fn prune_matching(effective: &toml::Value, defaults: &toml::Value) -> Option<toml::Value> {
|
||||
match (effective, defaults) {
|
||||
(toml::Value::Table(eff), toml::Value::Table(def)) => {
|
||||
let mut out = toml::value::Table::new();
|
||||
for (k, v) in eff {
|
||||
match def.get(k) {
|
||||
Some(dv) => {
|
||||
if let Some(diff) = prune_matching(v, dv) {
|
||||
out.insert(k.clone(), diff);
|
||||
}
|
||||
}
|
||||
None => {
|
||||
// Key absent in defaults — keep entirely.
|
||||
out.insert(k.clone(), v.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
if out.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(toml::Value::Table(out))
|
||||
}
|
||||
}
|
||||
// Identical leaf — drop.
|
||||
_ if effective == defaults => None,
|
||||
// Differing leaf or shape change — keep the effective value.
|
||||
_ => Some(effective.clone()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Count individual `key = value` overrides in a TOML string,
|
||||
/// ignoring section headers, comments, blank lines, and continuation
|
||||
/// lines from multi-line arrays/tables. Drives the
|
||||
/// `# N override(s) shown` banner.
|
||||
fn count_top_level_keys(toml_str: &str) -> usize {
|
||||
let mut count = 0;
|
||||
let mut in_multiline = false;
|
||||
for line in toml_str.lines() {
|
||||
let trimmed = line.trim();
|
||||
if trimmed.is_empty() || trimmed.starts_with('#') {
|
||||
continue;
|
||||
}
|
||||
if trimmed.starts_with('[') {
|
||||
// Section header — not an override on its own. Reset
|
||||
// any stuck multi-line state defensively.
|
||||
in_multiline = false;
|
||||
continue;
|
||||
}
|
||||
if in_multiline {
|
||||
// Inside a multi-line array/inline table — closing bracket
|
||||
// ends it, intermediate lines don't count.
|
||||
if trimmed.starts_with(']') || trimmed.starts_with('}') {
|
||||
in_multiline = false;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if find_top_level_equals(line).is_some() {
|
||||
count += 1;
|
||||
// A `key = [` or `key = {` opens a multi-line block whose
|
||||
// continuation lines should not be counted as new keys.
|
||||
let after_eq = line.split_once('=').map(|x| x.1.trim_start()).unwrap_or("");
|
||||
if (after_eq.starts_with('[') && !after_eq.contains(']'))
|
||||
|| (after_eq.starts_with('{') && !after_eq.contains('}'))
|
||||
{
|
||||
in_multiline = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
count
|
||||
}
|
||||
|
||||
/// Print the configuration directory path.
|
||||
pub fn path(config_dir: &Path) -> NyxResult<()> {
|
||||
println!("{}", config_dir.display());
|
||||
|
|
|
|||
|
|
@ -55,32 +55,35 @@ pub fn handle(
|
|||
let status_path = std::path::Path::new(&path).canonicalize()?;
|
||||
let (project_name, db_path) = get_project_info(&status_path, database_dir)?;
|
||||
|
||||
println!("{}", style("Project status").blue().bold().underlined());
|
||||
println!("{}", style("Index status").bold());
|
||||
println!(
|
||||
" {:14} {}",
|
||||
style("Project"),
|
||||
" {:10} {}",
|
||||
style("Project").dim(),
|
||||
style(&project_name).white().bold()
|
||||
);
|
||||
println!(
|
||||
" {:14} {}",
|
||||
style("Index path"),
|
||||
" {:10} {}",
|
||||
style("Path").dim(),
|
||||
style(db_path.display()).underlined()
|
||||
);
|
||||
println!(
|
||||
" {:14} {}",
|
||||
style("Exists"),
|
||||
style(db_path.exists()).bold()
|
||||
);
|
||||
|
||||
if db_path.exists() {
|
||||
let meta = fs::metadata(&db_path)?;
|
||||
let size = ByteSize::b(meta.len());
|
||||
let mtime: DateTime<Local> = meta.modified()?.into();
|
||||
println!(" {:14} {}", style("Size"), size);
|
||||
println!(
|
||||
" {:14} {}",
|
||||
style("Modified"),
|
||||
mtime.format("%Y-%m-%d %H:%M:%S")
|
||||
" {:10} {} {}",
|
||||
style("Indexed").dim(),
|
||||
style("✔").green().bold(),
|
||||
style(mtime.format("%Y-%m-%d %H:%M:%S")).dim()
|
||||
);
|
||||
println!(" {:10} {}", style("Size").dim(), size);
|
||||
} else {
|
||||
println!(
|
||||
" {:10} {} {}",
|
||||
style("Indexed").dim(),
|
||||
style("✖").red().bold(),
|
||||
style("(run `nyx index build` to create)").dim()
|
||||
);
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -5,42 +5,49 @@ use console::style;
|
|||
use std::fs;
|
||||
|
||||
pub fn handle(verbose: bool, database_dir: &std::path::Path) -> NyxResult<()> {
|
||||
println!("{}", style("Indexed projects").blue().bold().underlined());
|
||||
println!("{}", style("Indexed projects").bold());
|
||||
|
||||
if !database_dir.exists() {
|
||||
println!(" {}", style("∅ No indexed projects found").dim());
|
||||
println!(" {}", style("(none)").dim());
|
||||
std::process::exit(0);
|
||||
}
|
||||
|
||||
for entry in fs::read_dir(database_dir)? {
|
||||
let path = entry?.path();
|
||||
if path.extension().and_then(|s| s.to_str()) != Some("sqlite") {
|
||||
continue;
|
||||
}
|
||||
let mut entries: Vec<_> = fs::read_dir(database_dir)?
|
||||
.filter_map(|e| e.ok())
|
||||
.map(|e| e.path())
|
||||
.filter(|p| p.extension().and_then(|s| s.to_str()) == Some("sqlite"))
|
||||
.collect();
|
||||
entries.sort();
|
||||
|
||||
if entries.is_empty() {
|
||||
println!(" {}", style("(none)").dim());
|
||||
std::process::exit(0);
|
||||
}
|
||||
|
||||
for path in &entries {
|
||||
let name = path
|
||||
.file_stem()
|
||||
.and_then(|s| s.to_str())
|
||||
.unwrap_or("unknown");
|
||||
println!(" {}", style(name).white().bold());
|
||||
|
||||
if verbose {
|
||||
let meta = fs::metadata(&path)?;
|
||||
let meta = fs::metadata(path)?;
|
||||
let size = ByteSize::b(meta.len());
|
||||
let mtime: DateTime<Local> = meta.modified()?.into();
|
||||
println!(
|
||||
" {:10} {}",
|
||||
style("Path"),
|
||||
style(path.display()).underlined()
|
||||
);
|
||||
println!(" {:10} {}", style("Size"), size);
|
||||
println!(
|
||||
" {:10} {}",
|
||||
style("Modified"),
|
||||
mtime.format("%Y-%m-%d %H:%M:%S")
|
||||
" {} {} {}",
|
||||
style(name).white().bold(),
|
||||
style(format!("({size})")).dim(),
|
||||
style(format!("· {}", mtime.format("%Y-%m-%d %H:%M:%S"))).dim()
|
||||
);
|
||||
println!(" {}", style(path.display()).dim().underlined());
|
||||
} else {
|
||||
println!(" {}", style(name).white().bold());
|
||||
}
|
||||
}
|
||||
|
||||
println!();
|
||||
println!("{}", style(format!("{} project(s)", entries.len())).dim());
|
||||
|
||||
std::process::exit(0);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -94,11 +94,30 @@ pub fn handle_command(
|
|||
}
|
||||
|
||||
// ── Resolve deprecated aliases ──────────────────────────────
|
||||
// Each alias still works but emits a one-line stderr nudge so
|
||||
// users learn the new flag. Suppressed under --quiet and
|
||||
// structured output formats so machine pipelines stay clean.
|
||||
use crate::cli::OutputFormat;
|
||||
let effective_format = format.unwrap_or(config.output.default_format);
|
||||
let structured = matches!(effective_format, OutputFormat::Json | OutputFormat::Sarif);
|
||||
let suppress_warnings = quiet || config.output.quiet || structured;
|
||||
let warn_dep = |old: &str, new: &str| {
|
||||
if !suppress_warnings {
|
||||
eprintln!(
|
||||
"{}: {} is deprecated; use {} instead.",
|
||||
console::style("warn").yellow().bold(),
|
||||
console::style(old).bold(),
|
||||
console::style(new).bold()
|
||||
);
|
||||
}
|
||||
};
|
||||
|
||||
// Index mode: explicit --index wins, then deprecated flags
|
||||
let effective_index = if no_index {
|
||||
warn_dep("--no-index", "--index off");
|
||||
IndexMode::Off
|
||||
} else if rebuild_index {
|
||||
warn_dep("--rebuild-index", "--index rebuild");
|
||||
IndexMode::Rebuild
|
||||
} else {
|
||||
index
|
||||
|
|
@ -106,10 +125,13 @@ pub fn handle_command(
|
|||
|
||||
// Analysis mode: explicit --mode wins, then deprecated flags
|
||||
let effective_mode = if ast_only {
|
||||
warn_dep("--ast-only", "--mode ast");
|
||||
ScanMode::Ast
|
||||
} else if cfg_only {
|
||||
warn_dep("--cfg-only", "--mode cfg");
|
||||
ScanMode::Cfg
|
||||
} else if all_targets {
|
||||
warn_dep("--all-targets", "--mode full");
|
||||
ScanMode::Full
|
||||
} else {
|
||||
mode
|
||||
|
|
@ -121,6 +143,7 @@ pub fn handle_command(
|
|||
crate::errors::NyxError::Msg(format!("invalid --severity expression: {e}"))
|
||||
})?)
|
||||
} else if high_only {
|
||||
warn_dep("--high-only", "--severity HIGH");
|
||||
Some(SeverityFilter::parse("HIGH").unwrap())
|
||||
} else {
|
||||
None
|
||||
|
|
@ -304,7 +327,7 @@ pub fn handle_command(
|
|||
Commands::Config { action } => {
|
||||
use crate::cli::ConfigAction;
|
||||
match action {
|
||||
ConfigAction::Show => self::config::show(config)?,
|
||||
ConfigAction::Show { all } => self::config::show(config, all)?,
|
||||
ConfigAction::Path => self::config::path(config_dir)?,
|
||||
ConfigAction::AddRule {
|
||||
lang,
|
||||
|
|
@ -352,96 +375,135 @@ pub fn handle_command(
|
|||
/// `nyx scan --explain-engine`. Writes to stdout so it composes with
|
||||
/// standard shell redirection and process substitution.
|
||||
fn print_engine_explanation(config: &Config, engine_profile: Option<EngineProfile>) {
|
||||
fn onoff(b: bool) -> &'static str {
|
||||
if b { "on" } else { "off" }
|
||||
use console::style;
|
||||
|
||||
// Plain-text on/off, padded to 3 chars so the trailing column aligns
|
||||
// regardless of which value is rendered. Colour is layered on top —
|
||||
// the visible width stays 3 characters because `console::style` emits
|
||||
// zero-width ANSI codes (and nothing at all when NO_COLOR is set).
|
||||
fn onoff(b: bool) -> String {
|
||||
if b {
|
||||
style("on ").green().to_string()
|
||||
} else {
|
||||
style("off").red().dim().to_string()
|
||||
}
|
||||
}
|
||||
|
||||
let engine = config.analysis.engine;
|
||||
let scanner = &config.scanner;
|
||||
let profile_label = engine_profile
|
||||
.map(|p| p.to_string())
|
||||
.unwrap_or_else(|| "(none — using config defaults)".to_string());
|
||||
.unwrap_or_else(|| "(none, using config defaults)".to_string());
|
||||
let smt_compiled = cfg!(feature = "smt");
|
||||
let pipeline_on = matches!(
|
||||
config.scanner.mode,
|
||||
AnalysisMode::Full | AnalysisMode::Cfg | AnalysisMode::Taint
|
||||
);
|
||||
|
||||
println!("Effective engine configuration:");
|
||||
println!(" Engine profile: {profile_label}");
|
||||
println!(" AST patterns: on");
|
||||
// Layout: 2sp + label (left-aligned, 24w) + space + value + 3sp + flag info.
|
||||
// Label width 24 fits the longest entry ("Abstract interpretation:") with
|
||||
// a single trailing space before the value column. Numeric rows reuse
|
||||
// the same alignment so the value column is consistent across sections.
|
||||
let row_flag = |label: &str, on: bool, flags: &str| {
|
||||
println!(
|
||||
" {:<24} {} {}",
|
||||
format!("{label}:"),
|
||||
onoff(on),
|
||||
style(flags).dim()
|
||||
);
|
||||
};
|
||||
let row_plain = |label: &str, value: &str| {
|
||||
println!(" {:<24} {}", format!("{label}:"), value);
|
||||
};
|
||||
let row_num = |label: &str, value: String, flags: &str| {
|
||||
println!(
|
||||
" {:<24} {:<10} {}",
|
||||
format!("{label}:"),
|
||||
value,
|
||||
style(flags).dim()
|
||||
);
|
||||
};
|
||||
let section = |title: &str| {
|
||||
println!();
|
||||
println!(" {}", style(title).cyan().bold());
|
||||
};
|
||||
|
||||
println!("{}", style("Effective engine configuration").white().bold());
|
||||
println!(
|
||||
" CFG construction: {}",
|
||||
onoff(matches!(
|
||||
config.scanner.mode,
|
||||
AnalysisMode::Full | AnalysisMode::Cfg | AnalysisMode::Taint
|
||||
))
|
||||
" {:<24} {}",
|
||||
"Engine profile:",
|
||||
style(&profile_label).bold()
|
||||
);
|
||||
println!(
|
||||
" CFG analysis: {}",
|
||||
onoff(matches!(
|
||||
config.scanner.mode,
|
||||
AnalysisMode::Full | AnalysisMode::Cfg | AnalysisMode::Taint
|
||||
))
|
||||
|
||||
section("Pipeline");
|
||||
row_plain("AST patterns", &onoff(true));
|
||||
row_plain("CFG construction", &onoff(pipeline_on));
|
||||
row_plain("CFG analysis", &onoff(pipeline_on));
|
||||
row_plain("Taint (SSA)", &onoff(pipeline_on));
|
||||
row_plain("State analysis", &onoff(scanner.enable_state_analysis));
|
||||
row_plain("Auth analysis", &onoff(scanner.enable_auth_analysis));
|
||||
|
||||
section("Engine toggles");
|
||||
row_flag(
|
||||
"Abstract interpretation",
|
||||
engine.abstract_interpretation,
|
||||
"--abstract-interp / NYX_ABSTRACT_INTERP",
|
||||
);
|
||||
println!(
|
||||
" Taint (SSA): {}",
|
||||
onoff(matches!(
|
||||
config.scanner.mode,
|
||||
AnalysisMode::Full | AnalysisMode::Cfg | AnalysisMode::Taint
|
||||
))
|
||||
row_flag(
|
||||
"Context sensitivity",
|
||||
engine.context_sensitive,
|
||||
"--context-sensitive / NYX_CONTEXT_SENSITIVE (k=1)",
|
||||
);
|
||||
println!(
|
||||
" Abstract interpretation: {} (--abstract-interp / NYX_ABSTRACT_INTERP)",
|
||||
onoff(engine.abstract_interpretation)
|
||||
row_flag(
|
||||
"Constraint solving",
|
||||
engine.constraint_solving,
|
||||
"--constraint-solving / NYX_CONSTRAINT",
|
||||
);
|
||||
println!(
|
||||
" Context sensitivity: {} (--context-sensitive / NYX_CONTEXT_SENSITIVE, k=1)",
|
||||
onoff(engine.context_sensitive)
|
||||
// Backwards-taint label and value column kept exact-width-compatible
|
||||
// with the legacy format so external scripts grepping for
|
||||
// "Backwards taint: on" continue to match. The label slot is
|
||||
// 24 chars + 1 space → column 25, which lines up with that legacy
|
||||
// 9-space gap after "Backwards taint:" (16 chars).
|
||||
row_flag(
|
||||
"Backwards taint",
|
||||
engine.backwards_analysis,
|
||||
"--backwards-analysis / NYX_BACKWARDS",
|
||||
);
|
||||
println!(
|
||||
" Constraint solving: {} (--constraint-solving / NYX_CONSTRAINT)",
|
||||
onoff(engine.constraint_solving)
|
||||
|
||||
section("Symbolic execution");
|
||||
row_flag("Symex", engine.symex.enabled, "--symex / NYX_SYMEX");
|
||||
row_flag(
|
||||
"Cross-file symex",
|
||||
engine.symex.cross_file,
|
||||
"--cross-file-symex / NYX_CROSS_FILE_SYMEX",
|
||||
);
|
||||
println!(
|
||||
" Symbolic execution: {} (--symex / NYX_SYMEX)",
|
||||
onoff(engine.symex.enabled)
|
||||
row_flag(
|
||||
"Interproc symex",
|
||||
engine.symex.interprocedural,
|
||||
"--symex-interproc / NYX_SYMEX_INTERPROC",
|
||||
);
|
||||
println!(
|
||||
" Cross-file symex: {} (--cross-file-symex / NYX_CROSS_FILE_SYMEX)",
|
||||
onoff(engine.symex.cross_file)
|
||||
let smt_note = if smt_compiled {
|
||||
"--smt"
|
||||
} else {
|
||||
"--smt (this binary built without `smt` feature)"
|
||||
};
|
||||
row_flag("SMT (Z3)", engine.symex.smt && smt_compiled, smt_note);
|
||||
|
||||
section("Limits");
|
||||
row_num(
|
||||
"Parse timeout",
|
||||
format!("{} ms", engine.parse_timeout_ms),
|
||||
"--parse-timeout-ms / NYX_PARSE_TIMEOUT_MS (0 disables)",
|
||||
);
|
||||
println!(
|
||||
" Interproc symex: {} (--symex-interproc / NYX_SYMEX_INTERPROC)",
|
||||
onoff(engine.symex.interprocedural)
|
||||
row_num(
|
||||
"Max taint origins",
|
||||
engine.max_origins.to_string(),
|
||||
"--max-origins / NYX_MAX_ORIGINS (per-lattice-value cap)",
|
||||
);
|
||||
println!(
|
||||
" Backwards taint: {} (--backwards-analysis / NYX_BACKWARDS)",
|
||||
onoff(engine.backwards_analysis)
|
||||
);
|
||||
println!(
|
||||
" SMT (Z3): {} (--smt{}; requires --features smt)",
|
||||
onoff(engine.symex.smt && smt_compiled),
|
||||
if smt_compiled {
|
||||
""
|
||||
} else {
|
||||
", binary built WITHOUT smt feature"
|
||||
}
|
||||
);
|
||||
println!(
|
||||
" State analysis: {} (scanner.enable_state_analysis)",
|
||||
onoff(scanner.enable_state_analysis)
|
||||
);
|
||||
println!(
|
||||
" Auth analysis: {} (scanner.enable_auth_analysis)",
|
||||
onoff(scanner.enable_auth_analysis)
|
||||
);
|
||||
println!(
|
||||
" Parse timeout: {} ms (--parse-timeout-ms / NYX_PARSE_TIMEOUT_MS; 0 disables)",
|
||||
engine.parse_timeout_ms
|
||||
);
|
||||
println!(
|
||||
" Max taint origins: {} (--max-origins / NYX_MAX_ORIGINS; per-lattice-value cap)",
|
||||
engine.max_origins
|
||||
);
|
||||
println!(
|
||||
" Max points-to set: {} (--max-pointsto / NYX_MAX_POINTSTO; per-variable heap-object cap)",
|
||||
engine.max_pointsto
|
||||
row_num(
|
||||
"Max points-to set",
|
||||
engine.max_pointsto.to_string(),
|
||||
"--max-pointsto / NYX_MAX_POINTSTO (per-variable heap cap)",
|
||||
);
|
||||
println!();
|
||||
}
|
||||
|
|
|
|||
|
|
@ -860,6 +860,35 @@ fn effective_scc_cap() -> usize {
|
|||
if o == 0 { SCC_FIXPOINT_SAFETY_CAP } else { o }
|
||||
}
|
||||
|
||||
/// Observability hook: records the cumulative number of cross-batch
|
||||
/// summary refinements (FuncSummary, SsaFuncSummary, body, auth)
|
||||
/// persisted by non-recursive topo batches in the most recent
|
||||
/// [`run_topo_batches`] invocation. Intended for the regression tests
|
||||
/// that prove the topo-refinement pipeline is wired and producing
|
||||
/// observable cross-batch state — see
|
||||
/// `tests/topo_pass2_refinement_tests.rs`. Cheap relaxed load.
|
||||
static LAST_TOPO_NONRECURSIVE_REFINEMENTS: AtomicUsize = AtomicUsize::new(0);
|
||||
|
||||
/// Returns the cumulative count of non-recursive batch refinements
|
||||
/// (summary + ssa-summary + body + auth inserts) persisted to
|
||||
/// `global_summaries` during the most recent [`run_topo_batches`] call.
|
||||
/// Reset to zero at the start of each invocation.
|
||||
pub fn last_topo_nonrecursive_refinements() -> usize {
|
||||
LAST_TOPO_NONRECURSIVE_REFINEMENTS.load(Ordering::Relaxed)
|
||||
}
|
||||
|
||||
/// Returns `true` when topo-pass-2 cross-batch summary refinement is
|
||||
/// enabled. Default: enabled. Set `NYX_TOPO_REFINE=0` (or `false`)
|
||||
/// to fall back to the legacy non-recursive branch that runs
|
||||
/// [`run_rules_on_file`] without persisting refined SSA / body / auth
|
||||
/// artifacts to `global_summaries`.
|
||||
fn topo_refine_enabled() -> bool {
|
||||
match std::env::var("NYX_TOPO_REFINE") {
|
||||
Ok(v) => !matches!(v.as_str(), "0" | "false" | "FALSE" | "False"),
|
||||
Err(_) => true,
|
||||
}
|
||||
}
|
||||
|
||||
/// Run pass 2 analysis on a sequence of topo-ordered file batches.
|
||||
///
|
||||
/// For batches with mutual recursion, iterates until summaries converge
|
||||
|
|
@ -897,6 +926,9 @@ fn run_topo_batches(
|
|||
// Reset the observability counter for this invocation so tests and
|
||||
// diagnostics always see fresh data.
|
||||
LAST_SCC_MAX_ITERATIONS.store(0, Ordering::Relaxed);
|
||||
LAST_TOPO_NONRECURSIVE_REFINEMENTS.store(0, Ordering::Relaxed);
|
||||
|
||||
let refine_nonrecursive = topo_refine_enabled();
|
||||
|
||||
for (batch_idx, batch) in batches.iter().enumerate() {
|
||||
if batch.has_mutual_recursion {
|
||||
|
|
@ -1227,8 +1259,155 @@ fn run_topo_batches(
|
|||
p.inc_batches_completed(1);
|
||||
}
|
||||
result.extend(iteration_diags);
|
||||
} else if refine_nonrecursive {
|
||||
// Non-recursive batch with cross-batch refinement.
|
||||
//
|
||||
// Run `analyse_file_fused` so the batch produces refined
|
||||
// FuncSummary / SsaFuncSummary / CalleeSsaBody / AuthCheckSummary
|
||||
// artifacts on top of pass-1's output. After the batch's
|
||||
// parallel section completes, persist those refinements into
|
||||
// `global_summaries` sequentially. Subsequent batches in
|
||||
// topo order (caller-most batches) then resolve their call
|
||||
// sites against the refined cross-file context — the final
|
||||
// step in the callee-first topo pipeline that pass-2
|
||||
// sequencing was always meant to deliver.
|
||||
//
|
||||
// Opt out via `NYX_TOPO_REFINE=0` if a precision regression
|
||||
// surfaces; the legacy `run_rules_on_file` branch stays
|
||||
// available for triage.
|
||||
#[allow(clippy::type_complexity)]
|
||||
let batch_results: Vec<(
|
||||
std::path::PathBuf,
|
||||
Vec<Diag>,
|
||||
Vec<crate::summary::FuncSummary>,
|
||||
Vec<(
|
||||
crate::symbol::FuncKey,
|
||||
crate::summary::ssa_summary::SsaFuncSummary,
|
||||
)>,
|
||||
Vec<(
|
||||
crate::symbol::FuncKey,
|
||||
crate::taint::ssa_transfer::CalleeSsaBody,
|
||||
)>,
|
||||
Vec<(
|
||||
crate::symbol::FuncKey,
|
||||
crate::auth_analysis::model::AuthCheckSummary,
|
||||
)>,
|
||||
)> = batch
|
||||
.files
|
||||
.par_iter()
|
||||
.map(|path| {
|
||||
if let Some(p) = progress {
|
||||
p.set_current_file(&path.to_string_lossy());
|
||||
}
|
||||
let bytes = match std::fs::read(path) {
|
||||
Ok(b) => b,
|
||||
Err(e) => {
|
||||
tracing::warn!(
|
||||
"pass 2 (non-recursive): cannot read {}: {e}",
|
||||
path.display()
|
||||
);
|
||||
if let Some(l) = logs {
|
||||
l.warn(
|
||||
format!("Cannot read file for pass 2: {e}"),
|
||||
Some(path.display().to_string()),
|
||||
None,
|
||||
);
|
||||
}
|
||||
pb.inc(1);
|
||||
if let Some(p) = progress {
|
||||
p.inc_analyzed(1);
|
||||
}
|
||||
return (path.to_path_buf(), vec![], vec![], vec![], vec![], vec![]);
|
||||
}
|
||||
};
|
||||
match recover_or_propagate(
|
||||
cfg.scanner.enable_panic_recovery,
|
||||
path,
|
||||
logs,
|
||||
|| analyse_file_fused(&bytes, path, cfg, Some(global_summaries), scan_root),
|
||||
) {
|
||||
Ok(r) => {
|
||||
pb.inc(1);
|
||||
if let Some(p) = progress {
|
||||
p.inc_analyzed(1);
|
||||
}
|
||||
(
|
||||
path.to_path_buf(),
|
||||
r.diags,
|
||||
r.summaries,
|
||||
r.ssa_summaries,
|
||||
r.ssa_bodies,
|
||||
r.auth_summaries,
|
||||
)
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!("pass 2 (non-recursive): {}: {e}", path.display());
|
||||
if let Some(l) = logs {
|
||||
l.warn(
|
||||
format!("Pass 2 analysis failed: {e}"),
|
||||
Some(path.display().to_string()),
|
||||
None,
|
||||
);
|
||||
}
|
||||
pb.inc(1);
|
||||
if let Some(p) = progress {
|
||||
p.inc_analyzed(1);
|
||||
}
|
||||
(path.to_path_buf(), vec![], vec![], vec![], vec![], vec![])
|
||||
}
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Sequential persistence: union refined artifacts back into
|
||||
// `global_summaries` so caller-most batches see them.
|
||||
let mut batch_diags: Vec<Diag> = Vec::new();
|
||||
let mut refined_summaries: usize = 0;
|
||||
let mut refined_ssa: usize = 0;
|
||||
let mut refined_bodies: usize = 0;
|
||||
let mut refined_auth: usize = 0;
|
||||
for (_path, diags, summaries, ssa_summaries, ssa_bodies, auth_summaries) in
|
||||
batch_results
|
||||
{
|
||||
batch_diags.extend(diags);
|
||||
for s in summaries {
|
||||
let key = s.func_key(root_str_ref);
|
||||
global_summaries.insert(key, s);
|
||||
refined_summaries += 1;
|
||||
}
|
||||
for (key, ssa_sum) in ssa_summaries {
|
||||
global_summaries.insert_ssa(key, ssa_sum);
|
||||
refined_ssa += 1;
|
||||
}
|
||||
for (key, body) in ssa_bodies {
|
||||
global_summaries.insert_body(key, body);
|
||||
refined_bodies += 1;
|
||||
}
|
||||
for (key, auth_sum) in auth_summaries {
|
||||
global_summaries.insert_auth(key, auth_sum);
|
||||
refined_auth += 1;
|
||||
}
|
||||
}
|
||||
let total_refinements = refined_summaries + refined_ssa + refined_bodies + refined_auth;
|
||||
LAST_TOPO_NONRECURSIVE_REFINEMENTS.fetch_add(total_refinements, Ordering::Relaxed);
|
||||
|
||||
tracing::debug!(
|
||||
batch = batch_idx,
|
||||
files = batch.files.len(),
|
||||
recursive = false,
|
||||
refined_summaries,
|
||||
refined_ssa,
|
||||
refined_bodies,
|
||||
refined_auth,
|
||||
"non-recursive batch complete (refinements persisted)"
|
||||
);
|
||||
if let Some(p) = progress {
|
||||
p.inc_batches_completed(1);
|
||||
}
|
||||
result.extend(batch_diags);
|
||||
} else {
|
||||
// Non-recursive batch: single pass.
|
||||
// Legacy non-recursive batch (NYX_TOPO_REFINE=0): single
|
||||
// pass that discards refined SSA / body / auth artifacts.
|
||||
let batch_diags: Vec<Diag> = batch
|
||||
.files
|
||||
.par_iter()
|
||||
|
|
@ -1267,7 +1446,7 @@ fn run_topo_batches(
|
|||
batch = batch_idx,
|
||||
files = batch.files.len(),
|
||||
recursive = false,
|
||||
"non-recursive batch complete"
|
||||
"non-recursive batch complete (legacy, refinement disabled)"
|
||||
);
|
||||
if let Some(p) = progress {
|
||||
p.inc_batches_completed(1);
|
||||
|
|
@ -1507,7 +1686,7 @@ pub(crate) fn scan_filesystem_with_observer(
|
|||
);
|
||||
}
|
||||
let pass1_start = std::time::Instant::now();
|
||||
let global_summaries: GlobalSummaries = {
|
||||
let mut global_summaries: GlobalSummaries = {
|
||||
let _span = tracing::info_span!("pass1_fused", files = all_paths.len()).entered();
|
||||
let pb = make_progress_bar(
|
||||
all_paths.len() as u64,
|
||||
|
|
@ -1621,6 +1800,13 @@ pub(crate) fn scan_filesystem_with_observer(
|
|||
l.info("Building call graph", None);
|
||||
}
|
||||
let cg_start = std::time::Instant::now();
|
||||
// Install the type-hierarchy index on `global_summaries` BEFORE
|
||||
// building the call graph so the runtime taint engine consults
|
||||
// exactly the same view of virtual dispatch that the call-graph
|
||||
// builder uses to fan out edges. See
|
||||
// `GlobalSummaries::install_hierarchy` and
|
||||
// `GlobalSummaries::resolve_callee_widened`.
|
||||
global_summaries.install_hierarchy();
|
||||
let (call_graph, cg_analysis) = build_and_analyse_call_graph(&global_summaries);
|
||||
log_unresolved_callees(&call_graph);
|
||||
if let Some(p) = progress {
|
||||
|
|
@ -1939,72 +2125,36 @@ pub fn scan_with_index_parallel_observer(
|
|||
p.record_language(lang);
|
||||
}
|
||||
}
|
||||
if let Err(e) =
|
||||
idx.replace_summaries_for_file(path, &hash, &func_sums)
|
||||
{
|
||||
record_persist_error(
|
||||
&persist_errors_ref,
|
||||
format!("function summaries {}: {e}", path.display()),
|
||||
);
|
||||
}
|
||||
// Persist SSA summaries with full FuncKey metadata
|
||||
if !ssa_sums.is_empty() {
|
||||
let ssa_rows: Vec<_> = ssa_sums
|
||||
.into_iter()
|
||||
.map(|(key, sum)| {
|
||||
(
|
||||
key.name,
|
||||
key.arity.unwrap_or(0),
|
||||
key.lang.as_str().to_string(),
|
||||
key.namespace,
|
||||
key.container,
|
||||
key.disambig,
|
||||
key.kind,
|
||||
sum,
|
||||
)
|
||||
})
|
||||
.collect();
|
||||
if let Err(e) =
|
||||
idx.replace_ssa_summaries_for_file(path, &hash, &ssa_rows)
|
||||
{
|
||||
record_persist_error(
|
||||
&persist_errors_ref,
|
||||
format!("SSA summaries {}: {e}", path.display()),
|
||||
);
|
||||
}
|
||||
}
|
||||
// Persist SSA callee bodies
|
||||
if !ssa_bodies.is_empty() {
|
||||
let body_rows: Vec<_> = ssa_bodies
|
||||
.into_iter()
|
||||
.map(|(key, body)| {
|
||||
(
|
||||
key.name,
|
||||
key.arity.unwrap_or(0),
|
||||
key.lang.as_str().to_string(),
|
||||
key.namespace,
|
||||
key.container,
|
||||
key.disambig,
|
||||
key.kind,
|
||||
body,
|
||||
)
|
||||
})
|
||||
.collect();
|
||||
if let Err(e) =
|
||||
idx.replace_ssa_bodies_for_file(path, &hash, &body_rows)
|
||||
{
|
||||
record_persist_error(
|
||||
&persist_errors_ref,
|
||||
format!("SSA bodies {}: {e}", path.display()),
|
||||
);
|
||||
}
|
||||
}
|
||||
// Persist per-function auth-check summaries.
|
||||
// Empty-lifts are skipped; an empty input
|
||||
// list still clears any stale entry for
|
||||
// this file so a helper that lost its
|
||||
// ownership check no longer leaks lifts
|
||||
// into subsequent pass-2 runs.
|
||||
let ssa_rows: Vec<_> = ssa_sums
|
||||
.into_iter()
|
||||
.map(|(key, sum)| {
|
||||
(
|
||||
key.name,
|
||||
key.arity.unwrap_or(0),
|
||||
key.lang.as_str().to_string(),
|
||||
key.namespace,
|
||||
key.container,
|
||||
key.disambig,
|
||||
key.kind,
|
||||
sum,
|
||||
)
|
||||
})
|
||||
.collect();
|
||||
let body_rows: Vec<_> = ssa_bodies
|
||||
.into_iter()
|
||||
.map(|(key, body)| {
|
||||
(
|
||||
key.name,
|
||||
key.arity.unwrap_or(0),
|
||||
key.lang.as_str().to_string(),
|
||||
key.namespace,
|
||||
key.container,
|
||||
key.disambig,
|
||||
key.kind,
|
||||
body,
|
||||
)
|
||||
})
|
||||
.collect();
|
||||
let auth_rows: Vec<_> = auth_sums
|
||||
.into_iter()
|
||||
.map(|(key, sum)| {
|
||||
|
|
@ -2020,12 +2170,14 @@ pub fn scan_with_index_parallel_observer(
|
|||
)
|
||||
})
|
||||
.collect();
|
||||
if let Err(e) =
|
||||
idx.replace_auth_summaries_for_file(path, &hash, &auth_rows)
|
||||
{
|
||||
// Single transaction for all four caches:
|
||||
// one fsync per file instead of four.
|
||||
if let Err(e) = idx.replace_all_for_file(
|
||||
path, &hash, &func_sums, &ssa_rows, &body_rows, &auth_rows,
|
||||
) {
|
||||
record_persist_error(
|
||||
&persist_errors_ref,
|
||||
format!("auth summaries {}: {e}", path.display()),
|
||||
format!("summaries {}: {e}", path.display()),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
@ -2383,6 +2535,13 @@ pub fn scan_with_index_parallel_observer(
|
|||
p.set_stage(ScanStage::BuildingCallGraph);
|
||||
}
|
||||
let cg_start = std::time::Instant::now();
|
||||
// Install the type-hierarchy index on `global_summaries` BEFORE
|
||||
// building the call graph so the runtime taint engine consults
|
||||
// exactly the same view of virtual dispatch that the call-graph
|
||||
// builder uses to fan out edges. See
|
||||
// `GlobalSummaries::install_hierarchy` and
|
||||
// `GlobalSummaries::resolve_callee_widened`.
|
||||
global_summaries.install_hierarchy();
|
||||
let (call_graph, cg_analysis) = build_and_analyse_call_graph(&global_summaries);
|
||||
log_unresolved_callees(&call_graph);
|
||||
if let Some(p) = progress {
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
use crate::database::index::Indexer;
|
||||
use crate::errors::NyxResult;
|
||||
use crate::server::app::{AppState, build_router};
|
||||
use crate::server::app::{AppState, ServerEvent, build_router};
|
||||
use crate::server::jobs::JobManager;
|
||||
use crate::server::security::LocalServerSecurity;
|
||||
use crate::utils::config::Config;
|
||||
|
|
@ -81,10 +81,29 @@ pub fn handle(
|
|||
security,
|
||||
config: Arc::new(RwLock::new(config.clone())),
|
||||
job_manager: Arc::new(JobManager::new(max_jobs, rayon_stack_size)),
|
||||
event_tx,
|
||||
event_tx: event_tx.clone(),
|
||||
db_pool,
|
||||
findings_cache: Arc::new(RwLock::new(None)),
|
||||
};
|
||||
|
||||
// Invalidate the findings cache whenever a scan finishes so the next
|
||||
// request rebuilds against fresh diags. The next-request rebuild keeps
|
||||
// this hot-path simple — we only clear the slot here, never recompute.
|
||||
let cache_for_invalidate = Arc::clone(&state.findings_cache);
|
||||
let mut event_rx = event_tx.subscribe();
|
||||
tokio::spawn(async move {
|
||||
loop {
|
||||
match event_rx.recv().await {
|
||||
Ok(ServerEvent::ScanCompleted { .. } | ServerEvent::ScanFailed { .. }) => {
|
||||
*cache_for_invalidate.write() = None;
|
||||
}
|
||||
Ok(_) => {}
|
||||
Err(tokio::sync::broadcast::error::RecvError::Lagged(_)) => continue,
|
||||
Err(tokio::sync::broadcast::error::RecvError::Closed) => break,
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
let router = build_router(state);
|
||||
|
||||
if open_browser {
|
||||
|
|
|
|||
|
|
@ -184,6 +184,12 @@ fn type_kind_index(kind: &TypeKind) -> u32 {
|
|||
TypeKind::Url => 10,
|
||||
TypeKind::HttpClient => 11,
|
||||
TypeKind::LocalCollection => 12,
|
||||
// Phase 6 DTO types carry per-field structural info that the
|
||||
// bitset domain can't represent. Collapse to Unknown so callers
|
||||
// still see "any type possible" rather than crashing on an
|
||||
// unhandled variant. Same-file/cross-file Dto-aware paths read
|
||||
// the structured TypeKind directly, not via this index.
|
||||
TypeKind::Dto(_) => 6,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -608,6 +608,8 @@ mod tests {
|
|||
value_defs,
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
507
src/database.rs
507
src/database.rs
|
|
@ -98,7 +98,7 @@ pub mod index {
|
|||
container TEXT NOT NULL DEFAULT '',
|
||||
disambig INTEGER,
|
||||
kind TEXT NOT NULL DEFAULT 'fn',
|
||||
body TEXT NOT NULL,
|
||||
body BLOB NOT NULL,
|
||||
updated_at INTEGER NOT NULL,
|
||||
UNIQUE(project, file_path, name, container, arity, disambig, kind)
|
||||
);
|
||||
|
|
@ -173,6 +173,26 @@ pub mod index {
|
|||
created_at TEXT NOT NULL,
|
||||
UNIQUE(suppress_by, match_value)
|
||||
);
|
||||
|
||||
-- First time we observed each finding fingerprint. Lazy-populated by the
|
||||
-- overview endpoint when computing backlog age — INSERT OR IGNORE means
|
||||
-- only the earliest scan that mentioned a fingerprint sticks.
|
||||
CREATE TABLE IF NOT EXISTS finding_first_seen (
|
||||
fingerprint TEXT PRIMARY KEY,
|
||||
first_seen_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
-- Indexes on (project, file_path) for the per-file replace_* paths.
|
||||
-- Without these, every DELETE WHERE project=? AND file_path=? does a
|
||||
-- full table scan, which dominates indexing time as the cache grows.
|
||||
CREATE INDEX IF NOT EXISTS idx_function_summaries_project_file
|
||||
ON function_summaries(project, file_path);
|
||||
CREATE INDEX IF NOT EXISTS idx_ssa_function_summaries_project_file
|
||||
ON ssa_function_summaries(project, file_path);
|
||||
CREATE INDEX IF NOT EXISTS idx_ssa_function_bodies_project_file
|
||||
ON ssa_function_bodies(project, file_path);
|
||||
CREATE INDEX IF NOT EXISTS idx_auth_check_summaries_project_file
|
||||
ON auth_check_summaries(project, file_path);
|
||||
"#;
|
||||
|
||||
/// Engine version used to detect stale caches across upgrades.
|
||||
|
|
@ -191,7 +211,10 @@ pub mod index {
|
|||
/// byte offset to a depth-first structural index. Pre-0.5.0 caches
|
||||
/// store byte-offset disambigs and would fail to match bodies built
|
||||
/// by the new engine, so they are silently rebuilt on open.
|
||||
pub const SCHEMA_VERSION: &str = "2";
|
||||
/// * `"3"` — `ssa_function_bodies.body` changed from JSON TEXT to
|
||||
/// bincode BLOB. Old JSON payloads cannot be deserialised by the
|
||||
/// new engine, so they are silently rebuilt on open.
|
||||
pub const SCHEMA_VERSION: &str = "3";
|
||||
|
||||
// TODO: ADD CLEANS FOR EACH TABLE BASED ON PROJECT WHICH RUNS ON CLEAN
|
||||
// TODO: ADD DROP AND GIVE A CLI PARAMETER FOR DROP
|
||||
|
|
@ -263,7 +286,12 @@ pub mod index {
|
|||
| OpenFlags::SQLITE_OPEN_CREATE
|
||||
| OpenFlags::SQLITE_OPEN_NO_MUTEX;
|
||||
let manager = SqliteConnectionManager::file(database_path).with_flags(flags);
|
||||
let pool = Arc::new(Pool::new(manager)?);
|
||||
// r2d2's default `max_size` is 10, which can stall rayon
|
||||
// workers on machines with more cores than that during the
|
||||
// parallel indexing pass. Size the pool to comfortably hold
|
||||
// a connection per rayon thread plus a small slack.
|
||||
let max_conns = (num_cpus::get() as u32 + 4).max(16);
|
||||
let pool = Arc::new(Pool::builder().max_size(max_conns).build(manager)?);
|
||||
|
||||
{
|
||||
let conn = pool.get()?;
|
||||
|
|
@ -411,13 +439,17 @@ pub mod index {
|
|||
tracing::info!(
|
||||
"db schema version changed ({old} → {current}), clearing summary caches"
|
||||
);
|
||||
// Drop ssa_function_bodies entirely: column type changed
|
||||
// to BLOB in v3 and `CREATE TABLE IF NOT EXISTS` will
|
||||
// not migrate the column on an existing table.
|
||||
conn.execute_batch(
|
||||
"DELETE FROM function_summaries;
|
||||
"DROP TABLE IF EXISTS ssa_function_bodies;
|
||||
DELETE FROM function_summaries;
|
||||
DELETE FROM ssa_function_summaries;
|
||||
DELETE FROM ssa_function_bodies;
|
||||
DELETE FROM auth_check_summaries;
|
||||
DELETE FROM files;",
|
||||
)?;
|
||||
conn.execute_batch(SCHEMA)?;
|
||||
conn.execute(
|
||||
"INSERT OR REPLACE INTO nyx_metadata (key, value) VALUES ('schema_version', ?1)",
|
||||
params![current],
|
||||
|
|
@ -1005,26 +1037,32 @@ pub mod index {
|
|||
}
|
||||
}
|
||||
|
||||
/// Load symbol metadata (name, arity, lang, namespace) for a single file.
|
||||
/// Load symbol metadata (name, arity, lang, namespace, container, kind)
|
||||
/// for a single file.
|
||||
///
|
||||
/// Lighter than `load_all_ssa_summaries` — skips JSON deserialization of
|
||||
/// the full summary body and filters by file_path in the query.
|
||||
/// the full summary body and filters by file_path in the query. `kind`
|
||||
/// is the [`crate::symbol::FuncKind`] slug (`"fn"`, `"method"`,
|
||||
/// `"closure"`, ...) so consumers can distinguish anonymous functions
|
||||
/// from named ones.
|
||||
pub fn load_ssa_summaries_for_file(
|
||||
&self,
|
||||
file_path: &str,
|
||||
) -> NyxResult<Vec<(String, i64, String, String)>> {
|
||||
) -> NyxResult<Vec<(String, i64, String, String, String, String)>> {
|
||||
let mut stmt = self.c().prepare(
|
||||
"SELECT name, arity, lang, namespace
|
||||
"SELECT name, arity, lang, namespace, container, kind
|
||||
FROM ssa_function_summaries
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
)?;
|
||||
let rows: Vec<(String, i64, String, String)> = stmt
|
||||
let rows: Vec<(String, i64, String, String, String, String)> = stmt
|
||||
.query_map(rusqlite::params![self.project, file_path], |row| {
|
||||
Ok((
|
||||
row.get::<_, String>(0)?,
|
||||
row.get::<_, i64>(1)?,
|
||||
row.get::<_, String>(2)?,
|
||||
row.get::<_, String>(3)?,
|
||||
row.get::<_, String>(4)?,
|
||||
row.get::<_, String>(5)?,
|
||||
))
|
||||
})?
|
||||
.filter_map(Result::ok)
|
||||
|
|
@ -1035,7 +1073,11 @@ pub mod index {
|
|||
/// Atomically replace all SSA callee bodies for a single file.
|
||||
///
|
||||
/// Persists cross-file callee bodies for interprocedural symex.
|
||||
/// Bodies are serialized as JSON TEXT, matching the ssa_function_summaries pattern.
|
||||
/// Bodies are serialized as MessagePack (rmp-serde, named-field
|
||||
/// encoding) BLOBs — JSON proved too costly at indexing time on
|
||||
/// large SSA structures, and bincode's positional format trips
|
||||
/// over the `#[serde(skip_serializing_if = ...)]` attributes
|
||||
/// scattered through `OptimizeResult` and friends.
|
||||
/// Input tuple: `(name, arity, lang, namespace, container, disambig, kind, body)`.
|
||||
pub fn replace_ssa_bodies_for_file(
|
||||
&mut self,
|
||||
|
|
@ -1070,7 +1112,7 @@ pub mod index {
|
|||
)?;
|
||||
|
||||
for (name, arity, lang, namespace, container, disambig, kind, body) in bodies {
|
||||
let json = serde_json::to_string(body)
|
||||
let blob = rmp_serde::to_vec_named(body)
|
||||
.map_err(|e| NyxError::Msg(format!("SSA body serialise: {e}")))?;
|
||||
let disambig_sql = disambig.map(|d| d as i64);
|
||||
stmt.execute(params![
|
||||
|
|
@ -1084,7 +1126,7 @@ pub mod index {
|
|||
container,
|
||||
disambig_sql,
|
||||
kind.as_str(),
|
||||
json,
|
||||
blob,
|
||||
now
|
||||
])?;
|
||||
}
|
||||
|
|
@ -1128,7 +1170,7 @@ pub mod index {
|
|||
String,
|
||||
Option<i64>,
|
||||
String,
|
||||
String,
|
||||
Vec<u8>,
|
||||
)> = stmt
|
||||
.query_map([&self.project], |row| {
|
||||
Ok((
|
||||
|
|
@ -1140,7 +1182,7 @@ pub mod index {
|
|||
row.get::<_, String>(5)?,
|
||||
row.get::<_, Option<i64>>(6)?,
|
||||
row.get::<_, String>(7)?,
|
||||
row.get::<_, String>(8)?,
|
||||
row.get::<_, Vec<u8>>(8)?,
|
||||
))
|
||||
})?
|
||||
.filter_map(|r| match r {
|
||||
|
|
@ -1157,10 +1199,10 @@ pub mod index {
|
|||
let results: Vec<_> = rows
|
||||
.par_iter()
|
||||
.filter_map(
|
||||
|(fp, name, lang, arity, ns, container, disambig, kind, json)| {
|
||||
serde_json::from_str::<crate::taint::ssa_transfer::CalleeSsaBody>(json)
|
||||
|(fp, name, lang, arity, ns, container, disambig, kind, blob)| {
|
||||
rmp_serde::from_slice::<crate::taint::ssa_transfer::CalleeSsaBody>(blob)
|
||||
.map_err(|e| {
|
||||
tracing::warn!("failed to deserialize SSA body JSON: {e}");
|
||||
tracing::warn!("failed to deserialize SSA body: {e}");
|
||||
e
|
||||
})
|
||||
.ok()
|
||||
|
|
@ -1188,8 +1230,8 @@ pub mod index {
|
|||
Ok(results)
|
||||
} else {
|
||||
let mut out = Vec::with_capacity(rows.len());
|
||||
for (fp, name, lang, arity, ns, container, disambig, kind, json) in &rows {
|
||||
match serde_json::from_str::<crate::taint::ssa_transfer::CalleeSsaBody>(json) {
|
||||
for (fp, name, lang, arity, ns, container, disambig, kind, blob) in &rows {
|
||||
match rmp_serde::from_slice::<crate::taint::ssa_transfer::CalleeSsaBody>(blob) {
|
||||
Ok(mut b) => {
|
||||
// See note in parallel branch above.
|
||||
crate::taint::ssa_transfer::rebuild_body_graph(&mut b);
|
||||
|
|
@ -1206,7 +1248,7 @@ pub mod index {
|
|||
));
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!("failed to deserialize SSA body JSON: {e}");
|
||||
tracing::warn!("failed to deserialize SSA body: {e}");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1278,6 +1320,205 @@ pub mod index {
|
|||
Ok(())
|
||||
}
|
||||
|
||||
/// Atomically replace all four per-file caches in a single
|
||||
/// transaction. Equivalent in effect to calling
|
||||
/// [`Self::replace_summaries_for_file`],
|
||||
/// [`Self::replace_ssa_summaries_for_file`],
|
||||
/// [`Self::replace_ssa_bodies_for_file`] and
|
||||
/// [`Self::replace_auth_summaries_for_file`] in sequence, but
|
||||
/// issues a single fsync at commit instead of four — the
|
||||
/// dominant cost on large scans.
|
||||
///
|
||||
/// Behaviour parity with the four-call sequence:
|
||||
/// * function and auth summaries: DELETE-then-INSERT regardless
|
||||
/// of input length, so emptying a file's summaries clears
|
||||
/// stale rows.
|
||||
/// * SSA summaries and bodies: only touched when the input is
|
||||
/// non-empty, matching the existing scan path.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn replace_all_for_file(
|
||||
&mut self,
|
||||
file_path: &Path,
|
||||
file_hash: &[u8],
|
||||
func_summaries: &[crate::summary::FuncSummary],
|
||||
ssa_summaries: &[(
|
||||
String,
|
||||
usize,
|
||||
String,
|
||||
String,
|
||||
String,
|
||||
Option<u32>,
|
||||
crate::symbol::FuncKind,
|
||||
crate::summary::ssa_summary::SsaFuncSummary,
|
||||
)],
|
||||
ssa_bodies: &[(
|
||||
String,
|
||||
usize,
|
||||
String,
|
||||
String,
|
||||
String,
|
||||
Option<u32>,
|
||||
crate::symbol::FuncKind,
|
||||
crate::taint::ssa_transfer::CalleeSsaBody,
|
||||
)],
|
||||
auth_summaries: &[(
|
||||
String,
|
||||
usize,
|
||||
String,
|
||||
String,
|
||||
String,
|
||||
Option<u32>,
|
||||
crate::symbol::FuncKind,
|
||||
crate::auth_analysis::model::AuthCheckSummary,
|
||||
)],
|
||||
) -> NyxResult<()> {
|
||||
let tx = self.conn.transaction()?;
|
||||
let path_str = file_path.to_string_lossy();
|
||||
let now = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs() as i64;
|
||||
|
||||
// function_summaries — always replace.
|
||||
tx.execute(
|
||||
"DELETE FROM function_summaries WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
{
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR REPLACE INTO function_summaries
|
||||
(project, file_path, file_hash, name, arity, lang,
|
||||
container, disambig, kind, summary, updated_at)
|
||||
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11)",
|
||||
)?;
|
||||
for s in func_summaries {
|
||||
let json = serde_json::to_string(s)
|
||||
.map_err(|e| NyxError::Msg(format!("summary serialise: {e}")))?;
|
||||
let disambig_sql = s.disambig.map(|d| d as i64);
|
||||
stmt.execute(params![
|
||||
self.project,
|
||||
path_str,
|
||||
file_hash,
|
||||
s.name,
|
||||
s.param_count as i64,
|
||||
s.lang,
|
||||
s.container,
|
||||
disambig_sql,
|
||||
s.kind.as_str(),
|
||||
json,
|
||||
now
|
||||
])?;
|
||||
}
|
||||
}
|
||||
|
||||
// ssa_function_summaries — only touched when non-empty.
|
||||
if !ssa_summaries.is_empty() {
|
||||
tx.execute(
|
||||
"DELETE FROM ssa_function_summaries
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR REPLACE INTO ssa_function_summaries
|
||||
(project, file_path, file_hash, name, arity, lang, namespace,
|
||||
container, disambig, kind, summary, updated_at)
|
||||
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
|
||||
)?;
|
||||
for (name, arity, lang, namespace, container, disambig, kind, summary) in
|
||||
ssa_summaries
|
||||
{
|
||||
let json = serde_json::to_string(summary)
|
||||
.map_err(|e| NyxError::Msg(format!("SSA summary serialise: {e}")))?;
|
||||
let disambig_sql = disambig.map(|d| d as i64);
|
||||
stmt.execute(params![
|
||||
self.project,
|
||||
path_str,
|
||||
file_hash,
|
||||
name,
|
||||
*arity as i64,
|
||||
lang,
|
||||
namespace,
|
||||
container,
|
||||
disambig_sql,
|
||||
kind.as_str(),
|
||||
json,
|
||||
now
|
||||
])?;
|
||||
}
|
||||
}
|
||||
|
||||
// ssa_function_bodies — only touched when non-empty.
|
||||
if !ssa_bodies.is_empty() {
|
||||
tx.execute(
|
||||
"DELETE FROM ssa_function_bodies
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR REPLACE INTO ssa_function_bodies
|
||||
(project, file_path, file_hash, name, arity, lang, namespace,
|
||||
container, disambig, kind, body, updated_at)
|
||||
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
|
||||
)?;
|
||||
for (name, arity, lang, namespace, container, disambig, kind, body) in ssa_bodies {
|
||||
let blob = rmp_serde::to_vec_named(body)
|
||||
.map_err(|e| NyxError::Msg(format!("SSA body serialise: {e}")))?;
|
||||
let disambig_sql = disambig.map(|d| d as i64);
|
||||
stmt.execute(params![
|
||||
self.project,
|
||||
path_str,
|
||||
file_hash,
|
||||
name,
|
||||
*arity as i64,
|
||||
lang,
|
||||
namespace,
|
||||
container,
|
||||
disambig_sql,
|
||||
kind.as_str(),
|
||||
blob,
|
||||
now
|
||||
])?;
|
||||
}
|
||||
}
|
||||
|
||||
// auth_check_summaries — always replace, even when empty,
|
||||
// so a helper that lost its ownership check no longer
|
||||
// leaks lifts into subsequent pass-2 runs.
|
||||
tx.execute(
|
||||
"DELETE FROM auth_check_summaries WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
{
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR REPLACE INTO auth_check_summaries
|
||||
(project, file_path, file_hash, name, arity, lang, namespace,
|
||||
container, disambig, kind, summary, updated_at)
|
||||
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12)",
|
||||
)?;
|
||||
for (name, arity, lang, namespace, container, disambig, kind, summary) in
|
||||
auth_summaries
|
||||
{
|
||||
let json = serde_json::to_string(summary)
|
||||
.map_err(|e| NyxError::Msg(format!("auth summary serialise: {e}")))?;
|
||||
let disambig_sql = disambig.map(|d| d as i64);
|
||||
stmt.execute(params![
|
||||
self.project,
|
||||
path_str,
|
||||
file_hash,
|
||||
name,
|
||||
*arity as i64,
|
||||
lang,
|
||||
namespace,
|
||||
container,
|
||||
disambig_sql,
|
||||
kind.as_str(),
|
||||
json,
|
||||
now
|
||||
])?;
|
||||
}
|
||||
}
|
||||
|
||||
tx.commit()?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Load every `AuthCheckSummary` for this project.
|
||||
///
|
||||
/// Returns rows with full metadata for `FuncKey` reconstruction:
|
||||
|
|
@ -1962,6 +2203,103 @@ pub mod index {
|
|||
Ok(rows)
|
||||
}
|
||||
|
||||
/// Record the first time a finding fingerprint was observed. Idempotent —
|
||||
/// the earliest call wins via INSERT OR IGNORE. Used by the overview
|
||||
/// backlog-age computation; ts should be the originating scan's
|
||||
/// `started_at` (RFC-3339).
|
||||
pub fn record_finding_first_seen(&self, fingerprint: &str, ts: &str) -> NyxResult<()> {
|
||||
self.c().execute(
|
||||
"INSERT OR IGNORE INTO finding_first_seen (fingerprint, first_seen_at) VALUES (?1, ?2)",
|
||||
params![fingerprint, ts],
|
||||
)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Bulk variant. Inserts ignoring conflicts.
|
||||
pub fn record_finding_first_seen_bulk(
|
||||
&self,
|
||||
entries: &[(String, String)],
|
||||
) -> NyxResult<()> {
|
||||
if entries.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
let conn = self.c();
|
||||
let tx = conn.unchecked_transaction()?;
|
||||
{
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR IGNORE INTO finding_first_seen (fingerprint, first_seen_at) VALUES (?1, ?2)",
|
||||
)?;
|
||||
for (fp, ts) in entries {
|
||||
stmt.execute(params![fp, ts])?;
|
||||
}
|
||||
}
|
||||
tx.commit()?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Look up first-seen timestamps for a set of fingerprints. Missing
|
||||
/// entries are simply absent from the returned map.
|
||||
pub fn get_first_seen_map(
|
||||
&self,
|
||||
fingerprints: &[String],
|
||||
) -> NyxResult<std::collections::HashMap<String, String>> {
|
||||
if fingerprints.is_empty() {
|
||||
return Ok(std::collections::HashMap::new());
|
||||
}
|
||||
// SQLite IN-clause cap is high but parameter count is bounded — chunk
|
||||
// for safety with large fingerprint sets.
|
||||
let mut out = std::collections::HashMap::with_capacity(fingerprints.len());
|
||||
let conn = self.c();
|
||||
for chunk in fingerprints.chunks(500) {
|
||||
let placeholders = (1..=chunk.len())
|
||||
.map(|i| format!("?{i}"))
|
||||
.collect::<Vec<_>>()
|
||||
.join(",");
|
||||
let sql = format!(
|
||||
"SELECT fingerprint, first_seen_at FROM finding_first_seen WHERE fingerprint IN ({placeholders})"
|
||||
);
|
||||
let mut stmt = conn.prepare(&sql)?;
|
||||
let params: Vec<&dyn rusqlite::ToSql> =
|
||||
chunk.iter().map(|s| s as &dyn rusqlite::ToSql).collect();
|
||||
let rows = stmt.query_map(params.as_slice(), |row| {
|
||||
Ok((row.get::<_, String>(0)?, row.get::<_, String>(1)?))
|
||||
})?;
|
||||
for r in rows.flatten() {
|
||||
out.insert(r.0, r.1);
|
||||
}
|
||||
}
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
/// Get a single metadata value by key. Returns None if absent.
|
||||
pub fn get_metadata(&self, key: &str) -> NyxResult<Option<String>> {
|
||||
let conn = self.c();
|
||||
let mut stmt = conn.prepare("SELECT value FROM nyx_metadata WHERE key = ?1")?;
|
||||
let mut rows = stmt.query(params![key])?;
|
||||
if let Some(row) = rows.next()? {
|
||||
Ok(Some(row.get(0)?))
|
||||
} else {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
/// Set a metadata value (insert-or-replace).
|
||||
pub fn set_metadata(&self, key: &str, value: &str) -> NyxResult<()> {
|
||||
self.c().execute(
|
||||
"INSERT OR REPLACE INTO nyx_metadata (key, value) VALUES (?1, ?2)",
|
||||
params![key, value],
|
||||
)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Remove a metadata key. Returns true if a row was deleted.
|
||||
pub fn delete_metadata(&self, key: &str) -> NyxResult<bool> {
|
||||
let n = self
|
||||
.c()
|
||||
.execute("DELETE FROM nyx_metadata WHERE key = ?1", params![key])?;
|
||||
Ok(n > 0)
|
||||
}
|
||||
|
||||
/// Delete a suppression rule by ID. Returns true if a row was deleted.
|
||||
pub fn delete_suppression_rule(&self, id: i64) -> NyxResult<bool> {
|
||||
let count = self.c().execute(
|
||||
|
|
@ -2175,7 +2513,9 @@ fn ssa_summaries_round_trip() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
),
|
||||
(
|
||||
|
|
@ -2207,7 +2547,9 @@ fn ssa_summaries_round_trip() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
),
|
||||
];
|
||||
|
|
@ -2377,7 +2719,9 @@ fn ssa_summaries_hash_rescan_replaces_stale() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
)];
|
||||
idx.replace_ssa_summaries_for_file(&f, &hash_v1, &sums_v1)
|
||||
|
|
@ -2411,7 +2755,9 @@ fn ssa_summaries_hash_rescan_replaces_stale() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
)];
|
||||
idx.replace_ssa_summaries_for_file(&f, &hash_v2, &sums_v2)
|
||||
|
|
@ -2466,7 +2812,9 @@ fn clear_drops_ssa_summaries_table() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
)];
|
||||
idx.replace_ssa_summaries_for_file(&f, &hash, &sums)
|
||||
|
|
@ -2521,6 +2869,8 @@ fn make_test_callee_body(
|
|||
value_defs,
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::new(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
},
|
||||
opt: crate::ssa::OptimizeResult {
|
||||
const_values: std::collections::HashMap::new(),
|
||||
|
|
@ -2733,7 +3083,9 @@ fn make_test_ssa_summary() -> crate::summary::ssa_summary::SsaFuncSummary {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -3382,3 +3734,116 @@ fn metadata_table_survives_clear() {
|
|||
let stored = index::Indexer::get_stored_engine_version(&pool).unwrap();
|
||||
assert_eq!(stored.as_deref(), Some(index::ENGINE_VERSION));
|
||||
}
|
||||
|
||||
/// Pointer-Phase 5 / A3 audit: field_points_to round-trips through
|
||||
/// the SsaFuncSummary SQLite blob. Pin that the new field_points_to
|
||||
/// records preserve param_field_reads, param_field_writes, the
|
||||
/// receiver sentinel (`u32::MAX`), the container-element marker
|
||||
/// (`<elem>`), and the `overflow` flag across serialise → store →
|
||||
/// load → deserialise. This is the strict-additive contract for
|
||||
/// pre-Phase-5 blobs (default-empty deserialises cleanly) and the
|
||||
/// completeness check for the W3 cross-call resolver.
|
||||
#[test]
|
||||
fn ssa_summaries_round_trip_preserves_field_points_to() {
|
||||
use crate::summary::points_to::FieldPointsToSummary;
|
||||
use crate::summary::ssa_summary::SsaFuncSummary;
|
||||
|
||||
let td = tempfile::tempdir().unwrap();
|
||||
let db = td.path().join("nyx.sqlite");
|
||||
let f = td.path().join("store.rs");
|
||||
std::fs::write(&f, "// helper that writes obj.cache").unwrap();
|
||||
|
||||
let pool = index::Indexer::init(&db).unwrap();
|
||||
let mut idx = index::Indexer::from_pool("proj", &pool).unwrap();
|
||||
|
||||
let hash = index::Indexer::digest_bytes(b"// helper that writes obj.cache");
|
||||
|
||||
// Build a summary with one read on param 0 ("name"), one write on
|
||||
// param 1 ("cache"), one read on the receiver sentinel ("kind"),
|
||||
// and an ELEM marker on param 0. Round-trip must preserve all
|
||||
// four channels.
|
||||
let mut fpt = FieldPointsToSummary::empty();
|
||||
fpt.add_read(0, "name");
|
||||
fpt.add_write(1, "cache");
|
||||
fpt.add_read(u32::MAX, "kind");
|
||||
fpt.add_write(0, "<elem>");
|
||||
|
||||
let summary = SsaFuncSummary {
|
||||
field_points_to: fpt.clone(),
|
||||
..Default::default()
|
||||
};
|
||||
let row = (
|
||||
"store".to_string(),
|
||||
2_usize,
|
||||
"rust".to_string(),
|
||||
"store.rs".to_string(),
|
||||
String::new(),
|
||||
None,
|
||||
crate::symbol::FuncKind::Function,
|
||||
summary,
|
||||
);
|
||||
idx.replace_ssa_summaries_for_file(&f, &hash, &[row])
|
||||
.unwrap();
|
||||
|
||||
let loaded = idx.load_all_ssa_summaries().unwrap();
|
||||
assert_eq!(loaded.len(), 1, "single summary stored, single returned");
|
||||
let (_, name, _, _, _, _, _, _, sum) = &loaded[0];
|
||||
assert_eq!(name, "store");
|
||||
assert_eq!(
|
||||
sum.field_points_to, fpt,
|
||||
"field_points_to must round-trip byte-equal",
|
||||
);
|
||||
|
||||
// Spot-check sentinel + ELEM marker channels.
|
||||
let recv_read = sum
|
||||
.field_points_to
|
||||
.param_field_reads
|
||||
.iter()
|
||||
.find(|(p, _)| *p == u32::MAX)
|
||||
.expect("receiver read at u32::MAX sentinel");
|
||||
assert!(recv_read.1.iter().any(|s| s == "kind"));
|
||||
|
||||
let elem_write = sum
|
||||
.field_points_to
|
||||
.param_field_writes
|
||||
.iter()
|
||||
.find(|(p, _)| *p == 0)
|
||||
.expect("param 0 writes recorded");
|
||||
assert!(
|
||||
elem_write.1.iter().any(|s| s == "<elem>"),
|
||||
"<elem> marker must survive round-trip without conversion",
|
||||
);
|
||||
assert!(!sum.field_points_to.overflow);
|
||||
}
|
||||
|
||||
/// Pre-Phase-5 blob compatibility: a summary serialised without
|
||||
/// `field_points_to` deserialises with the empty default — no
|
||||
/// migration needed because the field is `#[serde(default)]`.
|
||||
#[test]
|
||||
fn ssa_summaries_pre_phase5_blob_decodes_with_empty_field_points_to() {
|
||||
use crate::summary::ssa_summary::SsaFuncSummary;
|
||||
|
||||
// Hand-craft JSON without the `field_points_to` key.
|
||||
let pre_phase5_json = r#"{
|
||||
"param_to_return": [],
|
||||
"param_to_sink": [],
|
||||
"source_caps": 0,
|
||||
"param_to_sink_param": [],
|
||||
"param_container_to_return": [],
|
||||
"param_to_container_store": [],
|
||||
"return_type": null,
|
||||
"return_abstract": null,
|
||||
"source_to_callback": [],
|
||||
"receiver_to_return": null,
|
||||
"receiver_to_sink": 0,
|
||||
"abstract_transfer": [],
|
||||
"param_return_paths": [],
|
||||
"return_path_facts": [],
|
||||
"typed_call_receivers": []
|
||||
}"#;
|
||||
let sum: SsaFuncSummary = serde_json::from_str(pre_phase5_json).unwrap();
|
||||
assert!(
|
||||
sum.field_points_to.is_empty(),
|
||||
"missing field_points_to must default to empty",
|
||||
);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -276,7 +276,7 @@ fn render_diag(d: &Diag, width: usize) -> String {
|
|||
Some(format!(
|
||||
" {}",
|
||||
style(format!(
|
||||
"[capped: {count} note{} — {}]",
|
||||
"[capped: {count} note{}, {}]",
|
||||
if count == 1 { "" } else { "s" },
|
||||
direction.tag(),
|
||||
))
|
||||
|
|
|
|||
|
|
@ -34,13 +34,19 @@ pub static RULES: &[LabelRule] = &[
|
|||
case_sensitive: false,
|
||||
},
|
||||
// Type conversion sanitizers (C++ STL forms).
|
||||
// The full `std::sto*` family (including 64-bit `*ll`/`*ull` and `*ld`)
|
||||
// returns an integral or floating value; downstream string-injection
|
||||
// caps no longer apply.
|
||||
LabelRule {
|
||||
matchers: &[
|
||||
"std::stoi",
|
||||
"std::stol",
|
||||
"std::stoll",
|
||||
"std::stoul",
|
||||
"std::stoull",
|
||||
"std::stof",
|
||||
"std::stod",
|
||||
"std::stold",
|
||||
],
|
||||
label: DataLabel::Sanitizer(Cap::all()),
|
||||
case_sensitive: false,
|
||||
|
|
@ -111,9 +117,19 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"lambda_expression" => Kind::Function,
|
||||
// Namespace bodies and C++ class bodies descend as plain Blocks so the
|
||||
// CFG builder can reach the nested function_definitions/lambdas inside
|
||||
// and extract them as separate bodies.
|
||||
// and extract them as separate bodies. Without these, a
|
||||
// `class_specifier` / `struct_specifier` falls through to the
|
||||
// generic `_ =>` arm in `build_sub`, which records a leaf `Seq`
|
||||
// node and never walks the body — so inline member-function
|
||||
// definitions (and methods of nested classes) are silently dropped.
|
||||
"declaration_list" => Kind::Block,
|
||||
"field_declaration_list" => Kind::Block,
|
||||
"class_specifier" => Kind::Block,
|
||||
"struct_specifier" => Kind::Block,
|
||||
"union_specifier" => Kind::Block,
|
||||
"enum_specifier" => Kind::Block,
|
||||
"template_declaration" => Kind::Block,
|
||||
"linkage_specification" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
|
|||
|
|
@ -81,6 +81,13 @@ pub static RULES: &[LabelRule] = &[
|
|||
"os.Create",
|
||||
"ioutil.ReadFile",
|
||||
"os.ReadFile",
|
||||
// Mutating filesystem operations. Path-traversal CVEs commonly
|
||||
// sink into delete/write rather than read (Owncast CVE-2024-31450
|
||||
// sinks into `os.Remove(filepath.Join(root, userInput))`).
|
||||
"os.Remove",
|
||||
"os.RemoveAll",
|
||||
"os.WriteFile",
|
||||
"ioutil.WriteFile",
|
||||
],
|
||||
label: DataLabel::Sink(Cap::FILE_IO),
|
||||
case_sensitive: false,
|
||||
|
|
@ -94,10 +101,22 @@ pub static RULES: &[LabelRule] = &[
|
|||
matchers: &[
|
||||
"http.Get",
|
||||
"http.Post",
|
||||
"http.Head",
|
||||
"http.NewRequest",
|
||||
"http.NewRequestWithContext",
|
||||
"net.Dial",
|
||||
"net.DialTimeout",
|
||||
// `http.DefaultClient` is the package-level default `*http.Client`.
|
||||
// Idiomatic Go SSRF sinks (Owncast CVE-2023-3188) use the
|
||||
// `http.DefaultClient.Get(url)` form rather than the bare
|
||||
// `http.Get(url)` helper, so the suffix-matched callee text needs
|
||||
// an explicit entry here — bare `Get/Post/Do/Head` would
|
||||
// over-match unrelated method names.
|
||||
"http.DefaultClient.Get",
|
||||
"http.DefaultClient.Post",
|
||||
"http.DefaultClient.Head",
|
||||
"http.DefaultClient.Do",
|
||||
"http.DefaultClient.PostForm",
|
||||
],
|
||||
label: DataLabel::Sink(Cap::SSRF),
|
||||
case_sensitive: false,
|
||||
|
|
|
|||
|
|
@ -505,6 +505,38 @@ pub static GATED_SINKS: &[SinkGate] = &[
|
|||
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
|
||||
},
|
||||
},
|
||||
// Node `http.get(options[, cb])` / `https.get(options[, cb])` —
|
||||
// convenience wrappers around `.request()` that auto-call `.end()`.
|
||||
// Same destination semantics as `.request`. Motivated by
|
||||
// CVE-2025-64430 (Parse Server SSRF via http.get(uri)).
|
||||
SinkGate {
|
||||
callee_matcher: "http.get",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::SSRF),
|
||||
case_sensitive: false,
|
||||
payload_args: &[0],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
|
||||
},
|
||||
},
|
||||
SinkGate {
|
||||
callee_matcher: "https.get",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::SSRF),
|
||||
case_sensitive: false,
|
||||
payload_args: &[0],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
|
||||
},
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ mod java;
|
|||
mod javascript;
|
||||
mod php;
|
||||
mod python;
|
||||
mod ruby;
|
||||
pub(crate) mod ruby;
|
||||
mod rust;
|
||||
mod typescript;
|
||||
|
||||
|
|
@ -689,9 +689,13 @@ fn ends_with_cs(haystack: &[u8], needle: &[u8], case_sensitive: bool) -> bool {
|
|||
}
|
||||
}
|
||||
|
||||
/// Prefix check with configurable case sensitivity.
|
||||
/// Prefix check with configurable case sensitivity. The `=` exact-match
|
||||
/// sigil is meaningless for prefix matchers (which by definition match many
|
||||
/// suffixes); it is stripped if present so a malformed matcher like
|
||||
/// `=foo_` still behaves predictably.
|
||||
#[inline]
|
||||
fn starts_with_cs(haystack: &[u8], needle: &[u8], case_sensitive: bool) -> bool {
|
||||
let (needle, _) = unpack_matcher(needle);
|
||||
if needle.len() > haystack.len() {
|
||||
return false;
|
||||
}
|
||||
|
|
@ -708,14 +712,37 @@ fn starts_with_cs(haystack: &[u8], needle: &[u8], case_sensitive: bool) -> bool
|
|||
/// Word-boundary suffix match with configurable case sensitivity.
|
||||
#[inline]
|
||||
fn match_suffix_cs(text: &[u8], matcher: &[u8], case_sensitive: bool) -> bool {
|
||||
if ends_with_cs(text, matcher, case_sensitive) {
|
||||
let start = text.len() - matcher.len();
|
||||
start == 0 || matches!(text[start - 1], b'.' | b':')
|
||||
let (m, exact_only) = unpack_matcher(matcher);
|
||||
if ends_with_cs(text, m, case_sensitive) {
|
||||
let start = text.len() - m.len();
|
||||
if exact_only {
|
||||
// `=foo` matchers fire only when `text` IS `foo` (no `Mod.foo`,
|
||||
// `Class::foo`, or any preceding namespace). Lets a label rule
|
||||
// distinguish bare `Kernel#open` from `File.open` — the former
|
||||
// shells out on `|cmd`, the latter never does (CVE-2020-8130).
|
||||
start == 0
|
||||
} else {
|
||||
start == 0 || matches!(text[start - 1], b'.' | b':')
|
||||
}
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Strip an optional `=` "exact-match" sigil from the start of a matcher.
|
||||
/// Matchers prefixed with `=` (e.g. `"=open"`) only fire when the candidate
|
||||
/// text equals the matcher exactly — the boundary-`.`-or-`:` allowance is
|
||||
/// suppressed. Used to distinguish bare-callee Ruby/Python builtins from
|
||||
/// methods of the same name on a typed receiver.
|
||||
#[inline]
|
||||
fn unpack_matcher(matcher: &[u8]) -> (&[u8], bool) {
|
||||
if matcher.first() == Some(&b'=') {
|
||||
(&matcher[1..], true)
|
||||
} else {
|
||||
(matcher, false)
|
||||
}
|
||||
}
|
||||
|
||||
/// Try to classify a piece of syntax text.
|
||||
/// `lang` is the canonicalised language key ("rust", "javascript", ...).
|
||||
///
|
||||
|
|
@ -1063,6 +1090,29 @@ pub fn normalize_chained_call_for_classify(text: &str) -> String {
|
|||
normalize_chained_call(text)
|
||||
}
|
||||
|
||||
/// Return the bare method-name segment of a callee text.
|
||||
///
|
||||
/// Centralised replacement for the textual `callee.rsplit('.').next().unwrap_or(callee)`
|
||||
/// pattern that used to be scattered across the codebase.
|
||||
///
|
||||
/// Behaviour-preserving across the Phase 2 SSA chain decomposition rollout:
|
||||
/// - When SSA lowering rewrites a chained-receiver call (`c.mu.Lock()` →
|
||||
/// `Call("Lock", [v_mu])`), the call's `callee` is already the bare method
|
||||
/// name, so this helper is a no-op pass-through.
|
||||
/// - For 1-dot callees (`obj.method`) and for languages where Phase 2 lowering
|
||||
/// doesn't run yet (PHP/Ruby) the helper still extracts the trailing method
|
||||
/// from the textual form, exactly as the old per-callsite split did.
|
||||
/// - For bare callees (no dot), it returns the input unchanged.
|
||||
///
|
||||
/// Use this helper when you need the *terminal* method name from a callee
|
||||
/// string regardless of whether the call had a chained receiver. When you
|
||||
/// have an `SsaOp::Call` in hand, prefer reading `callee` directly and
|
||||
/// walking `receiver` through `FieldProj` ops — that's the precise path.
|
||||
/// This helper is the textual fallback for callsites that only see a `&str`.
|
||||
pub fn bare_method_name(callee: &str) -> &str {
|
||||
callee.rsplit('.').next().unwrap_or(callee)
|
||||
}
|
||||
|
||||
/// Normalize a chained method call: strip `()` between `.` segments.
|
||||
/// e.g. `r.URL.Query().Get` → `r.URL.Query.Get`
|
||||
/// e.g. `r.URL.Query().Get("host")` → `r.URL.Query.Get`
|
||||
|
|
@ -1260,6 +1310,26 @@ pub fn custom_rule_id(lang: &str, kind: &str, matchers: &[String]) -> String {
|
|||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn bare_method_name_strips_chain() {
|
||||
// No-dot input → returned as-is.
|
||||
assert_eq!(bare_method_name("foo"), "foo");
|
||||
// 1-dot → trailing segment (Phase 2 leaves these alone in SSA).
|
||||
assert_eq!(bare_method_name("obj.method"), "method");
|
||||
// Multi-dot → trailing segment (matches AST-only callees from
|
||||
// PHP/Ruby and any pre-Phase-2 textual paths kept around in
|
||||
// `callee_text` for display).
|
||||
assert_eq!(bare_method_name("a.b.c.method"), "method");
|
||||
// Trailing dot → empty trailing segment, matching the legacy
|
||||
// `rsplit('.').next()` behaviour bit-for-bit.
|
||||
assert_eq!(bare_method_name("foo."), "");
|
||||
// Empty input.
|
||||
assert_eq!(bare_method_name(""), "");
|
||||
// Phase 2 invariant: when SSA decomposed a chain, `callee` is
|
||||
// the bare method already and the helper is a no-op.
|
||||
assert_eq!(bare_method_name("Lock"), "Lock");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn handler_param_names_exact_and_prefix() {
|
||||
// Exact names still match.
|
||||
|
|
@ -1376,6 +1446,115 @@ mod tests {
|
|||
assert_eq!(result, None);
|
||||
}
|
||||
|
||||
// CVE Hunt Session 2 (Go CVE-2024-31450 Owncast path traversal):
|
||||
// mutating filesystem helpers (`os.Remove`, `os.WriteFile`,
|
||||
// `os.RemoveAll`, `ioutil.WriteFile`) sink path-traversal flows that
|
||||
// the prior Go ruleset only saw on the read side (`os.Open`,
|
||||
// `os.ReadFile`).
|
||||
#[test]
|
||||
fn classify_go_os_remove_is_file_io_sink() {
|
||||
let result = classify("go", "os.Remove", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::FILE_IO)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_go_os_write_file_is_file_io_sink() {
|
||||
let result = classify("go", "os.WriteFile", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::FILE_IO)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_go_os_remove_all_is_file_io_sink() {
|
||||
let result = classify("go", "os.RemoveAll", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::FILE_IO)));
|
||||
}
|
||||
|
||||
// CVE Hunt Session 2 (Go CVE-2023-3188 Owncast SSRF):
|
||||
// `http.DefaultClient.Get/Post/Head/Do/PostForm` is the idiomatic Go
|
||||
// SSRF sink shape (`http.DefaultClient` is the package-level shared
|
||||
// `*http.Client`). Bare `Get`/`Post` matchers would over-match
|
||||
// unrelated method names; the explicit `http.DefaultClient.*` matcher
|
||||
// restricts the suffix-match to the stdlib helper while leaving
|
||||
// user-defined `myClient.Get` alone (no false positives).
|
||||
#[test]
|
||||
fn classify_go_http_default_client_get_is_ssrf_sink() {
|
||||
let result = classify("go", "http.DefaultClient.Get", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::SSRF)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_go_http_default_client_post_is_ssrf_sink() {
|
||||
let result = classify("go", "http.DefaultClient.Post", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::SSRF)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_go_http_default_client_do_is_ssrf_sink() {
|
||||
let result = classify("go", "http.DefaultClient.Do", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::SSRF)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_go_user_client_get_is_not_ssrf_sink() {
|
||||
// `client.Get` on a user-named *http.Client variable should NOT
|
||||
// match — the Go SSRF set is restricted to the stdlib package
|
||||
// helper `http.DefaultClient`. Type-aware resolution would be the
|
||||
// path to a broader rule, not a bare-name match.
|
||||
let result = classify("go", "client.Get", None);
|
||||
assert_eq!(result, None);
|
||||
}
|
||||
|
||||
// CVE Hunt Session 3 (Ruby CVE-2020-8130 rake `Kernel#open` CMDI):
|
||||
// bare `open(path)` interprets a leading `|` as a shell pipe. The
|
||||
// `=` exact-match sigil distinguishes the dangerous bare-callee form
|
||||
// from `File.open` / `IO.open` / `URI.open`, each of which has its
|
||||
// own non-piping semantics. Without the sigil, the suffix-with-
|
||||
// boundary matcher would over-fire on every `X.open` call.
|
||||
#[test]
|
||||
fn classify_ruby_bare_open_is_shell_escape_sink() {
|
||||
let result = classify("ruby", "open", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::SHELL_ESCAPE)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_ruby_file_open_is_not_shell_escape_sink() {
|
||||
// The exact-match sigil on `=open` must NOT fire on `File.open`.
|
||||
// `File.open` is a separate FILE_IO sink (existing rule); the
|
||||
// CMDI rule must not double-classify it.
|
||||
let result = classify_all("ruby", "File.open", None);
|
||||
// FILE_IO from the existing `File.open` matcher is allowed.
|
||||
assert!(result.contains(&DataLabel::Sink(Cap::FILE_IO)));
|
||||
// SHELL_ESCAPE from the new bare-`open` matcher must NOT appear.
|
||||
assert!(!result.contains(&DataLabel::Sink(Cap::SHELL_ESCAPE)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_ruby_io_open_is_not_shell_escape_sink() {
|
||||
// `IO.open` takes a file descriptor — never pipes. The bare-
|
||||
// open CMDI rule must leave it alone.
|
||||
let result = classify("ruby", "IO.open", None);
|
||||
assert_ne!(result, Some(DataLabel::Sink(Cap::SHELL_ESCAPE)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_ruby_uri_open_remains_ssrf_sink() {
|
||||
// `URI.open` is the existing SSRF sink. Adding `=open` as a
|
||||
// CMDI rule must not break or shadow it.
|
||||
let result = classify("ruby", "URI.open", None);
|
||||
assert_eq!(result, Some(DataLabel::Sink(Cap::SSRF)));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unpack_matcher_strips_exact_sigil() {
|
||||
let (m, exact) = unpack_matcher(b"=open");
|
||||
assert_eq!(m, b"open");
|
||||
assert!(exact);
|
||||
|
||||
let (m, exact) = unpack_matcher(b"open");
|
||||
assert_eq!(m, b"open");
|
||||
assert!(!exact);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_case_sensitive_suffix_boundary() {
|
||||
let extras = vec![RuntimeLabelRule {
|
||||
|
|
@ -1391,6 +1570,29 @@ mod tests {
|
|||
assert_eq!(result, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_cpp_sto_family_is_sanitizer() {
|
||||
// Phase 1: full `std::sto*` family (including 64-bit and `long
|
||||
// double` variants) clears every taint cap that flows through it,
|
||||
// matching the existing `std::stoi`/`std::stol` rule.
|
||||
for callee in [
|
||||
"std::stoi",
|
||||
"std::stol",
|
||||
"std::stoll",
|
||||
"std::stoul",
|
||||
"std::stoull",
|
||||
"std::stof",
|
||||
"std::stod",
|
||||
"std::stold",
|
||||
] {
|
||||
assert_eq!(
|
||||
classify("cpp", callee, None),
|
||||
Some(DataLabel::Sanitizer(Cap::all())),
|
||||
"{callee} should be a Cap::all() sanitizer",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_cap_works() {
|
||||
assert_eq!(parse_cap("html_escape"), Some(Cap::HTML_ESCAPE));
|
||||
|
|
|
|||
|
|
@ -73,6 +73,19 @@ pub static RULES: &[LabelRule] = &[
|
|||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
case_sensitive: false,
|
||||
},
|
||||
// Bare `Kernel#open(path)` interprets a path beginning with `|` as a
|
||||
// shell command (`open("|cmd")` runs `cmd`). `=open` exact-matcher
|
||||
// syntax limits this rule to the bare call — `File.open`, `IO.open`,
|
||||
// `URI.open` etc. each have their own non-pipe semantics and are
|
||||
// covered by their own labels (or intentionally not labeled as CMDI).
|
||||
// CVE-2020-8130 (rake `Rake::FileList#egrep`) was the canonical
|
||||
// exploit: an attacker-supplied filename starting with `|` ran through
|
||||
// `open(fn, "r")`. The fix replaced the call with `File.open(fn, "r")`.
|
||||
LabelRule {
|
||||
matchers: &["=open"],
|
||||
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
|
||||
case_sensitive: false,
|
||||
},
|
||||
// Backtick shell execution: tree-sitter-ruby represents `` `cmd` `` as a
|
||||
// `subshell` node with no callee field. push_node normalises the synthetic
|
||||
// callee name to "subshell" and extract_arg_uses lifts interpolation
|
||||
|
|
@ -225,6 +238,60 @@ pub static PARAM_CONFIG: ParamConfig = ParamConfig {
|
|||
ident_fields: &["name"],
|
||||
};
|
||||
|
||||
/// ActiveRecord query methods that the static [`RULES`] table classifies as
|
||||
/// `Sink(Cap::SQL_QUERY)`. These are SQL injection vectors only when arg 0
|
||||
/// is a string with interpolation (`#{x}`) or a non-literal identifier — the
|
||||
/// hash form (`where(id: x)`) and the parameterised form (`where("a = ?", x)`)
|
||||
/// are intrinsically safe because Rails escapes the values.
|
||||
const AR_QUERY_METHOD_NAMES: &[&str] = &["where", "order", "group", "having", "joins", "pluck"];
|
||||
|
||||
/// Tree-sitter argument-0 node kinds that mark an ActiveRecord query call as
|
||||
/// shape-safe. Hash literals (`pair`, `hash`), symbol literals
|
||||
/// (`simple_symbol`, `hash_key_symbol`), array literals (`array`), and pure
|
||||
/// string literals without `#{...}` interpolation are all safe. Strings WITH
|
||||
/// interpolation and identifiers / method calls are *not* in this list —
|
||||
/// callers must check `has_interpolation` and the kind separately.
|
||||
const AR_QUERY_SAFE_ARG0_KINDS: &[&str] = &[
|
||||
"pair",
|
||||
"hash",
|
||||
"simple_symbol",
|
||||
"hash_key_symbol",
|
||||
"array",
|
||||
"string",
|
||||
"string_literal",
|
||||
];
|
||||
|
||||
/// Returns `true` when a Ruby `call` node is an ActiveRecord query method
|
||||
/// (`where`, `order`, `pluck`, …) whose argument 0 has a parameter-safe shape.
|
||||
///
|
||||
/// Used by [`crate::cfg`] to synthesise a `Sanitizer(SQL_QUERY)` label on
|
||||
/// the same node as the `Sink(SQL_QUERY)` label, suppressing both
|
||||
/// `taint-unsanitised-flow` (sanitiser sees taint at the sink) and
|
||||
/// `cfg-unguarded-sink` (sanitiser dominates the sink reflexively).
|
||||
///
|
||||
/// Real-world FP shapes this closes (redmine, mastodon, diaspora):
|
||||
/// * `Issue.where(:id => params[:id])` — hash form
|
||||
/// * `Model.where(id: x, name: y)` — keyword-shorthand pairs
|
||||
/// * `Project.order(:created_at)` — symbol literal
|
||||
/// * `Issue.pluck(:id, :name)` — symbol literals
|
||||
/// * `Model.where("active = ?", x)` — parameterised string
|
||||
///
|
||||
/// Real-world TPs preserved:
|
||||
/// * `User.where("name = '#{name}'")` — string with interpolation
|
||||
/// * `Model.where(some_string_var)` — dynamic identifier (conservative)
|
||||
pub fn ar_query_safe_shape(callee_text: &str, arg0_kind: &str, has_interpolation: bool) -> bool {
|
||||
// Match the callee's last segment ("Model.where" → "where", "where" → "where").
|
||||
let leaf = callee_text.rsplit(['.', ':']).next().unwrap_or(callee_text);
|
||||
if !AR_QUERY_METHOD_NAMES.contains(&leaf) {
|
||||
return false;
|
||||
}
|
||||
// Strings are safe only when they don't contain `#{...}` interpolation.
|
||||
if matches!(arg0_kind, "string" | "string_literal") && has_interpolation {
|
||||
return false;
|
||||
}
|
||||
AR_QUERY_SAFE_ARG0_KINDS.contains(&arg0_kind)
|
||||
}
|
||||
|
||||
/// Framework-conditional rules for Ruby.
|
||||
pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
|
||||
let mut rules = Vec::new();
|
||||
|
|
@ -249,3 +316,61 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
|
|||
|
||||
rules
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod ar_query_tests {
|
||||
use super::ar_query_safe_shape;
|
||||
|
||||
#[test]
|
||||
fn hash_form_is_safe() {
|
||||
// Model.where(:id => x) — pair node directly in argument_list
|
||||
assert!(ar_query_safe_shape("Model.where", "pair", false));
|
||||
// Model.where(id: x)
|
||||
assert!(ar_query_safe_shape("where", "pair", false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn symbol_form_is_safe() {
|
||||
assert!(ar_query_safe_shape("Project.order", "simple_symbol", false));
|
||||
assert!(ar_query_safe_shape("Issue.pluck", "simple_symbol", false));
|
||||
assert!(ar_query_safe_shape("Model.joins", "simple_symbol", false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parameterised_string_is_safe() {
|
||||
// Model.where("a = ?", x) — first arg is a string literal w/o interpolation
|
||||
assert!(ar_query_safe_shape("where", "string", false));
|
||||
assert!(ar_query_safe_shape("where", "string_literal", false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn interpolated_string_is_dangerous() {
|
||||
// Model.where("a = #{x}") — string node WITH interpolation child
|
||||
assert!(!ar_query_safe_shape("where", "string", true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dynamic_identifier_is_dangerous() {
|
||||
// Model.where(some_var) — kind is identifier, not in safe list
|
||||
assert!(!ar_query_safe_shape("where", "identifier", false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn array_form_is_safe() {
|
||||
// Model.pluck([:id, :name]) — uncommon but valid
|
||||
assert!(ar_query_safe_shape("pluck", "array", false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn non_ar_method_is_never_suppressed() {
|
||||
// find_by_sql is a real raw-SQL sink — never suppress.
|
||||
assert!(!ar_query_safe_shape("find_by_sql", "string", false));
|
||||
assert!(!ar_query_safe_shape("connection.execute", "pair", false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn callee_with_module_path_resolves_leaf() {
|
||||
assert!(ar_query_safe_shape("Foo::Bar.where", "pair", false));
|
||||
assert!(ar_query_safe_shape("a.b.c.where", "pair", false));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -58,6 +58,7 @@ pub mod interop;
|
|||
pub mod labels;
|
||||
pub mod output;
|
||||
pub mod patterns;
|
||||
pub mod pointer;
|
||||
pub mod rank;
|
||||
pub mod rust_resolve;
|
||||
#[cfg(feature = "serve")]
|
||||
|
|
|
|||
15
src/main.rs
15
src/main.rs
|
|
@ -65,20 +65,27 @@ fn main() -> NyxResult<()> {
|
|||
.expect("set rayon stack size");
|
||||
|
||||
let is_serve = cli.command.is_serve();
|
||||
let is_info = cli.command.is_informational();
|
||||
let quiet = config.output.quiet || cli.command.is_structured_output(&config);
|
||||
|
||||
// Print config note before scanning (human-readable mode only).
|
||||
if let Some(note) = config_note.filter(|_| !quiet) {
|
||||
// Print config note before scanning (human-readable mode only). Pure
|
||||
// informational commands suppress it too — their output is usually
|
||||
// piped or grepped and the preamble is noise.
|
||||
if let Some(note) = config_note.filter(|_| !quiet && !is_info) {
|
||||
eprint!("{note}");
|
||||
}
|
||||
|
||||
commands::handle_command(cli.command, database_dir, config_dir, &mut config)?;
|
||||
|
||||
if !quiet && !is_serve {
|
||||
// "Finished in" is useful for long scans but pure noise on fast paths
|
||||
// (small repos, `index status`, `clean` etc.). Suppress it under a
|
||||
// second; users who care about precise timings can use `time`/`hyperfine`.
|
||||
let elapsed = now.elapsed();
|
||||
if !quiet && !is_serve && !is_info && elapsed.as_secs_f32() >= 1.0 {
|
||||
eprintln!(
|
||||
"{} in {:.3}s.",
|
||||
style("Finished").green().bold(),
|
||||
now.elapsed().as_secs_f32()
|
||||
elapsed.as_secs_f32()
|
||||
);
|
||||
}
|
||||
Ok(())
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Banned functions (always dangerous) ────────────────────
|
||||
Pattern {
|
||||
id: "c.memory.gets",
|
||||
description: "gets() — no bounds checking, always exploitable",
|
||||
description: "gets() has no bounds checking and is always exploitable",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "gets")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -21,7 +21,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "c.memory.strcpy",
|
||||
description: "strcpy() — no bounds checking on destination buffer",
|
||||
description: "strcpy() does not bounds-check the destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcpy")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -30,7 +30,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "c.memory.strcat",
|
||||
description: "strcat() — no bounds checking on destination buffer",
|
||||
description: "strcat() does not bounds-check the destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcat")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -39,7 +39,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "c.memory.sprintf",
|
||||
description: "sprintf() — no length limit on output buffer",
|
||||
description: "sprintf() does not limit the output buffer length",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "sprintf")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -48,7 +48,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "c.memory.scanf_percent_s",
|
||||
description: "scanf(\"%s\") — unbounded string read",
|
||||
description: "scanf(\"%s\") performs an unbounded string read",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "scanf")
|
||||
arguments: (argument_list
|
||||
|
|
@ -62,7 +62,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "c.cmdi.system",
|
||||
description: "system() — shell command execution",
|
||||
description: "system() runs a shell command",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "system")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -71,7 +71,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "c.cmdi.popen",
|
||||
description: "popen() — shell command execution with pipe",
|
||||
description: "popen() runs a shell command and returns a pipe",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "popen")) @vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -81,7 +81,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Format-string ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "c.memory.printf_no_fmt",
|
||||
description: "printf(var) — format-string vulnerability when first arg is not literal",
|
||||
description: "printf(var) is a format-string vulnerability when the first arg is not a literal",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "printf")
|
||||
arguments: (argument_list
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Banned C functions (inherited) ─────────────────────────
|
||||
Pattern {
|
||||
id: "cpp.memory.gets",
|
||||
description: "gets() — no bounds checking, always exploitable",
|
||||
description: "gets() has no bounds checking and is always exploitable",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "gets")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -19,7 +19,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "cpp.memory.strcpy",
|
||||
description: "strcpy() — no bounds checking on destination buffer",
|
||||
description: "strcpy() does not bounds-check the destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcpy")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -28,7 +28,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "cpp.memory.strcat",
|
||||
description: "strcat() — no bounds checking on destination buffer",
|
||||
description: "strcat() does not bounds-check the destination buffer",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "strcat")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -37,7 +37,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "cpp.memory.sprintf",
|
||||
description: "sprintf() — no length limit on output buffer",
|
||||
description: "sprintf() does not limit the output buffer length",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "sprintf")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -47,7 +47,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "cpp.cmdi.system",
|
||||
description: "system() — shell command execution",
|
||||
description: "system() runs a shell command",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "system")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -56,7 +56,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "cpp.cmdi.popen",
|
||||
description: "popen() — shell command execution",
|
||||
description: "popen() runs a shell command",
|
||||
query: r#"(call_expression function: (identifier) @id (#eq? @id "popen")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -67,7 +67,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// C++ casts are parsed as call_expression with template_function
|
||||
Pattern {
|
||||
id: "cpp.memory.reinterpret_cast",
|
||||
description: "reinterpret_cast — type-punning cast",
|
||||
description: "reinterpret_cast performs a type-punning cast",
|
||||
query: r#"(call_expression
|
||||
function: (template_function
|
||||
name: (identifier) @n (#eq? @n "reinterpret_cast")))
|
||||
|
|
@ -79,7 +79,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "cpp.memory.const_cast",
|
||||
description: "const_cast — removes const/volatile qualifier",
|
||||
description: "const_cast removes the const/volatile qualifier",
|
||||
query: r#"(call_expression
|
||||
function: (template_function
|
||||
name: (identifier) @n (#eq? @n "const_cast")))
|
||||
|
|
@ -92,7 +92,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier B: Format-string (variable first arg) ─────────────────────
|
||||
Pattern {
|
||||
id: "cpp.memory.printf_no_fmt",
|
||||
description: "printf(var) — format-string vulnerability when first arg is not literal",
|
||||
description: "printf(var) is a format-string vulnerability when the first arg is not a literal",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "printf")
|
||||
arguments: (argument_list
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.cmdi.exec_command",
|
||||
description: "exec.Command() — arbitrary process execution",
|
||||
description: "exec.Command() runs an arbitrary process",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
field: (field_identifier) @f (#eq? @f "Command")))
|
||||
|
|
@ -23,7 +23,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Unsafe pointer ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.memory.unsafe_pointer",
|
||||
description: "unsafe.Pointer — bypasses Go type system",
|
||||
description: "unsafe.Pointer bypasses the Go type system",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "unsafe")
|
||||
|
|
@ -37,7 +37,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: TLS misconfiguration ───────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.transport.insecure_skip_verify",
|
||||
description: "InsecureSkipVerify: true — disables TLS certificate validation",
|
||||
description: "InsecureSkipVerify: true disables TLS certificate validation",
|
||||
query: r#"(keyed_element
|
||||
(literal_element
|
||||
(identifier) @k (#eq? @k "InsecureSkipVerify"))
|
||||
|
|
@ -51,7 +51,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.crypto.md5",
|
||||
description: "md5.New() / md5.Sum() — weak hash algorithm",
|
||||
description: "md5.New() / md5.Sum() use a weak hash algorithm",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "md5")))
|
||||
|
|
@ -63,7 +63,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "go.crypto.sha1",
|
||||
description: "sha1.New() / sha1.Sum() — weak hash algorithm",
|
||||
description: "sha1.New() / sha1.Sum() use a weak hash algorithm",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "sha1")))
|
||||
|
|
@ -106,7 +106,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "go.deser.gob_decode",
|
||||
description: "gob.NewDecoder — Go binary deserialization",
|
||||
description: "gob.NewDecoder performs Go binary deserialization",
|
||||
query: r#"(call_expression
|
||||
function: (selector_expression
|
||||
operand: (identifier) @pkg (#eq? @pkg "gob")
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.deser.readobject",
|
||||
description: "ObjectInputStream.readObject() — unsafe deserialization",
|
||||
description: "ObjectInputStream.readObject() performs unsafe deserialization",
|
||||
// Match any .readObject() call — the method name is specific enough.
|
||||
query: r#"(method_invocation
|
||||
name: (identifier) @id (#eq? @id "readObject"))
|
||||
|
|
@ -24,7 +24,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.cmdi.runtime_exec",
|
||||
description: "Runtime.getRuntime().exec() — shell command execution",
|
||||
description: "Runtime.getRuntime().exec() runs a shell command",
|
||||
query: r#"(method_invocation
|
||||
object: (method_invocation
|
||||
name: (identifier) @n (#eq? @n "getRuntime"))
|
||||
|
|
@ -38,7 +38,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Reflection ─────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.reflection.class_forname",
|
||||
description: "Class.forName() — dynamic class loading",
|
||||
description: "Class.forName() performs dynamic class loading",
|
||||
query: r#"(method_invocation
|
||||
object: (identifier) @c (#eq? @c "Class")
|
||||
name: (identifier) @id (#eq? @id "forName"))
|
||||
|
|
@ -50,7 +50,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "java.reflection.method_invoke",
|
||||
description: "Method.invoke() — reflective method invocation",
|
||||
description: "Method.invoke() is a reflective method invocation",
|
||||
query: r#"(method_invocation
|
||||
name: (identifier) @id (#eq? @id "invoke"))
|
||||
@vuln"#,
|
||||
|
|
@ -76,7 +76,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.crypto.insecure_random",
|
||||
description: "new Random() — java.util.Random is not cryptographically secure",
|
||||
description: "new Random() (java.util.Random) is not cryptographically secure",
|
||||
query: r#"(object_creation_expression
|
||||
type: (type_identifier) @t (#eq? @t "Random"))
|
||||
@vuln"#,
|
||||
|
|
@ -87,7 +87,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "java.crypto.weak_digest",
|
||||
description: "MessageDigest.getInstance(\"MD5\"/\"SHA1\") — weak hash algorithm",
|
||||
description: "MessageDigest.getInstance(\"MD5\"/\"SHA1\") uses a weak hash algorithm",
|
||||
query: r#"(method_invocation
|
||||
object: (identifier) @c (#eq? @c "MessageDigest")
|
||||
name: (identifier) @id (#eq? @id "getInstance")
|
||||
|
|
@ -102,7 +102,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: XSS (servlet) ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "java.xss.getwriter_print",
|
||||
description: "response.getWriter().print/println — direct output without encoding",
|
||||
description: "response.getWriter().print/println writes output without encoding",
|
||||
query: r#"(method_invocation
|
||||
object: (method_invocation
|
||||
name: (identifier) @gw (#eq? @gw "getWriter"))
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
description: "eval() runs dynamic code",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "eval"))
|
||||
@vuln"#,
|
||||
|
|
@ -23,7 +23,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.code_exec.new_function",
|
||||
description: "new Function() constructor — eval equivalent",
|
||||
description: "new Function() constructor is equivalent to eval",
|
||||
query: r#"(new_expression
|
||||
constructor: (identifier) @id (#eq? @id "Function"))
|
||||
@vuln"#,
|
||||
|
|
@ -34,7 +34,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.code_exec.settimeout_string",
|
||||
description: "setTimeout/setInterval with string argument — implicit eval",
|
||||
description: "setTimeout/setInterval with a string argument runs implicit eval",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#match? @id "^(setTimeout|setInterval)$")
|
||||
arguments: (arguments (string) @code))
|
||||
|
|
@ -47,7 +47,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: XSS sinks ──────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.xss.document_write",
|
||||
description: "document.write() — XSS sink",
|
||||
description: "document.write() is an XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "document")
|
||||
|
|
@ -60,7 +60,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.xss.outer_html",
|
||||
description: "Assignment to .outerHTML — XSS sink",
|
||||
description: "Assignment to .outerHTML is an XSS sink",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "outerHTML")))
|
||||
|
|
@ -72,7 +72,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.xss.insert_adjacent_html",
|
||||
description: "insertAdjacentHTML() — XSS sink",
|
||||
description: "insertAdjacentHTML() is an XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "insertAdjacentHTML")))
|
||||
|
|
@ -85,7 +85,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Prototype pollution ────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.prototype.proto_assignment",
|
||||
description: "Assignment to __proto__ — prototype pollution",
|
||||
description: "Assignment to __proto__ causes prototype pollution",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "__proto__")))
|
||||
|
|
@ -97,7 +97,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.prototype.extend_object",
|
||||
description: "Assignment to Object.prototype — prototype mutation",
|
||||
description: "Assignment to Object.prototype mutates the prototype",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (member_expression
|
||||
|
|
@ -126,7 +126,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.crypto.weak_hash_import",
|
||||
description: "Direct md5()/sha1() call — weak hash from imported package",
|
||||
description: "Direct md5()/sha1() call uses a weak hash from an imported package",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#match? @id "^(md5|sha1)$"))
|
||||
@vuln"#,
|
||||
|
|
@ -137,7 +137,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.crypto.math_random",
|
||||
description: "Math.random() — not cryptographically secure",
|
||||
description: "Math.random() is not cryptographically secure",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "Math")
|
||||
|
|
@ -165,7 +165,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Open redirect ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.xss.location_assign",
|
||||
description: "Assignment to location/location.href — open redirect",
|
||||
description: "Assignment to location/location.href is an open-redirect sink",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj (#match? @obj "^(window|location|document)$")
|
||||
|
|
@ -207,7 +207,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Insecure session / cookie configuration ─────────────────
|
||||
Pattern {
|
||||
id: "js.config.insecure_session_httponly",
|
||||
description: "Session cookie with httpOnly: false — allows XSS-based session theft",
|
||||
description: "Session cookie with httpOnly: false allows XSS-based session theft",
|
||||
query: r#"(pair
|
||||
key: (property_identifier) @key (#eq? @key "httpOnly")
|
||||
value: (false) @val)
|
||||
|
|
@ -219,7 +219,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "js.config.insecure_session_secure",
|
||||
description: "Session cookie with secure: false — cookie sent over plain HTTP",
|
||||
description: "Session cookie with secure: false sends the cookie over plain HTTP",
|
||||
query: r#"(pair
|
||||
key: (property_identifier) @key (#eq? @key "secure")
|
||||
value: (false) @val)
|
||||
|
|
@ -276,7 +276,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Verbose error response ────────────────────────────────
|
||||
Pattern {
|
||||
id: "js.config.verbose_error_response",
|
||||
description: "Error object passed to response renderer — may leak stack traces to users",
|
||||
description: "Error object passed to response renderer can leak stack traces to users",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @method
|
||||
|
|
@ -295,7 +295,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier B: CORS dynamic origin reflection ────────────────────────
|
||||
Pattern {
|
||||
id: "js.config.cors_dynamic_origin",
|
||||
description: "CORS Access-Control-Allow-Origin set to dynamic value — may reflect arbitrary origins",
|
||||
description: "CORS Access-Control-Allow-Origin set to a dynamic value can reflect arbitrary origins",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @method (#eq? @method "setHeader"))
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
description: "eval() runs dynamic code",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "eval"))
|
||||
@vuln"#,
|
||||
|
|
@ -23,7 +23,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "php.code_exec.create_function",
|
||||
description: "create_function() — deprecated eval-like constructor",
|
||||
description: "create_function() is a deprecated eval-like constructor",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "create_function"))
|
||||
@vuln"#,
|
||||
|
|
@ -34,7 +34,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "php.code_exec.preg_replace_e",
|
||||
description: "preg_replace with /e modifier — code execution via regex",
|
||||
description: "preg_replace with /e modifier executes code via regex",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "preg_replace")
|
||||
arguments: (arguments
|
||||
|
|
@ -48,7 +48,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "php.code_exec.assert_string",
|
||||
description: "assert() with string argument — evaluates PHP code",
|
||||
description: "assert() with a string argument evaluates PHP code",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "assert")
|
||||
arguments: (arguments
|
||||
|
|
@ -62,7 +62,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.cmdi.system",
|
||||
description: "system/shell_exec/exec/passthru — shell command execution",
|
||||
description: "system/shell_exec/exec/passthru runs a shell command",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#match? @n "^(system|shell_exec|exec|passthru|proc_open|popen)$"))
|
||||
@vuln"#,
|
||||
|
|
@ -74,7 +74,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.deser.unserialize",
|
||||
description: "unserialize() — PHP object injection",
|
||||
description: "unserialize() enables PHP object injection",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "unserialize"))
|
||||
@vuln"#,
|
||||
|
|
@ -100,7 +100,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier B: Path traversal (include with variable) ─────────────────
|
||||
Pattern {
|
||||
id: "php.path.include_variable",
|
||||
description: "include/require with variable path — file inclusion vulnerability",
|
||||
description: "include/require with a variable path is a file-inclusion vulnerability",
|
||||
query: r#"(include_expression (variable_name)) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::B,
|
||||
|
|
@ -110,7 +110,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Crypto ─────────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "php.crypto.md5",
|
||||
description: "md5() — weak hash function",
|
||||
description: "md5() is a weak hash function",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "md5"))
|
||||
@vuln"#,
|
||||
|
|
@ -121,7 +121,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "php.crypto.sha1",
|
||||
description: "sha1() — weak hash function",
|
||||
description: "sha1() is a weak hash function",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#eq? @n "sha1"))
|
||||
@vuln"#,
|
||||
|
|
@ -132,7 +132,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "php.crypto.rand",
|
||||
description: "rand()/mt_rand() — not cryptographically secure",
|
||||
description: "rand()/mt_rand() is not cryptographically secure",
|
||||
query: r#"(function_call_expression
|
||||
function: (name) @n (#match? @n "^(rand|mt_rand)$"))
|
||||
@vuln"#,
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
description: "eval() runs dynamic code",
|
||||
query: r#"(call function: (identifier) @id (#eq? @id "eval")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -20,7 +20,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "py.code_exec.exec",
|
||||
description: "exec() — dynamic code execution",
|
||||
description: "exec() runs dynamic code",
|
||||
query: r#"(call function: (identifier) @id (#eq? @id "exec")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -29,7 +29,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "py.code_exec.compile",
|
||||
description: "compile() with exec/eval mode — code compilation from string",
|
||||
description: "compile() with exec/eval mode compiles code from a string",
|
||||
query: r#"(call function: (identifier) @id (#eq? @id "compile")) @vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -39,7 +39,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Command execution ──────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.cmdi.os_system",
|
||||
description: "os.system() — shell command execution",
|
||||
description: "os.system() runs a shell command",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "os")
|
||||
|
|
@ -52,7 +52,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "py.cmdi.os_popen",
|
||||
description: "os.popen() — shell command execution",
|
||||
description: "os.popen() runs a shell command",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "os")
|
||||
|
|
@ -83,7 +83,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.deser.pickle_loads",
|
||||
description: "pickle.loads/load — arbitrary object deserialization",
|
||||
description: "pickle.loads/load deserializes arbitrary objects",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "pickle")
|
||||
|
|
@ -96,7 +96,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "py.deser.yaml_load",
|
||||
description: "yaml.load() without SafeLoader — arbitrary object instantiation",
|
||||
description: "yaml.load() without SafeLoader instantiates arbitrary objects",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "yaml")
|
||||
|
|
@ -109,7 +109,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "py.deser.shelve_open",
|
||||
description: "shelve.open() — pickle-backed deserialization",
|
||||
description: "shelve.open() performs pickle-backed deserialization",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "shelve")
|
||||
|
|
@ -123,7 +123,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier B: SQL injection (format/concat heuristic) ────────────────
|
||||
Pattern {
|
||||
id: "py.sqli.execute_format",
|
||||
description: "cursor.execute with string concatenation — SQL injection risk",
|
||||
description: "cursor.execute with string concatenation risks SQL injection",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
attribute: (identifier) @fn (#eq? @fn "execute"))
|
||||
|
|
@ -138,7 +138,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Weak crypto ────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.crypto.md5",
|
||||
description: "hashlib.md5() — weak hash algorithm",
|
||||
description: "hashlib.md5() uses a weak hash algorithm",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "hashlib")
|
||||
|
|
@ -151,7 +151,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "py.crypto.sha1",
|
||||
description: "hashlib.sha1() — weak hash algorithm",
|
||||
description: "hashlib.sha1() uses a weak hash algorithm",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
object: (identifier) @pkg (#eq? @pkg "hashlib")
|
||||
|
|
@ -165,7 +165,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Template injection ─────────────────────────────────────
|
||||
Pattern {
|
||||
id: "py.xss.jinja_from_string",
|
||||
description: "jinja2.Template from string — potential template injection",
|
||||
description: "jinja2.Template from string risks template injection",
|
||||
query: r#"(call
|
||||
function: (attribute
|
||||
attribute: (identifier) @fn (#eq? @fn "from_string")))
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.code_exec.eval",
|
||||
description: "Kernel#eval — dynamic code execution",
|
||||
description: "Kernel#eval runs dynamic code",
|
||||
query: r#"(call (identifier) @id (#eq? @id "eval")) @vuln"#,
|
||||
severity: Severity::High,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -20,7 +20,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rb.code_exec.instance_eval",
|
||||
description: "instance_eval — evaluates string in object context",
|
||||
description: "instance_eval evaluates a string in object context",
|
||||
query: r#"(call
|
||||
method: (identifier) @id (#eq? @id "instance_eval"))
|
||||
@vuln"#,
|
||||
|
|
@ -31,7 +31,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rb.code_exec.class_eval",
|
||||
description: "class_eval / module_eval — evaluates string in class context",
|
||||
description: "class_eval / module_eval evaluates a string in class context",
|
||||
query: r#"(call
|
||||
method: (identifier) @id (#match? @id "^(class_eval|module_eval)$"))
|
||||
@vuln"#,
|
||||
|
|
@ -53,7 +53,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Shell execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.cmdi.system_interp",
|
||||
description: "system/exec call — command execution risk",
|
||||
description: "system/exec call runs a command",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#match? @m "^(system|exec)$"))
|
||||
@vuln"#,
|
||||
|
|
@ -65,7 +65,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Deserialization ────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.deser.yaml_load",
|
||||
description: "YAML.load — arbitrary object deserialization (use safe_load instead)",
|
||||
description: "YAML.load deserializes arbitrary objects (use safe_load instead)",
|
||||
query: r#"(call
|
||||
receiver: (constant) @recv (#match? @recv "^(YAML|Psych)$")
|
||||
method: (identifier) @m (#eq? @m "load"))
|
||||
|
|
@ -77,7 +77,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rb.deser.marshal_load",
|
||||
description: "Marshal.load — arbitrary Ruby object deserialization",
|
||||
description: "Marshal.load deserializes arbitrary Ruby objects",
|
||||
query: r#"(call
|
||||
receiver: (constant) @recv (#eq? @recv "Marshal")
|
||||
method: (identifier) @m (#eq? @m "load"))
|
||||
|
|
@ -90,7 +90,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Reflection ─────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.reflection.send_dynamic",
|
||||
description: "send() with non-symbol argument — arbitrary method dispatch",
|
||||
description: "send() with a non-symbol argument is arbitrary method dispatch",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#eq? @m "send")
|
||||
arguments: (argument_list
|
||||
|
|
@ -103,7 +103,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rb.reflection.constantize",
|
||||
description: "constantize / safe_constantize — dynamic class resolution",
|
||||
description: "constantize / safe_constantize performs dynamic class resolution",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#match? @m "^(constantize|safe_constantize)$"))
|
||||
@vuln"#,
|
||||
|
|
@ -115,7 +115,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: SSRF ───────────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.ssrf.open_uri",
|
||||
description: "Kernel#open with HTTP URL — SSRF via open-uri",
|
||||
description: "Kernel#open with an HTTP URL is an SSRF sink via open-uri",
|
||||
query: r#"(call
|
||||
method: (identifier) @m (#eq? @m "open")
|
||||
arguments: (argument_list
|
||||
|
|
@ -129,7 +129,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Crypto ─────────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rb.crypto.md5",
|
||||
description: "Digest::MD5 — weak hash algorithm",
|
||||
description: "Digest::MD5 is a weak hash algorithm",
|
||||
query: r#"(scope_resolution
|
||||
name: (constant) @c (#eq? @c "MD5"))
|
||||
@vuln"#,
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Memory Safety (unsafe) ─────────────────────────────────
|
||||
Pattern {
|
||||
id: "rs.memory.transmute",
|
||||
description: "std::mem::transmute — unchecked type reinterpretation",
|
||||
description: "std::mem::transmute performs unchecked type reinterpretation",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "mem")
|
||||
|
|
@ -24,7 +24,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.memory.copy_nonoverlapping",
|
||||
description: "ptr::copy_nonoverlapping — raw pointer memcpy",
|
||||
description: "ptr::copy_nonoverlapping is a raw pointer memcpy",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "ptr")
|
||||
|
|
@ -37,7 +37,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.memory.get_unchecked",
|
||||
description: "get_unchecked / get_unchecked_mut — unchecked indexing",
|
||||
description: "get_unchecked / get_unchecked_mut performs unchecked indexing",
|
||||
query: r#"(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @m
|
||||
|
|
@ -50,7 +50,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.memory.mem_zeroed",
|
||||
description: "std::mem::zeroed — zero-initialised memory may be UB for non-POD types",
|
||||
description: "std::mem::zeroed is UB for non-POD types since the zero pattern may not be a valid value",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "mem")
|
||||
|
|
@ -63,7 +63,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.memory.ptr_read",
|
||||
description: "ptr::read / ptr::read_volatile — raw pointer dereference",
|
||||
description: "ptr::read / ptr::read_volatile dereferences a raw pointer",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "ptr")
|
||||
|
|
@ -77,7 +77,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Code quality / robustness ──────────────────────────────
|
||||
Pattern {
|
||||
id: "rs.quality.unsafe_block",
|
||||
description: "unsafe block — manual memory safety obligation",
|
||||
description: "unsafe block carries a manual memory safety obligation",
|
||||
query: "(unsafe_block) @vuln",
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -98,7 +98,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.quality.unwrap",
|
||||
description: ".unwrap() — panics on None/Err",
|
||||
description: ".unwrap() panics on None/Err",
|
||||
query: r#"(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @name (#eq? @name "unwrap")))
|
||||
|
|
@ -110,7 +110,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.quality.expect",
|
||||
description: ".expect() — panics on None/Err",
|
||||
description: ".expect() panics on None/Err",
|
||||
query: r#"(call_expression
|
||||
function: (field_expression
|
||||
field: (field_identifier) @name (#eq? @name "expect")))
|
||||
|
|
@ -144,7 +144,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Narrowing cast ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "rs.memory.narrow_cast",
|
||||
description: "`as` cast to 8/16-bit integer — possible truncation",
|
||||
description: "`as` cast to 8/16-bit integer can truncate",
|
||||
query: r#"(type_cast_expression
|
||||
type: (primitive_type) @to
|
||||
(#match? @to "^(u8|i8|u16|i16)$"))
|
||||
|
|
@ -156,7 +156,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "rs.memory.mem_forget",
|
||||
description: "std::mem::forget — may leak resources",
|
||||
description: "std::mem::forget can leak resources",
|
||||
query: r#"(call_expression
|
||||
function: (scoped_identifier
|
||||
path: (identifier) @p (#eq? @p "mem")
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Code execution ─────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.code_exec.eval",
|
||||
description: "eval() — dynamic code execution",
|
||||
description: "eval() runs dynamic code",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "eval"))
|
||||
@vuln"#,
|
||||
|
|
@ -21,7 +21,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.code_exec.new_function",
|
||||
description: "new Function() constructor — eval equivalent",
|
||||
description: "new Function() constructor is equivalent to eval",
|
||||
query: r#"(new_expression
|
||||
constructor: (identifier) @id (#eq? @id "Function"))
|
||||
@vuln"#,
|
||||
|
|
@ -32,7 +32,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.code_exec.settimeout_string",
|
||||
description: "setTimeout/setInterval with string argument — implicit eval",
|
||||
description: "setTimeout/setInterval with a string argument runs implicit eval",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#match? @id "^(setTimeout|setInterval)$")
|
||||
arguments: (arguments (string) @code))
|
||||
|
|
@ -45,7 +45,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: XSS sinks ──────────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.xss.document_write",
|
||||
description: "document.write() — XSS sink",
|
||||
description: "document.write() is an XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "document")
|
||||
|
|
@ -58,7 +58,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.xss.outer_html",
|
||||
description: "Assignment to .outerHTML — XSS sink",
|
||||
description: "Assignment to .outerHTML is an XSS sink",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "outerHTML")))
|
||||
|
|
@ -70,7 +70,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.xss.insert_adjacent_html",
|
||||
description: "insertAdjacentHTML() — XSS sink",
|
||||
description: "insertAdjacentHTML() is an XSS sink",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "insertAdjacentHTML")))
|
||||
|
|
@ -97,7 +97,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.crypto.weak_hash_import",
|
||||
description: "Direct md5()/sha1() call — weak hash from imported package",
|
||||
description: "Direct md5()/sha1() call uses a weak hash from an imported package",
|
||||
query: r#"(call_expression
|
||||
function: (identifier) @id (#match? @id "^(md5|sha1)$"))
|
||||
@vuln"#,
|
||||
|
|
@ -108,7 +108,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.crypto.math_random",
|
||||
description: "Math.random() — not cryptographically secure",
|
||||
description: "Math.random() is not cryptographically secure",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
object: (identifier) @obj (#eq? @obj "Math")
|
||||
|
|
@ -136,7 +136,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: TypeScript-specific type-safety escapes ────────────────
|
||||
Pattern {
|
||||
id: "ts.quality.any_annotation",
|
||||
description: "Type annotation of `any` — disables type checking",
|
||||
description: "Type annotation of `any` disables type checking",
|
||||
query: r#"(type_annotation (predefined_type) @t (#eq? @t "any")) @vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -145,7 +145,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.quality.as_any",
|
||||
description: "Type assertion `as any` — type-safety escape hatch",
|
||||
description: "Type assertion `as any` is a type-safety escape hatch",
|
||||
query: r#"(as_expression (predefined_type) @t (#eq? @t "any")) @vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -155,7 +155,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Prototype pollution ────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.prototype.proto_assignment",
|
||||
description: "Assignment to __proto__ — prototype pollution",
|
||||
description: "Assignment to __proto__ causes prototype pollution",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "__proto__")))
|
||||
|
|
@ -168,7 +168,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Open redirect ──────────────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.xss.location_assign",
|
||||
description: "Assignment to location/location.href — open redirect",
|
||||
description: "Assignment to location/location.href is an open-redirect sink",
|
||||
query: r#"(assignment_expression
|
||||
left: (member_expression
|
||||
object: (identifier) @obj (#match? @obj "^(window|location|document)$")
|
||||
|
|
@ -196,7 +196,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Insecure session / cookie configuration ─────────────────
|
||||
Pattern {
|
||||
id: "ts.config.insecure_session_httponly",
|
||||
description: "Session cookie with httpOnly: false — allows XSS-based session theft",
|
||||
description: "Session cookie with httpOnly: false allows XSS-based session theft",
|
||||
query: r#"(pair
|
||||
key: (property_identifier) @key (#eq? @key "httpOnly")
|
||||
value: (false) @val)
|
||||
|
|
@ -208,7 +208,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
},
|
||||
Pattern {
|
||||
id: "ts.config.insecure_session_secure",
|
||||
description: "Session cookie with secure: false — cookie sent over plain HTTP",
|
||||
description: "Session cookie with secure: false sends the cookie over plain HTTP",
|
||||
query: r#"(pair
|
||||
key: (property_identifier) @key (#eq? @key "secure")
|
||||
value: (false) @val)
|
||||
|
|
@ -265,7 +265,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier A: Verbose error response ────────────────────────────────
|
||||
Pattern {
|
||||
id: "ts.config.verbose_error_response",
|
||||
description: "Error object passed to response renderer — may leak stack traces to users",
|
||||
description: "Error object passed to response renderer can leak stack traces to users",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @method
|
||||
|
|
@ -284,7 +284,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
// ── Tier B: CORS dynamic origin reflection ────────────────────────
|
||||
Pattern {
|
||||
id: "ts.config.cors_dynamic_origin",
|
||||
description: "CORS Access-Control-Allow-Origin set to dynamic value — may reflect arbitrary origins",
|
||||
description: "CORS Access-Control-Allow-Origin set to a dynamic value can reflect arbitrary origins",
|
||||
query: r#"(call_expression
|
||||
function: (member_expression
|
||||
property: (property_identifier) @method (#eq? @method "setHeader"))
|
||||
|
|
|
|||
1276
src/pointer/analysis.rs
Normal file
1276
src/pointer/analysis.rs
Normal file
File diff suppressed because it is too large
Load diff
466
src/pointer/domain.rs
Normal file
466
src/pointer/domain.rs
Normal file
|
|
@ -0,0 +1,466 @@
|
|||
//! Abstract domain for field-sensitive Steensgaard points-to.
|
||||
//!
|
||||
//! Locations are interned to compact `LocId(u32)` handles so the
|
||||
//! union-find resolver can operate on dense integer keys. Field
|
||||
//! locations are keyed structurally by `(parent_loc_id, field_id)` —
|
||||
//! interning a `Field(parent, f)` always returns the same `LocId` no
|
||||
//! matter how many times the same `(parent, f)` pair is requested.
|
||||
|
||||
use crate::cfg::BodyId;
|
||||
use crate::ssa::ir::FieldId;
|
||||
use smallvec::SmallVec;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Maximum nesting depth for `Field(...)` chains before folding to `Top`.
|
||||
///
|
||||
/// Bounds the per-body work for pathological recursive walks like
|
||||
/// `a.next.next.next.…` and matches the bound called out in the
|
||||
/// pointer-analysis prompt.
|
||||
pub const MAX_FIELD_DEPTH: u8 = 3;
|
||||
|
||||
/// Maximum members per [`PointsToSet`] before we collapse the set to
|
||||
/// the over-approximation `{Top}`. Keeps both the set and downstream
|
||||
/// constraint propagation bounded; mirrors the spirit of
|
||||
/// [`crate::ssa::heap::effective_max_pointsto`] without sharing the
|
||||
/// exact value (this analysis runs flow-insensitively across the body
|
||||
/// so its sets are typically smaller).
|
||||
pub const MAX_POINTSTO_MEMBERS: usize = 16;
|
||||
|
||||
/// Compact handle for an interned [`AbsLoc`].
|
||||
///
|
||||
/// All abstract locations referenced by a single body share one
|
||||
/// [`LocInterner`] — `LocId`s are only meaningful relative to that
|
||||
/// interner. IDs are assigned densely from 0 and are stable for the
|
||||
/// lifetime of the interner so the union-find can index parent / rank
|
||||
/// arrays directly.
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
|
||||
pub struct LocId(pub u32);
|
||||
|
||||
/// Sentinel "anywhere" location. Always `LocId(0)` — the interner
|
||||
/// reserves the first slot at construction so callers can compare
|
||||
/// against it cheaply.
|
||||
pub const LOC_TOP: LocId = LocId(0);
|
||||
|
||||
/// Abstract heap location in the points-to lattice.
|
||||
///
|
||||
/// A pointer-targets-this kind of fact. Cyclic field chains (e.g.
|
||||
/// `a.next.next.…`) are bounded by [`MAX_FIELD_DEPTH`]; once the cap
|
||||
/// is exceeded the chain folds to [`AbsLoc::Top`].
|
||||
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
|
||||
pub enum AbsLoc {
|
||||
/// "Anywhere" — the over-approximation used when precision is
|
||||
/// unrecoverable (e.g. a value sourced from outside the analysed
|
||||
/// body, or a points-to set that exceeded the cap).
|
||||
Top,
|
||||
/// Allocation site within a body, identified by the SSA value of
|
||||
/// the defining instruction. SSA guarantees a single definition
|
||||
/// per value, so the SSA value uniquely names the allocation site.
|
||||
///
|
||||
/// `body` disambiguates allocations across bodies in the same
|
||||
/// file. The interned `u32` is the `SsaValue.0` of the call /
|
||||
/// constructor instruction.
|
||||
Alloc(BodyId, u32),
|
||||
/// Function parameter — the abstract identity of the value
|
||||
/// supplied by the caller for parameter `index`. The receiver
|
||||
/// (`self` / `this`) uses [`AbsLoc::SelfParam`] instead.
|
||||
Param(BodyId, usize),
|
||||
/// Implicit method receiver (`self` / `this`). Distinct from
|
||||
/// `Param(_, _)` so callers don't have to encode an "is the
|
||||
/// receiver" sentinel index.
|
||||
SelfParam(BodyId),
|
||||
/// Heap field of a parent location: `parent.f`. `parent` is
|
||||
/// itself a [`LocId`] — chains of field accesses produce nested
|
||||
/// `Field` locations. Depth is bounded by [`MAX_FIELD_DEPTH`].
|
||||
Field { parent: LocId, field: FieldId },
|
||||
}
|
||||
|
||||
/// Per-body interner mapping [`AbsLoc`] → dense [`LocId`].
|
||||
///
|
||||
/// Owns the canonical store: callers only hold [`LocId`]s and resolve
|
||||
/// them through the interner. The first slot ([`LOC_TOP`]) is always
|
||||
/// `Top`, so the union-find resolver can short-circuit "is this Top?"
|
||||
/// queries with a single integer compare.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct LocInterner {
|
||||
/// Locations indexed by `LocId.0`.
|
||||
locs: Vec<AbsLoc>,
|
||||
/// Reverse lookup: `(BodyId, alloc-ssa-value)` → `LocId`.
|
||||
alloc_lookup: HashMap<(BodyId, u32), LocId>,
|
||||
/// Reverse lookup: `(BodyId, param-index)` → `LocId`.
|
||||
param_lookup: HashMap<(BodyId, usize), LocId>,
|
||||
/// Reverse lookup for `SelfParam`.
|
||||
self_param_lookup: HashMap<BodyId, LocId>,
|
||||
/// Reverse lookup for `Field { parent, field }`.
|
||||
field_lookup: HashMap<(LocId, FieldId), LocId>,
|
||||
/// Interned depth of each location (0 for non-Field). Used to
|
||||
/// fold deeply-nested `Field` chains to [`AbsLoc::Top`].
|
||||
depths: Vec<u8>,
|
||||
}
|
||||
|
||||
impl Default for LocInterner {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl LocInterner {
|
||||
/// Create a fresh interner with [`LOC_TOP`] pre-installed.
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
locs: vec![AbsLoc::Top],
|
||||
alloc_lookup: HashMap::new(),
|
||||
param_lookup: HashMap::new(),
|
||||
self_param_lookup: HashMap::new(),
|
||||
field_lookup: HashMap::new(),
|
||||
depths: vec![0],
|
||||
}
|
||||
}
|
||||
|
||||
/// Total number of interned locations (including the reserved
|
||||
/// [`LOC_TOP`] slot).
|
||||
#[inline]
|
||||
pub fn len(&self) -> usize {
|
||||
self.locs.len()
|
||||
}
|
||||
|
||||
/// Whether the interner only holds the reserved [`LOC_TOP`] slot.
|
||||
#[inline]
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.locs.len() <= 1
|
||||
}
|
||||
|
||||
/// Resolve a [`LocId`] back to its [`AbsLoc`]. Panics on out-of-
|
||||
/// range ids — only ids the interner produced are valid.
|
||||
#[inline]
|
||||
pub fn resolve(&self, id: LocId) -> &AbsLoc {
|
||||
&self.locs[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Depth of an interned location. `0` for non-`Field` locations;
|
||||
/// `1 + depth(parent)` for `Field { parent, .. }`.
|
||||
#[inline]
|
||||
pub fn depth(&self, id: LocId) -> u8 {
|
||||
self.depths[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Intern an `Alloc` location.
|
||||
pub fn intern_alloc(&mut self, body: BodyId, ssa_value: u32) -> LocId {
|
||||
if let Some(&id) = self.alloc_lookup.get(&(body, ssa_value)) {
|
||||
return id;
|
||||
}
|
||||
let id = self.push(AbsLoc::Alloc(body, ssa_value), 0);
|
||||
self.alloc_lookup.insert((body, ssa_value), id);
|
||||
id
|
||||
}
|
||||
|
||||
/// Intern a positional `Param` location.
|
||||
pub fn intern_param(&mut self, body: BodyId, index: usize) -> LocId {
|
||||
if let Some(&id) = self.param_lookup.get(&(body, index)) {
|
||||
return id;
|
||||
}
|
||||
let id = self.push(AbsLoc::Param(body, index), 0);
|
||||
self.param_lookup.insert((body, index), id);
|
||||
id
|
||||
}
|
||||
|
||||
/// Intern a `SelfParam` location for the given body.
|
||||
pub fn intern_self_param(&mut self, body: BodyId) -> LocId {
|
||||
if let Some(&id) = self.self_param_lookup.get(&body) {
|
||||
return id;
|
||||
}
|
||||
let id = self.push(AbsLoc::SelfParam(body), 0);
|
||||
self.self_param_lookup.insert(body, id);
|
||||
id
|
||||
}
|
||||
|
||||
/// Intern a `Field { parent, field }` location. Returns
|
||||
/// [`LOC_TOP`] when `parent` is `Top` or when the resulting depth
|
||||
/// would exceed [`MAX_FIELD_DEPTH`].
|
||||
pub fn intern_field(&mut self, parent: LocId, field: FieldId) -> LocId {
|
||||
if parent == LOC_TOP {
|
||||
return LOC_TOP;
|
||||
}
|
||||
let parent_depth = self.depth(parent);
|
||||
if parent_depth >= MAX_FIELD_DEPTH {
|
||||
return LOC_TOP;
|
||||
}
|
||||
let key = (parent, field);
|
||||
if let Some(&id) = self.field_lookup.get(&key) {
|
||||
return id;
|
||||
}
|
||||
let id = self.push(AbsLoc::Field { parent, field }, parent_depth + 1);
|
||||
self.field_lookup.insert(key, id);
|
||||
id
|
||||
}
|
||||
|
||||
fn push(&mut self, loc: AbsLoc, depth: u8) -> LocId {
|
||||
let id = LocId(self.locs.len() as u32);
|
||||
self.locs.push(loc);
|
||||
self.depths.push(depth);
|
||||
id
|
||||
}
|
||||
}
|
||||
|
||||
/// Coarse classification of a value's points-to set, used by consumers
|
||||
/// (Phase 2: resource lifecycle) that don't need full set membership but
|
||||
/// do need to know "is this value's heap identity a *field* of some
|
||||
/// other value, or does it stand on its own?".
|
||||
///
|
||||
/// The classifier is intentionally narrow: only [`PtrProxyHint::FieldOnly`]
|
||||
/// is interesting to today's consumers, every other shape (empty, root,
|
||||
/// `Top`, mixed) collapses to [`PtrProxyHint::Other`] so the consumer
|
||||
/// keeps its existing behaviour.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
pub enum PtrProxyHint {
|
||||
/// Every member of the points-to set is an [`AbsLoc::Field`]. The
|
||||
/// value is a sub-object alias — e.g. `m` in `m := c.mu`.
|
||||
FieldOnly,
|
||||
/// Anything else: the set is empty, contains a root location
|
||||
/// ([`AbsLoc::SelfParam`] / [`AbsLoc::Param`] / [`AbsLoc::Alloc`]),
|
||||
/// contains [`AbsLoc::Top`], or mixes fields with roots. Consumers
|
||||
/// fall back to their default behaviour.
|
||||
Other,
|
||||
}
|
||||
|
||||
/// Bounded points-to set: a small sorted vector of [`LocId`]s.
|
||||
///
|
||||
/// "Bounded" means the set silently collapses to `{Top}` on overflow;
|
||||
/// downstream consumers treat `Top`-containing sets as
|
||||
/// over-approximations exactly the same way [`AbsLoc::Top`] is treated
|
||||
/// at the singleton level.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct PointsToSet {
|
||||
/// Sorted, deduped list of locations. When the cap is exceeded
|
||||
/// the set is replaced by `[LOC_TOP]`.
|
||||
ids: SmallVec<[LocId; 4]>,
|
||||
}
|
||||
|
||||
impl Default for PointsToSet {
|
||||
fn default() -> Self {
|
||||
Self::empty()
|
||||
}
|
||||
}
|
||||
|
||||
impl PointsToSet {
|
||||
/// Empty set — the value points to nothing tracked by the
|
||||
/// analysis (e.g. a scalar constant).
|
||||
pub fn empty() -> Self {
|
||||
Self {
|
||||
ids: SmallVec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Singleton set wrapping `id`.
|
||||
pub fn singleton(id: LocId) -> Self {
|
||||
let mut ids = SmallVec::new();
|
||||
ids.push(id);
|
||||
Self { ids }
|
||||
}
|
||||
|
||||
/// `{Top}` — the universal over-approximation.
|
||||
pub fn top() -> Self {
|
||||
Self::singleton(LOC_TOP)
|
||||
}
|
||||
|
||||
/// True when the set contains [`LOC_TOP`] (i.e. has saturated to
|
||||
/// the over-approximation).
|
||||
pub fn is_top(&self) -> bool {
|
||||
self.ids.contains(&LOC_TOP)
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.ids.is_empty()
|
||||
}
|
||||
|
||||
pub fn len(&self) -> usize {
|
||||
self.ids.len()
|
||||
}
|
||||
|
||||
/// Iterate over members in sorted order.
|
||||
pub fn iter(&self) -> impl Iterator<Item = LocId> + '_ {
|
||||
self.ids.iter().copied()
|
||||
}
|
||||
|
||||
/// Whether `id` is one of the set members (or the set is `Top`).
|
||||
pub fn contains(&self, id: LocId) -> bool {
|
||||
if self.is_top() {
|
||||
return true;
|
||||
}
|
||||
self.ids.binary_search(&id).is_ok()
|
||||
}
|
||||
|
||||
/// Insert `id`, maintaining sort/dedup. Saturates to `{Top}`
|
||||
/// when the set would exceed [`MAX_POINTSTO_MEMBERS`].
|
||||
pub fn insert(&mut self, id: LocId) {
|
||||
if self.is_top() {
|
||||
return;
|
||||
}
|
||||
if id == LOC_TOP {
|
||||
self.ids.clear();
|
||||
self.ids.push(LOC_TOP);
|
||||
return;
|
||||
}
|
||||
match self.ids.binary_search(&id) {
|
||||
Ok(_) => {}
|
||||
Err(pos) => {
|
||||
if self.ids.len() >= MAX_POINTSTO_MEMBERS {
|
||||
self.ids.clear();
|
||||
self.ids.push(LOC_TOP);
|
||||
} else {
|
||||
self.ids.insert(pos, id);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Set-union, in place. Returns `true` when `self` changed —
|
||||
/// the constraint solver uses the bit to decide whether the
|
||||
/// containing equivalence class needs another pass.
|
||||
pub fn union_in_place(&mut self, other: &PointsToSet) -> bool {
|
||||
if self.is_top() {
|
||||
return false;
|
||||
}
|
||||
if other.is_top() {
|
||||
let was_top = self.is_top();
|
||||
self.ids.clear();
|
||||
self.ids.push(LOC_TOP);
|
||||
return !was_top;
|
||||
}
|
||||
let mut changed = false;
|
||||
for id in other.iter() {
|
||||
if id == LOC_TOP {
|
||||
let was_top = self.is_top();
|
||||
self.ids.clear();
|
||||
self.ids.push(LOC_TOP);
|
||||
return !was_top;
|
||||
}
|
||||
match self.ids.binary_search(&id) {
|
||||
Ok(_) => {}
|
||||
Err(pos) => {
|
||||
if self.ids.len() >= MAX_POINTSTO_MEMBERS {
|
||||
self.ids.clear();
|
||||
self.ids.push(LOC_TOP);
|
||||
return true;
|
||||
}
|
||||
self.ids.insert(pos, id);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
changed
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
fn body() -> BodyId {
|
||||
BodyId(0)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn loc_top_is_zero() {
|
||||
let interner = LocInterner::new();
|
||||
assert_eq!(interner.len(), 1);
|
||||
assert_eq!(interner.resolve(LOC_TOP), &AbsLoc::Top);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn alloc_intern_dedupes() {
|
||||
let mut interner = LocInterner::new();
|
||||
let a = interner.intern_alloc(body(), 7);
|
||||
let b = interner.intern_alloc(body(), 7);
|
||||
let c = interner.intern_alloc(body(), 8);
|
||||
assert_eq!(a, b);
|
||||
assert_ne!(a, c);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn param_intern_dedupes_by_index() {
|
||||
let mut interner = LocInterner::new();
|
||||
let p0 = interner.intern_param(body(), 0);
|
||||
let p1 = interner.intern_param(body(), 1);
|
||||
let p0_again = interner.intern_param(body(), 0);
|
||||
assert_eq!(p0, p0_again);
|
||||
assert_ne!(p0, p1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_intern_dedupes_structurally() {
|
||||
let mut interner = LocInterner::new();
|
||||
let parent = interner.intern_self_param(body());
|
||||
let f = FieldId(7);
|
||||
let a = interner.intern_field(parent, f);
|
||||
let b = interner.intern_field(parent, f);
|
||||
assert_eq!(a, b, "same parent + same field id ⇒ same loc id");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_chain_depth_bounded() {
|
||||
let mut interner = LocInterner::new();
|
||||
let mut cur = interner.intern_self_param(body());
|
||||
let f = FieldId(1);
|
||||
for _ in 0..MAX_FIELD_DEPTH {
|
||||
cur = interner.intern_field(cur, f);
|
||||
assert_ne!(cur, LOC_TOP, "depth ≤ MAX should not fold");
|
||||
}
|
||||
let folded = interner.intern_field(cur, f);
|
||||
assert_eq!(folded, LOC_TOP, "exceeding MAX_FIELD_DEPTH folds to Top");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_of_top_is_top() {
|
||||
let mut interner = LocInterner::new();
|
||||
let folded = interner.intern_field(LOC_TOP, FieldId(0));
|
||||
assert_eq!(folded, LOC_TOP);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pointsto_set_empty_singleton_top() {
|
||||
assert!(PointsToSet::empty().is_empty());
|
||||
assert!(PointsToSet::top().is_top());
|
||||
let mut interner = LocInterner::new();
|
||||
let p = interner.intern_self_param(body());
|
||||
let s = PointsToSet::singleton(p);
|
||||
assert!(s.contains(p));
|
||||
assert!(!s.is_top());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pointsto_set_insert_and_union() {
|
||||
let mut interner = LocInterner::new();
|
||||
let p0 = interner.intern_param(body(), 0);
|
||||
let p1 = interner.intern_param(body(), 1);
|
||||
let mut a = PointsToSet::singleton(p0);
|
||||
let b = PointsToSet::singleton(p1);
|
||||
let changed = a.union_in_place(&b);
|
||||
assert!(changed);
|
||||
assert_eq!(a.len(), 2);
|
||||
assert!(a.contains(p0));
|
||||
assert!(a.contains(p1));
|
||||
// Re-union is idempotent.
|
||||
let changed2 = a.union_in_place(&b);
|
||||
assert!(!changed2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pointsto_set_saturates_to_top_on_overflow() {
|
||||
let mut interner = LocInterner::new();
|
||||
let mut s = PointsToSet::empty();
|
||||
for i in 0..(MAX_POINTSTO_MEMBERS as u32 + 4) {
|
||||
s.insert(interner.intern_alloc(body(), i));
|
||||
}
|
||||
assert!(s.is_top(), "set should collapse to {{Top}} on overflow");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pointsto_set_union_with_top_is_top() {
|
||||
let mut interner = LocInterner::new();
|
||||
let p = interner.intern_param(body(), 0);
|
||||
let mut a = PointsToSet::singleton(p);
|
||||
let changed = a.union_in_place(&PointsToSet::top());
|
||||
assert!(changed);
|
||||
assert!(a.is_top());
|
||||
}
|
||||
}
|
||||
44
src/pointer/mod.rs
Normal file
44
src/pointer/mod.rs
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
//! Field-sensitive Steensgaard alias / points-to analysis.
|
||||
//!
|
||||
//! Sibling pass to [`crate::ssa::heap`]. Where `heap.rs` tracks per-value
|
||||
//! container identity for taint propagation through container element
|
||||
//! abstractions, this module tracks **field-sensitive** points-to so the
|
||||
//! engine can distinguish a receiver from one of its sub-fields:
|
||||
//!
|
||||
//! - `c.mu.Lock()` — the lock is acquired on `Field(c, mu)`, not on `c`
|
||||
//! itself. Without this distinction the resource-lifecycle pass
|
||||
//! mis-attributes the acquire to the receiver and emits a spurious
|
||||
//! "leakable resource" finding (the gin / `context.go` FP class).
|
||||
//! - Cross-method field flow — method A writes `this.cache`, method B
|
||||
//! reads `this.cache`; both observe a shared abstract location
|
||||
//! `Field(SelfParam, cache)` only when fields have a stable identity
|
||||
//! independent of the parent value.
|
||||
//!
|
||||
//! Phase 1 of the rollout (this commit) ships the analysis but no
|
||||
//! consumer. Behaviour is unchanged whether `NYX_POINTER_ANALYSIS=1` is
|
||||
//! set or not — the analysis is opt-in and only computed when callers
|
||||
//! ask for it. Phase 2 (resource lifecycle) and Phase 3 (taint engine)
|
||||
//! will start consuming the resulting facts.
|
||||
|
||||
pub mod analysis;
|
||||
pub mod domain;
|
||||
|
||||
pub use analysis::{
|
||||
PointsToFacts, analyse_body, extract_field_points_to, is_container_read_callee_pub,
|
||||
is_container_write_callee,
|
||||
};
|
||||
pub use domain::{AbsLoc, LocId, LocInterner, PointsToSet, PtrProxyHint};
|
||||
|
||||
/// Returns whether the field-sensitive pointer analysis is enabled at runtime.
|
||||
///
|
||||
/// Default: enabled (post-Phase-6 flip on 2026-04-26). Set
|
||||
/// `NYX_POINTER_ANALYSIS=0` (or `false`) to disable for one release
|
||||
/// cycle so customer scans can compare baselines. The env-var
|
||||
/// override is removed entirely in the next release.
|
||||
#[inline]
|
||||
pub fn is_enabled() -> bool {
|
||||
!matches!(
|
||||
std::env::var("NYX_POINTER_ANALYSIS").ok().as_deref(),
|
||||
Some("0") | Some("false") | Some("FALSE")
|
||||
)
|
||||
}
|
||||
|
|
@ -1,4 +1,6 @@
|
|||
use crate::server::jobs::JobManager;
|
||||
use crate::server::models::{FilterValues, FindingSummary, FindingView};
|
||||
use crate::server::observability;
|
||||
use crate::server::progress::TimingBreakdown;
|
||||
use crate::server::routes;
|
||||
use crate::server::security::LocalServerSecurity;
|
||||
|
|
@ -41,6 +43,21 @@ pub enum ServerEvent {
|
|||
ConfigChanged,
|
||||
}
|
||||
|
||||
/// Pre-computed views over the latest scan's findings.
|
||||
///
|
||||
/// Built once per completed scan and reused across `/findings`,
|
||||
/// `/findings/summary`, `/findings/filters`, and `/overview` requests so we
|
||||
/// don't re-walk the diag list (or re-deserialize from SQLite) on every hit.
|
||||
/// The `job_id` lets readers detect a stale entry without holding a write
|
||||
/// lock on hot paths.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CachedFindings {
|
||||
pub job_id: String,
|
||||
pub views: Arc<Vec<FindingView>>,
|
||||
pub summary: Arc<FindingSummary>,
|
||||
pub filters: Arc<FilterValues>,
|
||||
}
|
||||
|
||||
/// Shared application state accessible to all route handlers.
|
||||
#[derive(Clone)]
|
||||
pub struct AppState {
|
||||
|
|
@ -52,6 +69,7 @@ pub struct AppState {
|
|||
pub job_manager: Arc<JobManager>,
|
||||
pub event_tx: broadcast::Sender<ServerEvent>,
|
||||
pub db_pool: Option<Arc<Pool<SqliteConnectionManager>>>,
|
||||
pub findings_cache: Arc<RwLock<Option<CachedFindings>>>,
|
||||
}
|
||||
|
||||
/// 50 MiB cap on request bodies — generous for config uploads, tight
|
||||
|
|
@ -83,6 +101,7 @@ pub fn build_router(state: AppState) -> Router {
|
|||
security,
|
||||
crate::server::security::guard_requests,
|
||||
))
|
||||
.layer(middleware::from_fn(observability::observe))
|
||||
.layer(CompressionLayer::new())
|
||||
.layer(SetResponseHeaderLayer::overriding(
|
||||
HeaderName::from_static("x-frame-options"),
|
||||
|
|
@ -124,6 +143,7 @@ mod tests {
|
|||
job_manager: Arc::new(JobManager::new(4, 8 * 1024 * 1024)),
|
||||
event_tx,
|
||||
db_pool: None,
|
||||
findings_cache: Arc::new(RwLock::new(None)),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -6,11 +6,17 @@
|
|||
//! analysis pipeline on a single file/function for debug inspection.
|
||||
|
||||
use crate::ast::build_cfg_for_file;
|
||||
use crate::auth_analysis::model::{
|
||||
AnalysisUnit, AuthCheck, AuthorizationModel, CallSite, RouteRegistration, SensitiveOperation,
|
||||
ValueRef,
|
||||
};
|
||||
use crate::callgraph::{CallGraph, CallGraphAnalysis};
|
||||
use crate::cfg::{Cfg, EdgeKind, FileCfg, FuncSummaries, StmtKind};
|
||||
use crate::constraint::{CompOp, ConditionExpr, ConstValue, Operand};
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
use crate::pointer::{AbsLoc, PointsToFacts};
|
||||
use crate::ssa::ir::*;
|
||||
use crate::ssa::type_facts::{TypeFactResult, TypeKind};
|
||||
use crate::ssa::{self, OptimizeResult};
|
||||
use crate::state::symbol::SymbolInterner;
|
||||
use crate::summary::GlobalSummaries;
|
||||
|
|
@ -100,6 +106,13 @@ fn label_str(l: &DataLabel) -> String {
|
|||
pub struct FunctionInfo {
|
||||
pub name: String,
|
||||
pub namespace: String,
|
||||
/// Enclosing container path (class / impl / module / outer function).
|
||||
/// Empty for free top-level functions. Surfaced so the UI can render
|
||||
/// closures as `<anon#N> [in outer_fn]`.
|
||||
pub container: String,
|
||||
/// Structural [`crate::symbol::FuncKind`] slug (`"fn"`, `"method"`,
|
||||
/// `"closure"`, ...). Lets the UI offer a closure-filter toggle.
|
||||
pub func_kind: String,
|
||||
pub param_count: usize,
|
||||
pub line: usize,
|
||||
pub source_caps: Vec<String>,
|
||||
|
|
@ -298,6 +311,7 @@ fn op_view(op: &SsaOp) -> (String, Vec<String>) {
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} => {
|
||||
let mut ops = Vec::new();
|
||||
if let Some(rv) = receiver {
|
||||
|
|
@ -320,6 +334,18 @@ fn op_view(op: &SsaOp) -> (String, Vec<String>) {
|
|||
SsaOp::CatchParam => ("CatchParam".into(), vec![]),
|
||||
SsaOp::Nop => ("Nop".into(), vec![]),
|
||||
SsaOp::Undef => ("Undef".into(), vec![]),
|
||||
// FieldProj prints field-id (resolution to name requires the
|
||||
// owning SsaBody, which the serializer does not have here).
|
||||
// Debug consumers walk to the owning body when the name matters.
|
||||
SsaOp::FieldProj {
|
||||
receiver, field, ..
|
||||
} => (
|
||||
"FieldProj".into(),
|
||||
vec![
|
||||
format!("recv=v{}", receiver.0),
|
||||
format!("field={}", field.0),
|
||||
],
|
||||
),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -753,6 +779,13 @@ pub struct FuncSummaryView {
|
|||
pub file_path: String,
|
||||
pub lang: String,
|
||||
pub namespace: String,
|
||||
/// Enclosing container path (class / impl / module / outer function).
|
||||
/// Empty for free top-level functions.
|
||||
pub container: String,
|
||||
/// Structural [`crate::symbol::FuncKind`] slug — `"fn"`, `"method"`,
|
||||
/// `"closure"`, etc. Lets the UI distinguish anonymous closures from
|
||||
/// named functions for filtering.
|
||||
pub func_kind: String,
|
||||
pub arity: Option<usize>,
|
||||
pub param_count: usize,
|
||||
pub source_caps: Vec<String>,
|
||||
|
|
@ -832,6 +865,8 @@ impl FuncSummaryView {
|
|||
file_path: summary.file_path.clone(),
|
||||
lang: format!("{:?}", key.lang),
|
||||
namespace: key.namespace.clone(),
|
||||
container: key.container.clone(),
|
||||
func_kind: key.kind.as_str().to_string(),
|
||||
arity: key.arity,
|
||||
param_count: summary.param_count,
|
||||
source_caps: cap_names(Cap::from_bits_truncate(summary.source_caps)),
|
||||
|
|
@ -864,6 +899,480 @@ fn transform_str(t: &TaintTransform) -> String {
|
|||
}
|
||||
}
|
||||
|
||||
// ── Pointer / Points-to ──────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct PointerLocationView {
|
||||
pub id: u32,
|
||||
pub kind: String,
|
||||
pub display: String,
|
||||
/// Parent location id for `Field { parent, field }` chains.
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub parent: Option<u32>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub field: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct PointerValueView {
|
||||
pub ssa_value: u32,
|
||||
pub var_name: Option<String>,
|
||||
/// `LocId`s referencing entries in [`PointerView::locations`].
|
||||
pub points_to: Vec<u32>,
|
||||
pub is_top: bool,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct PointerFieldEntryView {
|
||||
/// Parameter index, or `null` for the implicit receiver.
|
||||
pub param_index: Option<u32>,
|
||||
pub field: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct PointerView {
|
||||
pub locations: Vec<PointerLocationView>,
|
||||
pub values: Vec<PointerValueView>,
|
||||
/// Field reads attributed to params/receiver via the field-points-to
|
||||
/// extractor (Phase 5).
|
||||
pub field_reads: Vec<PointerFieldEntryView>,
|
||||
/// Field writes attributed to params/receiver via the field-points-to
|
||||
/// extractor (Phase 5).
|
||||
pub field_writes: Vec<PointerFieldEntryView>,
|
||||
/// Number of distinct interned locations beyond the reserved Top sentinel.
|
||||
pub location_count: usize,
|
||||
}
|
||||
|
||||
impl PointerView {
|
||||
pub fn from_facts(facts: &PointsToFacts, ssa: &SsaBody) -> Self {
|
||||
// Determine which LocIds are referenced by any pt set so we only
|
||||
// emit those (plus Top when referenced).
|
||||
let mut referenced: std::collections::BTreeSet<u32> = std::collections::BTreeSet::new();
|
||||
for v in 0..ssa.num_values() as u32 {
|
||||
let set = facts.pt(SsaValue(v));
|
||||
for loc in set.iter() {
|
||||
referenced.insert(loc.0);
|
||||
}
|
||||
}
|
||||
|
||||
// Build location views in interner order so parent ids land before
|
||||
// child Field locations.
|
||||
let mut locations: Vec<PointerLocationView> = Vec::new();
|
||||
for raw_id in 0..facts.interner.len() as u32 {
|
||||
if !referenced.contains(&raw_id) {
|
||||
continue;
|
||||
}
|
||||
let loc_id = crate::pointer::LocId(raw_id);
|
||||
let abs = facts.interner.resolve(loc_id);
|
||||
let (kind, display, parent, field) = match abs {
|
||||
AbsLoc::Top => ("Top".to_string(), "⊤".to_string(), None, None),
|
||||
AbsLoc::Alloc(_, ssa_v) => {
|
||||
("Alloc".to_string(), format!("alloc#v{}", ssa_v), None, None)
|
||||
}
|
||||
AbsLoc::Param(_, idx) => {
|
||||
("Param".to_string(), format!("param[{}]", idx), None, None)
|
||||
}
|
||||
AbsLoc::SelfParam(_) => ("SelfParam".to_string(), "self".to_string(), None, None),
|
||||
AbsLoc::Field { parent, field } => {
|
||||
let field_name = if *field == FieldId::ELEM {
|
||||
"<elem>".to_string()
|
||||
} else if (field.0 as usize) < ssa.field_interner.len() {
|
||||
ssa.field_interner.resolve(*field).to_string()
|
||||
} else {
|
||||
format!("#{}", field.0)
|
||||
};
|
||||
(
|
||||
"Field".to_string(),
|
||||
format!(".{}", field_name),
|
||||
Some(parent.0),
|
||||
Some(field_name),
|
||||
)
|
||||
}
|
||||
};
|
||||
locations.push(PointerLocationView {
|
||||
id: raw_id,
|
||||
kind,
|
||||
display,
|
||||
parent,
|
||||
field,
|
||||
});
|
||||
}
|
||||
|
||||
// Per-value pt sets — emit only values with non-empty sets to keep
|
||||
// the payload focused on interesting facts.
|
||||
let mut values: Vec<PointerValueView> = Vec::new();
|
||||
for v in 0..ssa.num_values() as u32 {
|
||||
let set = facts.pt(SsaValue(v));
|
||||
if set.is_empty() {
|
||||
continue;
|
||||
}
|
||||
values.push(PointerValueView {
|
||||
ssa_value: v,
|
||||
var_name: ssa
|
||||
.value_defs
|
||||
.get(v as usize)
|
||||
.and_then(|d| d.var_name.clone()),
|
||||
points_to: set.iter().map(|loc| loc.0).collect(),
|
||||
is_top: set.is_top(),
|
||||
});
|
||||
}
|
||||
|
||||
// Field reads / writes summary derived from the body + facts.
|
||||
let summary = crate::pointer::extract_field_points_to(ssa, facts);
|
||||
let to_field_entries = |entries: &[(u32, smallvec::SmallVec<[String; 2]>)]| {
|
||||
entries
|
||||
.iter()
|
||||
.flat_map(|(idx, fields)| {
|
||||
let pi = if *idx == u32::MAX { None } else { Some(*idx) };
|
||||
fields.iter().map(move |f| PointerFieldEntryView {
|
||||
param_index: pi,
|
||||
field: f.clone(),
|
||||
})
|
||||
})
|
||||
.collect()
|
||||
};
|
||||
let field_reads = to_field_entries(&summary.param_field_reads);
|
||||
let field_writes = to_field_entries(&summary.param_field_writes);
|
||||
|
||||
let location_count = facts.interner.len().saturating_sub(1);
|
||||
PointerView {
|
||||
locations,
|
||||
values,
|
||||
field_reads,
|
||||
field_writes,
|
||||
location_count,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Type Facts (standalone view) ─────────────────────────────────────────────
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct DtoFieldView {
|
||||
pub name: String,
|
||||
pub kind: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct DtoFactView {
|
||||
pub class_name: String,
|
||||
pub fields: Vec<DtoFieldView>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct TypeFactDetailView {
|
||||
pub ssa_value: u32,
|
||||
pub var_name: Option<String>,
|
||||
pub line: usize,
|
||||
/// Type kind tag — matches the [`TypeKind`] discriminant
|
||||
/// (`String`, `Int`, `HttpClient`, `Dto`, …).
|
||||
pub kind: String,
|
||||
/// True when the value is allowed to be null/None.
|
||||
pub nullable: bool,
|
||||
/// Container/class name — set for `HttpClient`, `DatabaseConnection`,
|
||||
/// `Dto`, etc. Mirrors [`TypeKind::container_name`].
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub container: Option<String>,
|
||||
/// DTO field shape, populated only when `kind == "Dto"`.
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub dto: Option<DtoFactView>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct TypeFactsView {
|
||||
pub facts: Vec<TypeFactDetailView>,
|
||||
/// Total count of values reaching the analysis (for the "X of Y" header).
|
||||
pub total_values: usize,
|
||||
/// Count of values where the inferred type is `Unknown`. Surfaced so
|
||||
/// the UI can show coverage at a glance.
|
||||
pub unknown_count: usize,
|
||||
}
|
||||
|
||||
impl TypeFactsView {
|
||||
pub fn from_optimize(opt: &OptimizeResult, ssa: &SsaBody, bytes: &[u8]) -> Self {
|
||||
Self::from_type_facts(&opt.type_facts, ssa, bytes)
|
||||
}
|
||||
|
||||
pub fn from_type_facts(tf: &TypeFactResult, ssa: &SsaBody, bytes: &[u8]) -> Self {
|
||||
let total_values = ssa.num_values();
|
||||
let unknown_count = tf
|
||||
.facts
|
||||
.values()
|
||||
.filter(|f| matches!(f.kind, TypeKind::Unknown))
|
||||
.count();
|
||||
|
||||
let mut facts: Vec<TypeFactDetailView> = tf
|
||||
.facts
|
||||
.iter()
|
||||
.filter(|(_, f)| !matches!(f.kind, TypeKind::Unknown))
|
||||
.map(|(sv, fact)| {
|
||||
// Find the defining instruction for this SSA value so we can
|
||||
// resolve its source line. Falls back to 0 when no inst
|
||||
// matches (the value lives only in `value_defs`).
|
||||
let span: (usize, usize) = ssa
|
||||
.blocks
|
||||
.iter()
|
||||
.find_map(|blk| {
|
||||
blk.phis
|
||||
.iter()
|
||||
.chain(blk.body.iter())
|
||||
.find(|i| i.value == *sv)
|
||||
.map(|i| i.span)
|
||||
})
|
||||
.unwrap_or_default();
|
||||
let line = byte_offset_to_line(bytes, span.0);
|
||||
|
||||
let dto = match &fact.kind {
|
||||
TypeKind::Dto(d) => Some(DtoFactView {
|
||||
class_name: d.class_name.clone(),
|
||||
fields: d
|
||||
.fields
|
||||
.iter()
|
||||
.map(|(name, k)| DtoFieldView {
|
||||
name: name.clone(),
|
||||
kind: type_kind_tag(k),
|
||||
})
|
||||
.collect(),
|
||||
}),
|
||||
_ => None,
|
||||
};
|
||||
|
||||
TypeFactDetailView {
|
||||
ssa_value: sv.0,
|
||||
var_name: ssa
|
||||
.value_defs
|
||||
.get(sv.0 as usize)
|
||||
.and_then(|d| d.var_name.clone()),
|
||||
line,
|
||||
kind: type_kind_tag(&fact.kind),
|
||||
nullable: fact.nullable,
|
||||
container: fact.kind.container_name(),
|
||||
dto,
|
||||
}
|
||||
})
|
||||
.collect();
|
||||
facts.sort_by_key(|v| v.ssa_value);
|
||||
|
||||
TypeFactsView {
|
||||
facts,
|
||||
total_values,
|
||||
unknown_count,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Stable string tag for a [`TypeKind`] (used by both the TypeFacts view
|
||||
/// and DTO field rendering). Uses the variant name so the UI can map
|
||||
/// each tag to a colour without parsing free-form `Debug` strings.
|
||||
fn type_kind_tag(k: &TypeKind) -> String {
|
||||
match k {
|
||||
TypeKind::String => "String".into(),
|
||||
TypeKind::Int => "Int".into(),
|
||||
TypeKind::Bool => "Bool".into(),
|
||||
TypeKind::Object => "Object".into(),
|
||||
TypeKind::Array => "Array".into(),
|
||||
TypeKind::Null => "Null".into(),
|
||||
TypeKind::Unknown => "Unknown".into(),
|
||||
TypeKind::HttpResponse => "HttpResponse".into(),
|
||||
TypeKind::DatabaseConnection => "DatabaseConnection".into(),
|
||||
TypeKind::FileHandle => "FileHandle".into(),
|
||||
TypeKind::Url => "Url".into(),
|
||||
TypeKind::HttpClient => "HttpClient".into(),
|
||||
TypeKind::LocalCollection => "LocalCollection".into(),
|
||||
TypeKind::Dto(_) => "Dto".into(),
|
||||
}
|
||||
}
|
||||
|
||||
// ── Auth Analysis ────────────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthValueRefView {
|
||||
pub source_kind: String,
|
||||
pub name: String,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub base: Option<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub field: Option<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub index: Option<String>,
|
||||
pub line: usize,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthCheckView {
|
||||
pub kind: String,
|
||||
pub callee: String,
|
||||
pub line: usize,
|
||||
pub subjects: Vec<AuthValueRefView>,
|
||||
pub args: Vec<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub condition_text: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthOperationView {
|
||||
pub kind: String,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub sink_class: Option<String>,
|
||||
pub callee: String,
|
||||
pub line: usize,
|
||||
pub text: String,
|
||||
pub subjects: Vec<AuthValueRefView>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthCallSiteView {
|
||||
pub name: String,
|
||||
pub line: usize,
|
||||
pub args: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthUnitView {
|
||||
pub kind: String,
|
||||
pub name: Option<String>,
|
||||
pub line: usize,
|
||||
pub params: Vec<String>,
|
||||
pub auth_checks: Vec<AuthCheckView>,
|
||||
pub operations: Vec<AuthOperationView>,
|
||||
pub call_sites: Vec<AuthCallSiteView>,
|
||||
pub self_actor_vars: Vec<String>,
|
||||
pub typed_bounded_vars: Vec<String>,
|
||||
pub authorized_sql_vars: Vec<String>,
|
||||
pub const_bound_vars: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthRouteView {
|
||||
pub framework: String,
|
||||
pub method: String,
|
||||
pub path: String,
|
||||
pub middleware: Vec<String>,
|
||||
pub handler_params: Vec<String>,
|
||||
pub line: usize,
|
||||
pub unit_idx: usize,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct AuthAnalysisView {
|
||||
pub routes: Vec<AuthRouteView>,
|
||||
pub units: Vec<AuthUnitView>,
|
||||
/// Whether the auth-analysis rule set is enabled for the file's
|
||||
/// language. When `false`, the model is intentionally empty and the
|
||||
/// UI should surface that the analysis is skipped (not failing).
|
||||
pub enabled: bool,
|
||||
}
|
||||
|
||||
impl AuthAnalysisView {
|
||||
pub fn from_model(model: &AuthorizationModel, bytes: &[u8], enabled: bool) -> Self {
|
||||
let routes = model.routes.iter().map(|r| route_view(r, bytes)).collect();
|
||||
let units = model.units.iter().map(|u| unit_view(u, bytes)).collect();
|
||||
AuthAnalysisView {
|
||||
routes,
|
||||
units,
|
||||
enabled,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn value_ref_view(vr: &ValueRef, bytes: &[u8]) -> AuthValueRefView {
|
||||
AuthValueRefView {
|
||||
source_kind: format!("{:?}", vr.source_kind),
|
||||
name: vr.name.clone(),
|
||||
base: vr.base.clone(),
|
||||
field: vr.field.clone(),
|
||||
index: vr.index.clone(),
|
||||
line: byte_offset_to_line(bytes, vr.span.0),
|
||||
}
|
||||
}
|
||||
|
||||
fn auth_check_view(c: &AuthCheck, bytes: &[u8]) -> AuthCheckView {
|
||||
AuthCheckView {
|
||||
kind: format!("{:?}", c.kind),
|
||||
callee: c.callee.clone(),
|
||||
line: c.line,
|
||||
subjects: c
|
||||
.subjects
|
||||
.iter()
|
||||
.map(|s| value_ref_view(s, bytes))
|
||||
.collect(),
|
||||
args: c.args.clone(),
|
||||
condition_text: c.condition_text.clone(),
|
||||
}
|
||||
}
|
||||
|
||||
fn operation_view(op: &SensitiveOperation, bytes: &[u8]) -> AuthOperationView {
|
||||
AuthOperationView {
|
||||
kind: format!("{:?}", op.kind),
|
||||
sink_class: op.sink_class.map(|c| format!("{:?}", c)),
|
||||
callee: op.callee.clone(),
|
||||
line: op.line,
|
||||
text: op.text.clone(),
|
||||
subjects: op
|
||||
.subjects
|
||||
.iter()
|
||||
.map(|s| value_ref_view(s, bytes))
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
|
||||
fn call_site_view(c: &CallSite, bytes: &[u8]) -> AuthCallSiteView {
|
||||
AuthCallSiteView {
|
||||
name: c.name.clone(),
|
||||
line: byte_offset_to_line(bytes, c.span.0),
|
||||
args: c.args.clone(),
|
||||
}
|
||||
}
|
||||
|
||||
fn unit_view(unit: &AnalysisUnit, bytes: &[u8]) -> AuthUnitView {
|
||||
let mut self_actor_vars: Vec<String> = unit.self_actor_vars.iter().cloned().collect();
|
||||
self_actor_vars.sort();
|
||||
let mut typed_bounded_vars: Vec<String> = unit.typed_bounded_vars.iter().cloned().collect();
|
||||
typed_bounded_vars.sort();
|
||||
let mut authorized_sql_vars: Vec<String> = unit.authorized_sql_vars.iter().cloned().collect();
|
||||
authorized_sql_vars.sort();
|
||||
let mut const_bound_vars: Vec<String> = unit.const_bound_vars.iter().cloned().collect();
|
||||
const_bound_vars.sort();
|
||||
|
||||
AuthUnitView {
|
||||
kind: format!("{:?}", unit.kind),
|
||||
name: unit.name.clone(),
|
||||
line: unit.line,
|
||||
params: unit.params.clone(),
|
||||
auth_checks: unit
|
||||
.auth_checks
|
||||
.iter()
|
||||
.map(|c| auth_check_view(c, bytes))
|
||||
.collect(),
|
||||
operations: unit
|
||||
.operations
|
||||
.iter()
|
||||
.map(|op| operation_view(op, bytes))
|
||||
.collect(),
|
||||
call_sites: unit
|
||||
.call_sites
|
||||
.iter()
|
||||
.map(|c| call_site_view(c, bytes))
|
||||
.collect(),
|
||||
self_actor_vars,
|
||||
typed_bounded_vars,
|
||||
authorized_sql_vars,
|
||||
const_bound_vars,
|
||||
}
|
||||
}
|
||||
|
||||
fn route_view(r: &RouteRegistration, _bytes: &[u8]) -> AuthRouteView {
|
||||
AuthRouteView {
|
||||
framework: format!("{:?}", r.framework),
|
||||
method: format!("{:?}", r.method),
|
||||
path: r.path.clone(),
|
||||
middleware: r.middleware.clone(),
|
||||
handler_params: r.handler_params.clone(),
|
||||
line: r.line,
|
||||
unit_idx: r.unit_idx,
|
||||
}
|
||||
}
|
||||
|
||||
// ═════════════════════════════════════════════════════════════════════════════
|
||||
// On-demand analysis pipeline
|
||||
// ═════════════════════════════════════════════════════════════════════════════
|
||||
|
|
@ -914,6 +1423,8 @@ pub fn function_list(analysis: &FileAnalysis) -> Vec<FunctionInfo> {
|
|||
.map(|(key, summary)| FunctionInfo {
|
||||
name: key.name.clone(),
|
||||
namespace: key.namespace.clone(),
|
||||
container: key.container.clone(),
|
||||
func_kind: key.kind.as_str().to_string(),
|
||||
param_count: summary.param_count,
|
||||
line: byte_offset_to_line(&analysis.bytes, analysis.cfg()[summary.entry].ast.span.0),
|
||||
source_caps: cap_names(summary.source_caps),
|
||||
|
|
@ -924,10 +1435,16 @@ pub fn function_list(analysis: &FileAnalysis) -> Vec<FunctionInfo> {
|
|||
}
|
||||
|
||||
/// Lower a single function to SSA and optimize it.
|
||||
pub fn analyse_function_ssa(
|
||||
analysis: &FileAnalysis,
|
||||
///
|
||||
/// Returns the per-function body graph alongside the SSA. SSA is lowered
|
||||
/// against `body.graph`, whose `NodeIndex` space is body-local — the file's
|
||||
/// top-level CFG (`analysis.cfg()`) has a different index space, so any
|
||||
/// downstream analysis that indexes by `inst.cfg_node` must use the returned
|
||||
/// `&Cfg`, not `analysis.cfg()`.
|
||||
pub fn analyse_function_ssa<'a>(
|
||||
analysis: &'a FileAnalysis,
|
||||
func_name: &str,
|
||||
) -> Result<(SsaBody, OptimizeResult), StatusCode> {
|
||||
) -> Result<(SsaBody, OptimizeResult, &'a Cfg), StatusCode> {
|
||||
// Find the function body by name from the per-body CFGs.
|
||||
let body = analysis
|
||||
.file_cfg
|
||||
|
|
@ -945,9 +1462,48 @@ pub fn analyse_function_ssa(
|
|||
);
|
||||
|
||||
let mut ssa = ssa_result.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
let opt = ssa::optimize_ssa(&mut ssa, &body.graph, Some(analysis.lang));
|
||||
let opt = ssa::optimize_ssa_with_param_types(
|
||||
&mut ssa,
|
||||
&body.graph,
|
||||
Some(analysis.lang),
|
||||
&body.meta.param_types,
|
||||
);
|
||||
|
||||
Ok((ssa, opt))
|
||||
Ok((ssa, opt, &body.graph))
|
||||
}
|
||||
|
||||
/// Lower a function and run the field-sensitive Steensgaard pointer
|
||||
/// analysis on its body. Returns the SSA body alongside the resulting
|
||||
/// [`PointsToFacts`] so the debug view can attribute names to SSA values.
|
||||
pub fn analyse_function_pointer(
|
||||
analysis: &FileAnalysis,
|
||||
func_name: &str,
|
||||
) -> Result<(SsaBody, PointsToFacts), StatusCode> {
|
||||
let body = analysis
|
||||
.file_cfg
|
||||
.bodies
|
||||
.iter()
|
||||
.find(|b| b.meta.name.as_deref() == Some(func_name))
|
||||
.ok_or(StatusCode::NOT_FOUND)?;
|
||||
|
||||
let ssa_result = crate::ssa::lower::lower_to_ssa_with_params(
|
||||
&body.graph,
|
||||
body.entry,
|
||||
Some(func_name),
|
||||
false,
|
||||
&body.meta.params,
|
||||
);
|
||||
|
||||
let mut ssa = ssa_result.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
let _opt = ssa::optimize_ssa_with_param_types(
|
||||
&mut ssa,
|
||||
&body.graph,
|
||||
Some(analysis.lang),
|
||||
&body.meta.param_types,
|
||||
);
|
||||
|
||||
let facts = crate::pointer::analyse_body(&ssa, body.meta.id);
|
||||
Ok((ssa, facts))
|
||||
}
|
||||
|
||||
/// Run taint analysis on a function's SSA body.
|
||||
|
|
@ -999,6 +1555,7 @@ pub fn analyse_function_taint(
|
|||
static_map: None,
|
||||
auto_seed_handler_params: matches!(lang, Lang::JavaScript | Lang::TypeScript),
|
||||
cross_file_bodies: global_summaries.and_then(|gs| gs.bodies_by_key()),
|
||||
pointer_facts: None,
|
||||
};
|
||||
|
||||
crate::taint::ssa_transfer::run_ssa_taint_full_with_exits(ssa, cfg, &transfer)
|
||||
|
|
@ -1078,6 +1635,31 @@ pub fn analyse_file_summaries(
|
|||
Ok(global)
|
||||
}
|
||||
|
||||
/// Run the file-level authorization extraction pipeline for the debug UI.
|
||||
///
|
||||
/// Returns the structured `AuthorizationModel` (routes, units, sensitive
|
||||
/// operations, auth checks) plus the file bytes and an `enabled` flag —
|
||||
/// the bytes drive line-number resolution in the view, and `enabled`
|
||||
/// surfaces "auth analysis is off for this language" without conflating
|
||||
/// it with an empty result.
|
||||
pub fn analyse_file_auth(
|
||||
file_path: &Path,
|
||||
config: &Config,
|
||||
) -> Result<(AuthorizationModel, Vec<u8>, bool), StatusCode> {
|
||||
let bytes = std::fs::read(file_path).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
let model = crate::ast::extract_auth_model_for_debug(file_path, config)
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
|
||||
.ok_or(StatusCode::BAD_REQUEST)?;
|
||||
// Determine whether the auth rules were actually enabled for this
|
||||
// file's language — `extract_auth_model_for_debug` returns an empty
|
||||
// model both when the rules are disabled and when the file just
|
||||
// happens to have no routes. The view distinguishes the two so the
|
||||
// UI can show "analysis disabled" instead of "no routes found".
|
||||
let lang_slug = crate::ast::lang_slug_for_path(file_path).unwrap_or("");
|
||||
let rules = crate::auth_analysis::config::build_auth_rules(config, lang_slug);
|
||||
Ok((model, bytes, rules.enabled))
|
||||
}
|
||||
|
||||
/// Format a `ConditionExpr` as a human-readable string.
|
||||
fn format_condition_expr(cond: &ConditionExpr) -> String {
|
||||
match cond {
|
||||
|
|
@ -1150,7 +1732,7 @@ function demo() {
|
|||
|
||||
let config = Config::default();
|
||||
let analysis = analyse_file(&path, &config).expect("file should analyse");
|
||||
let (ssa, opt) =
|
||||
let (ssa, opt, _cfg) =
|
||||
analyse_function_ssa(&analysis, "demo").expect("function should lower to SSA");
|
||||
let body = analysis
|
||||
.file_cfg
|
||||
|
|
@ -1205,7 +1787,7 @@ function sink() {
|
|||
|
||||
let config = Config::default();
|
||||
let analysis = analyse_file(&path, &config).expect("file should analyse");
|
||||
let (ssa, opt) =
|
||||
let (ssa, opt, _cfg) =
|
||||
analyse_function_ssa(&analysis, "sink").expect("function should lower to SSA");
|
||||
let body = analysis
|
||||
.file_cfg
|
||||
|
|
@ -1249,7 +1831,7 @@ function consume() {
|
|||
|
||||
let config = Config::default();
|
||||
let analysis = analyse_file(&path, &config).expect("file should analyse");
|
||||
let (ssa, opt) =
|
||||
let (ssa, opt, _cfg) =
|
||||
analyse_function_ssa(&analysis, "consume").expect("function should lower to SSA");
|
||||
let body = analysis
|
||||
.file_cfg
|
||||
|
|
@ -1287,7 +1869,9 @@ function consume() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
);
|
||||
|
||||
|
|
@ -1373,4 +1957,249 @@ async function recentAuditLogs() {
|
|||
"sibling function nodes should not appear in writeAuditLog view"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pointer_view_serializes_synthetic_facts() {
|
||||
// The Steensgaard analyser is exercised against synthetic SSA
|
||||
// bodies in `src/pointer/analysis.rs` because real-world
|
||||
// lowering can yield bodies whose Param ops have been folded
|
||||
// away. Here we just pin the view-model wiring: feeding the
|
||||
// serialiser an SsaBody with one SelfParam + one FieldProj
|
||||
// produces non-empty locations / values / field_reads sections.
|
||||
use crate::cfg::BodyId;
|
||||
use crate::pointer::analyse_body;
|
||||
use crate::ssa::ir::{
|
||||
BlockId, FieldInterner, SsaBlock, SsaBody, SsaInst, SsaOp, SsaValue, Terminator,
|
||||
ValueDef,
|
||||
};
|
||||
use petgraph::graph::NodeIndex;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
let mut field_interner = FieldInterner::new();
|
||||
let mu = field_interner.intern("mu");
|
||||
|
||||
let v_self = SsaValue(0);
|
||||
let v_field = SsaValue(1);
|
||||
let value_defs = vec![
|
||||
ValueDef {
|
||||
var_name: Some("c".into()),
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("c.mu".into()),
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
},
|
||||
];
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: v_self,
|
||||
op: SsaOp::SelfParam,
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: Some("c".into()),
|
||||
span: (0, 0),
|
||||
},
|
||||
SsaInst {
|
||||
value: v_field,
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: v_self,
|
||||
field: mu,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: Some("c.mu".into()),
|
||||
span: (0, 0),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs,
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner,
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let facts = analyse_body(&body, BodyId(0));
|
||||
let view = PointerView::from_facts(&facts, &body);
|
||||
assert!(
|
||||
view.location_count > 0,
|
||||
"synthetic body should produce at least one location"
|
||||
);
|
||||
assert!(
|
||||
view.locations.iter().any(|l| l.kind == "SelfParam"),
|
||||
"expected a SelfParam location in the serialised view"
|
||||
);
|
||||
assert!(
|
||||
view.locations.iter().any(|l| l.kind == "Field"),
|
||||
"expected a Field location in the serialised view"
|
||||
);
|
||||
assert!(
|
||||
view.field_reads.iter().any(|e| e.field == "mu"),
|
||||
"expected a `mu` field read; got {:?}",
|
||||
view.field_reads,
|
||||
);
|
||||
}
|
||||
|
||||
/// Regression: `analyse_function_ssa` lowers SSA against `body.graph`
|
||||
/// (per-function NodeIndex space). Routes used to pass `analysis.cfg()`
|
||||
/// (the file's top-level CFG) to `analyse_function_taint`, which made
|
||||
/// every `cfg[inst.cfg_node]` lookup index a foreign graph and panicked
|
||||
/// with `index out of bounds` on any non-toplevel function whose body
|
||||
/// had more nodes than the toplevel. Reproduce: a small Rust file with
|
||||
/// a few top-level items and a `main` whose body branches enough to
|
||||
/// allocate body-local NodeIndex values past the toplevel's count.
|
||||
#[test]
|
||||
fn taint_route_uses_per_function_cfg_for_index_lookups() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let path = dir.path().join("docgen_like.rs");
|
||||
std::fs::write(
|
||||
&path,
|
||||
r#"
|
||||
use std::env;
|
||||
use std::fs;
|
||||
|
||||
const BEGIN_MARKER: &str = "<!-- BEGIN -->";
|
||||
const END_MARKER: &str = "<!-- END -->";
|
||||
|
||||
fn main() {
|
||||
let args: Vec<String> = env::args().collect();
|
||||
let target = args.get(1).cloned().unwrap_or_else(|| "x".to_string());
|
||||
let original = match fs::read_to_string(&target) {
|
||||
Ok(s) => s,
|
||||
Err(_) => return,
|
||||
};
|
||||
let begin = match original.find(BEGIN_MARKER) {
|
||||
Some(i) => i,
|
||||
None => return,
|
||||
};
|
||||
let end = match original.find(END_MARKER) {
|
||||
Some(i) => i,
|
||||
None => return,
|
||||
};
|
||||
if end < begin {
|
||||
return;
|
||||
}
|
||||
let _ = fs::write(&target, &original);
|
||||
}
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = Config::default();
|
||||
let analysis = analyse_file(&path, &config).expect("file should analyse");
|
||||
let (ssa, opt, body_cfg) =
|
||||
analyse_function_ssa(&analysis, "main").expect("function should lower to SSA");
|
||||
|
||||
// Sanity check that this fixture exercises the bug shape: main's body
|
||||
// graph must have more nodes than the file's top-level CFG, so a
|
||||
// mistaken `analysis.cfg()` would panic on `cfg[inst.cfg_node]`.
|
||||
assert!(
|
||||
body_cfg.node_count() > analysis.cfg().node_count(),
|
||||
"fixture must have more body nodes than toplevel nodes to exercise the bug"
|
||||
);
|
||||
|
||||
// Must not panic. Pre-fix this would `index out of bounds` inside
|
||||
// `transfer_inst` because the SSA was lowered against `body_cfg` but
|
||||
// the engine was given `analysis.cfg()`.
|
||||
let _ = analyse_function_taint(
|
||||
&ssa,
|
||||
body_cfg,
|
||||
analysis.lang,
|
||||
analysis.summaries(),
|
||||
None,
|
||||
&opt,
|
||||
);
|
||||
|
||||
// Belt-and-suspenders: assert that calling with the wrong (top-level)
|
||||
// CFG would have panicked. We can't catch the panic across rayon
|
||||
// worker threads here, but we can confirm at least one `inst.cfg_node`
|
||||
// index lies outside `analysis.cfg()`'s range — that's what triggers
|
||||
// the OOB indexing inside `transfer_inst`.
|
||||
let toplevel_count = analysis.cfg().node_count();
|
||||
let max_inst_idx = ssa
|
||||
.blocks
|
||||
.iter()
|
||||
.flat_map(|b| b.phis.iter().chain(b.body.iter()))
|
||||
.map(|inst| inst.cfg_node.index())
|
||||
.max()
|
||||
.unwrap_or(0);
|
||||
assert!(
|
||||
max_inst_idx >= toplevel_count,
|
||||
"regression: at least one inst.cfg_node ({max_inst_idx}) must exceed the \
|
||||
toplevel CFG node count ({toplevel_count}) for this test to exercise the bug"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn type_facts_view_groups_security_types() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let path = dir.path().join("h.java");
|
||||
std::fs::write(
|
||||
&path,
|
||||
r#"
|
||||
import java.net.http.HttpClient;
|
||||
|
||||
public class Demo {
|
||||
public void run() {
|
||||
HttpClient c = HttpClient.newHttpClient();
|
||||
c.send(null, null);
|
||||
}
|
||||
}
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = Config::default();
|
||||
let analysis = analyse_file(&path, &config).expect("file should analyse");
|
||||
let (ssa, opt, _cfg) = analyse_function_ssa(&analysis, "run").expect("ssa should lower");
|
||||
let view = TypeFactsView::from_optimize(&opt, &ssa, &analysis.bytes);
|
||||
assert!(
|
||||
view.facts.iter().any(|f| f.kind == "HttpClient"),
|
||||
"expected HttpClient inference for `c = HttpClient.newHttpClient()`; got {:?}",
|
||||
view.facts.iter().map(|f| &f.kind).collect::<Vec<_>>(),
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn auth_view_renders_routes_for_express_handlers() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let path = dir.path().join("app.js");
|
||||
std::fs::write(
|
||||
&path,
|
||||
r#"
|
||||
const express = require('express');
|
||||
const app = express();
|
||||
|
||||
app.get('/api/users/:id', (req, res) => {
|
||||
db.query('SELECT * FROM users WHERE id=$1', [req.params.id]);
|
||||
});
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = Config::default();
|
||||
let (model, bytes, enabled) =
|
||||
analyse_file_auth(&path, &config).expect("auth analysis should run");
|
||||
assert!(enabled, "auth analysis should be enabled for JavaScript");
|
||||
let view = AuthAnalysisView::from_model(&model, &bytes, enabled);
|
||||
assert!(view.enabled);
|
||||
assert!(
|
||||
view.routes.iter().any(|r| r.path.contains("/api/users")),
|
||||
"expected the express GET route to surface; got {:?}",
|
||||
view.routes.iter().map(|r| &r.path).collect::<Vec<_>>(),
|
||||
);
|
||||
assert!(
|
||||
!view.units.is_empty(),
|
||||
"expected at least one analysis unit for the handler"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
182
src/server/error.rs
Normal file
182
src/server/error.rs
Normal file
|
|
@ -0,0 +1,182 @@
|
|||
//! Unified error type for HTTP route handlers.
|
||||
//!
|
||||
//! All routes should return [`ApiResult<T>`] (an alias for `Result<T, ApiError>`).
|
||||
//! `ApiError` serializes as `{ "error": "<human msg>", "code": "<machine code>",
|
||||
//! "detail"?: ... }` and carries the HTTP status code through `IntoResponse`.
|
||||
|
||||
use axum::Json;
|
||||
use axum::http::StatusCode;
|
||||
use axum::response::{IntoResponse, Response};
|
||||
use serde::Serialize;
|
||||
use serde_json::Value;
|
||||
|
||||
/// Machine-readable error codes. Stable strings the frontend can branch on.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum ApiCode {
|
||||
BadRequest,
|
||||
Forbidden,
|
||||
NotFound,
|
||||
Conflict,
|
||||
PayloadTooLarge,
|
||||
Unprocessable,
|
||||
Internal,
|
||||
ServiceUnavailable,
|
||||
}
|
||||
|
||||
impl ApiCode {
|
||||
fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
ApiCode::BadRequest => "bad_request",
|
||||
ApiCode::Forbidden => "forbidden",
|
||||
ApiCode::NotFound => "not_found",
|
||||
ApiCode::Conflict => "conflict",
|
||||
ApiCode::PayloadTooLarge => "payload_too_large",
|
||||
ApiCode::Unprocessable => "unprocessable",
|
||||
ApiCode::Internal => "internal",
|
||||
ApiCode::ServiceUnavailable => "service_unavailable",
|
||||
}
|
||||
}
|
||||
|
||||
fn status(self) -> StatusCode {
|
||||
match self {
|
||||
ApiCode::BadRequest => StatusCode::BAD_REQUEST,
|
||||
ApiCode::Forbidden => StatusCode::FORBIDDEN,
|
||||
ApiCode::NotFound => StatusCode::NOT_FOUND,
|
||||
ApiCode::Conflict => StatusCode::CONFLICT,
|
||||
ApiCode::PayloadTooLarge => StatusCode::PAYLOAD_TOO_LARGE,
|
||||
ApiCode::Unprocessable => StatusCode::UNPROCESSABLE_ENTITY,
|
||||
ApiCode::Internal => StatusCode::INTERNAL_SERVER_ERROR,
|
||||
ApiCode::ServiceUnavailable => StatusCode::SERVICE_UNAVAILABLE,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct ApiError {
|
||||
code: ApiCode,
|
||||
message: String,
|
||||
detail: Option<Value>,
|
||||
}
|
||||
|
||||
impl ApiError {
|
||||
pub fn new(code: ApiCode, message: impl Into<String>) -> Self {
|
||||
Self {
|
||||
code,
|
||||
message: message.into(),
|
||||
detail: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn with_detail(mut self, detail: Value) -> Self {
|
||||
self.detail = Some(detail);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn bad_request(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::BadRequest, msg)
|
||||
}
|
||||
pub fn forbidden(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::Forbidden, msg)
|
||||
}
|
||||
pub fn not_found(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::NotFound, msg)
|
||||
}
|
||||
pub fn conflict(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::Conflict, msg)
|
||||
}
|
||||
pub fn unprocessable(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::Unprocessable, msg)
|
||||
}
|
||||
pub fn internal(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::Internal, msg)
|
||||
}
|
||||
pub fn service_unavailable(msg: impl Into<String>) -> Self {
|
||||
Self::new(ApiCode::ServiceUnavailable, msg)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Serialize)]
|
||||
struct ApiErrorBody<'a> {
|
||||
error: &'a str,
|
||||
code: &'a str,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
detail: Option<&'a Value>,
|
||||
}
|
||||
|
||||
impl IntoResponse for ApiError {
|
||||
fn into_response(self) -> Response {
|
||||
let body = ApiErrorBody {
|
||||
error: &self.message,
|
||||
code: self.code.as_str(),
|
||||
detail: self.detail.as_ref(),
|
||||
};
|
||||
(
|
||||
self.code.status(),
|
||||
Json(serde_json::to_value(&body).unwrap()),
|
||||
)
|
||||
.into_response()
|
||||
}
|
||||
}
|
||||
|
||||
impl From<StatusCode> for ApiError {
|
||||
fn from(status: StatusCode) -> Self {
|
||||
let code = match status {
|
||||
StatusCode::BAD_REQUEST => ApiCode::BadRequest,
|
||||
StatusCode::FORBIDDEN => ApiCode::Forbidden,
|
||||
StatusCode::NOT_FOUND => ApiCode::NotFound,
|
||||
StatusCode::CONFLICT => ApiCode::Conflict,
|
||||
StatusCode::PAYLOAD_TOO_LARGE => ApiCode::PayloadTooLarge,
|
||||
StatusCode::UNPROCESSABLE_ENTITY => ApiCode::Unprocessable,
|
||||
StatusCode::SERVICE_UNAVAILABLE => ApiCode::ServiceUnavailable,
|
||||
_ => ApiCode::Internal,
|
||||
};
|
||||
Self::new(
|
||||
code,
|
||||
status.canonical_reason().unwrap_or("error").to_string(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<std::io::Error> for ApiError {
|
||||
fn from(err: std::io::Error) -> Self {
|
||||
Self::internal(err.to_string())
|
||||
}
|
||||
}
|
||||
|
||||
impl From<serde_json::Error> for ApiError {
|
||||
fn from(err: serde_json::Error) -> Self {
|
||||
Self::bad_request(format!("invalid JSON: {err}"))
|
||||
}
|
||||
}
|
||||
|
||||
pub type ApiResult<T> = Result<T, ApiError>;
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use axum::body::to_bytes;
|
||||
|
||||
#[tokio::test]
|
||||
async fn serializes_with_error_code_detail() {
|
||||
let err = ApiError::not_found("scan not found").with_detail(serde_json::json!({"id":"x"}));
|
||||
let resp = err.into_response();
|
||||
assert_eq!(resp.status(), StatusCode::NOT_FOUND);
|
||||
let body = to_bytes(resp.into_body(), 8 * 1024).await.unwrap();
|
||||
let v: serde_json::Value = serde_json::from_slice(&body).unwrap();
|
||||
assert_eq!(v["error"], "scan not found");
|
||||
assert_eq!(v["code"], "not_found");
|
||||
assert_eq!(v["detail"]["id"], "x");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn omits_detail_when_absent() {
|
||||
let err = ApiError::bad_request("bad input");
|
||||
let body = ApiErrorBody {
|
||||
error: &err.message,
|
||||
code: err.code.as_str(),
|
||||
detail: err.detail.as_ref(),
|
||||
};
|
||||
let s = serde_json::to_string(&body).unwrap();
|
||||
assert!(!s.contains("detail"), "expected no detail key, got {s}");
|
||||
}
|
||||
}
|
||||
927
src/server/health.rs
Normal file
927
src/server/health.rs
Normal file
|
|
@ -0,0 +1,927 @@
|
|||
//! Health-score scoring engine — v3.5.
|
||||
//!
|
||||
//! Pure-function scoring over a `HealthInputs` struct. Documented in
|
||||
//! `docs/health-score-audit.md` (calibration, rationale) and
|
||||
//! `docs/health-score.md` (customer methodology).
|
||||
//!
|
||||
//! ## Conceptual model
|
||||
//!
|
||||
//! The score reflects two intersecting forces:
|
||||
//!
|
||||
//! 1. **Density of risk.** The *quantitative* axis: per-finding weight
|
||||
//! that combines severity, confidence, symex verdict, and a test-
|
||||
//! path discount, divided by a size proxy, mapped through a log
|
||||
//! curve to a 0–100 base.
|
||||
//!
|
||||
//! 2. **HIGH-count guardrails.** The *qualitative* axis: HIGH counts
|
||||
//! cap the maximum grade and floor "no HIGH" to at least C. These
|
||||
//! are non-negotiable promises — even a perfect-everywhere-else
|
||||
//! repo with 6 confirmed HIGHs grades F.
|
||||
//!
|
||||
//! Modifiers (triage, trend, stale, regression, suppression hygiene)
|
||||
//! are nudges totalling at most ±15 within whatever band the
|
||||
//! guardrails carve out.
|
||||
//!
|
||||
//! ## What v3.5 changed vs v2/v3
|
||||
//!
|
||||
//! * Verdict-weighted credibility (`Confirmed > NotAttempted >
|
||||
//! Inconclusive > Infeasible`). This is the structural protection
|
||||
//! against false-positive-driven F grades while the scanner is
|
||||
//! still maturing — it auto-tightens as symex coverage grows.
|
||||
//! * Cross-file vs intra-file vs AST-only weighting via
|
||||
//! `context_factor`.
|
||||
//! * Test-path downweight (0.3×) — a HIGH in a test fixture is
|
||||
//! genuinely less concerning than one in a request handler.
|
||||
//! * Effective HIGH count for ceilings — the HIGH-count caps key on
|
||||
//! credibility-adjusted HIGHs, not raw HIGHs. A repo with 5
|
||||
//! low-confidence HIGHs that got `NotAttempted` from symex doesn't
|
||||
//! pay the same ceiling cost as a repo with 5 `Confirmed` HIGHs.
|
||||
//! * Tighter modifier ranges so they can't flip a band.
|
||||
//! * No `parse_success_rate` (it's actually a cache-miss metric —
|
||||
//! see `project_parse_success_rate_misnomer.md`).
|
||||
|
||||
use crate::commands::scan::Diag;
|
||||
use crate::evidence::{Confidence, Verdict};
|
||||
use crate::patterns::Severity;
|
||||
use crate::server::models::{BacklogStats, FindingSummary, HealthComponent, HealthScore};
|
||||
|
||||
// ── Tunables ─────────────────────────────────────────────────────────────────
|
||||
//
|
||||
// Calibrated for v0.5.0 scanner FP rate. As Nyx symex coverage and
|
||||
// rule precision improve, the HIGH ceilings should tighten — see
|
||||
// `docs/health-score-audit.md` "Calibration trajectory" for the
|
||||
// roadmap.
|
||||
|
||||
/// Below this file count, we floor the size divisor at 1.0 — tiny
|
||||
/// repos can't claim infinite per-LOC dilution from one finding.
|
||||
const FILES_FLOOR: f64 = 100.0;
|
||||
|
||||
/// Above this file count, no further dilution credit. A 50MLOC
|
||||
/// monorepo doesn't get a pass on a HIGH because it's "drowned" in
|
||||
/// other code.
|
||||
const FILES_CEILING: f64 = 50_000.0;
|
||||
|
||||
/// Quality lints saturate fast. 300 quality lints = max drag.
|
||||
const QUALITY_DRAG_PER_FINDING: f64 = 0.05;
|
||||
const QUALITY_DRAG_CAP: f64 = 15.0;
|
||||
|
||||
/// Below this finding count, the Triage component contributes
|
||||
/// weight 0 — we don't punish fresh users for not having triaged
|
||||
/// what didn't need triaging.
|
||||
const TRIAGE_FLOOR: usize = 20;
|
||||
|
||||
/// Stale-HIGH penalty parameters.
|
||||
const STALE_PENALTY_PER_FINDING: f64 = 2.0;
|
||||
const STALE_PENALTY_CAP: f64 = 10.0;
|
||||
|
||||
// ── Public API ───────────────────────────────────────────────────────────────
|
||||
|
||||
/// Pure inputs to the health-score calculation. No app state, no DB
|
||||
/// handles — those upstream concerns are flattened into primitives the
|
||||
/// scorer actually consumes.
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct HealthInputs<'a> {
|
||||
pub summary: &'a FindingSummary,
|
||||
pub findings: &'a [Diag],
|
||||
pub triage_coverage: f64,
|
||||
pub new_since_last: usize,
|
||||
pub fixed_since_last: usize,
|
||||
pub reintroduced: usize,
|
||||
/// Files scanned in the latest scan. Used as a proxy for repo
|
||||
/// size. `None` disables size adjustment (matches v1 callers).
|
||||
pub repo_files: Option<u64>,
|
||||
/// Backlog stats from the overview pipeline. `None` is fine on
|
||||
/// first scans (no aging data yet).
|
||||
pub backlog: Option<&'a BacklogStats>,
|
||||
/// Whether we have ≥2 completed scans. Without history Trend
|
||||
/// is meaningless and contributes weight 0.
|
||||
pub has_history: bool,
|
||||
/// Fraction of suppressions that use blanket (rule/file/
|
||||
/// rule_in_file) rules instead of fingerprint-level. `None` if
|
||||
/// no suppressions. Drives a small ±2 modifier; high blanket
|
||||
/// rates suggest gaming the score.
|
||||
pub blanket_suppression_rate: Option<f64>,
|
||||
}
|
||||
|
||||
/// Compute the health score from pure inputs.
|
||||
pub fn compute(inp: &HealthInputs<'_>) -> HealthScore {
|
||||
// Step 1: Per-finding credibility-weighted weight, plus the
|
||||
// bookkeeping we need for the breakdown components.
|
||||
let weighted = aggregate_findings(inp.findings);
|
||||
|
||||
// Step 2: Density adjustment.
|
||||
let size_divisor = size_divisor(inp.repo_files);
|
||||
let density_weight = weighted.raw_weight / size_divisor;
|
||||
|
||||
// Step 3: Map density to base score via log curve.
|
||||
let base_score = density_to_base_score(density_weight);
|
||||
|
||||
// Step 4: Apply quality-lint drag.
|
||||
let quality_drag = quality_drag(weighted.quality_count);
|
||||
let base_after_drag = (base_score - quality_drag).clamp(0.0, 100.0);
|
||||
|
||||
// Step 5: HIGH-count guardrails — keyed on *effective* HIGH count
|
||||
// (credibility-weighted), not raw count. This is what protects
|
||||
// users from FP-driven F grades while the scanner is maturing.
|
||||
let ceiling = high_total_ceiling(weighted.effective_high);
|
||||
let floor = high_total_floor(weighted.effective_high);
|
||||
let score_clamped = base_after_drag.clamp(floor, ceiling);
|
||||
|
||||
// Step 6: Build the breakdown components (also computes their
|
||||
// sub-scores for transparency).
|
||||
let components = build_components(inp, &weighted, base_after_drag, size_divisor);
|
||||
|
||||
// Step 7: Sum modifiers (already encoded in component weights;
|
||||
// see `build_components`).
|
||||
let modifier_sum = components
|
||||
.iter()
|
||||
.filter(|c| c.label != "Severity pressure")
|
||||
.map(signed_modifier_contribution)
|
||||
.sum::<f64>();
|
||||
|
||||
// Reapply ceiling AND floor after modifiers. Ceiling: modifiers
|
||||
// can't lift past a HIGH cap. Floor: triage/regression
|
||||
// modifiers can't break the no-HIGH ≥ C guarantee.
|
||||
let final_uncapped = (score_clamped + modifier_sum).clamp(0.0, 100.0);
|
||||
let score = final_uncapped.min(ceiling).max(floor).round() as u8;
|
||||
let grade = grade_for(score).to_string();
|
||||
|
||||
HealthScore {
|
||||
score,
|
||||
grade,
|
||||
components,
|
||||
}
|
||||
}
|
||||
|
||||
// ── Aggregation ──────────────────────────────────────────────────────────────
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
struct WeightedAggregate {
|
||||
/// Sum of `severity_base × confidence_factor × verdict_factor ×
|
||||
/// context_factor` across security findings. Quality lints are
|
||||
/// handled separately via `quality_drag`.
|
||||
raw_weight: f64,
|
||||
/// Number of `*.quality.*` findings — drives `quality_drag`.
|
||||
quality_count: usize,
|
||||
/// Credibility-adjusted HIGH count (rounded) — drives the HIGH
|
||||
/// ceiling and floor. A low-confidence + Inconclusive HIGH might
|
||||
/// contribute 0.2; five of them would round to 1.
|
||||
effective_high: usize,
|
||||
/// Raw counts (for the breakdown text).
|
||||
raw_high: usize,
|
||||
raw_medium: usize,
|
||||
raw_low_security: usize,
|
||||
/// Confidence rate (high+medium*0.5)/total — drives the
|
||||
/// confidence component. 100 if no findings.
|
||||
confidence_rate: f64,
|
||||
/// Symex coverage — % of taint findings with any non-NotAttempted
|
||||
/// verdict. Surfaced in component detail; not currently in score.
|
||||
symex_coverage: f64,
|
||||
}
|
||||
|
||||
fn aggregate_findings(findings: &[Diag]) -> WeightedAggregate {
|
||||
let mut agg = WeightedAggregate::default();
|
||||
let mut effective_high_sum = 0.0f64;
|
||||
let mut conf_score_sum = 0.0f64;
|
||||
let mut taint_total = 0usize;
|
||||
let mut taint_with_verdict = 0usize;
|
||||
|
||||
for f in findings {
|
||||
let is_quality = f.id.contains(".quality.") || f.id.starts_with("quality.");
|
||||
if is_quality {
|
||||
agg.quality_count += 1;
|
||||
continue;
|
||||
}
|
||||
|
||||
let severity = f.severity;
|
||||
let conf_factor = confidence_factor(f.confidence);
|
||||
let verdict_factor = verdict_factor(f);
|
||||
let context_factor = context_factor(f);
|
||||
|
||||
let credibility = (conf_factor * verdict_factor * context_factor).clamp(0.0, 1.2);
|
||||
let weight = severity_base(severity) * credibility;
|
||||
agg.raw_weight += weight;
|
||||
|
||||
match severity {
|
||||
Severity::High => {
|
||||
agg.raw_high += 1;
|
||||
effective_high_sum += credibility;
|
||||
}
|
||||
Severity::Medium => agg.raw_medium += 1,
|
||||
Severity::Low => agg.raw_low_security += 1,
|
||||
}
|
||||
|
||||
// Confidence component contribution (independent of severity).
|
||||
conf_score_sum += match f.confidence {
|
||||
Some(Confidence::High) => 1.0,
|
||||
Some(Confidence::Medium) => 0.5,
|
||||
_ => 0.0,
|
||||
};
|
||||
|
||||
// Symex coverage tracking — only meaningful for findings with
|
||||
// taint-flow evidence (the ones symex even attempts).
|
||||
if let Some(ev) = f.evidence.as_ref()
|
||||
&& ev.symbolic.is_some()
|
||||
{
|
||||
taint_total += 1;
|
||||
if !matches!(
|
||||
ev.symbolic.as_ref().map(|s| s.verdict),
|
||||
Some(Verdict::NotAttempted) | None
|
||||
) {
|
||||
taint_with_verdict += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
agg.effective_high = effective_high_sum.round() as usize;
|
||||
agg.confidence_rate = if findings.is_empty() {
|
||||
100.0
|
||||
} else {
|
||||
let security_total = (findings.len() - agg.quality_count).max(1);
|
||||
(conf_score_sum / security_total as f64) * 100.0
|
||||
};
|
||||
agg.symex_coverage = if taint_total == 0 {
|
||||
0.0
|
||||
} else {
|
||||
taint_with_verdict as f64 / taint_total as f64
|
||||
};
|
||||
agg
|
||||
}
|
||||
|
||||
fn severity_base(s: Severity) -> f64 {
|
||||
match s {
|
||||
Severity::High => 10.0,
|
||||
Severity::Medium => 3.0,
|
||||
Severity::Low => 0.5,
|
||||
}
|
||||
}
|
||||
|
||||
fn confidence_factor(c: Option<Confidence>) -> f64 {
|
||||
match c {
|
||||
Some(Confidence::High) => 1.0,
|
||||
Some(Confidence::Medium) => 0.6,
|
||||
Some(Confidence::Low) => 0.3,
|
||||
None => 0.5,
|
||||
}
|
||||
}
|
||||
|
||||
/// `verdict_factor` is the heart of the FP protection. An AST-only
|
||||
/// finding (no taint flow → no symex even attempted) gets the
|
||||
/// `NotAttempted` baseline of 1.0. A taint finding that symex
|
||||
/// confirmed gets 1.2 (a credibility boost). A taint finding that
|
||||
/// symex proved infeasible gets 0.1 (near-suppress).
|
||||
fn verdict_factor(f: &Diag) -> f64 {
|
||||
let Some(ev) = f.evidence.as_ref() else {
|
||||
return 1.0;
|
||||
};
|
||||
let Some(sv) = ev.symbolic.as_ref() else {
|
||||
return 1.0;
|
||||
};
|
||||
match sv.verdict {
|
||||
Verdict::Confirmed => 1.2,
|
||||
Verdict::NotAttempted => 1.0,
|
||||
Verdict::Inconclusive => 0.7,
|
||||
Verdict::Infeasible => 0.1,
|
||||
}
|
||||
}
|
||||
|
||||
/// Cross-file flow → 1.15. Intra-file taint flow → 1.0. AST-only
|
||||
/// (no flow_steps) → 0.75. Test path → 0.3 regardless of the others
|
||||
/// (returns the *minimum* factor so test paths always win over
|
||||
/// cross-file boosts).
|
||||
fn context_factor(f: &Diag) -> f64 {
|
||||
if is_test_path(&f.path) {
|
||||
return 0.3;
|
||||
}
|
||||
let Some(ev) = f.evidence.as_ref() else {
|
||||
return 0.75; // No evidence at all — pattern match
|
||||
};
|
||||
if ev.flow_steps.is_empty() {
|
||||
return 0.75;
|
||||
}
|
||||
if ev.flow_steps.iter().any(|s| s.is_cross_file) || ev.uses_summary {
|
||||
return 1.15;
|
||||
}
|
||||
1.0
|
||||
}
|
||||
|
||||
fn is_test_path(path: &str) -> bool {
|
||||
let p = path.to_ascii_lowercase();
|
||||
// Path-segment matches.
|
||||
p.contains("/test/")
|
||||
|| p.contains("/tests/")
|
||||
|| p.contains("/spec/")
|
||||
|| p.contains("/__tests__/")
|
||||
|| p.contains("/testdata/")
|
||||
// Filename suffix conventions.
|
||||
|| p.ends_with("_test.go")
|
||||
|| p.ends_with("_spec.rb")
|
||||
|| p.ends_with(".test.ts")
|
||||
|| p.ends_with(".test.js")
|
||||
|| p.ends_with(".spec.ts")
|
||||
|| p.ends_with(".spec.js")
|
||||
|| file_basename(&p)
|
||||
.map(|b| b.starts_with("test_") && b.ends_with(".py"))
|
||||
.unwrap_or(false)
|
||||
}
|
||||
|
||||
fn file_basename(path: &str) -> Option<&str> {
|
||||
path.rsplit('/').next()
|
||||
}
|
||||
|
||||
// ── Density math ─────────────────────────────────────────────────────────────
|
||||
|
||||
fn size_divisor(repo_files: Option<u64>) -> f64 {
|
||||
let f = match repo_files {
|
||||
Some(n) => (n as f64).clamp(FILES_FLOOR, FILES_CEILING),
|
||||
None => FILES_FLOOR,
|
||||
};
|
||||
(f / FILES_FLOOR).sqrt()
|
||||
}
|
||||
|
||||
fn density_to_base_score(density_weight: f64) -> f64 {
|
||||
if density_weight <= 0.0 {
|
||||
return 100.0;
|
||||
}
|
||||
let raw = 100.0 - 22.0 * (1.0 + density_weight / 4.0).log10();
|
||||
raw.clamp(0.0, 100.0)
|
||||
}
|
||||
|
||||
fn quality_drag(quality_count: usize) -> f64 {
|
||||
(quality_count as f64 * QUALITY_DRAG_PER_FINDING).min(QUALITY_DRAG_CAP)
|
||||
}
|
||||
|
||||
// ── HIGH guardrails — calibrated for v0.5.0 FP rate ──────────────────────────
|
||||
|
||||
/// Final-score ceiling keyed on *effective* HIGH count (credibility-
|
||||
/// weighted, not raw). See module docstring for the rationale.
|
||||
fn high_total_ceiling(effective_high: usize) -> f64 {
|
||||
match effective_high {
|
||||
0 => 100.0,
|
||||
1 => 85.0, // 1 credible HIGH → max B
|
||||
2 => 78.0, // 2 → max C+
|
||||
3..=5 => 68.0, // 3-5 → max D+
|
||||
6..=10 => 58.0,
|
||||
_ => 45.0,
|
||||
}
|
||||
}
|
||||
|
||||
/// Final-score floor keyed on *effective* HIGH count. Zero HIGH never
|
||||
/// grades below C. This is the structural promise that the score
|
||||
/// isn't an automated F-machine.
|
||||
fn high_total_floor(effective_high: usize) -> f64 {
|
||||
if effective_high == 0 { 70.0 } else { 0.0 }
|
||||
}
|
||||
|
||||
// ── Stale-HIGH penalty ──────────────────────────────────────────────────────
|
||||
|
||||
fn stale_high_penalty(effective_high: usize, backlog: Option<&BacklogStats>) -> f64 {
|
||||
let Some(b) = backlog else { return 0.0 };
|
||||
if effective_high == 0 || b.stale_count == 0 {
|
||||
return 0.0;
|
||||
}
|
||||
(b.stale_count as f64 * STALE_PENALTY_PER_FINDING).min(STALE_PENALTY_CAP)
|
||||
}
|
||||
|
||||
// ── Component breakdown ──────────────────────────────────────────────────────
|
||||
|
||||
fn build_components(
|
||||
inp: &HealthInputs<'_>,
|
||||
weighted: &WeightedAggregate,
|
||||
base_after_drag: f64,
|
||||
size_divisor: f64,
|
||||
) -> Vec<HealthComponent> {
|
||||
let total = inp.summary.total;
|
||||
|
||||
// Severity component is the primary score-bearing component;
|
||||
// it absorbs the base+drag+ceiling+floor result.
|
||||
let sev_score = base_after_drag.round().clamp(0.0, 100.0) as u8;
|
||||
let sev_detail = severity_detail(weighted, size_divisor, inp.repo_files, inp.backlog);
|
||||
|
||||
// Confidence component — high-conf rate scaled into 0..=100.
|
||||
let conf_score = weighted.confidence_rate.round().clamp(0.0, 100.0) as u8;
|
||||
let conf_detail = format!(
|
||||
"High-confidence rate {:.0}% across {} security finding{}",
|
||||
weighted.confidence_rate,
|
||||
total - weighted.quality_count,
|
||||
plural_s(total - weighted.quality_count)
|
||||
);
|
||||
|
||||
// Trend component — only contributes weight when has_history.
|
||||
let net = inp.fixed_since_last as i64 - inp.new_since_last as i64;
|
||||
let trend_score = (50 + net * 5).clamp(0, 100) as u8;
|
||||
let trend_weight = if inp.has_history { 0.20 } else { 0.0 };
|
||||
let trend_detail = if inp.has_history {
|
||||
format!(
|
||||
"Net {} since last scan ({} fixed, {} new)",
|
||||
net, inp.fixed_since_last, inp.new_since_last
|
||||
)
|
||||
} else {
|
||||
"Not applicable: no prior scan to compare against (re-scan to populate)".into()
|
||||
};
|
||||
|
||||
// Triage — drops out when total < TRIAGE_FLOOR.
|
||||
let triage_active = total >= TRIAGE_FLOOR;
|
||||
let triage_score = (inp.triage_coverage * 100.0).round().clamp(0.0, 100.0) as u8;
|
||||
let triage_weight = if triage_active { 0.20 } else { 0.0 };
|
||||
let triage_detail = if triage_active {
|
||||
format!(
|
||||
"{:.0}% of findings have a triage state",
|
||||
inp.triage_coverage * 100.0
|
||||
)
|
||||
} else {
|
||||
format!(
|
||||
"Not applicable: only {} finding{} (need ≥{} to evaluate)",
|
||||
total,
|
||||
plural_s(total),
|
||||
TRIAGE_FLOOR
|
||||
)
|
||||
};
|
||||
|
||||
// Regression resistance.
|
||||
let stale_penalty = stale_high_penalty(weighted.effective_high, inp.backlog);
|
||||
let reintro_penalty = (inp.reintroduced as f64 * 5.0).min(10.0);
|
||||
let regression_score = (100.0 - reintro_penalty - stale_penalty)
|
||||
.clamp(0.0, 100.0)
|
||||
.round() as u8;
|
||||
let regression_detail = match (inp.reintroduced, stale_penalty) {
|
||||
(0, 0.0) => "No reintroduced or stale-HIGH findings".into(),
|
||||
(0, p) => format!(
|
||||
"{} stale finding{} affecting HIGH severity (−{:.0})",
|
||||
inp.backlog.map(|b| b.stale_count).unwrap_or(0),
|
||||
plural_s(inp.backlog.map(|b| b.stale_count).unwrap_or(0)),
|
||||
p
|
||||
),
|
||||
(n, 0.0) => format!(
|
||||
"{} previously-fixed finding{} reintroduced (−{:.0})",
|
||||
n,
|
||||
plural_s(n),
|
||||
(n as f64 * 5.0).min(10.0)
|
||||
),
|
||||
(n, p) => format!(
|
||||
"{} reintroduced (−{:.0}) + stale-HIGH penalty (−{:.0})",
|
||||
n,
|
||||
(n as f64 * 5.0).min(10.0),
|
||||
p
|
||||
),
|
||||
};
|
||||
|
||||
vec![
|
||||
HealthComponent {
|
||||
label: "Severity pressure".into(),
|
||||
score: sev_score,
|
||||
weight: 1.0, // Severity is the *base*, not a modifier — full weight in the blend.
|
||||
detail: sev_detail,
|
||||
},
|
||||
HealthComponent {
|
||||
label: "Confidence quality".into(),
|
||||
score: conf_score,
|
||||
weight: 0.0, // Confidence influence is already baked into raw_weight via verdict_factor.
|
||||
detail: conf_detail,
|
||||
},
|
||||
HealthComponent {
|
||||
label: "Trend".into(),
|
||||
score: trend_score,
|
||||
weight: trend_weight,
|
||||
detail: trend_detail,
|
||||
},
|
||||
HealthComponent {
|
||||
label: "Triage coverage".into(),
|
||||
score: triage_score,
|
||||
weight: triage_weight,
|
||||
detail: triage_detail,
|
||||
},
|
||||
HealthComponent {
|
||||
label: "Regression resistance".into(),
|
||||
score: regression_score,
|
||||
weight: 0.15,
|
||||
detail: regression_detail,
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
/// How a non-severity component contributes to the modifier sum.
|
||||
/// Each component's score (0–100) is mapped to a signed point delta
|
||||
/// in roughly the [−5, +5] range, gated by the component's weight
|
||||
/// (which becomes 0 when the component drops out).
|
||||
fn signed_modifier_contribution(c: &HealthComponent) -> f64 {
|
||||
if c.weight == 0.0 {
|
||||
return 0.0;
|
||||
}
|
||||
match c.label.as_str() {
|
||||
"Confidence quality" => {
|
||||
// High-conf rate above 80% → +3, above 50% → +1, below → 0.
|
||||
// (This component now also has weight 0 because its
|
||||
// influence is baked into raw_weight via verdict_factor.
|
||||
// Kept here for transparency in the breakdown only.)
|
||||
0.0
|
||||
}
|
||||
"Trend" => {
|
||||
// Net positive trend → +3 max; negative → −3 max.
|
||||
// Linear in (score − 50)/50 × 3, clamped.
|
||||
let centred = (c.score as f64 - 50.0) / 50.0;
|
||||
(centred * 3.0).clamp(-3.0, 3.0)
|
||||
}
|
||||
"Triage coverage" => {
|
||||
// ≥50% triaged → +5; 0% triaged → −3; in between → linear.
|
||||
if c.score >= 50 {
|
||||
((c.score as f64 - 50.0) / 50.0 * 5.0).min(5.0)
|
||||
} else {
|
||||
-((50.0 - c.score as f64) / 50.0 * 3.0).min(3.0)
|
||||
}
|
||||
}
|
||||
"Regression resistance" => {
|
||||
// 100 → +0, lower scores subtract directly (already baked
|
||||
// in the score; component weight pulls it into the blend).
|
||||
// Map: at score 100 → 0; at score 70 → −5; at score 0 → −15.
|
||||
((c.score as f64 - 100.0) * 0.15).clamp(-15.0, 0.0)
|
||||
}
|
||||
_ => 0.0,
|
||||
}
|
||||
}
|
||||
|
||||
fn severity_detail(
|
||||
w: &WeightedAggregate,
|
||||
size_divisor: f64,
|
||||
repo_files: Option<u64>,
|
||||
backlog: Option<&BacklogStats>,
|
||||
) -> String {
|
||||
let mut parts = Vec::new();
|
||||
parts.push(format!("{:.0} weighted points", w.raw_weight));
|
||||
parts.push(format!(
|
||||
"{} High, {} Medium, {} Low",
|
||||
w.raw_high, w.raw_medium, w.raw_low_security
|
||||
));
|
||||
if w.quality_count > 0 {
|
||||
parts.push(format!("{} quality lints", w.quality_count));
|
||||
}
|
||||
if w.effective_high != w.raw_high {
|
||||
parts.push(format!(
|
||||
"effective HIGH={} (credibility-adjusted)",
|
||||
w.effective_high
|
||||
));
|
||||
}
|
||||
if let Some(f) = repo_files
|
||||
&& (size_divisor - 1.0).abs() > 0.01
|
||||
{
|
||||
parts.push(format!("size factor 1/{:.2}× ({} files)", size_divisor, f));
|
||||
}
|
||||
let stale = stale_high_penalty(w.effective_high, backlog);
|
||||
if stale > 0.0
|
||||
&& let Some(b) = backlog
|
||||
{
|
||||
parts.push(format!("−{:.0} stale-HIGH ({} >30d)", stale, b.stale_count));
|
||||
}
|
||||
parts.join(" · ")
|
||||
}
|
||||
|
||||
// ── Misc ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
fn grade_for(score: u8) -> &'static str {
|
||||
match score {
|
||||
90..=100 => "A",
|
||||
80..=89 => "B",
|
||||
70..=79 => "C",
|
||||
60..=69 => "D",
|
||||
_ => "F",
|
||||
}
|
||||
}
|
||||
|
||||
fn plural_s(n: usize) -> &'static str {
|
||||
if n == 1 { "" } else { "s" }
|
||||
}
|
||||
|
||||
// ── Tests ────────────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::patterns::{FindingCategory, Severity};
|
||||
|
||||
fn diag(severity: Severity, id: &str, conf: Option<Confidence>) -> Diag {
|
||||
Diag {
|
||||
path: "src/lib.rs".into(),
|
||||
line: 1,
|
||||
col: 1,
|
||||
severity,
|
||||
id: id.into(),
|
||||
category: FindingCategory::Security,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
labels: Vec::new(),
|
||||
confidence: conf,
|
||||
evidence: None,
|
||||
rank_score: None,
|
||||
rank_reason: None,
|
||||
suppressed: false,
|
||||
suppression: None,
|
||||
rollup: None,
|
||||
finding_id: String::new(),
|
||||
alternative_finding_ids: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
fn diag_in(path: &str, severity: Severity, conf: Option<Confidence>) -> Diag {
|
||||
let mut d = diag(severity, "rs.taint.x", conf);
|
||||
d.path = path.into();
|
||||
d
|
||||
}
|
||||
|
||||
fn summary_of(findings: &[Diag]) -> FindingSummary {
|
||||
let mut s = FindingSummary {
|
||||
total: findings.len(),
|
||||
..Default::default()
|
||||
};
|
||||
for d in findings {
|
||||
*s.by_severity
|
||||
.entry(d.severity.as_db_str().to_string())
|
||||
.or_insert(0) += 1;
|
||||
}
|
||||
s
|
||||
}
|
||||
|
||||
fn first_scan<'a>(
|
||||
summary: &'a FindingSummary,
|
||||
findings: &'a [Diag],
|
||||
triage: f64,
|
||||
files: u64,
|
||||
) -> HealthInputs<'a> {
|
||||
HealthInputs {
|
||||
summary,
|
||||
findings,
|
||||
triage_coverage: triage,
|
||||
new_since_last: 0,
|
||||
fixed_since_last: 0,
|
||||
reintroduced: 0,
|
||||
repo_files: Some(files),
|
||||
backlog: None,
|
||||
has_history: false,
|
||||
blanket_suppression_rate: None,
|
||||
}
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
fn with_history<'a>(
|
||||
summary: &'a FindingSummary,
|
||||
findings: &'a [Diag],
|
||||
triage: f64,
|
||||
files: u64,
|
||||
) -> HealthInputs<'a> {
|
||||
HealthInputs {
|
||||
has_history: true,
|
||||
..first_scan(summary, findings, triage, files)
|
||||
}
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
fn sev_score(h: &HealthScore) -> u8 {
|
||||
h.components
|
||||
.iter()
|
||||
.find(|c| c.label == "Severity pressure")
|
||||
.unwrap()
|
||||
.score
|
||||
}
|
||||
|
||||
// ── Foundational behaviour ───────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn clean_repo_first_scan_grades_a() {
|
||||
let findings: Vec<Diag> = vec![];
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 100));
|
||||
assert_eq!(h.grade, "A");
|
||||
assert!(h.score >= 95, "clean first-scan ≥95, got {}", h.score);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn no_high_repo_never_grades_below_c() {
|
||||
// 0 HIGH, lots of mediums + quality.
|
||||
let mut findings: Vec<Diag> = (0..200)
|
||||
.map(|_| diag(Severity::Medium, "rs.taint.foo", Some(Confidence::High)))
|
||||
.collect();
|
||||
findings.extend(
|
||||
(0..2000).map(|_| diag(Severity::Low, "rs.quality.unwrap", Some(Confidence::High))),
|
||||
);
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 200));
|
||||
assert!(h.score >= 70, "0 HIGH must grade ≥C (70), got {}", h.score);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn quality_lints_alone_grade_at_least_b() {
|
||||
// 1000 quality lints, no security findings. Drag caps at 15
|
||||
// so base ~100−15=85. Should grade at worst B-.
|
||||
let findings: Vec<Diag> = (0..1000)
|
||||
.map(|_| diag(Severity::Low, "rs.quality.unwrap", Some(Confidence::High)))
|
||||
.collect();
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 100));
|
||||
assert!(h.score >= 80, "1000 quality lints → ≥B, got {}", h.score);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn one_high_caps_at_b() {
|
||||
let findings = vec![diag(Severity::High, "rs.taint.x", Some(Confidence::High))];
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 100));
|
||||
assert!(h.score <= 89, "1 HIGH must not grade A, got {}", h.score);
|
||||
assert_ne!(h.grade, "A");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn many_confirmed_high_grades_f() {
|
||||
// 8 HIGHs all symex-Confirmed → effective_high ≈ 9.6 → F band.
|
||||
let findings: Vec<Diag> = (0..8)
|
||||
.map(|_| {
|
||||
let mut d = diag(Severity::High, "rs.taint.x", Some(Confidence::High));
|
||||
let ev = crate::evidence::Evidence {
|
||||
symbolic: Some(crate::evidence::SymbolicVerdict {
|
||||
verdict: crate::evidence::Verdict::Confirmed,
|
||||
constraints_checked: 0,
|
||||
paths_explored: 0,
|
||||
witness: None,
|
||||
interproc_call_chains: Vec::new(),
|
||||
cutoff_notes: Vec::new(),
|
||||
}),
|
||||
..Default::default()
|
||||
};
|
||||
d.evidence = Some(ev);
|
||||
d
|
||||
})
|
||||
.collect();
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 1000));
|
||||
assert_eq!(h.grade, "F");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn low_credibility_high_does_not_count_as_full() {
|
||||
// 5 raw HIGHs, all Low confidence, all NotAttempted (no
|
||||
// evidence). Each has credibility ≈ 0.3 × 1.0 × 0.75 = 0.225.
|
||||
// Sum = 1.125 → effective_high = 1. Ceiling 85.
|
||||
let findings: Vec<Diag> = (0..5)
|
||||
.map(|_| {
|
||||
let mut d = diag(Severity::High, "rs.taint.x", Some(Confidence::Low));
|
||||
// Force AST-only: no evidence at all.
|
||||
d.evidence = None;
|
||||
d
|
||||
})
|
||||
.collect();
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 100));
|
||||
// The score reflects credibility — should NOT crater to F.
|
||||
assert!(
|
||||
h.score >= 60,
|
||||
"low-credibility HIGHs shouldn't crater to F, got {}",
|
||||
h.score
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_path_findings_are_discounted() {
|
||||
let in_test = vec![diag_in(
|
||||
"src/feature/__tests__/handler.test.ts",
|
||||
Severity::High,
|
||||
Some(Confidence::High),
|
||||
)];
|
||||
let in_prod = vec![diag_in(
|
||||
"src/feature/handler.ts",
|
||||
Severity::High,
|
||||
Some(Confidence::High),
|
||||
)];
|
||||
let st = summary_of(&in_test);
|
||||
let sp = summary_of(&in_prod);
|
||||
|
||||
let h_test = compute(&first_scan(&st, &in_test, 0.0, 50));
|
||||
let h_prod = compute(&first_scan(&sp, &in_prod, 0.0, 50));
|
||||
assert!(
|
||||
h_test.score > h_prod.score,
|
||||
"test-path HIGH ({}) should grade better than prod HIGH ({})",
|
||||
h_test.score,
|
||||
h_prod.score
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn density_dampens_for_large_repos_but_caps() {
|
||||
let findings: Vec<Diag> = (0..3)
|
||||
.map(|_| diag(Severity::Medium, "rs.taint.x", Some(Confidence::High)))
|
||||
.collect();
|
||||
let s = summary_of(&findings);
|
||||
let small = compute(&first_scan(&s, &findings, 0.0, 100));
|
||||
let mid = compute(&first_scan(&s, &findings, 0.0, 5000));
|
||||
let big = compute(&first_scan(&s, &findings, 0.0, 50_000));
|
||||
let huge = compute(&first_scan(&s, &findings, 0.0, 500_000));
|
||||
assert!(
|
||||
small.score <= mid.score,
|
||||
"small {} mid {}",
|
||||
small.score,
|
||||
mid.score
|
||||
);
|
||||
assert!(
|
||||
mid.score <= big.score,
|
||||
"mid {} big {}",
|
||||
mid.score,
|
||||
big.score
|
||||
);
|
||||
assert!(
|
||||
(big.score as i32 - huge.score as i32).abs() <= 1,
|
||||
"size cap broken: big {} huge {}",
|
||||
big.score,
|
||||
huge.score
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn triage_drops_when_total_under_floor() {
|
||||
let findings: Vec<Diag> = (0..5)
|
||||
.map(|_| diag(Severity::Low, "rs.x", Some(Confidence::High)))
|
||||
.collect();
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.0, 100));
|
||||
let triage = h
|
||||
.components
|
||||
.iter()
|
||||
.find(|c| c.label == "Triage coverage")
|
||||
.unwrap();
|
||||
assert_eq!(triage.weight, 0.0);
|
||||
assert!(triage.detail.contains("Not applicable"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn trend_drops_on_first_scan() {
|
||||
let findings: Vec<Diag> = (0..30)
|
||||
.map(|_| diag(Severity::Medium, "rs.x", Some(Confidence::High)))
|
||||
.collect();
|
||||
let s = summary_of(&findings);
|
||||
let h = compute(&first_scan(&s, &findings, 0.5, 100));
|
||||
let trend = h.components.iter().find(|c| c.label == "Trend").unwrap();
|
||||
assert_eq!(trend.weight, 0.0);
|
||||
assert!(trend.detail.contains("Not applicable"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn stale_high_penalty_lowers_regression_component() {
|
||||
let findings = vec![diag(Severity::High, "rs.taint.x", Some(Confidence::High))];
|
||||
let s = summary_of(&findings);
|
||||
|
||||
let backlog_clean = BacklogStats {
|
||||
oldest_open_days: Some(2),
|
||||
median_age_days: Some(1),
|
||||
stale_count: 0,
|
||||
age_buckets: vec![],
|
||||
};
|
||||
let backlog_stale = BacklogStats {
|
||||
oldest_open_days: Some(120),
|
||||
median_age_days: Some(60),
|
||||
stale_count: 3,
|
||||
age_buckets: vec![],
|
||||
};
|
||||
|
||||
let fresh_inputs = HealthInputs {
|
||||
backlog: Some(&backlog_clean),
|
||||
has_history: true,
|
||||
..first_scan(&s, &findings, 0.0, 100)
|
||||
};
|
||||
let rotting_inputs = HealthInputs {
|
||||
backlog: Some(&backlog_stale),
|
||||
has_history: true,
|
||||
..first_scan(&s, &findings, 0.0, 100)
|
||||
};
|
||||
let fresh = compute(&fresh_inputs);
|
||||
let rotting = compute(&rotting_inputs);
|
||||
let fresh_reg = fresh
|
||||
.components
|
||||
.iter()
|
||||
.find(|c| c.label == "Regression resistance")
|
||||
.unwrap()
|
||||
.score;
|
||||
let rot_reg = rotting
|
||||
.components
|
||||
.iter()
|
||||
.find(|c| c.label == "Regression resistance")
|
||||
.unwrap()
|
||||
.score;
|
||||
assert!(
|
||||
rot_reg < fresh_reg,
|
||||
"stale should lower regression score: fresh {} vs rotting {}",
|
||||
fresh_reg,
|
||||
rot_reg
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn grade_thresholds() {
|
||||
assert_eq!(grade_for(100), "A");
|
||||
assert_eq!(grade_for(90), "A");
|
||||
assert_eq!(grade_for(89), "B");
|
||||
assert_eq!(grade_for(80), "B");
|
||||
assert_eq!(grade_for(79), "C");
|
||||
assert_eq!(grade_for(70), "C");
|
||||
assert_eq!(grade_for(69), "D");
|
||||
assert_eq!(grade_for(60), "D");
|
||||
assert_eq!(grade_for(59), "F");
|
||||
assert_eq!(grade_for(0), "F");
|
||||
}
|
||||
}
|
||||
|
|
@ -1,8 +1,12 @@
|
|||
pub mod app;
|
||||
pub mod assets;
|
||||
pub mod debug;
|
||||
pub mod error;
|
||||
pub mod health;
|
||||
pub mod jobs;
|
||||
pub mod models;
|
||||
pub mod observability;
|
||||
pub mod owasp;
|
||||
pub mod progress;
|
||||
pub mod routes;
|
||||
pub mod scan_log;
|
||||
|
|
|
|||
|
|
@ -582,6 +582,187 @@ pub struct OverviewResponse {
|
|||
pub noisy_rules: Vec<NoisyRule>,
|
||||
pub recent_scans: Vec<ScanSummary>,
|
||||
pub insights: Vec<Insight>,
|
||||
|
||||
// ── Tier 1 ──
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub health: Option<HealthScore>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub posture: Option<PostureSummary>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub backlog: Option<BacklogStats>,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub weighted_top_files: Vec<WeightedFile>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub confidence_distribution: Option<ConfidenceDistribution>,
|
||||
|
||||
// ── Tier 2 ──
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub scanner_quality: Option<ScannerQuality>,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub issue_categories: Vec<IssueCategoryBucket>,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub hot_sinks: Vec<HotSink>,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub owasp_buckets: Vec<OwaspBucket>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub cross_file_ratio: Option<f64>,
|
||||
|
||||
// ── Tier 3 ──
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub baseline: Option<BaselineInfo>,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub language_health: Vec<LanguageHealth>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub suppression_hygiene: Option<SuppressionHygiene>,
|
||||
}
|
||||
|
||||
/// Composite repo-health rollup.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct HealthScore {
|
||||
/// 0–100 score; higher is better.
|
||||
pub score: u8,
|
||||
/// Letter grade A–F derived from score.
|
||||
pub grade: String,
|
||||
/// Sub-component contributions (0–100 each) for transparency.
|
||||
pub components: Vec<HealthComponent>,
|
||||
}
|
||||
|
||||
/// Single line item in the health-score breakdown.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct HealthComponent {
|
||||
/// Human label (e.g. "Severity pressure", "Trend", "Triage").
|
||||
pub label: String,
|
||||
/// 0–100 — already inverted so higher = healthier.
|
||||
pub score: u8,
|
||||
/// Weight applied when blending into the final score (0.0–1.0).
|
||||
pub weight: f64,
|
||||
/// Short rationale shown in tooltip.
|
||||
pub detail: String,
|
||||
}
|
||||
|
||||
/// One-line trend posture for the page header.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct PostureSummary {
|
||||
/// "improving" | "regressing" | "stable" | "unknown"
|
||||
pub trend: String,
|
||||
/// "success" | "warning" | "danger" | "info"
|
||||
pub severity: String,
|
||||
/// Short message shown verbatim in the banner.
|
||||
pub message: String,
|
||||
/// Findings that were previously fixed and have re-appeared.
|
||||
pub reintroduced_count: usize,
|
||||
}
|
||||
|
||||
/// Backlog age statistics computed from finding_first_seen.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct BacklogStats {
|
||||
/// Days since the oldest still-open finding was first seen.
|
||||
pub oldest_open_days: Option<u32>,
|
||||
/// Median age of currently-open findings, in days.
|
||||
pub median_age_days: Option<u32>,
|
||||
/// Findings older than 30 days that remain open.
|
||||
pub stale_count: usize,
|
||||
/// Histogram buckets (label, count) — fixed 5 buckets.
|
||||
pub age_buckets: Vec<OverviewCount>,
|
||||
}
|
||||
|
||||
/// Top-file row including severity stack for the weighted ranking.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct WeightedFile {
|
||||
pub name: String,
|
||||
pub score: u32,
|
||||
pub high: usize,
|
||||
pub medium: usize,
|
||||
pub low: usize,
|
||||
pub total: usize,
|
||||
}
|
||||
|
||||
/// Confidence-level distribution.
|
||||
#[derive(Debug, Clone, Serialize, Default)]
|
||||
pub struct ConfidenceDistribution {
|
||||
pub high: usize,
|
||||
pub medium: usize,
|
||||
pub low: usize,
|
||||
pub none: usize,
|
||||
}
|
||||
|
||||
/// Engine-quality metrics that describe analysis depth/coverage.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct ScannerQuality {
|
||||
pub files_scanned: u64,
|
||||
pub files_skipped: u64,
|
||||
/// 0.0–1.0 — files_scanned / (files_scanned + files_skipped).
|
||||
pub parse_success_rate: f64,
|
||||
pub functions_analyzed: u64,
|
||||
pub call_edges: u64,
|
||||
pub unresolved_calls: u64,
|
||||
/// 0.0–1.0 — call_edges / (call_edges + unresolved_calls).
|
||||
pub call_resolution_rate: f64,
|
||||
/// % of taint findings that received a symbolic verdict (Confirmed|Infeasible|Inconclusive).
|
||||
pub symex_verified_rate: f64,
|
||||
/// Count broken down by symbolic verdict label.
|
||||
pub symex_breakdown: HashMap<String, usize>,
|
||||
}
|
||||
|
||||
/// One issue-category bucket (rule-family derived). Broader than OWASP, with
|
||||
/// engine-friendly labels like "Tainted Flow" or "Code Quality".
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct IssueCategoryBucket {
|
||||
pub label: String,
|
||||
pub count: usize,
|
||||
}
|
||||
|
||||
/// "Hot sink" — a single callee that absorbs many findings.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct HotSink {
|
||||
/// Callee name (best-effort; from flow_steps last Sink).
|
||||
pub callee: String,
|
||||
pub count: usize,
|
||||
}
|
||||
|
||||
/// One OWASP Top-10 (2021) bucket.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct OwaspBucket {
|
||||
/// "A01:2021 — Broken Access Control" etc.
|
||||
pub code: String,
|
||||
pub label: String,
|
||||
pub count: usize,
|
||||
}
|
||||
|
||||
/// Per-language posture.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct LanguageHealth {
|
||||
pub language: String,
|
||||
pub findings: usize,
|
||||
pub high: usize,
|
||||
pub medium: usize,
|
||||
pub low: usize,
|
||||
}
|
||||
|
||||
/// Suppression-quality breakdown.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct SuppressionHygiene {
|
||||
/// Findings explicitly triaged by fingerprint.
|
||||
pub fingerprint_level: usize,
|
||||
/// Findings suppressed by rule-level suppression.
|
||||
pub rule_level: usize,
|
||||
/// Findings suppressed by file-level suppression.
|
||||
pub file_level: usize,
|
||||
/// Findings suppressed by rule-in-file suppression.
|
||||
pub rule_in_file_level: usize,
|
||||
/// % of suppressed findings using low-specificity (rule/file/rule_in_file) rules.
|
||||
pub blanket_rate: f64,
|
||||
}
|
||||
|
||||
/// Pinned baseline scan and current drift relative to it.
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
pub struct BaselineInfo {
|
||||
pub scan_id: String,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub started_at: Option<String>,
|
||||
pub baseline_total: usize,
|
||||
pub drift_new: usize,
|
||||
pub drift_fixed: usize,
|
||||
}
|
||||
|
||||
/// A name + count pair for overview top-N lists.
|
||||
|
|
|
|||
140
src/server/observability.rs
Normal file
140
src/server/observability.rs
Normal file
|
|
@ -0,0 +1,140 @@
|
|||
//! Per-request observability: request IDs + structured access logs.
|
||||
//!
|
||||
//! Layered above the security guard. Generates a short request id, attaches it
|
||||
//! as the `X-Request-Id` response header, and emits one INFO record per request
|
||||
//! with method, path, status, and duration.
|
||||
|
||||
use axum::extract::Request;
|
||||
use axum::http::{HeaderName, HeaderValue};
|
||||
use axum::middleware::Next;
|
||||
use axum::response::Response;
|
||||
use std::time::Instant;
|
||||
use uuid::Uuid;
|
||||
|
||||
const REQUEST_ID_HEADER: HeaderName = HeaderName::from_static("x-request-id");
|
||||
|
||||
pub async fn observe(mut request: Request, next: Next) -> Response {
|
||||
let request_id = request
|
||||
.headers()
|
||||
.get(&REQUEST_ID_HEADER)
|
||||
.and_then(|v| v.to_str().ok())
|
||||
.map(|s| s.to_string())
|
||||
.unwrap_or_else(|| Uuid::new_v4().as_simple().to_string()[..12].to_string());
|
||||
|
||||
if let Ok(value) = HeaderValue::from_str(&request_id) {
|
||||
request.headers_mut().insert(REQUEST_ID_HEADER, value);
|
||||
}
|
||||
|
||||
let method = request.method().clone();
|
||||
let path = request
|
||||
.uri()
|
||||
.path_and_query()
|
||||
.map(|p| p.as_str().to_string())
|
||||
.unwrap_or_else(|| request.uri().path().to_string());
|
||||
|
||||
let started = Instant::now();
|
||||
let mut response = next.run(request).await;
|
||||
let elapsed_ms = started.elapsed().as_secs_f64() * 1000.0;
|
||||
let status = response.status();
|
||||
|
||||
if let Ok(value) = HeaderValue::from_str(&request_id) {
|
||||
response.headers_mut().insert(REQUEST_ID_HEADER, value);
|
||||
}
|
||||
|
||||
// Skip noisy SSE channel — long-lived stream pollutes logs.
|
||||
if path != "/api/events" {
|
||||
if status.is_server_error() {
|
||||
tracing::error!(
|
||||
request_id = %request_id,
|
||||
method = %method,
|
||||
path = %path,
|
||||
status = status.as_u16(),
|
||||
elapsed_ms = format!("{elapsed_ms:.1}"),
|
||||
"request"
|
||||
);
|
||||
} else if status.is_client_error() {
|
||||
tracing::warn!(
|
||||
request_id = %request_id,
|
||||
method = %method,
|
||||
path = %path,
|
||||
status = status.as_u16(),
|
||||
elapsed_ms = format!("{elapsed_ms:.1}"),
|
||||
"request"
|
||||
);
|
||||
} else {
|
||||
tracing::info!(
|
||||
request_id = %request_id,
|
||||
method = %method,
|
||||
path = %path,
|
||||
status = status.as_u16(),
|
||||
elapsed_ms = format!("{elapsed_ms:.1}"),
|
||||
"request"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
response
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use axum::Router;
|
||||
use axum::body::Body;
|
||||
use axum::http::{Request as HttpRequest, StatusCode};
|
||||
use axum::middleware;
|
||||
use axum::routing::get;
|
||||
use tower::util::ServiceExt;
|
||||
|
||||
#[tokio::test]
|
||||
async fn adds_request_id_header_when_absent() {
|
||||
let app: Router = Router::new()
|
||||
.route("/ping", get(|| async { "pong" }))
|
||||
.layer(middleware::from_fn(observe));
|
||||
|
||||
let resp = app
|
||||
.oneshot(
|
||||
HttpRequest::builder()
|
||||
.uri("/ping")
|
||||
.body(Body::empty())
|
||||
.unwrap(),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(resp.status(), StatusCode::OK);
|
||||
let id = resp
|
||||
.headers()
|
||||
.get("x-request-id")
|
||||
.unwrap()
|
||||
.to_str()
|
||||
.unwrap();
|
||||
assert!(!id.is_empty());
|
||||
assert_eq!(id.len(), 12);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn preserves_caller_supplied_request_id() {
|
||||
let app: Router = Router::new()
|
||||
.route("/ping", get(|| async { "pong" }))
|
||||
.layer(middleware::from_fn(observe));
|
||||
|
||||
let resp = app
|
||||
.oneshot(
|
||||
HttpRequest::builder()
|
||||
.uri("/ping")
|
||||
.header("x-request-id", "abc-123")
|
||||
.body(Body::empty())
|
||||
.unwrap(),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(
|
||||
resp.headers()
|
||||
.get("x-request-id")
|
||||
.unwrap()
|
||||
.to_str()
|
||||
.unwrap(),
|
||||
"abc-123"
|
||||
);
|
||||
}
|
||||
}
|
||||
236
src/server/owasp.rs
Normal file
236
src/server/owasp.rs
Normal file
|
|
@ -0,0 +1,236 @@
|
|||
//! Static rule-id → OWASP Top-10 (2021) mapping for the dashboard.
|
||||
//!
|
||||
//! Rule IDs follow the convention `{lang}.{family}.{name}` (e.g. `js.xss.outer_html`).
|
||||
//! The family segment is what determines the bucket. Conservative — when in doubt,
|
||||
//! map to the closest fit; rules with no obvious bucket are left unbucketed.
|
||||
|
||||
use crate::server::models::OwaspBucket;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Extract the family token from a rule ID. Handles two ID shapes:
|
||||
/// 1. `lang.family.name` — typical (e.g. `js.xss.outer_html`)
|
||||
/// 2. `family-subname` or single-segment — engine-emitted (e.g.
|
||||
/// `state-resource-leak`, `taint-unsanitised-flow`, `cfg-error-fallthrough`)
|
||||
fn extract_family(rule_id: &str) -> &str {
|
||||
if let Some(idx) = rule_id.find('.') {
|
||||
let after = &rule_id[idx + 1..];
|
||||
return match after.find('.') {
|
||||
Some(n) => &after[..n],
|
||||
None => after,
|
||||
};
|
||||
}
|
||||
if let Some(idx) = rule_id.find('-') {
|
||||
return &rule_id[..idx];
|
||||
}
|
||||
rule_id
|
||||
}
|
||||
|
||||
/// Return the OWASP 2021 (code, label) pair for a given rule id, or `None` if unmapped.
|
||||
pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> {
|
||||
let family = extract_family(rule_id);
|
||||
if family.is_empty() {
|
||||
return None;
|
||||
}
|
||||
|
||||
Some(match family {
|
||||
// A01 — Broken Access Control
|
||||
"auth" | "csrf" | "mass_assign" | "path" | "redirect" => ("A01", "Broken Access Control"),
|
||||
// A02 — Cryptographic Failures
|
||||
"crypto" | "secrets" => ("A02", "Cryptographic Failures"),
|
||||
// A03 — Injection (covers SQLi, XSS, command, code-eval, template, NoSQL, LDAP, reflection,
|
||||
// and engine-level taint findings without a more specific family tag).
|
||||
"sqli" | "xss" | "cmdi" | "code_exec" | "template" | "nosql" | "ldap" | "reflection"
|
||||
| "taint" => ("A03", "Injection"),
|
||||
// A05 — Security Misconfiguration (TLS verify off, cookie flags, prototype pollution)
|
||||
"config" | "transport" | "prototype" => ("A05", "Security Misconfiguration"),
|
||||
// A08 — Software and Data Integrity Failures
|
||||
"deser" => ("A08", "Software and Data Integrity Failures"),
|
||||
// A09 — Logging & Monitoring Failures
|
||||
"log" => ("A09", "Logging and Monitoring Failures"),
|
||||
// A10 — SSRF
|
||||
"ssrf" => ("A10", "Server-Side Request Forgery"),
|
||||
// Memory-safety + state-machine resource lifecycle bugs — closest OWASP fit is
|
||||
// A04 Insecure Design (defensive depth).
|
||||
"memory" | "state" => ("A04", "Insecure Design"),
|
||||
// Quality findings (e.g. rs.quality.unwrap) and CFG structural issues
|
||||
// (cfg-error-fallthrough) are reliability / code-health, not direct OWASP
|
||||
// categories. We return None so they don't pollute the security buckets.
|
||||
_ => return None,
|
||||
})
|
||||
}
|
||||
|
||||
/// Bucket all rule-id counts into OWASP categories, returning sorted-desc.
|
||||
pub fn bucket_findings(by_rule: &HashMap<String, usize>) -> Vec<OwaspBucket> {
|
||||
let mut totals: HashMap<&'static str, (&'static str, usize)> = HashMap::new();
|
||||
for (rule_id, &count) in by_rule {
|
||||
if let Some((code, label)) = owasp_bucket_for(rule_id) {
|
||||
let entry = totals.entry(code).or_insert((label, 0));
|
||||
entry.1 += count;
|
||||
}
|
||||
}
|
||||
let mut out: Vec<OwaspBucket> = totals
|
||||
.into_iter()
|
||||
.map(|(code, (label, count))| OwaspBucket {
|
||||
code: code.to_string(),
|
||||
label: label.to_string(),
|
||||
count,
|
||||
})
|
||||
.collect();
|
||||
out.sort_by(|a, b| b.count.cmp(&a.count).then_with(|| a.code.cmp(&b.code)));
|
||||
out
|
||||
}
|
||||
|
||||
/// Bucket rule-id counts into issue categories using the family segment.
|
||||
/// Broader than OWASP, with friendlier labels (e.g. "Tainted Flow", "Code Quality").
|
||||
pub fn issue_categories(
|
||||
by_rule: &HashMap<String, usize>,
|
||||
) -> Vec<crate::server::models::IssueCategoryBucket> {
|
||||
let mut totals: HashMap<&'static str, usize> = HashMap::new();
|
||||
for (rule_id, &count) in by_rule {
|
||||
let label = issue_category_label(rule_id);
|
||||
*totals.entry(label).or_insert(0) += count;
|
||||
}
|
||||
let mut out: Vec<_> = totals
|
||||
.into_iter()
|
||||
.map(
|
||||
|(label, count)| crate::server::models::IssueCategoryBucket {
|
||||
label: label.to_string(),
|
||||
count,
|
||||
},
|
||||
)
|
||||
.collect();
|
||||
out.sort_by(|a, b| b.count.cmp(&a.count).then_with(|| a.label.cmp(&b.label)));
|
||||
out
|
||||
}
|
||||
|
||||
fn issue_category_label(rule_id: &str) -> &'static str {
|
||||
match extract_family(rule_id) {
|
||||
"sqli" => "SQL Injection",
|
||||
"xss" => "Cross-Site Scripting",
|
||||
"cmdi" => "Command Injection",
|
||||
"code_exec" => "Code Execution",
|
||||
"deser" => "Deserialization",
|
||||
"ssrf" => "SSRF",
|
||||
"path" => "Path Traversal",
|
||||
"auth" => "Access Control",
|
||||
"csrf" => "CSRF",
|
||||
"mass_assign" => "Mass Assignment",
|
||||
"crypto" => "Weak Crypto",
|
||||
"secrets" => "Hardcoded Secrets",
|
||||
"config" => "Misconfiguration",
|
||||
"transport" => "Insecure Transport",
|
||||
"prototype" => "Prototype Pollution",
|
||||
"memory" => "Memory Safety",
|
||||
"reflection" => "Reflection",
|
||||
"redirect" => "Open Redirect",
|
||||
"log" => "Logging",
|
||||
"template" => "Template Injection",
|
||||
"taint" => "Tainted Flow",
|
||||
"state" => "Resource Lifecycle",
|
||||
"cfg" => "Control-Flow",
|
||||
"quality" => "Code Quality",
|
||||
_ => "Other",
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn maps_xss_to_a03() {
|
||||
assert_eq!(
|
||||
owasp_bucket_for("js.xss.outer_html"),
|
||||
Some(("A03", "Injection"))
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn maps_auth_to_a01() {
|
||||
assert_eq!(
|
||||
owasp_bucket_for("rs.auth.missing_ownership_check"),
|
||||
Some(("A01", "Broken Access Control"))
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_family_returns_none() {
|
||||
assert_eq!(owasp_bucket_for("js.weirdthing.foo"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn malformed_rule_returns_none() {
|
||||
// single-segment "not" → family "not" → unmapped → None
|
||||
assert_eq!(owasp_bucket_for("not-a-rule"), None);
|
||||
// "js.onlytwo" — family is "onlytwo" which is unmapped
|
||||
assert_eq!(owasp_bucket_for("js.onlytwo"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_family_handles_dashed_ids() {
|
||||
assert_eq!(extract_family("state-resource-leak"), "state");
|
||||
assert_eq!(extract_family("taint-unsanitised-flow"), "taint");
|
||||
assert_eq!(extract_family("cfg-error-fallthrough"), "cfg");
|
||||
assert_eq!(extract_family("rs.quality.unwrap"), "quality");
|
||||
assert_eq!(extract_family(""), "");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn taint_findings_bucket_to_a03() {
|
||||
assert_eq!(
|
||||
owasp_bucket_for("taint-unsanitised-flow"),
|
||||
Some(("A03", "Injection"))
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn quality_and_cfg_are_not_owasp() {
|
||||
assert_eq!(owasp_bucket_for("rs.quality.unwrap"), None);
|
||||
assert_eq!(owasp_bucket_for("cfg-error-fallthrough"), None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn issue_category_handles_engine_ids() {
|
||||
assert_eq!(issue_category_label("rs.quality.unwrap"), "Code Quality");
|
||||
assert_eq!(
|
||||
issue_category_label("state-resource-leak"),
|
||||
"Resource Lifecycle"
|
||||
);
|
||||
assert_eq!(
|
||||
issue_category_label("cfg-error-fallthrough"),
|
||||
"Control-Flow"
|
||||
);
|
||||
assert_eq!(
|
||||
issue_category_label("taint-unsanitised-flow"),
|
||||
"Tainted Flow"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bucket_findings_sorts_desc() {
|
||||
let mut m = HashMap::new();
|
||||
m.insert("js.xss.outer_html".to_string(), 3);
|
||||
m.insert("rs.auth.missing_ownership_check".to_string(), 5);
|
||||
m.insert("js.crypto.math_random".to_string(), 2);
|
||||
let out = bucket_findings(&m);
|
||||
assert_eq!(out[0].code, "A01");
|
||||
assert_eq!(out[0].count, 5);
|
||||
assert_eq!(out[1].code, "A03");
|
||||
assert_eq!(out[1].count, 3);
|
||||
assert_eq!(out[2].code, "A02");
|
||||
assert_eq!(out[2].count, 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn issue_category_label_recognises_simple_families() {
|
||||
assert_eq!(
|
||||
issue_category_label("js.xss.outer_html"),
|
||||
"Cross-Site Scripting"
|
||||
);
|
||||
assert_eq!(
|
||||
issue_category_label("py.cmdi.os_system"),
|
||||
"Command Injection"
|
||||
);
|
||||
assert_eq!(issue_category_label("garbage"), "Other");
|
||||
}
|
||||
}
|
||||
|
|
@ -1,16 +1,17 @@
|
|||
use crate::commands::config as config_cmd;
|
||||
use crate::labels;
|
||||
use crate::server::app::{AppState, ServerEvent};
|
||||
use crate::server::models::{LabelEntryView, ProfileView, RuleView, TerminatorView};
|
||||
use crate::utils::config::{CapName, RuleKind, ScanProfile};
|
||||
use crate::utils::config::{CapName, Config, RuleKind, ScanProfile};
|
||||
use axum::extract::{Path, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::routing::get;
|
||||
use axum::{Json, Router};
|
||||
use std::fs;
|
||||
|
||||
pub fn routes() -> Router<AppState> {
|
||||
Router::new()
|
||||
.route("/config", get(get_config))
|
||||
.route("/config/raw", get(get_config_raw).put(put_config_raw))
|
||||
.route(
|
||||
"/config/rules",
|
||||
get(list_rules).post(add_rule).delete(remove_rule),
|
||||
|
|
@ -55,6 +56,67 @@ async fn get_config(State(state): State<AppState>) -> Json<serde_json::Value> {
|
|||
Json(serde_json::to_value(&*config).unwrap_or_default())
|
||||
}
|
||||
|
||||
// ── Raw nyx.local read/write ─────────────────────────────────────────────────
|
||||
|
||||
async fn get_config_raw(State(state): State<AppState>) -> Json<serde_json::Value> {
|
||||
let local_path = state.config_dir.join("nyx.local");
|
||||
let exists = local_path.exists();
|
||||
let content = if exists {
|
||||
fs::read_to_string(&local_path).unwrap_or_default()
|
||||
} else {
|
||||
String::new()
|
||||
};
|
||||
|
||||
Json(serde_json::json!({
|
||||
"path": local_path.display().to_string(),
|
||||
"exists": exists,
|
||||
"content": content,
|
||||
}))
|
||||
}
|
||||
|
||||
async fn put_config_raw(
|
||||
State(state): State<AppState>,
|
||||
Json(body): Json<serde_json::Value>,
|
||||
) -> Result<Json<serde_json::Value>, (StatusCode, Json<serde_json::Value>)> {
|
||||
let content = body
|
||||
.get("content")
|
||||
.and_then(|v| v.as_str())
|
||||
.ok_or_else(|| bad_request("missing content field"))?
|
||||
.to_string();
|
||||
|
||||
// Validate by parsing into Config (round-trip check).
|
||||
let parsed: Config =
|
||||
toml::from_str(&content).map_err(|e| bad_request(&format!("invalid TOML: {e}")))?;
|
||||
if let Err(errs) = parsed.validate() {
|
||||
let joined = errs
|
||||
.iter()
|
||||
.map(|e| e.to_string())
|
||||
.collect::<Vec<_>>()
|
||||
.join("; ");
|
||||
return Err(bad_request(&format!("config validation failed: {joined}")));
|
||||
}
|
||||
|
||||
let local_path = state.config_dir.join("nyx.local");
|
||||
fs::write(&local_path, &content)
|
||||
.map_err(|e| bad_request(&format!("failed to write {}: {e}", local_path.display())))?;
|
||||
|
||||
// Reload the merged config so live state matches the file.
|
||||
match Config::load(&state.config_dir) {
|
||||
Ok((reloaded, _note)) => {
|
||||
*state.config.write() = reloaded;
|
||||
}
|
||||
Err(e) => return Err(bad_request(&format!("config reload failed: {e}"))),
|
||||
}
|
||||
|
||||
let _ = state.event_tx.send(ServerEvent::ConfigChanged);
|
||||
|
||||
Ok(Json(serde_json::json!({
|
||||
"status": "ok",
|
||||
"path": local_path.display().to_string(),
|
||||
"bytes": content.len(),
|
||||
})))
|
||||
}
|
||||
|
||||
// ── Custom rules (existing endpoints) ────────────────────────────────────────
|
||||
|
||||
async fn list_rules(State(state): State<AppState>) -> Json<Vec<RuleView>> {
|
||||
|
|
@ -220,29 +282,17 @@ async fn remove_terminator(
|
|||
// ── Sources / Sinks / Sanitizers (by kind) ───────────────────────────────────
|
||||
|
||||
fn list_by_kind(state: &AppState, target_kind: &str) -> Vec<LabelEntryView> {
|
||||
let builtins = labels::enumerate_builtin_rules();
|
||||
let config = state.config.read();
|
||||
|
||||
let mut out: Vec<LabelEntryView> = builtins
|
||||
.iter()
|
||||
.filter(|r| r.kind == target_kind && !r.is_gated)
|
||||
.map(|r| LabelEntryView {
|
||||
lang: r.language.clone(),
|
||||
matchers: r.matchers.clone(),
|
||||
cap: r.cap.clone(),
|
||||
case_sensitive: r.case_sensitive,
|
||||
is_builtin: true,
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Add custom rules of the target kind
|
||||
// Built-in rules live on /api/rules — keep this endpoint focused on the
|
||||
// user's own additions in nyx.local.
|
||||
let target_rule_kind = match target_kind {
|
||||
"source" => RuleKind::Source,
|
||||
"sanitizer" => RuleKind::Sanitizer,
|
||||
"sink" => RuleKind::Sink,
|
||||
_ => return out,
|
||||
_ => return Vec::new(),
|
||||
};
|
||||
|
||||
let config = state.config.read();
|
||||
let mut out: Vec<LabelEntryView> = Vec::new();
|
||||
for (lang, lang_cfg) in &config.analysis.languages {
|
||||
for cr in &lang_cfg.rules {
|
||||
if cr.kind == target_rule_kind {
|
||||
|
|
@ -256,7 +306,6 @@ fn list_by_kind(state: &AppState, target_kind: &str) -> Vec<LabelEntryView> {
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
out
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -26,6 +26,9 @@ pub fn routes() -> Router<AppState> {
|
|||
.route("/debug/call-graph", get(get_call_graph))
|
||||
.route("/debug/abstract-interp", get(get_abstract_interp))
|
||||
.route("/debug/symex", get(get_symex))
|
||||
.route("/debug/pointer", get(get_pointer))
|
||||
.route("/debug/type-facts", get(get_type_facts))
|
||||
.route("/debug/auth", get(get_auth))
|
||||
}
|
||||
|
||||
// ── Query params ─────────────────────────────────────────────────────────────
|
||||
|
|
@ -117,7 +120,7 @@ async fn get_ssa(
|
|||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let analysis = debug::analyse_file(&path, &config)?;
|
||||
let (ssa, _opt) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
let (ssa, _opt, _cfg) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
Ok(Json(SsaBodyView::from_ssa(&ssa, &analysis.bytes)))
|
||||
}
|
||||
|
||||
|
|
@ -130,7 +133,7 @@ async fn get_taint(
|
|||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let analysis = debug::analyse_file(&path, &config)?;
|
||||
let (ssa, opt) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
let (ssa, opt, body_cfg) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
|
||||
// Try to load global summaries from DB for cross-file context
|
||||
let global = load_global_summaries(&state);
|
||||
|
|
@ -141,7 +144,7 @@ async fn get_taint(
|
|||
|
||||
let (events, _entry_states, exit_states) = debug::analyse_function_taint(
|
||||
&ssa,
|
||||
analysis.cfg(),
|
||||
body_cfg,
|
||||
analysis.lang,
|
||||
analysis.summaries(),
|
||||
global.as_ref(),
|
||||
|
|
@ -168,13 +171,13 @@ async fn get_abstract_interp(
|
|||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let analysis = debug::analyse_file(&path, &config)?;
|
||||
let (ssa, opt) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
let (ssa, opt, body_cfg) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
|
||||
let global = load_global_summaries(&state);
|
||||
|
||||
let (_events, block_states, _exit_states) = debug::analyse_function_taint(
|
||||
&ssa,
|
||||
analysis.cfg(),
|
||||
body_cfg,
|
||||
analysis.lang,
|
||||
analysis.summaries(),
|
||||
global.as_ref(),
|
||||
|
|
@ -262,16 +265,59 @@ async fn get_symex(
|
|||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let analysis = debug::analyse_file(&path, &config)?;
|
||||
let (ssa, opt) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
let (ssa, opt, body_cfg) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
|
||||
let global = load_global_summaries(&state);
|
||||
|
||||
let sym_state =
|
||||
debug::analyse_function_symex(&ssa, analysis.cfg(), analysis.lang, &opt, global.as_ref());
|
||||
debug::analyse_function_symex(&ssa, body_cfg, analysis.lang, &opt, global.as_ref());
|
||||
|
||||
Ok(Json(SymexView::from_symbolic_state(&sym_state, &ssa)))
|
||||
}
|
||||
|
||||
/// GET /api/debug/pointer?file=<path>&function=<name>
|
||||
/// Return the field-sensitive Steensgaard points-to facts for a function.
|
||||
async fn get_pointer(
|
||||
State(state): State<AppState>,
|
||||
Query(q): Query<FileFunctionQuery>,
|
||||
) -> Result<Json<PointerView>, StatusCode> {
|
||||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let analysis = debug::analyse_file(&path, &config)?;
|
||||
let (ssa, facts) = debug::analyse_function_pointer(&analysis, &q.function)?;
|
||||
Ok(Json(PointerView::from_facts(&facts, &ssa)))
|
||||
}
|
||||
|
||||
/// GET /api/debug/type-facts?file=<path>&function=<name>
|
||||
/// Return per-function type-fact details derived from the SSA optimiser.
|
||||
async fn get_type_facts(
|
||||
State(state): State<AppState>,
|
||||
Query(q): Query<FileFunctionQuery>,
|
||||
) -> Result<Json<TypeFactsView>, StatusCode> {
|
||||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let analysis = debug::analyse_file(&path, &config)?;
|
||||
let (ssa, opt, _cfg) = debug::analyse_function_ssa(&analysis, &q.function)?;
|
||||
Ok(Json(TypeFactsView::from_optimize(
|
||||
&opt,
|
||||
&ssa,
|
||||
&analysis.bytes,
|
||||
)))
|
||||
}
|
||||
|
||||
/// GET /api/debug/auth?file=<path>
|
||||
/// Return the file-scoped authorization model — routes, units,
|
||||
/// sensitive operations, and auth checks — for the debug UI.
|
||||
async fn get_auth(
|
||||
State(state): State<AppState>,
|
||||
Query(q): Query<FileQuery>,
|
||||
) -> Result<Json<AuthAnalysisView>, StatusCode> {
|
||||
let path = validate_and_resolve(&state.scan_root, &q.file)?;
|
||||
let config = state.config.read();
|
||||
let (model, bytes, enabled) = debug::analyse_file_auth(&path, &config)?;
|
||||
Ok(Json(AuthAnalysisView::from_model(&model, &bytes, enabled)))
|
||||
}
|
||||
|
||||
// ── Helpers ──────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Load global summaries from DB if available.
|
||||
|
|
@ -396,7 +442,9 @@ mod tests {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
)],
|
||||
)
|
||||
|
|
@ -466,6 +514,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
},
|
||||
false,
|
||||
false,
|
||||
|
|
@ -486,6 +536,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
},
|
||||
true,
|
||||
true,
|
||||
|
|
@ -506,6 +558,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
},
|
||||
true,
|
||||
false,
|
||||
|
|
@ -599,7 +653,9 @@ mod tests {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
)],
|
||||
)
|
||||
|
|
|
|||
|
|
@ -54,7 +54,19 @@ struct TreeEntry {
|
|||
#[derive(Debug, Serialize)]
|
||||
struct SymbolEntry {
|
||||
name: String,
|
||||
/// Legacy display kind (`"function"` / `"method"`) used by existing CSS
|
||||
/// classes in the frontend. Kept for backward-compat — new consumers
|
||||
/// should prefer `func_kind`.
|
||||
kind: String,
|
||||
/// Structural [`crate::symbol::FuncKind`] slug (`"fn"`, `"method"`,
|
||||
/// `"closure"`, `"ctor"`, `"getter"`, `"setter"`, `"toplevel"`). Lets
|
||||
/// the UI distinguish anonymous closures (`<anon#N>`) from named
|
||||
/// functions and offer a default-hide toggle.
|
||||
func_kind: String,
|
||||
/// Enclosing container path (class / impl / module / outer function).
|
||||
/// Empty for free top-level functions. Surfaced so the UI can render
|
||||
/// closures as `<anon#N> [in outer_fn]`.
|
||||
container: String,
|
||||
line: Option<usize>,
|
||||
finding_count: usize,
|
||||
namespace: Option<String>,
|
||||
|
|
@ -278,16 +290,21 @@ async fn get_symbols(
|
|||
|
||||
let entries: Vec<SymbolEntry> = symbols
|
||||
.into_iter()
|
||||
.map(|(name, arity, _lang, namespace)| {
|
||||
let kind = if !namespace.is_empty() && namespace != name {
|
||||
"method".to_string()
|
||||
} else {
|
||||
"function".to_string()
|
||||
.map(|(name, arity, _lang, namespace, container, func_kind)| {
|
||||
// Legacy `kind` field — still used by existing CSS classes
|
||||
// (`symbol-kind-method`, `symbol-kind-function`). Map any
|
||||
// method-like FuncKind onto `"method"` and everything else
|
||||
// onto `"function"` so the rendered icon stays sensible.
|
||||
let kind = match func_kind.as_str() {
|
||||
"method" | "ctor" | "getter" | "setter" => "method".to_string(),
|
||||
_ => "function".to_string(),
|
||||
};
|
||||
let finding_count = func_finding_counts.get(&name).copied().unwrap_or(0);
|
||||
SymbolEntry {
|
||||
name,
|
||||
kind,
|
||||
func_kind,
|
||||
container,
|
||||
line: None,
|
||||
finding_count,
|
||||
namespace: if namespace.is_empty() {
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
use crate::server::app::AppState;
|
||||
use crate::server::error::{ApiError, ApiResult};
|
||||
use crate::utils::path::{DEFAULT_UI_MAX_FILE_BYTES, RepoPathError, open_repo_text_file};
|
||||
use axum::extract::{Query, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::routing::get;
|
||||
use axum::{Json, Router};
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
|
@ -33,9 +33,9 @@ struct FileResponse {
|
|||
async fn get_file(
|
||||
State(state): State<AppState>,
|
||||
Query(query): Query<FileQuery>,
|
||||
) -> Result<Json<FileResponse>, StatusCode> {
|
||||
) -> ApiResult<Json<FileResponse>> {
|
||||
let opened = open_repo_text_file(&state.scan_root, &query.path, DEFAULT_UI_MAX_FILE_BYTES)
|
||||
.map_err(map_path_error)?;
|
||||
.map_err(|e| map_path_error(e, &query.path))?;
|
||||
let content = opened.content;
|
||||
let all_lines: Vec<&str> = content.lines().collect();
|
||||
let total_lines = all_lines.len();
|
||||
|
|
@ -64,14 +64,25 @@ async fn get_file(
|
|||
}))
|
||||
}
|
||||
|
||||
fn map_path_error(err: RepoPathError) -> StatusCode {
|
||||
fn map_path_error(err: RepoPathError, path: &str) -> ApiError {
|
||||
match err {
|
||||
RepoPathError::InvalidPath | RepoPathError::OutsideRoot => StatusCode::FORBIDDEN,
|
||||
RepoPathError::NotFound => StatusCode::NOT_FOUND,
|
||||
RepoPathError::TooLarge
|
||||
| RepoPathError::InvalidText
|
||||
| RepoPathError::NotFile
|
||||
| RepoPathError::NotDirectory => StatusCode::BAD_REQUEST,
|
||||
RepoPathError::Io => StatusCode::INTERNAL_SERVER_ERROR,
|
||||
RepoPathError::InvalidPath => ApiError::forbidden(format!("invalid path: {path}")),
|
||||
RepoPathError::OutsideRoot => {
|
||||
ApiError::forbidden(format!("path outside scan root: {path}"))
|
||||
}
|
||||
RepoPathError::NotFound => ApiError::not_found(format!("file not found: {path}")),
|
||||
RepoPathError::TooLarge => {
|
||||
ApiError::bad_request(format!("file too large to display: {path}"))
|
||||
}
|
||||
RepoPathError::InvalidText => {
|
||||
ApiError::bad_request(format!("file is not valid UTF-8 text: {path}"))
|
||||
}
|
||||
RepoPathError::NotFile => {
|
||||
ApiError::bad_request(format!("path is not a regular file: {path}"))
|
||||
}
|
||||
RepoPathError::NotDirectory => {
|
||||
ApiError::bad_request(format!("path is not a directory: {path}"))
|
||||
}
|
||||
RepoPathError::Io => ApiError::internal(format!("I/O error reading: {path}")),
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -2,13 +2,13 @@
|
|||
|
||||
use crate::commands::scan::Diag;
|
||||
use crate::database::index::Indexer;
|
||||
use crate::server::app::AppState;
|
||||
use crate::server::app::{AppState, CachedFindings};
|
||||
use crate::server::error::{ApiError, ApiResult};
|
||||
use crate::server::models::{
|
||||
FilterValues, FindingSummary, FindingView, collect_filter_values, finding_from_diag,
|
||||
finding_from_diag_with_detail, overlay_triage_states, summarize_findings,
|
||||
};
|
||||
use axum::extract::{Path, Query, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::routing::get;
|
||||
use axum::{Json, Router};
|
||||
use serde::Deserialize;
|
||||
|
|
@ -22,16 +22,30 @@ pub fn routes() -> Router<AppState> {
|
|||
.route("/findings/{index}", get(get_finding))
|
||||
}
|
||||
|
||||
/// Sentinel job id for "we read this from SQLite, not from JobManager."
|
||||
/// Used as the cache key when no in-memory job exists (e.g. fresh server boot).
|
||||
const DB_FALLBACK_KEY: &str = "__db_fallback__";
|
||||
|
||||
/// Bundle returned by [`load_latest_findings`]: the raw diags plus the cache
|
||||
/// key under which their derived views should be stored. The cache key is the
|
||||
/// in-memory job id when available, or [`DB_FALLBACK_KEY`] when we fell back
|
||||
/// to SQLite.
|
||||
struct LoadedFindings {
|
||||
cache_key: String,
|
||||
findings: Arc<Vec<Diag>>,
|
||||
}
|
||||
|
||||
/// Load findings for the latest completed scan, falling back to DB if no
|
||||
/// in-memory completed scan exists (e.g. after a server restart).
|
||||
pub fn load_latest_findings(state: &AppState) -> Arc<Vec<Diag>> {
|
||||
// In-memory first
|
||||
fn load_latest_findings_internal(state: &AppState) -> LoadedFindings {
|
||||
if let Some(job) = state.job_manager.get_latest_completed() {
|
||||
if let Some(ref findings) = job.findings {
|
||||
return Arc::clone(findings);
|
||||
return LoadedFindings {
|
||||
cache_key: job.id.clone(),
|
||||
findings: Arc::clone(findings),
|
||||
};
|
||||
}
|
||||
}
|
||||
// DB fallback — find the most recent completed scan with findings
|
||||
if let Some(ref pool) = state.db_pool {
|
||||
if let Ok(idx) = Indexer::from_pool("_scans", pool) {
|
||||
if let Ok(scans) = idx.list_scans(20) {
|
||||
|
|
@ -39,7 +53,10 @@ pub fn load_latest_findings(state: &AppState) -> Arc<Vec<Diag>> {
|
|||
if scan.status == "completed" {
|
||||
if let Some(json) = scan.findings_json.as_deref() {
|
||||
if let Ok(diags) = serde_json::from_str::<Vec<Diag>>(json) {
|
||||
return Arc::new(diags);
|
||||
return LoadedFindings {
|
||||
cache_key: format!("{DB_FALLBACK_KEY}:{}", scan.id),
|
||||
findings: Arc::new(diags),
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -47,10 +64,61 @@ pub fn load_latest_findings(state: &AppState) -> Arc<Vec<Diag>> {
|
|||
}
|
||||
}
|
||||
}
|
||||
Arc::new(Vec::new())
|
||||
LoadedFindings {
|
||||
cache_key: DB_FALLBACK_KEY.to_string(),
|
||||
findings: Arc::new(Vec::new()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Build (or fetch from cache) the per-scan derived views.
|
||||
///
|
||||
/// Returns clones of `Arc`s so callers can drop the lock immediately and work
|
||||
/// without contention. Triage state is *not* baked into the cached views — it
|
||||
/// changes on a different cadence and is overlaid per request.
|
||||
fn cached_for_latest(state: &AppState) -> CachedFindings {
|
||||
let loaded = load_latest_findings_internal(state);
|
||||
|
||||
// Fast path: cache hit for the same job id.
|
||||
if let Some(cached) = state.findings_cache.read().as_ref() {
|
||||
if cached.job_id == loaded.cache_key {
|
||||
return cached.clone();
|
||||
}
|
||||
}
|
||||
|
||||
// Slow path: rebuild. Guard against concurrent rebuilds of the same key —
|
||||
// a second writer that finds the cache already populated for our key
|
||||
// simply returns it.
|
||||
let mut guard = state.findings_cache.write();
|
||||
if let Some(existing) = guard.as_ref() {
|
||||
if existing.job_id == loaded.cache_key {
|
||||
return existing.clone();
|
||||
}
|
||||
}
|
||||
|
||||
let views: Vec<FindingView> = loaded
|
||||
.findings
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, d)| finding_from_diag(i, d))
|
||||
.collect();
|
||||
let summary = summarize_findings(&loaded.findings);
|
||||
let filters = collect_filter_values(&loaded.findings);
|
||||
|
||||
let entry = CachedFindings {
|
||||
job_id: loaded.cache_key,
|
||||
views: Arc::new(views),
|
||||
summary: Arc::new(summary),
|
||||
filters: Arc::new(filters),
|
||||
};
|
||||
*guard = Some(entry.clone());
|
||||
entry
|
||||
}
|
||||
|
||||
/// Load triage states and suppression rules from DB, apply to views.
|
||||
///
|
||||
/// Triage state is overlaid onto a freshly-cloned `Vec` rather than mutating
|
||||
/// the cached views so concurrent readers see consistent data and the cache
|
||||
/// stays valid across triage edits.
|
||||
fn apply_triage_overlay(state: &AppState, views: &mut [FindingView]) {
|
||||
if let Some(ref pool) = state.db_pool {
|
||||
if let Ok(idx) = Indexer::from_pool("_triage", pool) {
|
||||
|
|
@ -80,19 +148,11 @@ struct FindingsQuery {
|
|||
async fn list_findings(
|
||||
State(state): State<AppState>,
|
||||
Query(query): Query<FindingsQuery>,
|
||||
) -> Result<Json<serde_json::Value>, StatusCode> {
|
||||
let findings = load_latest_findings(&state);
|
||||
|
||||
let mut views: Vec<FindingView> = findings
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, d)| finding_from_diag(i, d))
|
||||
.collect();
|
||||
|
||||
// Overlay triage states from DB before filtering
|
||||
) -> ApiResult<Json<serde_json::Value>> {
|
||||
let cached = cached_for_latest(&state);
|
||||
let mut views: Vec<FindingView> = (*cached.views).clone();
|
||||
apply_triage_overlay(&state, &mut views);
|
||||
|
||||
// Apply filters.
|
||||
if let Some(ref sev) = query.severity {
|
||||
let sev_upper = sev.to_ascii_uppercase();
|
||||
views.retain(|f| f.severity.as_db_str() == sev_upper);
|
||||
|
|
@ -138,7 +198,6 @@ async fn list_findings(
|
|||
});
|
||||
}
|
||||
|
||||
// Sort.
|
||||
match query.sort_by.as_deref() {
|
||||
Some("severity") => views.sort_by_key(|a| a.severity),
|
||||
Some("path") | Some("file") => views.sort_by(|a, b| a.path.cmp(&b.path)),
|
||||
|
|
@ -163,13 +222,12 @@ async fn list_findings(
|
|||
}),
|
||||
Some("status") => views.sort_by(|a, b| a.status.cmp(&b.status)),
|
||||
Some("category") => views.sort_by_key(|a| a.category.to_string()),
|
||||
_ => {} // default order (by index)
|
||||
_ => {}
|
||||
}
|
||||
if query.sort_dir.as_deref() == Some("desc") {
|
||||
views.reverse();
|
||||
}
|
||||
|
||||
// Paginate.
|
||||
let total = views.len();
|
||||
let page = query.page.unwrap_or(1).max(1);
|
||||
let per_page = query.per_page.unwrap_or(50).clamp(1, 10000);
|
||||
|
|
@ -185,22 +243,28 @@ async fn list_findings(
|
|||
}
|
||||
|
||||
async fn findings_summary(State(state): State<AppState>) -> Json<FindingSummary> {
|
||||
let findings = load_latest_findings(&state);
|
||||
Json(summarize_findings(&findings))
|
||||
Json((*cached_for_latest(&state).summary).clone())
|
||||
}
|
||||
|
||||
async fn findings_filters(State(state): State<AppState>) -> Json<FilterValues> {
|
||||
let findings = load_latest_findings(&state);
|
||||
Json(collect_filter_values(&findings))
|
||||
Json((*cached_for_latest(&state).filters).clone())
|
||||
}
|
||||
|
||||
async fn get_finding(
|
||||
State(state): State<AppState>,
|
||||
Path(index): Path<usize>,
|
||||
) -> Result<Json<FindingView>, StatusCode> {
|
||||
let findings = load_latest_findings(&state);
|
||||
let diag = findings.get(index).ok_or(StatusCode::NOT_FOUND)?;
|
||||
) -> ApiResult<Json<FindingView>> {
|
||||
let findings = load_latest_findings_internal(&state).findings;
|
||||
let diag = findings
|
||||
.get(index)
|
||||
.ok_or_else(|| ApiError::not_found(format!("finding {index} not found")))?;
|
||||
let mut view = finding_from_diag_with_detail(index, diag, &state.scan_root, &findings);
|
||||
apply_triage_overlay(&state, std::slice::from_mut(&mut view));
|
||||
Ok(Json(view))
|
||||
}
|
||||
|
||||
/// Public alias for callers (overview, explorer, triage) that just want
|
||||
/// the raw diag list. Kept as `load_latest_findings` for source-compat.
|
||||
pub fn load_latest_findings(state: &AppState) -> Arc<Vec<Diag>> {
|
||||
load_latest_findings_internal(state).findings
|
||||
}
|
||||
|
|
|
|||
|
|
@ -2,21 +2,31 @@
|
|||
|
||||
use crate::commands::scan::Diag;
|
||||
use crate::database::index::{Indexer, ScanRecord};
|
||||
use crate::evidence::Confidence;
|
||||
use crate::evidence::{Confidence, Verdict};
|
||||
use crate::server::app::AppState;
|
||||
use crate::server::models::{
|
||||
Insight, NoisyRule, OverviewResponse, ScanSummary, TrendPoint, by_language_from_findings,
|
||||
compute_fingerprint, summarize_findings, top_directories_from_findings, top_n_from_map,
|
||||
BacklogStats, BaselineInfo, ConfidenceDistribution, HotSink, Insight, LanguageHealth,
|
||||
NoisyRule, OverviewCount, OverviewResponse, PostureSummary, ScanSummary, ScannerQuality,
|
||||
SuppressionHygiene, TrendPoint, WeightedFile, by_language_from_findings, compute_fingerprint,
|
||||
lang_for_finding_path, summarize_findings, top_directories_from_findings, top_n_from_map,
|
||||
};
|
||||
use axum::extract::State;
|
||||
use axum::routing::get;
|
||||
use crate::server::owasp;
|
||||
use axum::extract::{Path as AxPath, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::routing::{delete, get, post};
|
||||
use axum::{Json, Router};
|
||||
use serde::Deserialize;
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
const BASELINE_KEY: &str = "baseline_scan_id";
|
||||
|
||||
pub fn routes() -> Router<AppState> {
|
||||
Router::new()
|
||||
.route("/overview", get(overview))
|
||||
.route("/overview/trends", get(overview_trends))
|
||||
.route("/overview/baseline", post(set_baseline))
|
||||
.route("/overview/baseline", delete(clear_baseline))
|
||||
.route("/overview/baseline/{scan_id}", post(set_baseline_path))
|
||||
}
|
||||
|
||||
/// GET /api/overview — aggregated dashboard data.
|
||||
|
|
@ -25,7 +35,7 @@ async fn overview(State(state): State<AppState>) -> Json<OverviewResponse> {
|
|||
let findings = crate::server::routes::findings::load_latest_findings(&state);
|
||||
|
||||
// 2. Collect recent scans (in-memory + DB, deduped)
|
||||
let recent_scans = collect_recent_scans(&state, 10);
|
||||
let recent_scans = collect_recent_scans(&state, 20);
|
||||
|
||||
// 3. Basic summary
|
||||
let summary = summarize_findings(&findings);
|
||||
|
|
@ -37,8 +47,10 @@ async fn overview(State(state): State<AppState>) -> Json<OverviewResponse> {
|
|||
let latest_scan_at = latest_completed.and_then(|s| s.started_at.clone());
|
||||
let latest_scan_duration = latest_completed.and_then(|s| s.duration_secs);
|
||||
|
||||
// 5. New/fixed since last scan
|
||||
let (new_since_last, fixed_since_last) = compute_delta(&state, &findings);
|
||||
// 5. Walk historical scans once for delta + posture + backlog + drift.
|
||||
let history = ScanHistory::load(&state, 20);
|
||||
let (new_since_last, fixed_since_last, reintroduced_count) =
|
||||
history.compare_to_current(&findings);
|
||||
|
||||
// 6. High confidence rate
|
||||
let high_confidence_rate = if findings.is_empty() {
|
||||
|
|
@ -67,6 +79,7 @@ async fn overview(State(state): State<AppState>) -> Json<OverviewResponse> {
|
|||
&summary,
|
||||
new_since_last,
|
||||
fixed_since_last,
|
||||
reintroduced_count,
|
||||
triage_coverage,
|
||||
&noisy_rules,
|
||||
);
|
||||
|
|
@ -80,6 +93,51 @@ async fn overview(State(state): State<AppState>) -> Json<OverviewResponse> {
|
|||
"normal".to_string()
|
||||
};
|
||||
|
||||
// ── New (Tier 1/2/3) ──
|
||||
let confidence_distribution = Some(compute_confidence_distribution(&findings));
|
||||
let weighted_top_files = compute_weighted_top_files(&findings, 10);
|
||||
let cross_file_ratio = Some(compute_cross_file_ratio(&findings));
|
||||
let hot_sinks = compute_hot_sinks(&findings, 5);
|
||||
let owasp_buckets = owasp::bucket_findings(&summary.by_rule);
|
||||
let issue_categories = owasp::issue_categories(&summary.by_rule);
|
||||
let scanner_quality =
|
||||
compute_scanner_quality(&state, &findings, latest_completed.map(|s| s.id.as_str()));
|
||||
let language_health = compute_language_health(&findings);
|
||||
let suppression_hygiene = Some(compute_suppression_hygiene(&state, &findings));
|
||||
let backlog = Some(compute_backlog(&state, &findings, &history));
|
||||
let baseline = compute_baseline_info(&state, &findings);
|
||||
let posture = Some(build_posture(
|
||||
new_since_last,
|
||||
fixed_since_last,
|
||||
reintroduced_count,
|
||||
&history,
|
||||
summary.total,
|
||||
));
|
||||
let health = Some(crate::server::health::compute(
|
||||
&crate::server::health::HealthInputs {
|
||||
summary: &summary,
|
||||
findings: &findings,
|
||||
triage_coverage,
|
||||
new_since_last,
|
||||
fixed_since_last,
|
||||
reintroduced: reintroduced_count,
|
||||
// Files-scanned proxy for repo size — used for size-aware
|
||||
// severity dampening in `health::compute`. See
|
||||
// `docs/health-score-audit.md` for calibration data.
|
||||
repo_files: scanner_quality
|
||||
.as_ref()
|
||||
.map(|q| q.files_scanned)
|
||||
.filter(|&f| f > 0),
|
||||
backlog: backlog.as_ref(),
|
||||
// Trend is meaningless without ≥2 completed scans —
|
||||
// matches the first-scan check `compare_to_current` uses.
|
||||
has_history: history.scans.len() >= 2,
|
||||
// Suppression-hygiene modifier — populated when the
|
||||
// suppression panel was computable for this scan.
|
||||
blanket_suppression_rate: suppression_hygiene.as_ref().map(|s| s.blanket_rate),
|
||||
},
|
||||
));
|
||||
|
||||
Json(OverviewResponse {
|
||||
state: state_str,
|
||||
total_findings: summary.total,
|
||||
|
|
@ -90,8 +148,8 @@ async fn overview(State(state): State<AppState>) -> Json<OverviewResponse> {
|
|||
latest_scan_duration_secs: latest_scan_duration,
|
||||
latest_scan_id,
|
||||
latest_scan_at,
|
||||
by_severity: summary.by_severity,
|
||||
by_category: summary.by_category,
|
||||
by_severity: summary.by_severity.clone(),
|
||||
by_category: summary.by_category.clone(),
|
||||
by_language,
|
||||
top_files,
|
||||
top_directories,
|
||||
|
|
@ -99,6 +157,19 @@ async fn overview(State(state): State<AppState>) -> Json<OverviewResponse> {
|
|||
noisy_rules,
|
||||
recent_scans: recent_scans.into_iter().take(10).collect(),
|
||||
insights,
|
||||
health,
|
||||
posture,
|
||||
backlog,
|
||||
weighted_top_files,
|
||||
confidence_distribution,
|
||||
scanner_quality,
|
||||
issue_categories,
|
||||
hot_sinks,
|
||||
owasp_buckets,
|
||||
cross_file_ratio,
|
||||
baseline,
|
||||
language_health,
|
||||
suppression_hygiene,
|
||||
})
|
||||
}
|
||||
|
||||
|
|
@ -142,8 +213,198 @@ async fn overview_trends(State(state): State<AppState>) -> Json<Vec<TrendPoint>>
|
|||
Json(points)
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct BaselineBody {
|
||||
scan_id: String,
|
||||
}
|
||||
|
||||
/// POST /api/overview/baseline { scan_id } — pin a scan as the baseline for drift comparison.
|
||||
async fn set_baseline(
|
||||
State(state): State<AppState>,
|
||||
Json(body): Json<BaselineBody>,
|
||||
) -> Result<StatusCode, StatusCode> {
|
||||
set_baseline_inner(&state, &body.scan_id)
|
||||
}
|
||||
|
||||
/// POST /api/overview/baseline/:scan_id — convenience path-form for clients without a JSON body.
|
||||
async fn set_baseline_path(
|
||||
State(state): State<AppState>,
|
||||
AxPath(scan_id): AxPath<String>,
|
||||
) -> Result<StatusCode, StatusCode> {
|
||||
set_baseline_inner(&state, &scan_id)
|
||||
}
|
||||
|
||||
fn set_baseline_inner(state: &AppState, scan_id: &str) -> Result<StatusCode, StatusCode> {
|
||||
if scan_id.is_empty() {
|
||||
return Err(StatusCode::BAD_REQUEST);
|
||||
}
|
||||
let pool = state
|
||||
.db_pool
|
||||
.as_ref()
|
||||
.ok_or(StatusCode::SERVICE_UNAVAILABLE)?;
|
||||
let idx = Indexer::from_pool("_scans", pool).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
idx.set_metadata(BASELINE_KEY, scan_id)
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
|
||||
/// DELETE /api/overview/baseline — clear the pinned baseline.
|
||||
async fn clear_baseline(State(state): State<AppState>) -> Result<StatusCode, StatusCode> {
|
||||
let pool = state
|
||||
.db_pool
|
||||
.as_ref()
|
||||
.ok_or(StatusCode::SERVICE_UNAVAILABLE)?;
|
||||
let idx = Indexer::from_pool("_scans", pool).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
idx.delete_metadata(BASELINE_KEY)
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
|
||||
// ── Helpers ──────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Cached view of recent completed scans' fingerprints + timestamps. Built once
|
||||
/// per overview request and reused by delta / posture / backlog / drift.
|
||||
struct ScanHistory {
|
||||
/// Completed scans, oldest → newest.
|
||||
scans: Vec<HistoricScan>,
|
||||
/// fingerprint → earliest started_at (RFC-3339) seen across history.
|
||||
first_seen: HashMap<String, String>,
|
||||
}
|
||||
|
||||
struct HistoricScan {
|
||||
#[allow(dead_code)]
|
||||
id: String,
|
||||
#[allow(dead_code)]
|
||||
started_at: Option<String>,
|
||||
fingerprints: HashSet<String>,
|
||||
total: usize,
|
||||
}
|
||||
|
||||
impl ScanHistory {
|
||||
fn load(state: &AppState, limit: usize) -> Self {
|
||||
let mut scans = Vec::new();
|
||||
let mut first_seen: HashMap<String, String> = HashMap::new();
|
||||
|
||||
let Some(ref pool) = state.db_pool else {
|
||||
return Self { scans, first_seen };
|
||||
};
|
||||
let Ok(idx) = Indexer::from_pool("_scans", pool) else {
|
||||
return Self { scans, first_seen };
|
||||
};
|
||||
|
||||
let mut records = idx.list_scans(limit as i64).unwrap_or_default();
|
||||
// Filter to completed and reverse to oldest-first.
|
||||
records.retain(|r| r.status == "completed");
|
||||
records.reverse();
|
||||
|
||||
let mut bulk_inserts: Vec<(String, String)> = Vec::new();
|
||||
|
||||
for r in records {
|
||||
let fps: HashSet<String> = r
|
||||
.findings_json
|
||||
.as_deref()
|
||||
.and_then(|j| serde_json::from_str::<Vec<Diag>>(j).ok())
|
||||
.map(|diags| diags.iter().map(compute_fingerprint).collect())
|
||||
.unwrap_or_default();
|
||||
let total = fps.len();
|
||||
let started_at = r.started_at.clone();
|
||||
// Seed first_seen for new fingerprints.
|
||||
if let Some(ref ts) = started_at {
|
||||
for fp in &fps {
|
||||
first_seen.entry(fp.clone()).or_insert_with(|| {
|
||||
bulk_inserts.push((fp.clone(), ts.clone()));
|
||||
ts.clone()
|
||||
});
|
||||
}
|
||||
}
|
||||
scans.push(HistoricScan {
|
||||
id: r.id,
|
||||
started_at,
|
||||
fingerprints: fps,
|
||||
total,
|
||||
});
|
||||
}
|
||||
|
||||
// Persist newly observed first-seen entries (best-effort; ignore errors).
|
||||
if !bulk_inserts.is_empty() {
|
||||
let _ = idx.record_finding_first_seen_bulk(&bulk_inserts);
|
||||
}
|
||||
|
||||
Self { scans, first_seen }
|
||||
}
|
||||
|
||||
/// Compare current findings against the most-recent historical scan and
|
||||
/// against all earlier scans for regression detection.
|
||||
/// Returns (new_count, fixed_count, reintroduced_count).
|
||||
fn compare_to_current(&self, current: &[Diag]) -> (usize, usize, usize) {
|
||||
if self.scans.is_empty() {
|
||||
return (0, 0, 0);
|
||||
}
|
||||
let current_fps: HashSet<String> = current.iter().map(compute_fingerprint).collect();
|
||||
|
||||
// For new/fixed delta, compare against the *previous* completed scan
|
||||
// (i.e. the one before the latest, since the latest is "current" in DB
|
||||
// most of the time). If only one scan exists, treat all as new.
|
||||
let (new_count, fixed_count) = if self.scans.len() >= 2 {
|
||||
let prev = &self.scans[self.scans.len() - 2];
|
||||
let new_count = current_fps.difference(&prev.fingerprints).count();
|
||||
let fixed_count = prev.fingerprints.difference(¤t_fps).count();
|
||||
(new_count, fixed_count)
|
||||
} else {
|
||||
(0, 0)
|
||||
};
|
||||
|
||||
// Regression: fingerprints that were present in some past scan, were
|
||||
// absent in the immediately-preceding scan, and are present now.
|
||||
let reintroduced = if self.scans.len() >= 2 {
|
||||
let prev_fps = &self.scans[self.scans.len() - 2].fingerprints;
|
||||
let mut count = 0usize;
|
||||
for fp in current_fps.iter() {
|
||||
if prev_fps.contains(fp) {
|
||||
continue;
|
||||
}
|
||||
// Was present in any earlier scan?
|
||||
let earlier = self
|
||||
.scans
|
||||
.iter()
|
||||
.take(self.scans.len() - 2)
|
||||
.any(|s| s.fingerprints.contains(fp));
|
||||
if earlier {
|
||||
count += 1;
|
||||
}
|
||||
}
|
||||
count
|
||||
} else {
|
||||
0
|
||||
};
|
||||
|
||||
(new_count, fixed_count, reintroduced)
|
||||
}
|
||||
|
||||
/// Trend slope across the last N totals — 1.0 means strictly improving,
|
||||
/// -1.0 strictly regressing, 0.0 unchanged. Returns None with <3 points.
|
||||
fn trend_slope(&self) -> Option<f64> {
|
||||
if self.scans.len() < 3 {
|
||||
return None;
|
||||
}
|
||||
let tail: Vec<f64> = self
|
||||
.scans
|
||||
.iter()
|
||||
.rev()
|
||||
.take(5)
|
||||
.map(|s| s.total as f64)
|
||||
.collect();
|
||||
let first = *tail.last()?;
|
||||
let last = *tail.first()?;
|
||||
if first <= 0.0 && last <= 0.0 {
|
||||
return Some(0.0);
|
||||
}
|
||||
// Improving = total decreased → positive score. Normalize by max.
|
||||
let max = first.max(last).max(1.0);
|
||||
Some(((first - last) / max).clamp(-1.0, 1.0))
|
||||
}
|
||||
}
|
||||
|
||||
/// Collect recent scans from in-memory jobs + DB, deduped by ID.
|
||||
fn collect_recent_scans(state: &AppState, limit: usize) -> Vec<ScanSummary> {
|
||||
let mut seen = HashSet::new();
|
||||
|
|
@ -181,55 +442,11 @@ fn collect_recent_scans(state: &AppState, limit: usize) -> Vec<ScanSummary> {
|
|||
}
|
||||
}
|
||||
|
||||
// Sort by started_at descending
|
||||
scans.sort_by(|a, b| b.started_at.cmp(&a.started_at));
|
||||
scans.truncate(limit);
|
||||
scans
|
||||
}
|
||||
|
||||
/// Compute new/fixed finding counts by comparing the two most recent completed scans.
|
||||
fn compute_delta(state: &AppState, current_findings: &[Diag]) -> (usize, usize) {
|
||||
if current_findings.is_empty() {
|
||||
return (0, 0);
|
||||
}
|
||||
|
||||
let current_fps: HashSet<String> = current_findings.iter().map(compute_fingerprint).collect();
|
||||
|
||||
// Find previous completed scan's findings
|
||||
let previous_fps = load_previous_scan_fingerprints(state);
|
||||
if previous_fps.is_empty() {
|
||||
return (0, 0);
|
||||
}
|
||||
|
||||
let new_count = current_fps.difference(&previous_fps).count();
|
||||
let fixed_count = previous_fps.difference(¤t_fps).count();
|
||||
(new_count, fixed_count)
|
||||
}
|
||||
|
||||
/// Load fingerprints from the second-most-recent completed scan.
|
||||
fn load_previous_scan_fingerprints(state: &AppState) -> HashSet<String> {
|
||||
if let Some(ref pool) = state.db_pool {
|
||||
if let Ok(idx) = Indexer::from_pool("_scans", pool) {
|
||||
if let Ok(scans) = idx.list_scans(10) {
|
||||
let completed: Vec<&ScanRecord> = scans
|
||||
.iter()
|
||||
.filter(|s| s.status == "completed" && s.findings_json.is_some())
|
||||
.collect();
|
||||
|
||||
// Skip the first (latest) completed scan — we want the previous one
|
||||
if let Some(prev) = completed.get(1) {
|
||||
if let Some(json) = prev.findings_json.as_deref() {
|
||||
if let Ok(diags) = serde_json::from_str::<Vec<Diag>>(json) {
|
||||
return diags.iter().map(compute_fingerprint).collect();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
HashSet::new()
|
||||
}
|
||||
|
||||
/// Compute triage coverage: fraction of findings with non-"open" triage state.
|
||||
fn compute_triage_coverage(state: &AppState, findings: &[Diag]) -> f64 {
|
||||
if findings.is_empty() {
|
||||
|
|
@ -249,24 +466,19 @@ fn compute_triage_coverage(state: &AppState, findings: &[Diag]) -> f64 {
|
|||
let mut non_open = 0usize;
|
||||
for d in findings {
|
||||
let fp = compute_fingerprint(d);
|
||||
// Check explicit triage state
|
||||
if let Some((triage_state, _, _)) = triage_map.get(&fp) {
|
||||
if triage_state != "open" {
|
||||
non_open += 1;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
// Check suppression rules
|
||||
let path = &d.path;
|
||||
let rule_id = &d.id;
|
||||
for rule in &suppression_rules {
|
||||
let matches = match rule.suppress_by.as_str() {
|
||||
"fingerprint" => fp == rule.match_value,
|
||||
"rule" => *rule_id == rule.match_value,
|
||||
"rule_in_file" => {
|
||||
let key = format!("{rule_id}:{path}");
|
||||
key == rule.match_value
|
||||
}
|
||||
"rule_in_file" => format!("{rule_id}:{path}") == rule.match_value,
|
||||
"file" => *path == rule.match_value,
|
||||
_ => false,
|
||||
};
|
||||
|
|
@ -296,7 +508,6 @@ fn compute_noisy_rules(
|
|||
let triage_map = idx.get_all_triage_states().unwrap_or_default();
|
||||
let suppression_rules = idx.get_suppression_rules().unwrap_or_default();
|
||||
|
||||
// Count suppressed findings per rule
|
||||
let mut suppressed_per_rule: HashMap<String, usize> = HashMap::new();
|
||||
for d in findings {
|
||||
let fp = compute_fingerprint(d);
|
||||
|
|
@ -347,12 +558,12 @@ fn generate_insights(
|
|||
summary: &crate::server::models::FindingSummary,
|
||||
new_since_last: usize,
|
||||
fixed_since_last: usize,
|
||||
reintroduced: usize,
|
||||
triage_coverage: f64,
|
||||
noisy_rules: &[NoisyRule],
|
||||
) -> Vec<Insight> {
|
||||
let mut insights = Vec::new();
|
||||
|
||||
// Untriaged high findings
|
||||
let high_count = summary.by_severity.get("HIGH").copied().unwrap_or(0);
|
||||
if high_count > 0 {
|
||||
insights.push(Insight {
|
||||
|
|
@ -366,7 +577,18 @@ fn generate_insights(
|
|||
});
|
||||
}
|
||||
|
||||
// New findings since last scan
|
||||
if reintroduced > 0 {
|
||||
insights.push(Insight {
|
||||
kind: "regression".into(),
|
||||
message: format!(
|
||||
"{reintroduced} previously-fixed finding{} reintroduced",
|
||||
if reintroduced == 1 { "" } else { "s" }
|
||||
),
|
||||
severity: "danger".into(),
|
||||
action_url: Some("/findings".into()),
|
||||
});
|
||||
}
|
||||
|
||||
if new_since_last > 0 {
|
||||
insights.push(Insight {
|
||||
kind: "new_findings".into(),
|
||||
|
|
@ -379,7 +601,6 @@ fn generate_insights(
|
|||
});
|
||||
}
|
||||
|
||||
// Fixed findings since last scan
|
||||
if fixed_since_last > 0 {
|
||||
insights.push(Insight {
|
||||
kind: "fixed_findings".into(),
|
||||
|
|
@ -392,7 +613,6 @@ fn generate_insights(
|
|||
});
|
||||
}
|
||||
|
||||
// Noisy rules
|
||||
for rule in noisy_rules.iter().take(3) {
|
||||
insights.push(Insight {
|
||||
kind: "noisy_rule".into(),
|
||||
|
|
@ -407,7 +627,6 @@ fn generate_insights(
|
|||
});
|
||||
}
|
||||
|
||||
// Low triage coverage
|
||||
if triage_coverage < 0.1 && summary.total > 20 {
|
||||
insights.push(Insight {
|
||||
kind: "low_triage".into(),
|
||||
|
|
@ -435,3 +654,481 @@ fn is_fresh_scan(scan: Option<&ScanSummary>) -> bool {
|
|||
}
|
||||
false
|
||||
}
|
||||
|
||||
// ── Tier 1/2/3 computations ──────────────────────────────────────────────────
|
||||
|
||||
fn compute_confidence_distribution(findings: &[Diag]) -> ConfidenceDistribution {
|
||||
let mut d = ConfidenceDistribution::default();
|
||||
for f in findings {
|
||||
match f.confidence {
|
||||
Some(Confidence::High) => d.high += 1,
|
||||
Some(Confidence::Medium) => d.medium += 1,
|
||||
Some(Confidence::Low) => d.low += 1,
|
||||
None => d.none += 1,
|
||||
}
|
||||
}
|
||||
d
|
||||
}
|
||||
|
||||
fn compute_weighted_top_files(findings: &[Diag], limit: usize) -> Vec<WeightedFile> {
|
||||
use crate::patterns::Severity;
|
||||
let mut per_file: HashMap<String, [usize; 3]> = HashMap::new(); // [high, medium, low]
|
||||
for f in findings {
|
||||
let entry = per_file.entry(f.path.clone()).or_insert([0, 0, 0]);
|
||||
match f.severity {
|
||||
Severity::High => entry[0] += 1,
|
||||
Severity::Medium => entry[1] += 1,
|
||||
Severity::Low => entry[2] += 1,
|
||||
}
|
||||
}
|
||||
let mut rows: Vec<WeightedFile> = per_file
|
||||
.into_iter()
|
||||
.map(|(name, [h, m, l])| WeightedFile {
|
||||
name,
|
||||
score: (h * 10 + m * 3 + l) as u32,
|
||||
high: h,
|
||||
medium: m,
|
||||
low: l,
|
||||
total: h + m + l,
|
||||
})
|
||||
.collect();
|
||||
rows.sort_by(|a, b| b.score.cmp(&a.score).then_with(|| b.total.cmp(&a.total)));
|
||||
rows.truncate(limit);
|
||||
rows
|
||||
}
|
||||
|
||||
fn compute_cross_file_ratio(findings: &[Diag]) -> f64 {
|
||||
if findings.is_empty() {
|
||||
return 0.0;
|
||||
}
|
||||
let mut cross = 0usize;
|
||||
for f in findings {
|
||||
if let Some(ev) = f.evidence.as_ref() {
|
||||
if ev.uses_summary || ev.flow_steps.iter().any(|s| s.is_cross_file) {
|
||||
cross += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
cross as f64 / findings.len() as f64
|
||||
}
|
||||
|
||||
/// Hot sinks are *only* meaningful for taint findings — counting AST rule IDs
|
||||
/// (e.g. `rs.quality.unwrap`) here just duplicates the Top Rules table. So we
|
||||
/// deliberately require a real Sink-step callee (or a parsable sink snippet)
|
||||
/// and skip everything else. Empty result → frontend hides the card.
|
||||
fn compute_hot_sinks(findings: &[Diag], limit: usize) -> Vec<HotSink> {
|
||||
let mut counts: HashMap<String, usize> = HashMap::new();
|
||||
for f in findings {
|
||||
let Some(ev) = f.evidence.as_ref() else {
|
||||
continue;
|
||||
};
|
||||
let from_flow = ev
|
||||
.flow_steps
|
||||
.iter()
|
||||
.rev()
|
||||
.find(|s| matches!(s.kind, crate::evidence::FlowStepKind::Sink))
|
||||
.and_then(|s| s.callee.clone())
|
||||
.filter(|c| !c.trim().is_empty());
|
||||
let from_sink_snippet = ev
|
||||
.sink
|
||||
.as_ref()
|
||||
.and_then(|s| s.snippet.as_ref())
|
||||
.and_then(|s| {
|
||||
let c = extract_callee_from_snippet(s);
|
||||
if c.is_empty() { None } else { Some(c) }
|
||||
});
|
||||
let Some(callee) = from_flow.or(from_sink_snippet) else {
|
||||
continue;
|
||||
};
|
||||
*counts.entry(callee).or_insert(0) += 1;
|
||||
}
|
||||
let mut rows: Vec<HotSink> = counts
|
||||
.into_iter()
|
||||
.map(|(callee, count)| HotSink { callee, count })
|
||||
.collect();
|
||||
rows.sort_by(|a, b| b.count.cmp(&a.count).then_with(|| a.callee.cmp(&b.callee)));
|
||||
rows.truncate(limit);
|
||||
rows
|
||||
}
|
||||
|
||||
/// Pull the leading identifier from a sink snippet — a best-effort heuristic
|
||||
/// for the dashboard's "hot sinks" list.
|
||||
fn extract_callee_from_snippet(s: &str) -> String {
|
||||
let trimmed = s.trim();
|
||||
let end = trimmed
|
||||
.find('(')
|
||||
.or_else(|| trimmed.find(char::is_whitespace))
|
||||
.unwrap_or(trimmed.len());
|
||||
trimmed[..end].trim().to_string()
|
||||
}
|
||||
|
||||
fn compute_scanner_quality(
|
||||
state: &AppState,
|
||||
findings: &[Diag],
|
||||
latest_scan_id: Option<&str>,
|
||||
) -> Option<ScannerQuality> {
|
||||
let pool = state.db_pool.as_ref()?;
|
||||
let idx = Indexer::from_pool("_scans", pool).ok()?;
|
||||
|
||||
let mut files_scanned = 0u64;
|
||||
let mut files_skipped = 0u64;
|
||||
if let Some(scan_id) = latest_scan_id {
|
||||
let scans = idx.list_scans(20).unwrap_or_default();
|
||||
if let Some(rec) = scans.into_iter().find(|s| s.id == scan_id) {
|
||||
files_scanned = rec.files_scanned.unwrap_or(0).max(0) as u64;
|
||||
files_skipped = rec.files_skipped.unwrap_or(0).max(0) as u64;
|
||||
}
|
||||
}
|
||||
|
||||
let parse_success_rate = if files_scanned + files_skipped > 0 {
|
||||
files_scanned as f64 / (files_scanned + files_skipped) as f64
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
|
||||
// Engine metrics from scan_metrics table (if available via Indexer).
|
||||
let (functions_analyzed, call_edges, unresolved_calls) = latest_scan_id
|
||||
.and_then(|id| idx.get_scan_metrics(id).ok().flatten())
|
||||
.map(|m| (m.functions_analyzed, m.call_edges, m.unresolved_calls))
|
||||
.unwrap_or((0, 0, 0));
|
||||
|
||||
let call_resolution_rate = if call_edges + unresolved_calls > 0 {
|
||||
call_edges as f64 / (call_edges + unresolved_calls) as f64
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
|
||||
// Symex coverage from current findings.
|
||||
let mut breakdown: HashMap<String, usize> = HashMap::new();
|
||||
let mut taint_total = 0usize;
|
||||
for f in findings {
|
||||
let Some(ev) = f.evidence.as_ref() else {
|
||||
continue;
|
||||
};
|
||||
let Some(sv) = ev.symbolic.as_ref() else {
|
||||
continue;
|
||||
};
|
||||
taint_total += 1;
|
||||
let label = match sv.verdict {
|
||||
Verdict::Confirmed => "confirmed",
|
||||
Verdict::Infeasible => "infeasible",
|
||||
Verdict::Inconclusive => "inconclusive",
|
||||
Verdict::NotAttempted => "not_attempted",
|
||||
};
|
||||
*breakdown.entry(label.to_string()).or_insert(0) += 1;
|
||||
}
|
||||
let symex_verified_rate = if taint_total > 0 {
|
||||
let attempted = breakdown
|
||||
.iter()
|
||||
.filter(|(k, _)| k.as_str() != "not_attempted")
|
||||
.map(|(_, v)| *v)
|
||||
.sum::<usize>();
|
||||
attempted as f64 / taint_total as f64
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
|
||||
Some(ScannerQuality {
|
||||
files_scanned,
|
||||
files_skipped,
|
||||
parse_success_rate,
|
||||
functions_analyzed,
|
||||
call_edges,
|
||||
unresolved_calls,
|
||||
call_resolution_rate,
|
||||
symex_verified_rate,
|
||||
symex_breakdown: breakdown,
|
||||
})
|
||||
}
|
||||
|
||||
fn compute_language_health(findings: &[Diag]) -> Vec<LanguageHealth> {
|
||||
use crate::patterns::Severity;
|
||||
let mut per_lang: HashMap<String, [usize; 4]> = HashMap::new(); // [total, h, m, l]
|
||||
for f in findings {
|
||||
let Some(lang) = lang_for_finding_path(&f.path) else {
|
||||
continue;
|
||||
};
|
||||
let entry = per_lang.entry(lang).or_insert([0; 4]);
|
||||
entry[0] += 1;
|
||||
match f.severity {
|
||||
Severity::High => entry[1] += 1,
|
||||
Severity::Medium => entry[2] += 1,
|
||||
Severity::Low => entry[3] += 1,
|
||||
}
|
||||
}
|
||||
let mut rows: Vec<LanguageHealth> = per_lang
|
||||
.into_iter()
|
||||
.map(|(language, [t, h, m, l])| LanguageHealth {
|
||||
language,
|
||||
findings: t,
|
||||
high: h,
|
||||
medium: m,
|
||||
low: l,
|
||||
})
|
||||
.collect();
|
||||
rows.sort_by(|a, b| {
|
||||
b.high
|
||||
.cmp(&a.high)
|
||||
.then_with(|| b.findings.cmp(&a.findings))
|
||||
});
|
||||
rows
|
||||
}
|
||||
|
||||
fn compute_suppression_hygiene(state: &AppState, findings: &[Diag]) -> SuppressionHygiene {
|
||||
let mut hygiene = SuppressionHygiene {
|
||||
fingerprint_level: 0,
|
||||
rule_level: 0,
|
||||
file_level: 0,
|
||||
rule_in_file_level: 0,
|
||||
blanket_rate: 0.0,
|
||||
};
|
||||
if findings.is_empty() {
|
||||
return hygiene;
|
||||
}
|
||||
let Some(ref pool) = state.db_pool else {
|
||||
return hygiene;
|
||||
};
|
||||
let Ok(idx) = Indexer::from_pool("_scans", pool) else {
|
||||
return hygiene;
|
||||
};
|
||||
let triage_map = idx.get_all_triage_states().unwrap_or_default();
|
||||
let suppression_rules = idx.get_suppression_rules().unwrap_or_default();
|
||||
let mut total_suppressed = 0usize;
|
||||
for d in findings {
|
||||
let fp = compute_fingerprint(d);
|
||||
if let Some((s, _, _)) = triage_map.get(&fp) {
|
||||
if s == "suppressed" || s == "false_positive" {
|
||||
hygiene.fingerprint_level += 1;
|
||||
total_suppressed += 1;
|
||||
continue;
|
||||
}
|
||||
}
|
||||
for rule in &suppression_rules {
|
||||
let matched = match rule.suppress_by.as_str() {
|
||||
"fingerprint" => fp == rule.match_value,
|
||||
"rule" => d.id == rule.match_value,
|
||||
"rule_in_file" => format!("{}:{}", d.id, d.path) == rule.match_value,
|
||||
"file" => d.path == rule.match_value,
|
||||
_ => false,
|
||||
};
|
||||
if matched {
|
||||
match rule.suppress_by.as_str() {
|
||||
"fingerprint" => hygiene.fingerprint_level += 1,
|
||||
"rule" => hygiene.rule_level += 1,
|
||||
"file" => hygiene.file_level += 1,
|
||||
"rule_in_file" => hygiene.rule_in_file_level += 1,
|
||||
_ => {}
|
||||
}
|
||||
total_suppressed += 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if total_suppressed > 0 {
|
||||
let blanket = hygiene.rule_level + hygiene.file_level + hygiene.rule_in_file_level;
|
||||
hygiene.blanket_rate = blanket as f64 / total_suppressed as f64;
|
||||
}
|
||||
hygiene
|
||||
}
|
||||
|
||||
fn compute_backlog(state: &AppState, findings: &[Diag], history: &ScanHistory) -> BacklogStats {
|
||||
// No useful aging data on the first scan — every fingerprint was first-seen
|
||||
// today by definition. Avoid the misleading "0d / 0d / 0" display.
|
||||
if history.scans.len() <= 1 {
|
||||
return BacklogStats {
|
||||
oldest_open_days: None,
|
||||
median_age_days: None,
|
||||
stale_count: 0,
|
||||
age_buckets: Vec::new(),
|
||||
};
|
||||
}
|
||||
|
||||
let now = chrono::Utc::now();
|
||||
|
||||
// Pull DB-cached first_seen first; fall back to in-memory history map.
|
||||
let fingerprints: Vec<String> = findings.iter().map(compute_fingerprint).collect();
|
||||
let mut cached: HashMap<String, String> = HashMap::new();
|
||||
if let Some(ref pool) = state.db_pool {
|
||||
if let Ok(idx) = Indexer::from_pool("_scans", pool) {
|
||||
cached = idx.get_first_seen_map(&fingerprints).unwrap_or_default();
|
||||
}
|
||||
}
|
||||
// Merge history's view (already persisted as we walked).
|
||||
for (fp, ts) in &history.first_seen {
|
||||
cached.entry(fp.clone()).or_insert_with(|| ts.clone());
|
||||
}
|
||||
|
||||
let mut ages_days: Vec<u32> = Vec::with_capacity(fingerprints.len());
|
||||
for fp in &fingerprints {
|
||||
let Some(ts) = cached.get(fp) else {
|
||||
continue;
|
||||
};
|
||||
if let Ok(dt) = chrono::DateTime::parse_from_rfc3339(ts) {
|
||||
let elapsed = now - dt.with_timezone(&chrono::Utc);
|
||||
let days = elapsed.num_days().max(0) as u32;
|
||||
ages_days.push(days);
|
||||
}
|
||||
}
|
||||
|
||||
let oldest_open_days = ages_days.iter().copied().max();
|
||||
let median_age_days = if ages_days.is_empty() {
|
||||
None
|
||||
} else {
|
||||
let mut sorted = ages_days.clone();
|
||||
sorted.sort_unstable();
|
||||
Some(sorted[sorted.len() / 2])
|
||||
};
|
||||
let stale_count = ages_days.iter().filter(|d| **d > 30).count();
|
||||
|
||||
// Buckets: ≤1d, ≤7d, ≤30d, ≤90d, >90d
|
||||
let mut b = [0usize; 5];
|
||||
for d in &ages_days {
|
||||
let i = match *d {
|
||||
0..=1 => 0,
|
||||
2..=7 => 1,
|
||||
8..=30 => 2,
|
||||
31..=90 => 3,
|
||||
_ => 4,
|
||||
};
|
||||
b[i] += 1;
|
||||
}
|
||||
let labels = ["≤1d", "≤7d", "≤30d", "≤90d", ">90d"];
|
||||
let age_buckets = labels
|
||||
.iter()
|
||||
.zip(b.iter())
|
||||
.map(|(l, c)| OverviewCount {
|
||||
name: (*l).to_string(),
|
||||
count: *c,
|
||||
})
|
||||
.collect();
|
||||
|
||||
BacklogStats {
|
||||
oldest_open_days,
|
||||
median_age_days,
|
||||
stale_count,
|
||||
age_buckets,
|
||||
}
|
||||
}
|
||||
|
||||
fn compute_baseline_info(state: &AppState, findings: &[Diag]) -> Option<BaselineInfo> {
|
||||
let pool = state.db_pool.as_ref()?;
|
||||
let idx = Indexer::from_pool("_scans", pool).ok()?;
|
||||
let scan_id = idx.get_metadata(BASELINE_KEY).ok().flatten()?;
|
||||
if scan_id.is_empty() {
|
||||
return None;
|
||||
}
|
||||
// Look up baseline scan record (separate from history, since history is capped at 20).
|
||||
let scans = idx.list_scans(200).ok()?;
|
||||
let baseline = scans.into_iter().find(|s| s.id == scan_id)?;
|
||||
let baseline_fps: HashSet<String> = baseline
|
||||
.findings_json
|
||||
.as_deref()
|
||||
.and_then(|j| serde_json::from_str::<Vec<Diag>>(j).ok())
|
||||
.map(|diags| diags.iter().map(compute_fingerprint).collect())
|
||||
.unwrap_or_default();
|
||||
let current_fps: HashSet<String> = findings.iter().map(compute_fingerprint).collect();
|
||||
let drift_new = current_fps.difference(&baseline_fps).count();
|
||||
let drift_fixed = baseline_fps.difference(¤t_fps).count();
|
||||
Some(BaselineInfo {
|
||||
scan_id: baseline.id,
|
||||
started_at: baseline.started_at,
|
||||
baseline_total: baseline_fps.len(),
|
||||
drift_new,
|
||||
drift_fixed,
|
||||
})
|
||||
}
|
||||
|
||||
fn build_posture(
|
||||
new_since_last: usize,
|
||||
fixed_since_last: usize,
|
||||
reintroduced: usize,
|
||||
history: &ScanHistory,
|
||||
current_total: usize,
|
||||
) -> PostureSummary {
|
||||
// First-scan case: no prior data to diff against. Saying "stable / no change"
|
||||
// is misleading — we genuinely don't know yet.
|
||||
if history.scans.len() <= 1 {
|
||||
return PostureSummary {
|
||||
trend: "unknown".into(),
|
||||
severity: "info".into(),
|
||||
message: format!(
|
||||
"First scan: {current_total} finding{} detected. Re-scan to compare.",
|
||||
plural(current_total)
|
||||
),
|
||||
reintroduced_count: 0,
|
||||
};
|
||||
}
|
||||
|
||||
let net = fixed_since_last as i64 - new_since_last as i64;
|
||||
let trend_slope = history.trend_slope();
|
||||
|
||||
// Severity selection priorities: regressions are loudest.
|
||||
let (trend, severity, message) = if reintroduced > 0 {
|
||||
(
|
||||
"regressing",
|
||||
"danger",
|
||||
format!(
|
||||
"Regressed: {reintroduced} previously-fixed finding{} returned",
|
||||
plural(reintroduced)
|
||||
),
|
||||
)
|
||||
} else if net > 0 {
|
||||
(
|
||||
"improving",
|
||||
"success",
|
||||
format!(
|
||||
"Improving: net {net:+} since last scan ({fixed_since_last} fixed, {new_since_last} new)"
|
||||
),
|
||||
)
|
||||
} else if net < 0 {
|
||||
(
|
||||
"regressing",
|
||||
"warning",
|
||||
format!(
|
||||
"Regressing: net {net:+} since last scan ({new_since_last} new, {fixed_since_last} fixed)"
|
||||
),
|
||||
)
|
||||
} else if let Some(slope) = trend_slope {
|
||||
if slope > 0.1 {
|
||||
(
|
||||
"improving",
|
||||
"success",
|
||||
"Improving: gradual decline in finding count over the last 5 scans".to_string(),
|
||||
)
|
||||
} else if slope < -0.1 {
|
||||
(
|
||||
"regressing",
|
||||
"warning",
|
||||
"Regressing: gradual rise in finding count over the last 5 scans".to_string(),
|
||||
)
|
||||
} else {
|
||||
(
|
||||
"stable",
|
||||
"info",
|
||||
"Stable: no net change since last scan".to_string(),
|
||||
)
|
||||
}
|
||||
} else {
|
||||
(
|
||||
"stable",
|
||||
"info",
|
||||
"Stable: no net change since last scan".to_string(),
|
||||
)
|
||||
};
|
||||
|
||||
PostureSummary {
|
||||
trend: trend.to_string(),
|
||||
severity: severity.to_string(),
|
||||
message,
|
||||
reintroduced_count: reintroduced,
|
||||
}
|
||||
}
|
||||
|
||||
fn plural(n: usize) -> &'static str {
|
||||
if n == 1 { "" } else { "s" }
|
||||
}
|
||||
|
||||
// `compute_health_score` moved to `crate::server::health::compute`
|
||||
// after the v2 audit (2026-04-28). See `docs/health-score-audit.md`
|
||||
// for calibration data and the rationale, and `docs/health-score.md`
|
||||
// for the customer-facing methodology.
|
||||
|
|
|
|||
|
|
@ -215,6 +215,8 @@ mod tests {
|
|||
value_defs: defs,
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -59,9 +59,12 @@ impl ConstLattice {
|
|||
return ConstLattice::Int(i);
|
||||
}
|
||||
|
||||
// String: strip surrounding quotes
|
||||
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|
||||
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
|
||||
// String: strip surrounding quotes. Require len >= 2 so a lone `'`
|
||||
// or `"` (where starts_with and ends_with both match the same byte)
|
||||
// does not produce an empty `[1..0]` slice and panic.
|
||||
if trimmed.len() >= 2
|
||||
&& ((trimmed.starts_with('"') && trimmed.ends_with('"'))
|
||||
|| (trimmed.starts_with('\'') && trimmed.ends_with('\'')))
|
||||
{
|
||||
let inner = &trimmed[1..trimmed.len() - 1];
|
||||
return ConstLattice::Str(inner.to_string());
|
||||
|
|
@ -279,6 +282,12 @@ fn eval_inst(inst: &SsaInst, values: &HashMap<SsaValue, ConstLattice>) -> ConstL
|
|||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam => ConstLattice::Varying,
|
||||
// FieldProj: projecting a field is dynamic with respect to the
|
||||
// const-propagation lattice — there is no general way to fold
|
||||
// `obj.field` to a known scalar at this phase. Returning Varying
|
||||
// matches Call: callers needing field-level constness will go
|
||||
// through the points-to / heap analysis.
|
||||
SsaOp::FieldProj { .. } => ConstLattice::Varying,
|
||||
SsaOp::Phi(_) => ConstLattice::Varying, // phis in body shouldn't happen
|
||||
SsaOp::Nop => ConstLattice::Varying,
|
||||
// Undef contributes no knowledge: `Top` is the lattice identity
|
||||
|
|
@ -303,6 +312,7 @@ fn inst_uses(inst: &SsaInst) -> Vec<SsaValue> {
|
|||
}
|
||||
vals
|
||||
}
|
||||
SsaOp::FieldProj { receiver, .. } => vec![*receiver],
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
|
|
@ -626,6 +636,8 @@ mod tests {
|
|||
value_defs,
|
||||
cfg_node_map,
|
||||
exception_edges: Vec::new(),
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -751,4 +763,129 @@ mod tests {
|
|||
Some(&ConstLattice::Bool(true))
|
||||
);
|
||||
}
|
||||
|
||||
/// Meet must be commutative: `a ⊓ b == b ⊓ a` for every pair of
|
||||
/// lattice values. Iterates a representative cross product; failure
|
||||
/// would indicate the implementation special-cased one operand.
|
||||
#[test]
|
||||
fn meet_lattice_is_commutative() {
|
||||
let vals = [
|
||||
ConstLattice::Top,
|
||||
ConstLattice::Varying,
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(0),
|
||||
ConstLattice::Int(42),
|
||||
ConstLattice::Bool(true),
|
||||
ConstLattice::Bool(false),
|
||||
ConstLattice::Str("a".into()),
|
||||
ConstLattice::Str("b".into()),
|
||||
];
|
||||
for a in &vals {
|
||||
for b in &vals {
|
||||
assert_eq!(
|
||||
a.meet(b),
|
||||
b.meet(a),
|
||||
"meet should be commutative for ({a:?}, {b:?})"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Meet must be associative: `(a ⊓ b) ⊓ c == a ⊓ (b ⊓ c)`.
|
||||
#[test]
|
||||
fn meet_lattice_is_associative() {
|
||||
let vals = [
|
||||
ConstLattice::Top,
|
||||
ConstLattice::Varying,
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(0),
|
||||
ConstLattice::Int(42),
|
||||
ConstLattice::Bool(true),
|
||||
ConstLattice::Str("x".into()),
|
||||
];
|
||||
for a in &vals {
|
||||
for b in &vals {
|
||||
for c in &vals {
|
||||
let lhs = a.meet(b).meet(c);
|
||||
let rhs = a.meet(&b.meet(c));
|
||||
assert_eq!(lhs, rhs, "associativity broken on ({a:?},{b:?},{c:?})");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Meet must be idempotent: `a ⊓ a == a` for every lattice value.
|
||||
#[test]
|
||||
fn meet_lattice_is_idempotent() {
|
||||
let vals = [
|
||||
ConstLattice::Top,
|
||||
ConstLattice::Varying,
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(7),
|
||||
ConstLattice::Bool(false),
|
||||
ConstLattice::Str("y".into()),
|
||||
];
|
||||
for a in &vals {
|
||||
assert_eq!(a.meet(a), a.clone(), "idempotence broken on {a:?}");
|
||||
}
|
||||
}
|
||||
|
||||
/// Top is the meet identity: `Top ⊓ x == x` for every value.
|
||||
/// Varying is meet-absorbing: `Varying ⊓ x == Varying`.
|
||||
/// Two distinct concrete values meet to Varying.
|
||||
#[test]
|
||||
fn meet_lattice_extremes() {
|
||||
let xs = [
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(1),
|
||||
ConstLattice::Bool(true),
|
||||
ConstLattice::Str("a".into()),
|
||||
];
|
||||
for x in &xs {
|
||||
assert_eq!(ConstLattice::Top.meet(x), x.clone());
|
||||
assert_eq!(x.meet(&ConstLattice::Top), x.clone());
|
||||
assert_eq!(ConstLattice::Varying.meet(x), ConstLattice::Varying);
|
||||
assert_eq!(x.meet(&ConstLattice::Varying), ConstLattice::Varying);
|
||||
}
|
||||
assert_eq!(
|
||||
ConstLattice::Int(1).meet(&ConstLattice::Int(2)),
|
||||
ConstLattice::Varying
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::Bool(true).meet(&ConstLattice::Bool(false)),
|
||||
ConstLattice::Varying
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::Str("a".into()).meet(&ConstLattice::Str("b".into())),
|
||||
ConstLattice::Varying
|
||||
);
|
||||
}
|
||||
|
||||
/// Const parsing must round-trip integer signs. i64::MIN/MAX must
|
||||
/// parse without overflow; arbitrary text falls back to a bare-string
|
||||
/// const (current contract — tested here so a future change is
|
||||
/// caught explicitly).
|
||||
#[test]
|
||||
fn const_parse_extremes_and_fallback() {
|
||||
assert_eq!(
|
||||
ConstLattice::parse(&i64::MAX.to_string()),
|
||||
ConstLattice::Int(i64::MAX)
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::parse(&i64::MIN.to_string()),
|
||||
ConstLattice::Int(i64::MIN)
|
||||
);
|
||||
// Larger than i64 falls back to bare-string.
|
||||
let huge = "99999999999999999999";
|
||||
assert_eq!(
|
||||
ConstLattice::parse(huge),
|
||||
ConstLattice::Str(huge.to_string())
|
||||
);
|
||||
// Empty string parses as empty Str (not panic).
|
||||
assert_eq!(ConstLattice::parse(""), ConstLattice::Str("".into()));
|
||||
// Lone quote characters must not panic in the quote-stripping path
|
||||
// (regression for fuzz crash-2f943c14: `'` triggered &s[1..0]).
|
||||
assert_eq!(ConstLattice::parse("'"), ConstLattice::Str("'".into()));
|
||||
assert_eq!(ConstLattice::parse("\""), ConstLattice::Str("\"".into()));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -213,6 +213,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
|
||||
|
|
@ -225,4 +227,494 @@ mod tests {
|
|||
assert!(matches!(body.blocks[0].body[1].op, SsaOp::Nop));
|
||||
assert!(matches!(body.blocks[0].body[2].op, SsaOp::Nop));
|
||||
}
|
||||
|
||||
/// `resolve_root` has a 1000-iteration safety cap to avoid livelock if
|
||||
/// a malformed copy map ever contains a cycle (SSA itself is acyclic,
|
||||
/// but defensively we want this guarantee on the helper). Confirm the
|
||||
/// cap actually fires by feeding a hand-crafted cycle a → b → a.
|
||||
#[test]
|
||||
fn resolve_root_terminates_on_cyclic_copy_map() {
|
||||
let mut map: std::collections::HashMap<SsaValue, SsaValue> =
|
||||
std::collections::HashMap::new();
|
||||
map.insert(SsaValue(0), SsaValue(1));
|
||||
map.insert(SsaValue(1), SsaValue(0));
|
||||
// Must terminate; the exact returned value isn't a correctness
|
||||
// guarantee under malformed input, but no infinite loop is.
|
||||
let _root = resolve_root(SsaValue(0), &map);
|
||||
}
|
||||
|
||||
/// A four-deep copy chain v3 = v2 = v1 = v0 must collapse to v0
|
||||
/// in a single `copy_propagate` pass — the resolved replacement
|
||||
/// map drives downstream alias recovery, so the *transitive*
|
||||
/// closure must be exposed, not just the immediate parent.
|
||||
#[test]
|
||||
fn deep_copy_chain_collapses_to_root() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let nodes: Vec<_> = (0..4)
|
||||
.map(|_| cfg.add_node(make_cfg_node(StmtKind::Seq)))
|
||||
.collect();
|
||||
|
||||
let mut block_body = vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("\"x\"".into())),
|
||||
cfg_node: nodes[0],
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
}];
|
||||
for (i, node) in nodes.iter().enumerate().take(4).skip(1) {
|
||||
block_body.push(SsaInst {
|
||||
value: SsaValue(i as u32),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue((i - 1) as u32), 1)),
|
||||
cfg_node: *node,
|
||||
var_name: Some(format!("v{i}")),
|
||||
span: (i, i + 1),
|
||||
});
|
||||
}
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: block_body,
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: (0..4)
|
||||
.map(|i| ValueDef {
|
||||
var_name: Some(format!("v{i}")),
|
||||
cfg_node: nodes[i],
|
||||
block: BlockId(0),
|
||||
})
|
||||
.collect(),
|
||||
cfg_node_map: nodes
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, n)| (*n, SsaValue(i as u32)))
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 3, "v1, v2, v3 must all be eliminated");
|
||||
for i in 1..4 {
|
||||
assert_eq!(
|
||||
copy_map.get(&SsaValue(i)),
|
||||
Some(&SsaValue(0)),
|
||||
"v{i} must resolve transitively to v0"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Skip-conditions: copy-prop must NOT erase semantic info attached
|
||||
// to a copy's CFG node. These guard the three early-exits in
|
||||
// `copy_propagate`: labels, numeric-length, and string_prefix.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Build a single-block SSA body containing
|
||||
/// v0 = Const, v1 = Assign(v0)
|
||||
/// with `node1_decorator` applied to v1's CFG node so individual
|
||||
/// skip-conditions can be exercised.
|
||||
fn build_two_inst_body(decorate: impl FnOnce(&mut NodeInfo)) -> (Cfg, SsaBody) {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let mut n1_info = make_cfg_node(StmtKind::Seq);
|
||||
decorate(&mut n1_info);
|
||||
let n1 = cfg.add_node(n1_info);
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (3, 5),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
(cfg, body)
|
||||
}
|
||||
|
||||
/// Skip path 1: an Assign whose CFG node carries a label
|
||||
/// (sanitizer/source/sink) must NOT be propagated through. Erasing
|
||||
/// that label would silently drop a sanitization step from the
|
||||
/// taint path.
|
||||
#[test]
|
||||
fn copy_with_label_on_cfg_node_is_not_propagated() {
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
use smallvec::smallvec;
|
||||
let (cfg, mut body) = build_two_inst_body(|info| {
|
||||
info.taint.labels = smallvec![DataLabel::Sanitizer(Cap::SHELL_ESCAPE)];
|
||||
});
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 0, "copy through a labeled node must be skipped");
|
||||
assert!(
|
||||
matches!(body.blocks[0].body[1].op, SsaOp::Assign(_)),
|
||||
"labeled copy must remain an Assign, not be Nop'd"
|
||||
);
|
||||
}
|
||||
|
||||
/// Skip path 2: numeric-length reads (`arr.length`, `map.size`)
|
||||
/// have a different type from their source — propagating through
|
||||
/// would erase the Int type fact.
|
||||
#[test]
|
||||
fn copy_through_numeric_length_access_is_not_propagated() {
|
||||
let (cfg, mut body) = build_two_inst_body(|info| {
|
||||
info.is_numeric_length_access = true;
|
||||
});
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
eliminated, 0,
|
||||
"copy through numeric-length access must be skipped"
|
||||
);
|
||||
}
|
||||
|
||||
/// Skip path 3: an Assign carrying a `string_prefix` (template
|
||||
/// literal or `"lit" + var` RHS) seeds a StringFact on its SSA
|
||||
/// value. Propagating past it erases the prefix-bearing value and
|
||||
/// breaks SSRF prefix-lock suppression downstream.
|
||||
#[test]
|
||||
fn copy_through_string_prefix_node_is_not_propagated() {
|
||||
let (cfg, mut body) = build_two_inst_body(|info| {
|
||||
info.string_prefix = Some("https://api.example.com/".into());
|
||||
});
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
eliminated, 0,
|
||||
"copy through string_prefix-bearing node must be skipped"
|
||||
);
|
||||
}
|
||||
|
||||
/// Multi-operand Assigns (e.g. `v2 = v0 + v1`) are NOT copies and
|
||||
/// must be left alone. Only single-operand Assigns are copies.
|
||||
#[test]
|
||||
fn multi_operand_assign_is_not_a_copy() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("1".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Const(Some("2".into())),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Assign({
|
||||
let mut v: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
v.push(SsaValue(0));
|
||||
v.push(SsaValue(1));
|
||||
v
|
||||
}),
|
||||
cfg_node: n2,
|
||||
var_name: Some("z".into()),
|
||||
span: (4, 5),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("z".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 0, "two-operand Assign is not a copy");
|
||||
assert!(
|
||||
matches!(body.blocks[0].body[2].op, SsaOp::Assign(_)),
|
||||
"multi-operand Assign must be preserved"
|
||||
);
|
||||
}
|
||||
|
||||
/// A Call's argument and receiver slots that reference a
|
||||
/// copy-eliminated value must be rewritten to the root.
|
||||
#[test]
|
||||
fn call_args_and_receiver_rewritten_through_copy() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Call));
|
||||
let mut arg_vec: SmallVec<[SsaValue; 2]> = SmallVec::new();
|
||||
arg_vec.push(SsaValue(1));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("\"x\"".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Call {
|
||||
callee: "f".into(),
|
||||
callee_text: None,
|
||||
args: vec![arg_vec],
|
||||
receiver: Some(SsaValue(1)),
|
||||
},
|
||||
cfg_node: n2,
|
||||
var_name: None,
|
||||
span: (4, 7),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, _) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 1, "v1 should be eliminated");
|
||||
let call_inst = &body.blocks[0].body[2];
|
||||
match &call_inst.op {
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
assert_eq!(receiver, &Some(SsaValue(0)), "receiver rewritten to root");
|
||||
assert_eq!(args[0][0], SsaValue(0), "call arg rewritten to root");
|
||||
}
|
||||
other => panic!("expected Call op, got {:?}", other),
|
||||
}
|
||||
}
|
||||
|
||||
/// Phi operand referencing a copy-eliminated value must be
|
||||
/// rewritten to the root.
|
||||
#[test]
|
||||
fn phi_operand_rewritten_through_copy() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
// Block 0: v0=const, v1=assign(v0)
|
||||
// Block 1: v2 = phi(B0: v1)
|
||||
let mut phi_ops: smallvec::SmallVec<[(BlockId, SsaValue); 2]> = smallvec::SmallVec::new();
|
||||
phi_ops.push((BlockId(0), SsaValue(1)));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![
|
||||
SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("\"v0\"".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: SmallVec::new(),
|
||||
succs: {
|
||||
let mut s = SmallVec::new();
|
||||
s.push(BlockId(1));
|
||||
s
|
||||
},
|
||||
},
|
||||
SsaBlock {
|
||||
id: BlockId(1),
|
||||
phis: vec![SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Phi(phi_ops),
|
||||
cfg_node: n2,
|
||||
var_name: Some("b".into()),
|
||||
span: (4, 5),
|
||||
}],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(Some(SsaValue(2))),
|
||||
preds: {
|
||||
let mut p = SmallVec::new();
|
||||
p.push(BlockId(0));
|
||||
p
|
||||
},
|
||||
succs: SmallVec::new(),
|
||||
},
|
||||
],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(1),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 1);
|
||||
// The phi in block 1 should now reference v0, not v1.
|
||||
let phi = &body.blocks[1].phis[0];
|
||||
match &phi.op {
|
||||
SsaOp::Phi(ops) => {
|
||||
assert_eq!(
|
||||
ops[0].1,
|
||||
SsaValue(0),
|
||||
"phi operand should be rewritten to root v0"
|
||||
);
|
||||
}
|
||||
other => panic!("expected Phi op, got {:?}", other),
|
||||
}
|
||||
}
|
||||
|
||||
/// `copy_propagate` on a body with no Assign instructions returns
|
||||
/// `(0, empty_map)` and leaves the body untouched.
|
||||
#[test]
|
||||
fn no_op_when_no_copies_exist() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 0);
|
||||
assert!(map.is_empty());
|
||||
}
|
||||
}
|
||||
|
|
|
|||
364
src/ssa/dce.rs
364
src/ssa/dce.rs
|
|
@ -143,6 +143,7 @@ fn inst_used_values(inst: &SsaInst) -> Vec<SsaValue> {
|
|||
}
|
||||
vals
|
||||
}
|
||||
SsaOp::FieldProj { receiver, .. } => vec![*receiver],
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
|
|
@ -214,6 +215,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -260,6 +263,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -307,6 +312,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -350,6 +357,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -385,6 +394,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -392,6 +403,142 @@ mod tests {
|
|||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dce_keeps_field_proj_when_used() {
|
||||
// v0 = source(); v1 = field_proj(v0, "field"); ret v1
|
||||
// The terminator references v1, so the FieldProj's receiver chain
|
||||
// (v0) must stay reachable.
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut interner = crate::ssa::ir::FieldInterner::new();
|
||||
let fid = interner.intern("field");
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Source,
|
||||
cfg_node: n0,
|
||||
var_name: Some("obj".into()),
|
||||
span: (0, 5),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(0),
|
||||
field: fid,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: n1,
|
||||
var_name: Some("obj.field".into()),
|
||||
span: (10, 20),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(Some(SsaValue(1))),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("obj".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("obj.field".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: interner,
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 0,
|
||||
"FieldProj reachable from terminator must survive"
|
||||
);
|
||||
assert_eq!(body.blocks[0].body.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dce_removes_dead_field_proj() {
|
||||
// v0 = const("x"); v1 = field_proj(v0, "field"); ret (no v1 use)
|
||||
// Both should be removed since neither has a use and neither is
|
||||
// a Source/Call/labeled instruction.
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut interner = crate::ssa::ir::FieldInterner::new();
|
||||
let fid = interner.intern("field");
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("x".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("obj".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(0),
|
||||
field: fid,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: n1,
|
||||
var_name: Some("obj.field".into()),
|
||||
span: (2, 12),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("obj".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("obj.field".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: interner,
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
// First pass removes the FieldProj (no uses), second removes the Const
|
||||
// (no uses after FieldProj is gone).
|
||||
assert_eq!(
|
||||
removed, 2,
|
||||
"dead FieldProj and its dead receiver const must be removed"
|
||||
);
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn used_def_preserved() {
|
||||
// v0 = const("42"), v1 = assign(v0) — v0 is used, both survive
|
||||
|
|
@ -438,6 +585,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -446,4 +595,219 @@ mod tests {
|
|||
assert_eq!(removed, 2);
|
||||
assert_eq!(body.blocks[0].body.len(), 0);
|
||||
}
|
||||
|
||||
/// DCE must NEVER remove a Call instruction even when its result has
|
||||
/// zero uses — calls have side effects (I/O, throws, mutations) that
|
||||
/// cannot be modeled as SSA-value uses. This is the conservative
|
||||
/// invariant `is_dead()` enforces; regressing it would silently drop
|
||||
/// real-world code from analysis (sinks, sanitizers expressed as
|
||||
/// expression-statements, etc.).
|
||||
#[test]
|
||||
fn dead_call_with_unused_result_preserved() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Call));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Call {
|
||||
callee: "side_effect".into(),
|
||||
callee_text: None,
|
||||
args: Vec::new(),
|
||||
receiver: None,
|
||||
},
|
||||
cfg_node: n0,
|
||||
var_name: None,
|
||||
span: (0, 12),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 0,
|
||||
"Call with unused result must be preserved (side effects)"
|
||||
);
|
||||
assert_eq!(body.blocks[0].body.len(), 1);
|
||||
assert!(matches!(body.blocks[0].body[0].op, SsaOp::Call { .. }));
|
||||
}
|
||||
|
||||
/// A dead phi must be eliminated. We construct an entry block whose
|
||||
/// successor has a phi merging two unused constants and a Return(None).
|
||||
/// All defs are dead; DCE should strip every body and phi instruction.
|
||||
#[test]
|
||||
fn dead_phi_in_otherwise_dead_block_removed() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let entry_block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("1".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Const(Some("2".into())),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (1, 2),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::from_elem(BlockId(1), 1),
|
||||
};
|
||||
let join_block = SsaBlock {
|
||||
id: BlockId(1),
|
||||
phis: vec![SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Phi(smallvec::smallvec![
|
||||
(BlockId(0), SsaValue(0)),
|
||||
(BlockId(0), SsaValue(1)),
|
||||
]),
|
||||
cfg_node: n2,
|
||||
var_name: Some("phi".into()),
|
||||
span: (2, 3),
|
||||
}],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::from_elem(BlockId(0), 1),
|
||||
succs: SmallVec::new(),
|
||||
};
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![entry_block, join_block],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("phi".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(1),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
// Pass 1: the phi (no uses) goes; that drops the use-counts on v0/v1.
|
||||
// Pass 2: v0 and v1 (now unused) go.
|
||||
assert_eq!(removed, 3, "dead phi + two operands should be removed");
|
||||
assert!(
|
||||
body.blocks[1].phis.is_empty(),
|
||||
"dead phi must be eliminated"
|
||||
);
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
/// DCE iteration: removing v1 should make v0 dead on the next pass.
|
||||
/// Mirrors `used_def_preserved` but explicit about the chain.
|
||||
#[test]
|
||||
fn dce_iterates_until_fixpoint() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("1".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (1, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(1), 1)),
|
||||
cfg_node: n2,
|
||||
var_name: Some("c".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("c".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 3,
|
||||
"DCE must reach fixpoint and remove all 3 dead defs in the chain"
|
||||
);
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -48,6 +48,7 @@ impl fmt::Display for SsaBody {
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} => {
|
||||
if let Some(rv) = receiver {
|
||||
write!(f, "v{}.{callee}(", rv.0)?;
|
||||
|
|
@ -64,6 +65,20 @@ impl fmt::Display for SsaBody {
|
|||
.collect();
|
||||
write!(f, "{})", arg_strs.join(", "))?;
|
||||
}
|
||||
SsaOp::FieldProj {
|
||||
receiver,
|
||||
field,
|
||||
projected_type,
|
||||
} => {
|
||||
// Resolve the field name through the body's interner
|
||||
// so display output matches the original source field.
|
||||
let name = self.field_interner.resolve(*field);
|
||||
if let Some(ty) = projected_type {
|
||||
write!(f, "field_proj(v{}, {name:?}) :: {ty:?}", receiver.0)?;
|
||||
} else {
|
||||
write!(f, "field_proj(v{}, {name:?})", receiver.0)?;
|
||||
}
|
||||
}
|
||||
SsaOp::Source => write!(f, "source()")?,
|
||||
SsaOp::Const(val) => {
|
||||
if let Some(v) = val {
|
||||
|
|
|
|||
|
|
@ -23,7 +23,7 @@
|
|||
#![allow(clippy::collapsible_if, clippy::unnecessary_map_or)]
|
||||
|
||||
use crate::cfg::Cfg;
|
||||
use crate::labels::Cap;
|
||||
use crate::labels::{Cap, bare_method_name};
|
||||
use crate::ssa::ir::*;
|
||||
use crate::ssa::pointsto::{ContainerOp, classify_container_op};
|
||||
use crate::symbol::Lang;
|
||||
|
|
@ -588,7 +588,7 @@ fn is_container_literal(text: &str) -> bool {
|
|||
/// Check if a callee creates a new container (constructor/factory).
|
||||
pub fn is_container_constructor(callee: &str, lang: Lang) -> bool {
|
||||
// Extract last segment after '.' or '::' (whichever comes last)
|
||||
let after_dot = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let after_dot = bare_method_name(callee);
|
||||
let suffix = after_dot.rsplit("::").next().unwrap_or(after_dot);
|
||||
let suffix_lower = suffix.to_ascii_lowercase();
|
||||
|
||||
|
|
|
|||
|
|
@ -548,6 +548,7 @@ fn op_kind(op: &SsaOp) -> &'static str {
|
|||
SsaOp::CatchParam => "CatchParam",
|
||||
SsaOp::Nop => "Nop",
|
||||
SsaOp::Undef => "Undef",
|
||||
SsaOp::FieldProj { .. } => "FieldProj",
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -785,6 +786,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
@ -830,6 +833,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
@ -878,6 +883,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
@ -904,6 +911,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
|
|||
390
src/ssa/ir.rs
390
src/ssa/ir.rs
|
|
@ -1,8 +1,10 @@
|
|||
use crate::constraint::domain::ConstValue;
|
||||
use crate::constraint::lower::ConditionExpr;
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use smallvec::SmallVec;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Unique identifier for an SSA value (one per definition point).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
|
|
@ -12,6 +14,141 @@ pub struct SsaValue(pub u32);
|
|||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub struct BlockId(pub u32);
|
||||
|
||||
/// Interned field-name identifier, scoped to a single [`SsaBody`].
|
||||
///
|
||||
/// Different bodies may assign different `FieldId`s to the same field name,
|
||||
/// so callers MUST resolve through the owning body's [`FieldInterner`]
|
||||
/// (`SsaBody::field_name`) before using the name in cross-body contexts
|
||||
/// (e.g. summary serialization).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub struct FieldId(pub u32);
|
||||
|
||||
impl FieldId {
|
||||
/// Pointer-Phase 4 sentinel for the abstract "any element of a
|
||||
/// container" field. Steensgaard-grade precision: every numeric
|
||||
/// or dynamic index access (`arr[i]`, `arr.shift()`, `map[k]`)
|
||||
/// projects through the same `Field(pt(container), ELEM)` cell so
|
||||
/// per-element taint propagation is independent of the SSA value
|
||||
/// referencing the container.
|
||||
///
|
||||
/// `u32::MAX` is reserved by convention; the per-body
|
||||
/// [`FieldInterner`] never assigns it because interning is
|
||||
/// monotone-ascending from `0` and bodies don't approach 4 billion
|
||||
/// fields. Consumers should compare with `==` rather than reach
|
||||
/// into the wrapped `u32`.
|
||||
pub const ELEM: FieldId = FieldId(u32::MAX);
|
||||
|
||||
/// "Tainted at every field" wildcard sentinel — distinct from
|
||||
/// [`Self::ELEM`] (which is container-element semantics: every
|
||||
/// numeric/dynamic index access projects through it).
|
||||
/// `ANY_FIELD` represents the case where a writeback-shaped sink
|
||||
/// (`json.NewDecoder(r.Body).Decode(&dest)`,
|
||||
/// `proto.Unmarshal(buf, &msg)`) taints the destination wholesale
|
||||
/// without a per-field decomposition the caller can enumerate.
|
||||
/// Read by [`SsaOp::FieldProj`] as a fallback when no specific
|
||||
/// `(loc, *field)` cell exists, so subsequent struct-field reads
|
||||
/// pick up the writeback's taint without over-tainting unrelated
|
||||
/// containers' element cells. `u32::MAX - 1` is reserved
|
||||
/// alongside `ELEM` and is similarly never assigned by the per-
|
||||
/// body interner.
|
||||
pub const ANY_FIELD: FieldId = FieldId(u32::MAX - 1);
|
||||
}
|
||||
|
||||
/// Per-body interner for field names referenced by [`SsaOp::FieldProj`].
|
||||
///
|
||||
/// Names are deduped within a single SSA body: every distinct field-name
|
||||
/// string is assigned a stable `FieldId(u32)` for the lifetime of the body.
|
||||
/// The interner is serialized alongside the body so deserialization restores
|
||||
/// IDs intact; cross-body summary code is responsible for resolving names
|
||||
/// before passing them across body boundaries.
|
||||
#[derive(Clone, Debug, Default, Serialize, Deserialize)]
|
||||
pub struct FieldInterner {
|
||||
/// Names indexed by `FieldId.0`.
|
||||
names: Vec<String>,
|
||||
/// Reverse lookup: name → existing FieldId.
|
||||
#[serde(skip)]
|
||||
lookup: HashMap<String, u32>,
|
||||
}
|
||||
|
||||
impl FieldInterner {
|
||||
/// Create an empty interner.
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
/// Intern a field name, returning its [`FieldId`]. Reuses the existing
|
||||
/// id if the name has already been interned.
|
||||
pub fn intern(&mut self, name: &str) -> FieldId {
|
||||
if let Some(&id) = self.lookup.get(name) {
|
||||
return FieldId(id);
|
||||
}
|
||||
let id = self.names.len() as u32;
|
||||
self.names.push(name.to_string());
|
||||
self.lookup.insert(name.to_string(), id);
|
||||
FieldId(id)
|
||||
}
|
||||
|
||||
/// Read-only lookup: returns the [`FieldId`] for `name` if it has
|
||||
/// already been interned, or `None` otherwise.
|
||||
///
|
||||
/// Used by cross-call resolvers (Pointer-Phase 5 / W3) to avoid
|
||||
/// growing the caller's interner with field names introduced
|
||||
/// solely by the callee summary — such IDs would never be referenced
|
||||
/// by any other instruction in the caller's body, so the cells
|
||||
/// would be write-only and consume space without contributing
|
||||
/// to taint flow.
|
||||
pub fn lookup(&self, name: &str) -> Option<FieldId> {
|
||||
// Walk `names` directly so we don't require the post-deserialise
|
||||
// `ensure_lookup()` rebuild before this method is callable.
|
||||
// Callers usually own `&SsaBody` — interning was either done at
|
||||
// lowering time or via `ensure_lookup` post-deserialise — so the
|
||||
// hot path goes through the `lookup` table; the linear walk is
|
||||
// a fallback for the (small) deserialised-but-not-rebuilt case.
|
||||
if let Some(&id) = self.lookup.get(name) {
|
||||
return Some(FieldId(id));
|
||||
}
|
||||
for (idx, n) in self.names.iter().enumerate() {
|
||||
if n == name {
|
||||
return Some(FieldId(idx as u32));
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Resolve a [`FieldId`] back to its interned name.
|
||||
pub fn resolve(&self, id: FieldId) -> &str {
|
||||
&self.names[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Number of unique interned names.
|
||||
pub fn len(&self) -> usize {
|
||||
self.names.len()
|
||||
}
|
||||
|
||||
/// Whether the interner contains no names.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.names.is_empty()
|
||||
}
|
||||
|
||||
/// Rebuild the reverse lookup after deserialization. Called lazily by
|
||||
/// [`Self::ensure_lookup`] so deserialized interners can still be used
|
||||
/// for further interning.
|
||||
fn rebuild_lookup(&mut self) {
|
||||
self.lookup.clear();
|
||||
for (i, n) in self.names.iter().enumerate() {
|
||||
self.lookup.entry(n.clone()).or_insert(i as u32);
|
||||
}
|
||||
}
|
||||
|
||||
/// Ensure the reverse lookup is populated (rebuilds after a serde
|
||||
/// roundtrip when the lookup table was skipped).
|
||||
pub fn ensure_lookup(&mut self) {
|
||||
if self.lookup.len() != self.names.len() {
|
||||
self.rebuild_lookup();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// SSA instruction operation.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub enum SsaOp {
|
||||
|
|
@ -20,13 +157,48 @@ pub enum SsaOp {
|
|||
/// Assignment: result depends on the listed SSA values.
|
||||
Assign(SmallVec<[SsaValue; 4]>),
|
||||
/// Function/method call.
|
||||
///
|
||||
/// `callee` is the canonical name SSA-time consumers should match on.
|
||||
/// When SSA lowering decomposes a chained-receiver method call into a
|
||||
/// `FieldProj` chain (e.g. `c.mu.Lock()` → `v_mu = FieldProj(v_c, "mu")`,
|
||||
/// `Call("Lock", [v_mu])`), `callee` carries the bare method name
|
||||
/// (`"Lock"`) and `callee_text` carries the original full path
|
||||
/// (`Some("c.mu.Lock")`). When no decomposition happens, `callee_text`
|
||||
/// is `None` and `callee` already holds the original textual form.
|
||||
Call {
|
||||
callee: String,
|
||||
/// Original textual full path when SSA decomposed a chained receiver.
|
||||
/// `None` when the callee was not rewritten — `callee` already holds
|
||||
/// the source-level textual form.
|
||||
///
|
||||
/// **Debug / display only.** Analysis code must walk the SSA receiver
|
||||
/// chain (through `FieldProj` ops) for precise field structure, or
|
||||
/// use [`crate::labels::bare_method_name`] when only the terminal
|
||||
/// method name is needed from a textual callee.
|
||||
#[doc(hidden)]
|
||||
#[serde(default)]
|
||||
callee_text: Option<String>,
|
||||
/// Per-argument SSA value uses.
|
||||
args: Vec<SmallVec<[SsaValue; 2]>>,
|
||||
/// Receiver SSA value (for method calls).
|
||||
receiver: Option<SsaValue>,
|
||||
},
|
||||
/// Field projection: read field `field` of object `receiver`.
|
||||
///
|
||||
/// Models member-access expressions (`obj.field`) as a first-class SSA
|
||||
/// op. Lowering walks the receiver tree so chained accesses like
|
||||
/// `c.writer.header` produce a chain of `FieldProj` ops with explicit
|
||||
/// per-step receivers — eliminating the textual-prefix parsing that
|
||||
/// previously misclassified deep receivers (the gin/context.go FP).
|
||||
///
|
||||
/// `field` is interned in the owning [`SsaBody`]'s [`FieldInterner`].
|
||||
/// `projected_type` carries the inferred type of the projected field
|
||||
/// when known (populated by type-fact analysis), `None` otherwise.
|
||||
FieldProj {
|
||||
receiver: SsaValue,
|
||||
field: FieldId,
|
||||
projected_type: Option<TypeKind>,
|
||||
},
|
||||
/// Taint source introduction.
|
||||
Source,
|
||||
/// Constant / literal value (no taint).
|
||||
|
|
@ -168,6 +340,31 @@ pub struct SsaBody {
|
|||
/// Recorded during lowering when exception edges are stripped from the CFG.
|
||||
/// Used by taint analysis to seed catch blocks with try-body taint state.
|
||||
pub exception_edges: Vec<(BlockId, BlockId)>,
|
||||
/// Per-body interner for [`SsaOp::FieldProj`] field names.
|
||||
///
|
||||
/// Empty until the lowering phase emits FieldProj ops (Phase 2 of the
|
||||
/// field-projections rollout). Cross-body callers (cross-file
|
||||
/// summaries, debug serialization) MUST resolve interned ids through
|
||||
/// this interner before transporting field references to other bodies.
|
||||
#[serde(default)]
|
||||
pub field_interner: FieldInterner,
|
||||
/// Pointer-Phase 3 / W1: side-table mapping a synthetic base-update
|
||||
/// [`SsaOp::Assign`]'s defined value back to the `(receiver, field)`
|
||||
/// pair it represents. Populated by SSA lowering at the
|
||||
/// `obj.f = rhs` synthesis point so the taint engine can recognise
|
||||
/// the synthetic assign as a structural field WRITE — the assigned
|
||||
/// value is the new "obj" value, the use is the rhs, and the side-
|
||||
/// table records `(prior_obj_value, FieldId("f"))`.
|
||||
///
|
||||
/// Empty by default; only synthetic assigns whose enclosing source
|
||||
/// statement was a dotted-path assignment (`a.b.c = …`) appear here.
|
||||
/// Lookup is `O(log n)` worst case (`HashMap`), but the per-body
|
||||
/// table is small (one entry per synthetic chain link).
|
||||
///
|
||||
/// Serialized via `#[serde(default)]` so pre-W1 SSA blobs decode
|
||||
/// cleanly with an empty map (no migration needed).
|
||||
#[serde(default)]
|
||||
pub field_writes: HashMap<SsaValue, (SsaValue, FieldId)>,
|
||||
}
|
||||
|
||||
impl SsaBody {
|
||||
|
|
@ -190,6 +387,53 @@ impl SsaBody {
|
|||
pub fn def_of(&self, v: SsaValue) -> &ValueDef {
|
||||
&self.value_defs[v.0 as usize]
|
||||
}
|
||||
|
||||
/// Resolve a [`FieldId`] back to the interned field name within this body.
|
||||
pub fn field_name(&self, id: FieldId) -> &str {
|
||||
self.field_interner.resolve(id)
|
||||
}
|
||||
|
||||
/// Intern a field name in this body's [`FieldInterner`], returning its
|
||||
/// stable [`FieldId`].
|
||||
pub fn intern_field(&mut self, name: &str) -> FieldId {
|
||||
self.field_interner.intern(name)
|
||||
}
|
||||
}
|
||||
|
||||
impl SsaInst {
|
||||
/// Iterate over the SSA values used (read) by this instruction.
|
||||
///
|
||||
/// Yields receiver/operand values for `Call`, `Phi`, `Assign`, and
|
||||
/// `FieldProj`; nothing for leaf ops (`Const`, `Param`, `Source`, etc.).
|
||||
/// Callers that need the values as a `Vec` should `.collect()`.
|
||||
pub fn uses_iter(&self) -> SmallVec<[SsaValue; 4]> {
|
||||
match &self.op {
|
||||
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
|
||||
SsaOp::Assign(uses) => uses.iter().copied().collect(),
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
let mut out: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
if let Some(rv) = receiver {
|
||||
out.push(*rv);
|
||||
}
|
||||
for arg in args {
|
||||
out.extend(arg.iter().copied());
|
||||
}
|
||||
out
|
||||
}
|
||||
SsaOp::FieldProj { receiver, .. } => {
|
||||
let mut out: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
out.push(*receiver);
|
||||
out
|
||||
}
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam
|
||||
| SsaOp::Nop
|
||||
| SsaOp::Undef => SmallVec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Errors that can occur during SSA lowering.
|
||||
|
|
@ -211,3 +455,149 @@ impl std::fmt::Display for SsaError {
|
|||
}
|
||||
|
||||
impl std::error::Error for SsaError {}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn field_interner_dedupes_names() {
|
||||
let mut interner = FieldInterner::new();
|
||||
let a = interner.intern("mu");
|
||||
let b = interner.intern("mu");
|
||||
let c = interner.intern("writer");
|
||||
assert_eq!(a, b, "interning same name twice yields same id");
|
||||
assert_ne!(a, c, "different names get different ids");
|
||||
assert_eq!(interner.resolve(a), "mu");
|
||||
assert_eq!(interner.resolve(c), "writer");
|
||||
assert_eq!(interner.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_interner_serde_roundtrip_rebuilds_lookup() {
|
||||
let mut interner = FieldInterner::new();
|
||||
let a = interner.intern("mu");
|
||||
let b = interner.intern("writer");
|
||||
let json = serde_json::to_string(&interner).expect("serialize");
|
||||
let mut restored: FieldInterner = serde_json::from_str(&json).expect("deserialize");
|
||||
assert_eq!(restored.resolve(a), "mu");
|
||||
assert_eq!(restored.resolve(b), "writer");
|
||||
// After ensure_lookup, intern("mu") returns the original id (not a new one).
|
||||
restored.ensure_lookup();
|
||||
assert_eq!(restored.intern("mu"), a);
|
||||
assert_eq!(restored.intern("header"), FieldId(2));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_proj_use_iter_includes_receiver() {
|
||||
let inst = SsaInst {
|
||||
value: SsaValue(3),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(1),
|
||||
field: FieldId(0),
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: Some("c.mu".into()),
|
||||
span: (0, 0),
|
||||
};
|
||||
let uses: Vec<SsaValue> = inst.uses_iter().into_iter().collect();
|
||||
assert_eq!(uses, vec![SsaValue(1)]);
|
||||
}
|
||||
|
||||
/// Pointer-Phase 4 / A6 audit: the [`FieldId::ELEM`] sentinel is
|
||||
/// reserved for "any element of a container". The interner assigns
|
||||
/// IDs monotonically from `0`, so the sentinel `u32::MAX` can only
|
||||
/// collide if the body declares ~4 billion fields — a corner case
|
||||
/// no realistic codebase reaches. Pin the contract with a stress
|
||||
/// loop so future implementation drift can't silently shift IDs to
|
||||
/// the sentinel value.
|
||||
#[test]
|
||||
fn field_interner_never_assigns_elem_sentinel() {
|
||||
let mut interner = FieldInterner::new();
|
||||
for i in 0..1024 {
|
||||
let id = interner.intern(&format!("f{i}"));
|
||||
assert_ne!(
|
||||
id,
|
||||
FieldId::ELEM,
|
||||
"intern('f{i}') yielded the ELEM sentinel — invariant broken",
|
||||
);
|
||||
}
|
||||
// Lookup of the sentinel name (used by W3 to round-trip
|
||||
// container-element flow through summary) must NOT match a
|
||||
// real interned name even when the same name is interned.
|
||||
// The wire-format keeps `<elem>` as a *string marker* — it
|
||||
// never goes through `intern`. Instead, callers compare
|
||||
// explicitly against `FieldId::ELEM`.
|
||||
assert_ne!(interner.intern("<elem>"), FieldId::ELEM);
|
||||
}
|
||||
|
||||
/// A6: the `<elem>` marker round-trips through extraction →
|
||||
/// SQLite → caller-side translation without colliding with a
|
||||
/// caller-interned `<elem>` field. When a caller's body has its
|
||||
/// own `<elem>` field, that gets a regular FieldId, distinct from
|
||||
/// the sentinel.
|
||||
#[test]
|
||||
fn elem_marker_distinct_from_interner_assigned_id() {
|
||||
let mut interner = FieldInterner::new();
|
||||
let lit_elem = interner.intern("<elem>");
|
||||
// Sentinel still compares equal to itself only.
|
||||
assert_eq!(FieldId::ELEM, FieldId(u32::MAX));
|
||||
assert_ne!(lit_elem, FieldId::ELEM);
|
||||
// Resolve the literal-string id back to its interned name.
|
||||
assert_eq!(interner.resolve(lit_elem), "<elem>");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_proj_serde_roundtrip_with_field_name() {
|
||||
// Build a tiny body with one FieldProj op and check that the
|
||||
// body's interner survives round-trip and the id resolves back
|
||||
// to the original name.
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("c".into()),
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: FieldInterner::new(),
|
||||
field_writes: HashMap::new(),
|
||||
};
|
||||
let fid = body.intern_field("mu");
|
||||
body.blocks[0].body.push(SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(0),
|
||||
field: fid,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: Some("c.mu".into()),
|
||||
span: (0, 0),
|
||||
});
|
||||
|
||||
let json = serde_json::to_string(&body).expect("serialize body");
|
||||
let restored: SsaBody = serde_json::from_str(&json).expect("deserialize body");
|
||||
|
||||
let inst = &restored.blocks[0].body[0];
|
||||
match &inst.op {
|
||||
SsaOp::FieldProj {
|
||||
receiver, field, ..
|
||||
} => {
|
||||
assert_eq!(*receiver, SsaValue(0));
|
||||
assert_eq!(restored.field_name(*field), "mu");
|
||||
}
|
||||
other => panic!("expected FieldProj, got {other:?}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
1374
src/ssa/lower.rs
1374
src/ssa/lower.rs
File diff suppressed because it is too large
Load diff
|
|
@ -21,6 +21,7 @@ pub use lower::lower_to_ssa_scoped_nop;
|
|||
pub use lower::lower_to_ssa_with_params;
|
||||
|
||||
use crate::cfg::Cfg;
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
use crate::symbol::Lang;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
|
|
@ -51,6 +52,19 @@ pub struct OptimizeResult {
|
|||
///
|
||||
/// Pipeline: const propagation → branch pruning → copy propagation → DCE → type facts.
|
||||
pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> OptimizeResult {
|
||||
optimize_ssa_with_param_types(body, cfg, lang, &[])
|
||||
}
|
||||
|
||||
/// Same as [`optimize_ssa`] but seeds [`SsaOp::Param`] values with
|
||||
/// per-position [`TypeKind`] facts derived from the function's
|
||||
/// `BodyMeta.param_types`. Strictly additive: an empty slice or
|
||||
/// `None` entries leave the type-fact analysis behaviour unchanged.
|
||||
pub fn optimize_ssa_with_param_types(
|
||||
body: &mut SsaBody,
|
||||
cfg: &Cfg,
|
||||
lang: Option<Lang>,
|
||||
param_types: &[Option<TypeKind>],
|
||||
) -> OptimizeResult {
|
||||
// 1. Constant propagation (SCCP)
|
||||
let cp = const_prop::const_propagate(body);
|
||||
let branches_pruned = const_prop::apply_const_prop(body, &cp);
|
||||
|
|
@ -65,7 +79,8 @@ pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> Optimi
|
|||
let dead_defs_removed = dce::eliminate_dead_defs(body, cfg);
|
||||
|
||||
// 5. Type fact analysis (uses const prop results + language for constructor inference)
|
||||
let type_facts = type_facts::analyze_types(body, cfg, &cp.values, lang);
|
||||
let type_facts =
|
||||
type_facts::analyze_types_with_param_types(body, cfg, &cp.values, lang, param_types);
|
||||
|
||||
// 6. Points-to analysis (uses allocation site detection + SSA def-use)
|
||||
let points_to = heap::analyze_points_to(body, cfg, lang);
|
||||
|
|
|
|||
|
|
@ -415,6 +415,8 @@ mod tests {
|
|||
value_defs,
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -606,6 +608,7 @@ mod tests {
|
|||
0,
|
||||
SsaOp::Call {
|
||||
callee: "list".to_string(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@
|
|||
//! across all supported languages so that taint flows correctly through
|
||||
//! collection operations.
|
||||
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::symbol::Lang;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
|
|
@ -29,6 +30,14 @@ pub enum ContainerOp {
|
|||
/// `index_arg`: same semantics as `Store::index_arg` — when present and
|
||||
/// provably constant, loads from `HeapSlot::Index(n)`.
|
||||
Load { index_arg: Option<usize> },
|
||||
/// Taint flows from the receiver container into the argument at
|
||||
/// `dest_arg` — i.e. the "writeback" pattern where a method writes its
|
||||
/// decoded/loaded value into a caller-supplied destination rather than
|
||||
/// returning it. Used for the Go `*.Decode(&dest)` family
|
||||
/// (`json.Decoder.Decode`, `xml.Decoder.Decode`, `gob.Decoder.Decode`),
|
||||
/// where `r.Body → json.NewDecoder(r.Body).Decode(&dest)` should taint
|
||||
/// `dest` even though `Decode` returns only an `error`.
|
||||
Writeback { dest_arg: usize },
|
||||
}
|
||||
|
||||
/// Convenience: store with a single value argument, no index tracking.
|
||||
|
|
@ -92,7 +101,7 @@ fn load_indexed(idx_pos: usize) -> Option<ContainerOp> {
|
|||
/// Returns `None` if the callee is not a recognised container operation.
|
||||
pub fn classify_container_op(callee: &str, lang: Lang) -> Option<ContainerOp> {
|
||||
// Extract method name: last segment after '.' (or full name if no dot).
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let method = bare_method_name(callee);
|
||||
|
||||
match lang {
|
||||
Lang::JavaScript | Lang::TypeScript => classify_js(method),
|
||||
|
|
@ -121,6 +130,10 @@ fn classify_js(method: &str) -> Option<ContainerOp> {
|
|||
// map.get(key) — key at 0
|
||||
"get" => load_indexed(0),
|
||||
"values" | "keys" | "entries" => load(),
|
||||
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
|
||||
// lowering for subscript reads/writes (`arr[i]`, `arr[i] = v`).
|
||||
"__index_get__" => load_indexed(0),
|
||||
"__index_set__" => store_indexed(1, 0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -140,6 +153,10 @@ fn classify_python(method: &str) -> Option<ContainerOp> {
|
|||
"get" => load_indexed(0), // dict.get(key) / list index — key/index at 0
|
||||
"items" | "values" | "keys" => load(),
|
||||
"join" => load(),
|
||||
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
|
||||
// lowering for subscript reads/writes (`arr[i]`, `arr[i] = v`).
|
||||
"__index_get__" => load_indexed(0),
|
||||
"__index_set__" => store_indexed(1, 0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -173,6 +190,24 @@ fn classify_go(method: &str, callee: &str) -> Option<ContainerOp> {
|
|||
match method {
|
||||
"Add" | "Set" | "Store" | "Put" => store(0),
|
||||
"Get" | "Load" | "Pop" => load(),
|
||||
// Stream-decoder writeback. In Go, the canonical decode pattern
|
||||
// takes a destination as the sole positional argument and returns
|
||||
// only an `error`:
|
||||
// decoder := json.NewDecoder(r.Body)
|
||||
// decoder.Decode(&dest)
|
||||
// The decoder's receiver chain carries the source taint
|
||||
// (`r.Body` → `json.NewDecoder(r.Body)` → `decoder`); without a
|
||||
// writeback rule, the destination stays clean and downstream sinks
|
||||
// miss the flow. `Unmarshal` is the matching sibling pattern on
|
||||
// top-level decoders (e.g. `proto.Unmarshal(buf, &msg)`); the
|
||||
// method-call form has the bytes carried via the receiver, not arg 0,
|
||||
// so it lines up with the writeback contract just like `Decode`.
|
||||
"Decode" | "Unmarshal" => Some(ContainerOp::Writeback { dest_arg: 0 }),
|
||||
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
|
||||
// lowering for Go index_expression reads/writes (`arr[i]`,
|
||||
// `m[k] = v`).
|
||||
"__index_get__" => load_indexed(0),
|
||||
"__index_set__" => store_indexed(1, 0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -195,9 +230,22 @@ fn classify_php(method: &str) -> Option<ContainerOp> {
|
|||
|
||||
fn classify_cpp(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
"push_back" | "emplace_back" | "insert" | "emplace" | "push" => store(0),
|
||||
"front" | "back" | "pop_back" | "pop_front" | "top" => load(),
|
||||
// vector.at(index) — index at 0
|
||||
// Mutating container operations.
|
||||
// `assign` overwrites the container's contents with the argument
|
||||
// sequence — modeled as Store so the receiver inherits the argument
|
||||
// taint, matching the runtime "the values now live inside this
|
||||
// container" semantics shared with `push_back`/`emplace_back`.
|
||||
"push_back" | "emplace_back" | "insert" | "emplace" | "push" | "assign" => store(0),
|
||||
// Map/unordered_map insertion: `m.insert_or_assign(k, v)` — value at 1.
|
||||
"insert_or_assign" => store_indexed(1, 0),
|
||||
// Read-only container observers. `find`/`count` return iterators or
|
||||
// counts that carry the container's value taint when queried with a
|
||||
// tainted needle; `data` returns a pointer to the underlying buffer
|
||||
// (its real identity-passthrough behaviour for `c_str`/`data` is
|
||||
// refined in the labels phase, but Load propagation gives us the
|
||||
// baseline cap-flow without further plumbing).
|
||||
"front" | "back" | "pop_back" | "pop_front" | "top" | "find" | "count" | "data" => load(),
|
||||
// Indexed reads: `vector::at(i)`, `unordered_map::at(k)`.
|
||||
"at" => load_indexed(0),
|
||||
_ => None,
|
||||
}
|
||||
|
|
@ -255,6 +303,40 @@ mod tests {
|
|||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
// CVE Hunt Session 2 (Owncast CVE-2023-3188 / CVE-2024-31450 family):
|
||||
// Go `*.Decode(&dest)` is the canonical streaming-decoder writeback —
|
||||
// `json.NewDecoder(r.Body).Decode(&dest)`, `xml.NewDecoder(r).Decode(&out)`,
|
||||
// `gob.NewDecoder(buf).Decode(&v)`. The decoder receiver carries the
|
||||
// source taint and the destination is arg 0; the writeback rule is the
|
||||
// only way taint reaches `dest` because `Decode` itself returns only
|
||||
// `error`. The same-shape `Unmarshal` pattern (`proto.Unmarshal`,
|
||||
// `tar.Header.Unmarshal`) on a typed receiver follows the same contract.
|
||||
#[test]
|
||||
fn go_decode_is_writeback_dest_arg_zero() {
|
||||
match classify_container_op("decoder.Decode", Lang::Go) {
|
||||
Some(ContainerOp::Writeback { dest_arg }) => assert_eq!(dest_arg, 0),
|
||||
other => panic!("expected Writeback {{ dest_arg: 0 }}, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn go_unmarshal_is_writeback_dest_arg_zero() {
|
||||
match classify_container_op("hdr.Unmarshal", Lang::Go) {
|
||||
Some(ContainerOp::Writeback { dest_arg }) => assert_eq!(dest_arg, 0),
|
||||
other => panic!("expected Writeback {{ dest_arg: 0 }}, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_decode_is_not_writeback() {
|
||||
// The Writeback rule is a Go-specific pattern; JS/TS `decode`
|
||||
// helpers (`Buffer.from(s, 'base64').toString()` etc.) return their
|
||||
// result and don't have a writeback contract. Make sure we didn't
|
||||
// accidentally widen the rule into other languages.
|
||||
assert!(classify_container_op("decoder.Decode", Lang::JavaScript).is_none());
|
||||
assert!(classify_container_op("decoder.Decode", Lang::Python).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_method_is_none() {
|
||||
assert!(classify_container_op("obj.frobnicate", Lang::JavaScript).is_none());
|
||||
|
|
@ -311,4 +393,102 @@ mod tests {
|
|||
panic!("expected Load");
|
||||
}
|
||||
}
|
||||
|
||||
// ── C++ Phase 1 additions ──────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn cpp_push_back_is_store() {
|
||||
let op = classify_container_op("v.push_back", Lang::Cpp);
|
||||
match op {
|
||||
Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) => {
|
||||
assert_eq!(value_args.as_slice(), &[0]);
|
||||
assert_eq!(index_arg, None);
|
||||
}
|
||||
_ => panic!("expected Store"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_assign_is_store() {
|
||||
// vector::assign(args) overwrites the container's contents — the
|
||||
// receiver inherits argument taint just like push_back.
|
||||
let op = classify_container_op("v.assign", Lang::Cpp);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_insert_or_assign_indexes_value() {
|
||||
// map::insert_or_assign(key, value) — value is at arg 1, key at arg 0.
|
||||
match classify_container_op("m.insert_or_assign", Lang::Cpp) {
|
||||
Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) => {
|
||||
assert_eq!(value_args.as_slice(), &[1]);
|
||||
assert_eq!(index_arg, Some(0));
|
||||
}
|
||||
other => panic!("expected indexed Store, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_find_count_data_are_load() {
|
||||
for callee in ["m.find", "m.count", "v.data"] {
|
||||
assert!(
|
||||
matches!(
|
||||
classify_container_op(callee, Lang::Cpp),
|
||||
Some(ContainerOp::Load { .. })
|
||||
),
|
||||
"{callee} should be a Load",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_at_is_indexed_load() {
|
||||
match classify_container_op("v.at", Lang::Cpp) {
|
||||
Some(ContainerOp::Load { index_arg }) => assert_eq!(index_arg, Some(0)),
|
||||
other => panic!("expected indexed Load, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
/// W5: synthetic `__index_get__` is recognised as an indexed load
|
||||
/// in JS/TS, Python, and Go — driving the index_arg=0 path so a
|
||||
/// constant-key subscript read flows through `HeapSlot::Index(n)`.
|
||||
#[test]
|
||||
fn synth_index_get_classified_as_indexed_load_js_py_go() {
|
||||
for lang in [Lang::JavaScript, Lang::TypeScript, Lang::Python, Lang::Go] {
|
||||
match classify_container_op("__index_get__", lang) {
|
||||
Some(ContainerOp::Load { index_arg }) => {
|
||||
assert_eq!(index_arg, Some(0), "{lang:?} should mark idx arg=0");
|
||||
}
|
||||
other => panic!("{lang:?}: expected indexed Load, got {other:?}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// W5: synthetic `__index_set__` is recognised as an indexed store
|
||||
/// in JS/TS, Python, and Go — value at arg 1, index at arg 0.
|
||||
#[test]
|
||||
fn synth_index_set_classified_as_indexed_store_js_py_go() {
|
||||
for lang in [Lang::JavaScript, Lang::TypeScript, Lang::Python, Lang::Go] {
|
||||
match classify_container_op("__index_set__", lang) {
|
||||
Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) => {
|
||||
assert_eq!(
|
||||
value_args.as_slice(),
|
||||
&[1],
|
||||
"{lang:?} value arg should be 1"
|
||||
);
|
||||
assert_eq!(index_arg, Some(0), "{lang:?} index arg should be 0");
|
||||
}
|
||||
other => panic!("{lang:?}: expected indexed Store, got {other:?}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -256,6 +256,7 @@ pub fn analyze(
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} => {
|
||||
if candidates.contains_key(&inst.value) && is_rust_map_constructor(callee) {
|
||||
continue;
|
||||
|
|
@ -437,6 +438,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let cfg: Cfg = Graph::new();
|
||||
let const_values = HashMap::new();
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
#![allow(clippy::if_same_then_else)]
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
|
||||
use super::const_prop::ConstLattice;
|
||||
use super::ir::*;
|
||||
|
|
@ -32,6 +32,40 @@ pub enum TypeKind {
|
|||
/// `label_prefix` — it never participates in label-based callee
|
||||
/// resolution.
|
||||
LocalCollection,
|
||||
/// Phase 6: a framework-injected DTO body whose field types are
|
||||
/// known. Populated only when a parameter is recognised as a typed
|
||||
/// extractor by a Phase 1-2 matcher AND the DTO class / struct /
|
||||
/// Pydantic model is resolvable in the current scan scope.
|
||||
/// Strictly additive — when no DTO definition is found, callers
|
||||
/// fall through to today's pre-Phase-6 behaviour.
|
||||
Dto(DtoFields),
|
||||
}
|
||||
|
||||
/// Phase 6: structural carrier for a recognised DTO type. Maps
|
||||
/// declared field names to their inferred [`TypeKind`]. Nested DTOs
|
||||
/// use [`TypeKind::Dto`] recursively.
|
||||
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct DtoFields {
|
||||
pub class_name: String,
|
||||
/// Sorted-by-key map for stable iteration / serialisation.
|
||||
pub fields: BTreeMap<String, TypeKind>,
|
||||
}
|
||||
|
||||
impl DtoFields {
|
||||
pub fn new(class_name: impl Into<String>) -> Self {
|
||||
Self {
|
||||
class_name: class_name.into(),
|
||||
fields: BTreeMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, field: impl Into<String>, kind: TypeKind) {
|
||||
self.fields.insert(field.into(), kind);
|
||||
}
|
||||
|
||||
pub fn get(&self, field: &str) -> Option<&TypeKind> {
|
||||
self.fields.get(field)
|
||||
}
|
||||
}
|
||||
|
||||
impl TypeKind {
|
||||
|
|
@ -47,6 +81,38 @@ impl TypeKind {
|
|||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Container name used by the typed call-graph devirtualisation
|
||||
/// (`docs/typed-call-graph-prompt.md`, Phase 2).
|
||||
///
|
||||
/// Returns the class / impl / module string under which an SSA
|
||||
/// receiver value of this type would be looked up in
|
||||
/// [`crate::callgraph::ClassMethodIndex`]. Mirrors
|
||||
/// [`Self::label_prefix`] for the security-relevant abstract
|
||||
/// types (HttpClient → `"HttpClient"`, DatabaseConnection →
|
||||
/// `"DatabaseConnection"`, etc.) and additionally returns the DTO
|
||||
/// class name for [`TypeKind::Dto`] receivers.
|
||||
///
|
||||
/// Scalar / unknown types return `None` — they have no defining
|
||||
/// container and would not narrow a method-call edge meaningfully.
|
||||
pub fn container_name(&self) -> Option<String> {
|
||||
if let Some(prefix) = self.label_prefix() {
|
||||
return Some(prefix.to_string());
|
||||
}
|
||||
if let Self::Dto(d) = self {
|
||||
return Some(d.class_name.clone());
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Phase 6: convenience accessor for the inner `DtoFields` if this
|
||||
/// type is a recognised DTO.
|
||||
pub fn as_dto(&self) -> Option<&DtoFields> {
|
||||
match self {
|
||||
Self::Dto(d) => Some(d),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A type fact about an SSA value.
|
||||
|
|
@ -79,6 +145,13 @@ impl TypeFact {
|
|||
};
|
||||
TypeFact { kind, nullable }
|
||||
}
|
||||
|
||||
/// Phase 6: factory used by the field-access propagation rule.
|
||||
pub(crate) fn from_dto_field(receiver: &TypeKind, field: &str) -> Option<Self> {
|
||||
let dto = receiver.as_dto()?;
|
||||
let kind = dto.get(field)?.clone();
|
||||
Some(Self::from_kind(kind))
|
||||
}
|
||||
}
|
||||
|
||||
/// Result of type fact analysis.
|
||||
|
|
@ -107,32 +180,41 @@ impl TypeFactResult {
|
|||
}
|
||||
}
|
||||
|
||||
/// Check whether the given sink-operand SSA values are all int-typed for the
|
||||
/// sink's capability set. Returns `false` when `sink_caps` carries no
|
||||
/// type-suppressible bits, when `values` is empty, or when any value is not
|
||||
/// known to be `TypeKind::Int`. Shared by the SSA taint engine and the
|
||||
/// structural `cfg-unguarded-sink` analysis so both agree on when a sink's
|
||||
/// arguments are provably non-injectable.
|
||||
/// Check whether the given sink-operand SSA values are all type-safe for
|
||||
/// the sink's capability set. Returns `false` when `sink_caps` carries
|
||||
/// no type-suppressible bits, when `values` is empty, or when any value
|
||||
/// is not known to be a payload-incompatible scalar type. Shared by
|
||||
/// the SSA taint engine and the structural `cfg-unguarded-sink`
|
||||
/// analysis so both agree on when a sink's arguments are provably
|
||||
/// non-injectable.
|
||||
///
|
||||
/// Suppression policy:
|
||||
/// * [`TypeKind::Int`] (and float, treated as numeric): suppresses
|
||||
/// `SQL_QUERY`, `FILE_IO`, `SHELL_ESCAPE`, `HTML_ESCAPE`, `SSRF` —
|
||||
/// numeric values cannot carry the metacharacters required to drive
|
||||
/// any of these injection classes.
|
||||
/// * [`TypeKind::Bool`]: suppresses every type-suppressible bit —
|
||||
/// `true`/`false` cannot carry a payload of any kind.
|
||||
pub fn is_type_safe_for_sink(
|
||||
values: &[SsaValue],
|
||||
sink_caps: crate::labels::Cap,
|
||||
type_facts: &TypeFactResult,
|
||||
) -> bool {
|
||||
use crate::labels::Cap;
|
||||
// Int-typed values cannot carry injection payloads for these caps:
|
||||
// SQL_QUERY — digits can't form meta SQL tokens
|
||||
// FILE_IO — digits can't form path traversal sequences
|
||||
// SHELL_ESCAPE — digits can't form shell metacharacters
|
||||
// HTML_ESCAPE — digits can't form HTML metachars (<, >, ", ', &, /, :)
|
||||
// in either text or attribute context
|
||||
let type_suppressible = Cap::SQL_QUERY | Cap::FILE_IO | Cap::SHELL_ESCAPE | Cap::HTML_ESCAPE;
|
||||
let type_suppressible =
|
||||
Cap::SQL_QUERY | Cap::FILE_IO | Cap::SHELL_ESCAPE | Cap::HTML_ESCAPE | Cap::SSRF;
|
||||
if !sink_caps.intersects(type_suppressible) {
|
||||
return false;
|
||||
}
|
||||
if values.is_empty() {
|
||||
return false;
|
||||
}
|
||||
values.iter().all(|v| type_facts.is_int(*v))
|
||||
values.iter().all(|v| {
|
||||
let Some(kind) = type_facts.get_type(*v) else {
|
||||
return false;
|
||||
};
|
||||
matches!(kind, TypeKind::Int | TypeKind::Bool)
|
||||
})
|
||||
}
|
||||
|
||||
/// Infer a type from a constructor, factory, or allocator call.
|
||||
|
|
@ -393,6 +475,21 @@ pub fn analyze_types(
|
|||
cfg: &Cfg,
|
||||
consts: &HashMap<SsaValue, ConstLattice>,
|
||||
lang: Option<Lang>,
|
||||
) -> TypeFactResult {
|
||||
analyze_types_with_param_types(body, cfg, consts, lang, &[])
|
||||
}
|
||||
|
||||
/// Same as [`analyze_types`] but seeds [`SsaOp::Param`] values with
|
||||
/// per-position [`TypeKind`] facts from `param_types` (parallel-vec to
|
||||
/// the function's BodyMeta.params). An entry of `None` (or an out-of-
|
||||
/// range index) leaves the value at the default Param fact (Unknown),
|
||||
/// preserving the pre-Phase-3 behaviour.
|
||||
pub fn analyze_types_with_param_types(
|
||||
body: &SsaBody,
|
||||
cfg: &Cfg,
|
||||
consts: &HashMap<SsaValue, ConstLattice>,
|
||||
lang: Option<Lang>,
|
||||
param_types: &[Option<TypeKind>],
|
||||
) -> TypeFactResult {
|
||||
let mut facts: HashMap<SsaValue, TypeFact> = HashMap::new();
|
||||
|
||||
|
|
@ -424,7 +521,16 @@ pub fn analyze_types(
|
|||
}
|
||||
}
|
||||
SsaOp::Source => TypeFact::from_kind(TypeKind::String),
|
||||
SsaOp::Param { .. } => TypeFact::unknown(),
|
||||
SsaOp::Param { index } => {
|
||||
// Seed from the function's BodyMeta.param_types when
|
||||
// a TypeKind was recovered at CFG construction time.
|
||||
// Out-of-range / None entries fall back to Unknown,
|
||||
// matching the pre-Phase-3 behaviour.
|
||||
match param_types.get(*index).and_then(|t| t.clone()) {
|
||||
Some(tk) => TypeFact::from_kind(tk),
|
||||
None => TypeFact::unknown(),
|
||||
}
|
||||
}
|
||||
SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object),
|
||||
SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object),
|
||||
SsaOp::Call { callee, .. } => {
|
||||
|
|
@ -473,6 +579,14 @@ pub fn analyze_types(
|
|||
// Defer: will be filled in second pass
|
||||
TypeFact::unknown()
|
||||
}
|
||||
// FieldProj: when the projection carries an inferred type
|
||||
// (set during lowering or by future field-type analysis),
|
||||
// honour it; otherwise the field type is unknown until a
|
||||
// points-to / heap query resolves it.
|
||||
SsaOp::FieldProj { projected_type, .. } => match projected_type {
|
||||
Some(tk) => TypeFact::from_kind(tk.clone()),
|
||||
None => TypeFact::unknown(),
|
||||
},
|
||||
// Undef contributes no type information — phi joins
|
||||
// pick up the type from the other (defined) operand.
|
||||
SsaOp::Undef => TypeFact::unknown(),
|
||||
|
|
@ -530,6 +644,38 @@ pub fn analyze_types(
|
|||
}
|
||||
}
|
||||
|
||||
// Phase 6.3: FieldProj receiver-driven type narrowing. When
|
||||
// SSA lowering decomposed `a.b.c()` into a FieldProj chain,
|
||||
// intermediate FieldProj insts default to `projected_type =
|
||||
// None`. If the receiver value carries a Dto fact and the
|
||||
// projected field name is in its `fields` map, route the
|
||||
// FieldProj's type fact to the field's declared TypeKind.
|
||||
for inst in &block.body {
|
||||
let SsaOp::FieldProj {
|
||||
receiver,
|
||||
field,
|
||||
projected_type,
|
||||
} = &inst.op
|
||||
else {
|
||||
continue;
|
||||
};
|
||||
// If the lowering already pinned a type, keep it.
|
||||
if projected_type.is_some() {
|
||||
continue;
|
||||
}
|
||||
let Some(recv_fact) = facts.get(receiver).cloned() else {
|
||||
continue;
|
||||
};
|
||||
let field_name = body.field_name(*field).to_string();
|
||||
let Some(new_fact) = TypeFact::from_dto_field(&recv_fact.kind, &field_name) else {
|
||||
continue;
|
||||
};
|
||||
if facts.get(&inst.value) != Some(&new_fact) {
|
||||
facts.insert(inst.value, new_fact);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
|
||||
// Phi nodes
|
||||
for inst in &block.phis {
|
||||
if let SsaOp::Phi(operands) = &inst.op {
|
||||
|
|
@ -566,13 +712,29 @@ pub fn analyze_types(
|
|||
}
|
||||
if let SsaOp::Assign(uses) = &inst.op {
|
||||
if uses.len() == 1 {
|
||||
let src_fact = facts
|
||||
.get(&uses[0])
|
||||
.cloned()
|
||||
.unwrap_or_else(TypeFact::unknown);
|
||||
// Phase 6.3: when the RHS is a single member-access
|
||||
// expression and the receiver value carries a
|
||||
// `TypeKind::Dto(fields)` fact, route the assignment's
|
||||
// type to the field's declared `TypeKind`. Strictly
|
||||
// additive — falls through to copy-prop when the
|
||||
// receiver isn't a DTO or the field isn't recorded.
|
||||
let dto_field_fact = cfg
|
||||
.node_weight(inst.cfg_node)
|
||||
.and_then(|ni| ni.member_field.as_deref())
|
||||
.and_then(|field| {
|
||||
let recv_kind = facts.get(&uses[0])?.kind.clone();
|
||||
TypeFact::from_dto_field(&recv_kind, field)
|
||||
});
|
||||
let new_fact = match dto_field_fact {
|
||||
Some(f) => f,
|
||||
None => facts
|
||||
.get(&uses[0])
|
||||
.cloned()
|
||||
.unwrap_or_else(TypeFact::unknown),
|
||||
};
|
||||
let old = facts.get(&inst.value);
|
||||
if old != Some(&src_fact) {
|
||||
facts.insert(inst.value, src_fact);
|
||||
if old != Some(&new_fact) {
|
||||
facts.insert(inst.value, new_fact);
|
||||
changed = true;
|
||||
}
|
||||
} else if uses.len() == 2 {
|
||||
|
|
@ -840,6 +1002,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let consts = HashMap::from([
|
||||
|
|
@ -911,6 +1075,7 @@ mod tests {
|
|||
value: SsaValue(0),
|
||||
op: SsaOp::Call {
|
||||
callee: "URL".into(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
@ -922,6 +1087,7 @@ mod tests {
|
|||
value: SsaValue(1),
|
||||
op: SsaOp::Call {
|
||||
callee: "HttpClient.newHttpClient".into(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
@ -949,6 +1115,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let consts = HashMap::new();
|
||||
|
|
@ -979,6 +1147,291 @@ mod tests {
|
|||
assert_eq!(result.get_type(SsaValue(99)), None);
|
||||
}
|
||||
|
||||
/// Phase 4: Int-typed values must suppress every type-suppressible
|
||||
/// cap — including the freshly-added `SSRF` bit. Numeric IDs
|
||||
/// cannot rewrite a URL host, cannot form path traversal sequences,
|
||||
/// cannot carry SQL/HTML/shell metacharacters.
|
||||
#[test]
|
||||
fn int_suppresses_every_type_suppressible_cap() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
let result = TypeFactResult { facts };
|
||||
|
||||
for cap in [
|
||||
Cap::SQL_QUERY,
|
||||
Cap::FILE_IO,
|
||||
Cap::SHELL_ESCAPE,
|
||||
Cap::HTML_ESCAPE,
|
||||
Cap::SSRF,
|
||||
] {
|
||||
assert!(
|
||||
is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
|
||||
"Int must suppress {cap:?}",
|
||||
);
|
||||
}
|
||||
// Caps outside the type-suppressible set never qualify.
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::CODE_EXEC,
|
||||
&result
|
||||
));
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::DESERIALIZE,
|
||||
&result
|
||||
));
|
||||
}
|
||||
|
||||
/// Phase 4: Bool-typed values are even safer than ints — `true` /
|
||||
/// `false` cannot carry any payload and must suppress every
|
||||
/// type-suppressible cap.
|
||||
#[test]
|
||||
fn bool_suppresses_every_type_suppressible_cap() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Bool));
|
||||
let result = TypeFactResult { facts };
|
||||
|
||||
for cap in [
|
||||
Cap::SQL_QUERY,
|
||||
Cap::FILE_IO,
|
||||
Cap::SHELL_ESCAPE,
|
||||
Cap::HTML_ESCAPE,
|
||||
Cap::SSRF,
|
||||
] {
|
||||
assert!(
|
||||
is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
|
||||
"Bool must suppress {cap:?}",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// String-typed values must NOT trigger suppression — they are the
|
||||
/// canonical injection carrier. Regression guard so a future
|
||||
/// change to `is_type_safe_for_sink` does not silently silence
|
||||
/// real String-payload findings.
|
||||
#[test]
|
||||
fn string_does_not_trigger_sink_suppression() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::String));
|
||||
let result = TypeFactResult { facts };
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::SQL_QUERY,
|
||||
&result
|
||||
));
|
||||
assert!(!is_type_safe_for_sink(&[SsaValue(0)], Cap::SSRF, &result));
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::SHELL_ESCAPE,
|
||||
&result
|
||||
));
|
||||
}
|
||||
|
||||
/// Audit A3: The full `(TypeKind, Cap)` suppression matrix. Encoded
|
||||
/// as a single table-driven test so any future change to
|
||||
/// `is_type_safe_for_sink` requires an intentional matrix edit + a
|
||||
/// test update. Truth values:
|
||||
///
|
||||
/// | TypeKind | SQL | FILE | SHELL | HTML | SSRF | CODE_EXEC | DESERIALIZE |
|
||||
/// |-----------|-----|------|-------|------|------|-----------|-------------|
|
||||
/// | Int | Y | Y | Y | Y | Y | N | N |
|
||||
/// | Bool | Y | Y | Y | Y | Y | N | N |
|
||||
/// | String | N | N | N | N | N | N | N |
|
||||
/// | Url | N | N | N | N | N | N | N |
|
||||
/// | Object | N | N | N | N | N | N | N |
|
||||
/// | Unknown | N | N | N | N | N | N | N |
|
||||
#[test]
|
||||
fn type_kind_cap_suppression_matrix() {
|
||||
use crate::labels::Cap;
|
||||
let caps = [
|
||||
("SQL_QUERY", Cap::SQL_QUERY),
|
||||
("FILE_IO", Cap::FILE_IO),
|
||||
("SHELL_ESCAPE", Cap::SHELL_ESCAPE),
|
||||
("HTML_ESCAPE", Cap::HTML_ESCAPE),
|
||||
("SSRF", Cap::SSRF),
|
||||
("CODE_EXEC", Cap::CODE_EXEC),
|
||||
("DESERIALIZE", Cap::DESERIALIZE),
|
||||
];
|
||||
// (kind_name, kind, [suppress for each cap in `caps` order])
|
||||
let rows: &[(&str, TypeKind, [bool; 7])] = &[
|
||||
(
|
||||
"Int",
|
||||
TypeKind::Int,
|
||||
[true, true, true, true, true, false, false],
|
||||
),
|
||||
(
|
||||
"Bool",
|
||||
TypeKind::Bool,
|
||||
[true, true, true, true, true, false, false],
|
||||
),
|
||||
(
|
||||
"String",
|
||||
TypeKind::String,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
(
|
||||
"Url",
|
||||
TypeKind::Url,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
(
|
||||
"Object",
|
||||
TypeKind::Object,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
(
|
||||
"Unknown",
|
||||
TypeKind::Unknown,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
];
|
||||
for (kind_name, kind, expected) in rows {
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(kind.clone()));
|
||||
let result = TypeFactResult { facts };
|
||||
for (i, (cap_name, cap)) in caps.iter().enumerate() {
|
||||
let got = is_type_safe_for_sink(&[SsaValue(0)], *cap, &result);
|
||||
assert_eq!(
|
||||
got, expected[i],
|
||||
"matrix mismatch for ({kind_name}, {cap_name}): expected {}, got {got}",
|
||||
expected[i]
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Audit A3 (companion): empty `values` slice never suppresses,
|
||||
/// regardless of cap or per-value type facts.
|
||||
#[test]
|
||||
fn empty_values_never_suppress() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
let result = TypeFactResult { facts };
|
||||
for cap in [
|
||||
Cap::SQL_QUERY,
|
||||
Cap::FILE_IO,
|
||||
Cap::SHELL_ESCAPE,
|
||||
Cap::HTML_ESCAPE,
|
||||
Cap::SSRF,
|
||||
Cap::CODE_EXEC,
|
||||
Cap::DESERIALIZE,
|
||||
] {
|
||||
assert!(
|
||||
!is_type_safe_for_sink(&[], cap, &result),
|
||||
"empty values must never suppress {cap:?}",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Audit A3 (companion): a Cap with NO type-suppressible bits never
|
||||
/// suppresses, even when the value's type kind is otherwise
|
||||
/// suppression-eligible.
|
||||
#[test]
|
||||
fn caps_without_type_suppressible_bits_never_fire() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
let result = TypeFactResult { facts };
|
||||
for cap in [
|
||||
Cap::CODE_EXEC,
|
||||
Cap::DESERIALIZE,
|
||||
Cap::CRYPTO,
|
||||
Cap::URL_ENCODE,
|
||||
] {
|
||||
assert!(
|
||||
!is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
|
||||
"Int must NOT suppress non-type-suppressible {cap:?}",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Audit A3 (companion): mixed-type operand list — only one Int
|
||||
/// among operands of unknown type — must NOT suppress. The
|
||||
/// suppression rule requires every operand to be payload-incompatible.
|
||||
#[test]
|
||||
fn mixed_type_operands_do_not_suppress() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
facts.insert(SsaValue(1), TypeFact::from_kind(TypeKind::String));
|
||||
let result = TypeFactResult { facts };
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0), SsaValue(1)],
|
||||
Cap::SQL_QUERY,
|
||||
&result
|
||||
));
|
||||
}
|
||||
|
||||
/// Phase 3: Param values seeded from `param_types` must surface
|
||||
/// the right TypeKind for downstream sink suppression. An out-of-
|
||||
/// range index falls back to Unknown (the pre-Phase-3 default).
|
||||
#[test]
|
||||
fn param_types_seed_param_value_facts() {
|
||||
use crate::cfg::Cfg;
|
||||
let n0 = NodeIndex::new(0);
|
||||
let n1 = NodeIndex::new(1);
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Param { index: 0 },
|
||||
cfg_node: n0,
|
||||
var_name: Some("user_id".into()),
|
||||
span: (0, 7),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Param { index: 99 },
|
||||
cfg_node: n1,
|
||||
var_name: Some("oob".into()),
|
||||
span: (8, 11),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("user_id".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("oob".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let consts = HashMap::new();
|
||||
let cfg: Cfg = petgraph::Graph::new();
|
||||
let param_types = vec![Some(TypeKind::Int)];
|
||||
|
||||
let result =
|
||||
analyze_types_with_param_types(&body, &cfg, &consts, Some(Lang::Java), ¶m_types);
|
||||
assert_eq!(result.get_type(SsaValue(0)), Some(&TypeKind::Int));
|
||||
// Index 99 is out of range → falls back to Unknown.
|
||||
assert_eq!(result.get_type(SsaValue(1)), Some(&TypeKind::Unknown));
|
||||
|
||||
// Empty slice = pre-Phase-3 behaviour.
|
||||
let result2 = analyze_types(&body, &cfg, &consts, Some(Lang::Java));
|
||||
assert_eq!(result2.get_type(SsaValue(0)), Some(&TypeKind::Unknown));
|
||||
}
|
||||
|
||||
// ── TypeHierarchy::is_subtype_of ─────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
|
|
@ -1484,4 +1937,90 @@ mod tests {
|
|||
Some(TypeKind::HttpClient)
|
||||
);
|
||||
}
|
||||
|
||||
// ── Phase 6 DTO field-level taint ─────────────────────────────────────
|
||||
|
||||
/// Phase 6: `TypeFact::from_dto_field` returns `Some(field_kind)`
|
||||
/// for a DTO receiver whose `fields` map contains the requested
|
||||
/// field, and `None` otherwise.
|
||||
#[test]
|
||||
fn dto_field_lookup_returns_field_type_kind() {
|
||||
let mut dto = DtoFields::new("CreateUser");
|
||||
dto.insert("age", TypeKind::Int);
|
||||
dto.insert("email", TypeKind::String);
|
||||
let recv = TypeKind::Dto(dto);
|
||||
let age = TypeFact::from_dto_field(&recv, "age").expect("age field present");
|
||||
assert_eq!(age.kind, TypeKind::Int);
|
||||
let email = TypeFact::from_dto_field(&recv, "email").expect("email field present");
|
||||
assert_eq!(email.kind, TypeKind::String);
|
||||
assert!(TypeFact::from_dto_field(&recv, "missing").is_none());
|
||||
}
|
||||
|
||||
/// Phase 6: a non-DTO receiver kind never produces a field fact —
|
||||
/// `from_dto_field` falls through to the legacy copy-prop path.
|
||||
#[test]
|
||||
fn dto_field_lookup_on_non_dto_returns_none() {
|
||||
for k in [
|
||||
TypeKind::Int,
|
||||
TypeKind::String,
|
||||
TypeKind::Object,
|
||||
TypeKind::Unknown,
|
||||
TypeKind::HttpClient,
|
||||
] {
|
||||
assert!(
|
||||
TypeFact::from_dto_field(&k, "any_field").is_none(),
|
||||
"non-DTO {k:?} must not produce a field fact",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Phase 6: nested DTO — the parent DTO's field type is
|
||||
/// `TypeKind::Dto`, and `from_dto_field` returns that nested DTO
|
||||
/// fact directly. Phase 6.3 callers can recurse into the inner
|
||||
/// fields by following the returned receiver's `as_dto()` chain.
|
||||
#[test]
|
||||
fn dto_field_lookup_supports_nested_dto() {
|
||||
let mut inner = DtoFields::new("Address");
|
||||
inner.insert("zip", TypeKind::String);
|
||||
let mut outer = DtoFields::new("CreateUser");
|
||||
outer.insert("address", TypeKind::Dto(inner.clone()));
|
||||
outer.insert("age", TypeKind::Int);
|
||||
let recv = TypeKind::Dto(outer);
|
||||
let addr = TypeFact::from_dto_field(&recv, "address").expect("address present");
|
||||
assert_eq!(addr.kind, TypeKind::Dto(inner));
|
||||
}
|
||||
|
||||
/// Phase 6: an empty DTO (class declared but with no inferred
|
||||
/// fields) never resolves field reads. Documents the safe-fallback
|
||||
/// invariant so the legacy path runs when class fields couldn't be
|
||||
/// classified.
|
||||
#[test]
|
||||
fn empty_dto_never_resolves_fields() {
|
||||
let recv = TypeKind::Dto(DtoFields::new("EmptyDto"));
|
||||
assert!(TypeFact::from_dto_field(&recv, "anything").is_none());
|
||||
}
|
||||
|
||||
/// Phase 6: an `Int`-typed field in a DTO survives the
|
||||
/// type-suppression matrix exactly the same way a freestanding
|
||||
/// `Int` does — sanity-check the bridge between Phase 6 and Phase 4.
|
||||
#[test]
|
||||
fn dto_int_field_suppresses_sql_query_via_matrix() {
|
||||
use crate::labels::Cap;
|
||||
let mut dto = DtoFields::new("CreateUser");
|
||||
dto.insert("age", TypeKind::Int);
|
||||
let field = TypeFact::from_dto_field(&TypeKind::Dto(dto), "age").unwrap();
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), field);
|
||||
let result = TypeFactResult { facts };
|
||||
assert!(is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::SQL_QUERY,
|
||||
&result
|
||||
));
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::CODE_EXEC,
|
||||
&result
|
||||
));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -160,6 +160,24 @@ impl Lattice for AuthDomainState {
|
|||
|
||||
// ── ProductState ─────────────────────────────────────────────────────────
|
||||
|
||||
/// Per-chain-receiver proxy tracking entry.
|
||||
///
|
||||
/// The state machine carries this for every chained-receiver resource
|
||||
/// proxy call (`c.mu.Lock()`, `c.writer.header.set(...)`). Stored in
|
||||
/// [`ProductState::chain_proxies`] keyed by the joined chain text
|
||||
/// (e.g. `"c.mu"`, `"c.writer.header"`) so distinct field projections
|
||||
/// of the same chain root are tracked independently.
|
||||
///
|
||||
/// Chain-keyed proxy state is the Phase 3 replacement for the single-dot
|
||||
/// band-aid that conservatively dropped chain receivers entirely — chain
|
||||
/// receivers are now first-class, semantically distinct from their root.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct ChainProxyState {
|
||||
pub lifecycle: ResourceLifecycle,
|
||||
pub class_group: crate::cfg::BodyId,
|
||||
pub acquire_span: (usize, usize),
|
||||
}
|
||||
|
||||
/// Composable product of resource and auth domains.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub struct ProductState {
|
||||
|
|
@ -173,6 +191,20 @@ pub struct ProductState {
|
|||
/// Used by `extract_findings` to attribute leaks to the original resource
|
||||
/// operation (e.g., fs.openSync at line 7) rather than the proxy call.
|
||||
pub proxy_acquire_spans: HashMap<SymbolId, (usize, usize)>,
|
||||
/// Per-chain-receiver proxy tracking, keyed by joined chain text
|
||||
/// (`"c.mu"`, `"c.writer.header"`). Each chain receiver has its own
|
||||
/// lifecycle, class group, and acquire span — independent of both the
|
||||
/// chain root and any other chain. Phase 3 of the field-projections
|
||||
/// rollout introduces this map; consumers that previously used
|
||||
/// [`receiver_class_group`] for chain receivers (via the deleted
|
||||
/// single-dot band-aid) now route through here for 2+ dot callees.
|
||||
///
|
||||
/// Phase 3 ships chain_proxies in tracking-only mode: chain receivers
|
||||
/// that remain OPEN at exit are NOT promoted to leak findings (so the
|
||||
/// addition is strictly behaviour-preserving against the existing
|
||||
/// benchmark). Phase 4 / a follow-up adds chain-rooted leak findings
|
||||
/// once the receiver-class detection is broad enough to avoid new FPs.
|
||||
pub chain_proxies: HashMap<String, ChainProxyState>,
|
||||
}
|
||||
|
||||
impl ProductState {
|
||||
|
|
@ -182,6 +214,7 @@ impl ProductState {
|
|||
auth: AuthDomainState::new(),
|
||||
receiver_class_group: HashMap::new(),
|
||||
proxy_acquire_spans: HashMap::new(),
|
||||
chain_proxies: HashMap::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -193,6 +226,7 @@ impl Lattice for ProductState {
|
|||
auth: AuthDomainState::bot(),
|
||||
receiver_class_group: HashMap::new(),
|
||||
proxy_acquire_spans: HashMap::new(),
|
||||
chain_proxies: HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -202,11 +236,27 @@ impl Lattice for ProductState {
|
|||
class_group.extend(other.receiver_class_group.iter());
|
||||
let mut proxy_spans = self.proxy_acquire_spans.clone();
|
||||
proxy_spans.extend(other.proxy_acquire_spans.iter());
|
||||
// Chain proxies: union, with lifecycle joined per-key so an OPEN
|
||||
// entry on one path remains OPEN if joined with a missing entry
|
||||
// on another path (matches the existing receiver_class_group
|
||||
// semantics). Last-writer-wins for class_group / acquire_span:
|
||||
// both are stable per chain receiver in practice (a chain root +
|
||||
// field path is monomorphic), so the conflict cases collapse.
|
||||
let mut chain = self.chain_proxies.clone();
|
||||
for (key, other_state) in &other.chain_proxies {
|
||||
chain
|
||||
.entry(key.clone())
|
||||
.and_modify(|e| {
|
||||
e.lifecycle = e.lifecycle.join(&other_state.lifecycle);
|
||||
})
|
||||
.or_insert_with(|| other_state.clone());
|
||||
}
|
||||
Self {
|
||||
resource: self.resource.join(&other.resource),
|
||||
auth: self.auth.join(&other.auth),
|
||||
receiver_class_group: class_group,
|
||||
proxy_acquire_spans: proxy_spans,
|
||||
chain_proxies: chain,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -329,4 +379,185 @@ mod tests {
|
|||
let s2 = s;
|
||||
assert_eq!(s, s2);
|
||||
}
|
||||
|
||||
// ── Lattice law checks on the real domains ─────────────────────
|
||||
//
|
||||
// The trait-level `lattice.rs` tests use a synthetic `Three` lattice;
|
||||
// the laws also need to hold on the *actual* impls used by the
|
||||
// engine. A change to ResourceLifecycle's bitset semantics or to
|
||||
// AuthLevel's ordering could quietly break commutativity /
|
||||
// associativity / idempotence — these tests pin those properties.
|
||||
|
||||
#[test]
|
||||
fn resource_lifecycle_join_laws() {
|
||||
let vals = [
|
||||
ResourceLifecycle::empty(),
|
||||
ResourceLifecycle::UNINIT,
|
||||
ResourceLifecycle::OPEN,
|
||||
ResourceLifecycle::CLOSED,
|
||||
ResourceLifecycle::MOVED,
|
||||
ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED,
|
||||
ResourceLifecycle::all(),
|
||||
];
|
||||
for a in &vals {
|
||||
// Idempotence: a ⊔ a = a
|
||||
assert_eq!(a.join(a), *a, "idempotence broken on {a:?}");
|
||||
// Bot identity: a ⊔ ⊥ = a
|
||||
assert_eq!(a.join(&ResourceLifecycle::bot()), *a);
|
||||
for b in &vals {
|
||||
// Commutativity: a ⊔ b = b ⊔ a
|
||||
assert_eq!(a.join(b), b.join(a), "commutativity broken ({a:?},{b:?})");
|
||||
// leq consistent with join: a ⊑ b iff a ⊔ b = b
|
||||
let consistent = a.leq(b) == (a.join(b) == *b);
|
||||
assert!(consistent, "leq/join consistency broken ({a:?} ⊑ {b:?})");
|
||||
for c in &vals {
|
||||
// Associativity
|
||||
assert_eq!(
|
||||
a.join(b).join(c),
|
||||
a.join(&b.join(c)),
|
||||
"associativity broken ({a:?},{b:?},{c:?})"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `AuthLevel` satisfies idempotence, commutativity, and associativity
|
||||
/// of `join` (which is `min` of the privilege ordering). It does NOT
|
||||
/// satisfy the `Lattice` trait's bot-identity law — see the explicit
|
||||
/// `auth_level_bot_is_absorbing_not_identity` test below for a
|
||||
/// rationale and a regression guard.
|
||||
#[test]
|
||||
fn auth_level_join_associative_commutative_idempotent() {
|
||||
let vals = [AuthLevel::Unauthed, AuthLevel::Authed, AuthLevel::Admin];
|
||||
for a in &vals {
|
||||
assert_eq!(a.join(a), *a, "AuthLevel idempotence broken on {a:?}");
|
||||
for b in &vals {
|
||||
assert_eq!(
|
||||
a.join(b),
|
||||
b.join(a),
|
||||
"AuthLevel commutativity ({a:?},{b:?})"
|
||||
);
|
||||
for c in &vals {
|
||||
assert_eq!(
|
||||
a.join(b).join(c),
|
||||
a.join(&b.join(c)),
|
||||
"AuthLevel associativity ({a:?},{b:?},{c:?})"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// **Audit finding pinned as a regression guard.**
|
||||
///
|
||||
/// `AuthLevel` deliberately violates the `Lattice` trait's bot-identity
|
||||
/// law (`a ⊔ ⊥ = a`). The trait says `bot()` is the join identity, but:
|
||||
///
|
||||
/// * `bot()` returns `Unauthed`
|
||||
/// * `join` is `min` over the ordering `Unauthed < Authed < Admin`
|
||||
/// * therefore `Admin.join(Unauthed) == Unauthed`, not `Admin`
|
||||
///
|
||||
/// In other words, `Unauthed` is the *absorbing* element of the join,
|
||||
/// not the identity — the algebraic dual of what the trait expects.
|
||||
///
|
||||
/// This is intentional for security: if any incoming path is unauthed,
|
||||
/// the merged state must be unauthed (the conservative baseline). The
|
||||
/// trait contract violation matters only if the dataflow engine ever
|
||||
/// joins `bot()` with a non-bot reachable state from a different path
|
||||
/// (e.g. for an unreachable predecessor); in the current engine such
|
||||
/// nodes are skipped, so the violation is observably benign — but
|
||||
/// documenting it here prevents an accidental "fix" that flips
|
||||
/// `bot()` to `Admin` and silently elevates auth across all merges.
|
||||
#[test]
|
||||
fn auth_level_bot_is_absorbing_not_identity() {
|
||||
assert_eq!(AuthLevel::bot(), AuthLevel::Unauthed);
|
||||
// Absorbing: Admin ⊔ Unauthed = Unauthed (conservative).
|
||||
assert_eq!(
|
||||
AuthLevel::Admin.join(&AuthLevel::Unauthed),
|
||||
AuthLevel::Unauthed,
|
||||
"Unauthed must absorb Admin under min-join (conservative security)"
|
||||
);
|
||||
// NOT identity: Admin ⊔ bot ≠ Admin (would be the trait law).
|
||||
assert_ne!(
|
||||
AuthLevel::Admin.join(&AuthLevel::bot()),
|
||||
AuthLevel::Admin,
|
||||
"if this passes, AuthLevel::bot() was changed — re-audit security implications"
|
||||
);
|
||||
}
|
||||
|
||||
/// `leq` for AuthLevel is "at least as privileged": Admin ⊑ Authed ⊑
|
||||
/// Unauthed in the privilege ordering. The trait law `a.leq(b) iff
|
||||
/// a.join(b) == b` therefore must read `b absorbs a`, since join is
|
||||
/// min. Verify the consistency on every pair.
|
||||
#[test]
|
||||
fn auth_level_leq_consistent_with_join() {
|
||||
let vals = [AuthLevel::Unauthed, AuthLevel::Authed, AuthLevel::Admin];
|
||||
for a in &vals {
|
||||
for b in &vals {
|
||||
assert_eq!(
|
||||
a.leq(b),
|
||||
a.join(b) == *b,
|
||||
"leq/join inconsistent on ({a:?}, {b:?})"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `AuthDomainState::join` keeps a variable as `validated` only if
|
||||
/// it was validated on *every* incoming path. A variable validated
|
||||
/// on one branch but not the other must be dropped — otherwise an
|
||||
/// auth bypass on one path silently authorises sinks on the merge
|
||||
/// path.
|
||||
#[test]
|
||||
fn auth_domain_join_drops_partially_validated() {
|
||||
let sym_only_a = SymbolId(10);
|
||||
let sym_only_b = SymbolId(11);
|
||||
|
||||
let a = AuthDomainState {
|
||||
auth_level: AuthLevel::Authed,
|
||||
validated: [sym_only_a].into_iter().collect(),
|
||||
};
|
||||
let b = AuthDomainState {
|
||||
auth_level: AuthLevel::Authed,
|
||||
validated: [sym_only_b].into_iter().collect(),
|
||||
};
|
||||
let j = a.join(&b);
|
||||
assert!(
|
||||
j.validated.is_empty(),
|
||||
"validated set must drop vars not validated on all paths"
|
||||
);
|
||||
}
|
||||
|
||||
/// ProductState join must combine resource OPEN | CLOSED across
|
||||
/// branches (may-leak), keep min-auth, and union the proxy maps.
|
||||
/// This exercises the non-trivial join (the existing test only
|
||||
/// joins two identical initial states).
|
||||
#[test]
|
||||
fn product_state_join_non_trivial() {
|
||||
let sym_x = SymbolId(20);
|
||||
let sym_y = SymbolId(21);
|
||||
|
||||
let mut a = ProductState::initial();
|
||||
a.resource.set(sym_x, ResourceLifecycle::OPEN);
|
||||
a.auth.auth_level = AuthLevel::Admin;
|
||||
a.auth.validated.insert(sym_y);
|
||||
|
||||
let mut b = ProductState::initial();
|
||||
b.resource.set(sym_x, ResourceLifecycle::CLOSED);
|
||||
b.auth.auth_level = AuthLevel::Authed;
|
||||
b.auth.validated.insert(sym_y);
|
||||
|
||||
let j = a.join(&b);
|
||||
assert_eq!(
|
||||
j.resource.get(sym_x),
|
||||
ResourceLifecycle::OPEN | ResourceLifecycle::CLOSED,
|
||||
"may-leak: OPEN on one path, CLOSED on the other"
|
||||
);
|
||||
assert_eq!(j.auth.auth_level, AuthLevel::Authed, "join takes min auth");
|
||||
assert!(
|
||||
j.auth.validated.contains(&sym_y),
|
||||
"var validated on both paths must survive"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -253,6 +253,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
|
@ -323,6 +324,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
|
@ -508,4 +510,168 @@ mod tests {
|
|||
assert_eq!(wl.len(), 3);
|
||||
assert_eq!(in_wl.len(), 3);
|
||||
}
|
||||
|
||||
// ── CFG-shape robustness ─────────────────────────────────────────────
|
||||
//
|
||||
// The audit flagged that `run_forward` had only linear/diamond test
|
||||
// shapes. These tests exercise edge cases that can trip up the
|
||||
// worklist algorithm: nodes the entry can't reach, a CFG with only
|
||||
// an entry node, irreducible flow with multiple paths into the
|
||||
// same loop body, and a self-loop. Each must terminate without
|
||||
// panicking and produce a sensible converged state.
|
||||
|
||||
/// A node disconnected from the entry must NOT receive any state
|
||||
/// (it's unreachable). The engine processes only nodes reachable
|
||||
/// from the worklist seed; a quiescent unreachable node should
|
||||
/// stay absent from the result map.
|
||||
#[test]
|
||||
fn unreachable_nodes_get_no_state() {
|
||||
use crate::state::domain::ProductState;
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let reachable = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
// Unreachable island: no edge from entry leads here.
|
||||
let orphan = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let orphan_exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, reachable, EdgeKind::Seq);
|
||||
cfg.add_edge(reachable, exit, EdgeKind::Seq);
|
||||
cfg.add_edge(orphan, orphan_exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
assert!(result.converged);
|
||||
assert!(
|
||||
result.states.contains_key(&entry),
|
||||
"entry must have a state"
|
||||
);
|
||||
assert!(
|
||||
result.states.contains_key(&reachable),
|
||||
"reachable node must have a state"
|
||||
);
|
||||
assert!(
|
||||
!result.states.contains_key(&orphan),
|
||||
"orphan island must NOT receive any state"
|
||||
);
|
||||
assert!(
|
||||
!result.states.contains_key(&orphan_exit),
|
||||
"orphan exit must NOT receive any state"
|
||||
);
|
||||
}
|
||||
|
||||
/// A single-node graph (entry only, no edges) is the minimal case.
|
||||
/// The engine must terminate immediately, mark converged, and leave
|
||||
/// the entry's initial state untouched.
|
||||
#[test]
|
||||
fn single_node_graph_terminates_immediately() {
|
||||
use crate::state::domain::ProductState;
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
assert!(result.converged);
|
||||
assert!(
|
||||
result.states.contains_key(&entry),
|
||||
"single-node graph still seeds the entry state"
|
||||
);
|
||||
}
|
||||
|
||||
/// Self-loop on a single node: `entry → A → A → … → exit`. The
|
||||
/// worklist must not livelock — once A's state is stable, the
|
||||
/// back-edge stops re-enqueueing it.
|
||||
#[test]
|
||||
fn self_loop_terminates() {
|
||||
use crate::state::domain::ProductState;
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let a = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, a, EdgeKind::Seq);
|
||||
cfg.add_edge(a, a, EdgeKind::Back); // self-loop
|
||||
cfg.add_edge(a, exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
assert!(result.converged, "self-loop must converge");
|
||||
assert!(result.states.contains_key(&exit));
|
||||
}
|
||||
|
||||
/// Irreducible CFG: two distinct paths from entry both enter the
|
||||
/// same loop body, so the loop has multiple "entry points". This
|
||||
/// is the classic shape that breaks structured-loop assumptions
|
||||
/// (e.g., "every loop has a unique header"). The forward worklist
|
||||
/// must still terminate.
|
||||
///
|
||||
/// Shape:
|
||||
/// entry → a ─┐
|
||||
/// ├→ loop_body ─→ exit
|
||||
/// entry → b ─┘ ↑
|
||||
/// └─ back-edge from loop_body to itself
|
||||
#[test]
|
||||
fn irreducible_cfg_terminates() {
|
||||
use crate::state::domain::ProductState;
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let a = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let b = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let loop_body = cfg.add_node(make_node(StmtKind::Loop));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, a, EdgeKind::Seq);
|
||||
cfg.add_edge(entry, b, EdgeKind::Seq);
|
||||
cfg.add_edge(a, loop_body, EdgeKind::Seq);
|
||||
cfg.add_edge(b, loop_body, EdgeKind::Seq);
|
||||
cfg.add_edge(loop_body, loop_body, EdgeKind::Back);
|
||||
cfg.add_edge(loop_body, exit, EdgeKind::Seq);
|
||||
|
||||
let interner = SymbolInterner::from_cfg(&cfg);
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::C,
|
||||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
assert!(
|
||||
result.converged,
|
||||
"irreducible CFG must still converge under run_forward"
|
||||
);
|
||||
// Every reachable node must have a state.
|
||||
for n in [entry, a, b, loop_body, exit] {
|
||||
assert!(result.states.contains_key(&n), "node {n:?} must be visited");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -292,46 +292,58 @@ pub fn extract_findings(
|
|||
// CLOSED at function exit (no OPEN paths), check whether there are
|
||||
// intervening calls between the proxy acquire and release nodes that
|
||||
// could throw and bypass the release. If so, emit a possible leak.
|
||||
for (idx, info) in cfg.node_references() {
|
||||
if !is_terminal_function_exit(idx, info, cfg) {
|
||||
continue;
|
||||
}
|
||||
let Some(state) = result.states.get(&idx) else {
|
||||
continue;
|
||||
};
|
||||
for (&sym, &lifecycle) in &state.resource.vars {
|
||||
// Only for proxy-acquired resources that are fully CLOSED at exit
|
||||
if !state.proxy_acquire_spans.contains_key(&sym) {
|
||||
//
|
||||
// **Language gate**: this heuristic is JS/TS-specific. Other
|
||||
// languages (Go, Java, C, C++, Python, Rust, Ruby, PHP) use
|
||||
// explicit error returns / try-catch with deterministic control
|
||||
// flow — an intervening call does NOT silently bypass a release.
|
||||
// Firing this on Go gave the gin/context.go FP where any method
|
||||
// calling another method (`c.Set`, `c.Get`) was flagged as a
|
||||
// possible leak on the receiver. Skip the section but continue
|
||||
// to section 3 (auth-required sinks) which is independent of the
|
||||
// resource state machine.
|
||||
if matches!(lang, Lang::JavaScript | Lang::TypeScript) {
|
||||
for (idx, info) in cfg.node_references() {
|
||||
if !is_terminal_function_exit(idx, info, cfg) {
|
||||
continue;
|
||||
}
|
||||
if lifecycle.contains(ResourceLifecycle::OPEN) {
|
||||
continue; // Already handled by the normal leak detection above
|
||||
}
|
||||
if !lifecycle.contains(ResourceLifecycle::CLOSED) {
|
||||
let Some(state) = result.states.get(&idx) else {
|
||||
continue;
|
||||
}
|
||||
// Check if there are intervening Call nodes between acquire and release
|
||||
// in the CFG (these could throw and bypass the release)
|
||||
let has_intervening_calls = cfg.node_references().any(|(_, ni)| {
|
||||
ni.kind == StmtKind::Call
|
||||
&& ni.ast.enclosing_func == info.ast.enclosing_func
|
||||
&& ni.call.callee.is_some()
|
||||
// Not the acquire or release proxy itself
|
||||
&& !state.proxy_acquire_spans.values().any(|s| *s == ni.ast.span)
|
||||
});
|
||||
if has_intervening_calls {
|
||||
let var_name = interner.resolve(sym);
|
||||
let acquire_span = state.proxy_acquire_spans.get(&sym).copied();
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-resource-leak-possible".into(),
|
||||
severity: Severity::Low,
|
||||
span: acquire_span.unwrap_or(info.ast.span),
|
||||
message: format!("resource `{var_name}` may not be closed on all paths"),
|
||||
machine: "resource",
|
||||
subject: Some(var_name.to_string()),
|
||||
from_state: "open",
|
||||
to_state: "possibly_leaked",
|
||||
};
|
||||
for (&sym, &lifecycle) in &state.resource.vars {
|
||||
// Only for proxy-acquired resources that are fully CLOSED at exit
|
||||
if !state.proxy_acquire_spans.contains_key(&sym) {
|
||||
continue;
|
||||
}
|
||||
if lifecycle.contains(ResourceLifecycle::OPEN) {
|
||||
continue; // Already handled by the normal leak detection above
|
||||
}
|
||||
if !lifecycle.contains(ResourceLifecycle::CLOSED) {
|
||||
continue;
|
||||
}
|
||||
// Check if there are intervening Call nodes between acquire and release
|
||||
// in the CFG (these could throw and bypass the release)
|
||||
let has_intervening_calls = cfg.node_references().any(|(_, ni)| {
|
||||
ni.kind == StmtKind::Call
|
||||
&& ni.ast.enclosing_func == info.ast.enclosing_func
|
||||
&& ni.call.callee.is_some()
|
||||
// Not the acquire or release proxy itself
|
||||
&& !state.proxy_acquire_spans.values().any(|s| *s == ni.ast.span)
|
||||
});
|
||||
if has_intervening_calls {
|
||||
let var_name = interner.resolve(sym);
|
||||
let acquire_span = state.proxy_acquire_spans.get(&sym).copied();
|
||||
findings.push(StateFinding {
|
||||
rule_id: "state-resource-leak-possible".into(),
|
||||
severity: Severity::Low,
|
||||
span: acquire_span.unwrap_or(info.ast.span),
|
||||
message: format!("resource `{var_name}` may not be closed on all paths"),
|
||||
machine: "resource",
|
||||
subject: Some(var_name.to_string()),
|
||||
from_state: "open",
|
||||
to_state: "possibly_leaked",
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -533,6 +545,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = engine::run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
|
@ -592,6 +605,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = engine::run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
|
@ -725,6 +739,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = engine::run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
|
@ -789,6 +804,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let result = engine::run_forward(&cfg, entry, &transfer, ProductState::initial());
|
||||
|
|
|
|||
|
|
@ -69,6 +69,12 @@ pub fn run_state_analysis(
|
|||
resource_method_summaries: &[transfer::ResourceMethodSummary],
|
||||
auth_decorators: &[String],
|
||||
path_safe_suppressed_sink_spans: &std::collections::HashSet<(usize, usize)>,
|
||||
// Optional `var_name → PtrProxyHint` map derived from the body's
|
||||
// PointsToFacts. When present, the proxy-acquire transfer suppresses
|
||||
// SymbolId attribution on field-aliased receivers (`m := c.mu;
|
||||
// m.Lock()`) and routes them through `chain_proxies` instead. Pass
|
||||
// `None` to disable — strict-additive.
|
||||
ptr_proxy_hints: Option<&std::collections::HashMap<String, crate::pointer::PtrProxyHint>>,
|
||||
) -> Vec<StateFinding> {
|
||||
let _span = tracing::debug_span!("run_state_analysis").entered();
|
||||
|
||||
|
|
@ -88,6 +94,7 @@ pub fn run_state_analysis(
|
|||
resource_pairs,
|
||||
interner: &interner,
|
||||
resource_method_summaries,
|
||||
ptr_proxy_hints,
|
||||
};
|
||||
|
||||
// Seed initial auth level from decorator-based authorization markers.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
#![allow(clippy::collapsible_if)]
|
||||
|
||||
use super::domain::{AuthLevel, ProductState, ResourceLifecycle};
|
||||
use super::domain::{AuthLevel, ChainProxyState, ProductState, ResourceLifecycle};
|
||||
use super::engine::Transfer;
|
||||
use super::symbol::{SymbolId, SymbolInterner};
|
||||
use crate::cfg::{EdgeKind, NodeInfo, StmtKind};
|
||||
|
|
@ -8,6 +8,47 @@ use crate::cfg_analysis::rules::{self, ResourcePair};
|
|||
use crate::symbol::Lang;
|
||||
use petgraph::graph::NodeIndex;
|
||||
|
||||
/// Decompose a textual callee like `"c.mu.Lock"` into
|
||||
/// `(chain_receiver_text, method_suffix)`. Returns `None` when the
|
||||
/// callee isn't a clean dotted member chain (parens, brackets, `::`,
|
||||
/// arrow operators, whitespace, or other complex tokens disqualify it).
|
||||
///
|
||||
/// Phase 3 of the field-projections rollout: this is the textual mirror
|
||||
/// of `try_lower_field_proj_chain` in `src/ssa/lower.rs`. The state
|
||||
/// engine doesn't yet read SSA bodies (would require threading SSA
|
||||
/// through the lattice run), so the same parse rules are duplicated
|
||||
/// here. Both helpers share the contract: a success here implies a
|
||||
/// FieldProj chain at SSA level (or a direct receiver for the 1-dot
|
||||
/// case).
|
||||
///
|
||||
/// **Returns** `Some(("c", "Close"))` for `"c.Close"` (1 dot — the
|
||||
/// receiver is a bare ident); `Some(("c.mu", "Lock"))` for
|
||||
/// `"c.mu.Lock"` (2 dots — receiver is a 1-element chain);
|
||||
/// `Some(("c.writer.header", "set"))` for `"c.writer.header.set"`
|
||||
/// (3 dots — receiver is a 2-element chain). Returns `None` for any
|
||||
/// callee shape we can't safely decompose textually.
|
||||
fn try_chain_decompose(callee: &str) -> Option<(&str, &str)> {
|
||||
for ch in callee.chars() {
|
||||
match ch {
|
||||
'(' | ')' | '[' | ']' | '<' | '>' | '?' | '*' | '&' | ':' | ' ' | '\t' | '\n' | '-'
|
||||
| '!' | ',' | ';' | '"' | '\'' | '\\' => return None,
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
let last_dot = callee.rfind('.')?;
|
||||
let receiver_text = &callee[..last_dot];
|
||||
let method_suffix = &callee[last_dot + 1..];
|
||||
if receiver_text.is_empty() || method_suffix.is_empty() {
|
||||
return None;
|
||||
}
|
||||
// Reject if any segment in the receiver is empty (leading dot,
|
||||
// double dots) — same discipline as the SSA-side helper.
|
||||
if receiver_text.split('.').any(str::is_empty) {
|
||||
return None;
|
||||
}
|
||||
Some((receiver_text, method_suffix))
|
||||
}
|
||||
|
||||
/// Events emitted during transfer for illegal state transitions.
|
||||
/// These are NOT lattice values — they become findings in `facts.rs`.
|
||||
#[derive(Debug, Clone)]
|
||||
|
|
@ -130,6 +171,20 @@ pub struct DefaultTransfer<'a> {
|
|||
pub interner: &'a SymbolInterner,
|
||||
/// Resource method summaries for cross-body proxy resolution.
|
||||
pub resource_method_summaries: &'a [ResourceMethodSummary],
|
||||
/// Optional per-body field-only points-to hints — names that resolve
|
||||
/// to a value whose entire abstract heap identity is one or more
|
||||
/// [`crate::pointer::AbsLoc::Field`] locations (e.g. `m := c.mu`).
|
||||
///
|
||||
/// Populated only when `NYX_POINTER_ANALYSIS=1` is set and the
|
||||
/// state-analysis caller built the body's
|
||||
/// [`crate::pointer::PointsToFacts`]. When present, the proxy-acquire
|
||||
/// logic routes single-dot calls on field-aliased receivers
|
||||
/// (e.g. `m.Lock()` after `m := c.mu`) into `chain_proxies` instead
|
||||
/// of marking the local with a `SymbolId` that would later be flagged
|
||||
/// as a leak. Strict-additive: when `None`, behaviour matches the
|
||||
/// pointer-unaware fallback exactly.
|
||||
pub ptr_proxy_hints:
|
||||
Option<&'a std::collections::HashMap<String, crate::pointer::PtrProxyHint>>,
|
||||
}
|
||||
|
||||
impl Transfer<ProductState> for DefaultTransfer<'_> {
|
||||
|
|
@ -170,6 +225,77 @@ impl DefaultTransfer<'_> {
|
|||
.get_scoped(info.ast.enclosing_func.as_deref(), name)
|
||||
}
|
||||
|
||||
/// Pointer-Phase 2 hook. Returns `true` when the call has been
|
||||
/// fully handled as a field-aliased receiver proxy and the rest of
|
||||
/// `apply_call` should bail.
|
||||
///
|
||||
/// Activates only on single-dot calls (`<recv>.<method>`) whose
|
||||
/// receiver name is recorded with [`crate::pointer::PtrProxyHint::FieldOnly`]
|
||||
/// in the per-body hint map AND for which a matching
|
||||
/// [`ResourceMethodSummary`] exists. The acquire/release effect
|
||||
/// is recorded against `state.chain_proxies` keyed by the receiver
|
||||
/// name — chain_proxies is a tracking-only lattice today, so leak
|
||||
/// detection (which only inspects `state.resource`) is suppressed
|
||||
/// for the alias. Strict-additive: when no hint map is supplied,
|
||||
/// when the receiver isn't `FieldOnly`, or when no method summary
|
||||
/// matches, the function returns `false` and the legacy branches
|
||||
/// run unchanged.
|
||||
fn try_apply_field_alias_proxy(
|
||||
&self,
|
||||
info: &NodeInfo,
|
||||
callee: &str,
|
||||
state: &mut ProductState,
|
||||
) -> bool {
|
||||
let Some(hints) = self.ptr_proxy_hints else {
|
||||
return false;
|
||||
};
|
||||
// Only single-dot callees: `m.Lock`, not `c.mu.Lock` (which the
|
||||
// chain-receiver block already handles textually) and not zero-
|
||||
// dot (no receiver to alias).
|
||||
let Some((receiver_text, method_suffix)) = try_chain_decompose(callee) else {
|
||||
return false;
|
||||
};
|
||||
if receiver_text.contains('.') {
|
||||
return false;
|
||||
}
|
||||
let recv_name: &str = match info.call.receiver.as_deref() {
|
||||
Some(r) if !r.contains('.') && !r.contains('(') => r,
|
||||
_ => receiver_text,
|
||||
};
|
||||
if hints.get(recv_name).copied() != Some(crate::pointer::PtrProxyHint::FieldOnly) {
|
||||
return false;
|
||||
}
|
||||
let mut handled = false;
|
||||
for summary in self.resource_method_summaries {
|
||||
if !summary.method_name.eq_ignore_ascii_case(method_suffix) {
|
||||
continue;
|
||||
}
|
||||
handled = true;
|
||||
match summary.effect {
|
||||
ResourceEffect::Acquire => {
|
||||
state.chain_proxies.insert(
|
||||
recv_name.to_string(),
|
||||
ChainProxyState {
|
||||
lifecycle: ResourceLifecycle::OPEN,
|
||||
class_group: summary.class_group,
|
||||
acquire_span: summary.original_span,
|
||||
},
|
||||
);
|
||||
}
|
||||
ResourceEffect::Release => {
|
||||
if let Some(entry) = state.chain_proxies.get_mut(recv_name) {
|
||||
if entry.class_group == summary.class_group
|
||||
&& entry.lifecycle.contains(ResourceLifecycle::OPEN)
|
||||
{
|
||||
entry.lifecycle = ResourceLifecycle::CLOSED;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
handled
|
||||
}
|
||||
|
||||
fn apply_call(
|
||||
&self,
|
||||
node_idx: NodeIndex,
|
||||
|
|
@ -182,6 +308,23 @@ impl DefaultTransfer<'_> {
|
|||
None => return,
|
||||
};
|
||||
|
||||
// ── Pointer-Phase 2: field-aliased receiver fast-path ───────────
|
||||
// When the receiver name resolves through points-to to a value
|
||||
// whose abstract heap identity is purely `Field(_, _)` (e.g.
|
||||
// `m := c.mu` followed by `m.Lock()`), the receiver is a
|
||||
// sub-object alias rather than a standalone resource handle.
|
||||
// Routing the entire call into `chain_proxies` here — *before*
|
||||
// the SymbolId-based direct-acquire/release/proxy branches —
|
||||
// suppresses the FP class where the local `m` would otherwise
|
||||
// be flagged as a leakable resource at function exit.
|
||||
//
|
||||
// Strict-additive: when `ptr_proxy_hints` is `None` or the
|
||||
// receiver name is absent from the map, this returns early and
|
||||
// the legacy branches run unchanged.
|
||||
if self.try_apply_field_alias_proxy(info, &callee, state) {
|
||||
return;
|
||||
}
|
||||
|
||||
// ── Resource acquire ─────────────────────────────────────────────
|
||||
let mut direct_acquire = false;
|
||||
for pair in self.resource_pairs {
|
||||
|
|
@ -240,47 +383,92 @@ impl DefaultTransfer<'_> {
|
|||
|
||||
// ── Resource method proxy ────────────────────────────────────────
|
||||
// When no direct resource pair matched, check if the callee is a
|
||||
// method wrapper for a known resource operation. Only fires when:
|
||||
// 1. The callee is a method call (contains `.`)
|
||||
// 2. An explicit receiver is identified
|
||||
// 3. The method suffix matches a ResourceMethodSummary
|
||||
// 4. For Release: the receiver was previously acquired by the same class group
|
||||
if !direct_acquire && !direct_release && callee.contains('.') {
|
||||
// Extract receiver: prefer explicit NodeInfo.call.receiver, fall back
|
||||
// to everything before the last `.` in the callee string.
|
||||
let recv_from_callee: Option<String>;
|
||||
let recv_name: Option<&str> = if let Some(ref r) = info.call.receiver {
|
||||
Some(r.as_str())
|
||||
} else {
|
||||
recv_from_callee = callee.rsplit_once('.').map(|(prefix, _)| {
|
||||
// For multi-segment paths like "a.b.c", use the root receiver
|
||||
prefix.split('.').next().unwrap_or(prefix).to_string()
|
||||
});
|
||||
recv_from_callee.as_deref()
|
||||
};
|
||||
if let Some(recv) = recv_name {
|
||||
let method_suffix = callee.rsplit('.').next().unwrap_or("");
|
||||
// method wrapper for a known resource operation.
|
||||
//
|
||||
// Phase 3 (field-projections rollout, 2026-04-25): the previous
|
||||
// single-dot band-aid (`callee.matches('.').count() == 1 &&
|
||||
// !callee.contains('(')`) silently dropped chained receivers
|
||||
// because the original textual extractor took the chain root as
|
||||
// receiver — collapsing `c.writer.header().set` to `c` and
|
||||
// marking `c` as proxy-acquired (the gin/context.go FP class).
|
||||
//
|
||||
// The band-aid is now deleted. Chained-receiver method calls
|
||||
// are routed to a *separate* state map (`chain_proxies`) keyed by
|
||||
// the joined receiver chain text — so `c.mu.Lock()` acquires
|
||||
// `c.mu` (a chain-receiver entity), not `c`. The chain receiver
|
||||
// is independent of the chain root: leaks/double-closes are
|
||||
// tracked per chain, never propagated up to the root.
|
||||
//
|
||||
// The single-dot case (`<recv>.<method>`) keeps the original
|
||||
// SymbolId-based path so existing fixtures' lifecycle tracking,
|
||||
// leak detection, and finding attribution stay bit-for-bit
|
||||
// identical.
|
||||
// Chain-receiver proxy path runs independently of the direct
|
||||
// acquire/release flags: it touches a *separate* state map
|
||||
// (`chain_proxies`) that doesn't overlap with the SymbolId-based
|
||||
// `state.resource` / `receiver_class_group` lattice. This is
|
||||
// important for callees like `c.mu.Unlock()` where the textual
|
||||
// direct-release matcher (`.Unlock`) fires (sets `direct_release`
|
||||
// even without a SymbolId state change), but the chain receiver
|
||||
// (`c.mu`) is still the semantically meaningful target.
|
||||
if let Some((receiver_text, method_suffix)) = try_chain_decompose(&callee) {
|
||||
let receiver_is_chain = receiver_text.contains('.');
|
||||
if receiver_is_chain {
|
||||
for summary in self.resource_method_summaries {
|
||||
if summary.method_name.eq_ignore_ascii_case(method_suffix) {
|
||||
if let Some(sym) = self.get_sym(info, recv) {
|
||||
match summary.effect {
|
||||
ResourceEffect::Acquire => {
|
||||
state.resource.set(sym, ResourceLifecycle::OPEN);
|
||||
// Track class group for release matching
|
||||
state.receiver_class_group.insert(sym, summary.class_group);
|
||||
// Store original acquire span for finding attribution
|
||||
state.proxy_acquire_spans.insert(sym, summary.original_span);
|
||||
if !summary.method_name.eq_ignore_ascii_case(method_suffix) {
|
||||
continue;
|
||||
}
|
||||
match summary.effect {
|
||||
ResourceEffect::Acquire => {
|
||||
state.chain_proxies.insert(
|
||||
receiver_text.to_string(),
|
||||
ChainProxyState {
|
||||
lifecycle: ResourceLifecycle::OPEN,
|
||||
class_group: summary.class_group,
|
||||
acquire_span: summary.original_span,
|
||||
},
|
||||
);
|
||||
}
|
||||
ResourceEffect::Release => {
|
||||
if let Some(entry) = state.chain_proxies.get_mut(receiver_text) {
|
||||
if entry.class_group == summary.class_group
|
||||
&& entry.lifecycle.contains(ResourceLifecycle::OPEN)
|
||||
{
|
||||
entry.lifecycle = ResourceLifecycle::CLOSED;
|
||||
}
|
||||
ResourceEffect::Release => {
|
||||
// Only release if receiver was acquired by same class group
|
||||
if state.receiver_class_group.get(&sym)
|
||||
== Some(&summary.class_group)
|
||||
{
|
||||
let current = state.resource.get(sym);
|
||||
if current.contains(ResourceLifecycle::OPEN) {
|
||||
state.resource.set(sym, ResourceLifecycle::CLOSED);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if !direct_acquire && !direct_release {
|
||||
// Single-dot receiver (`<recv>.<method>`): existing
|
||||
// SymbolId-based path. Gated on direct_acquire/release
|
||||
// because it shares state with the direct paths above —
|
||||
// running both would double-transition. Honour the
|
||||
// explicit `info.call.receiver` when it's the same bare
|
||||
// ident, otherwise fall back to the parsed receiver text.
|
||||
let recv_name: &str = match info.call.receiver.as_deref() {
|
||||
Some(r) if !r.contains('.') && !r.contains('(') => r,
|
||||
_ => receiver_text,
|
||||
};
|
||||
for summary in self.resource_method_summaries {
|
||||
if !summary.method_name.eq_ignore_ascii_case(method_suffix) {
|
||||
continue;
|
||||
}
|
||||
let Some(sym) = self.get_sym(info, recv_name) else {
|
||||
continue;
|
||||
};
|
||||
match summary.effect {
|
||||
ResourceEffect::Acquire => {
|
||||
state.resource.set(sym, ResourceLifecycle::OPEN);
|
||||
state.receiver_class_group.insert(sym, summary.class_group);
|
||||
state.proxy_acquire_spans.insert(sym, summary.original_span);
|
||||
}
|
||||
ResourceEffect::Release => {
|
||||
if state.receiver_class_group.get(&sym) == Some(&summary.class_group) {
|
||||
let current = state.resource.get(sym);
|
||||
if current.contains(ResourceLifecycle::OPEN) {
|
||||
state.resource.set(sym, ResourceLifecycle::CLOSED);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -658,6 +846,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let info = NodeInfo {
|
||||
|
|
@ -693,6 +882,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
|
|
@ -730,6 +920,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
|
|
@ -768,6 +959,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
|
|
@ -840,6 +1032,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
|
|
@ -894,6 +1087,7 @@ mod tests {
|
|||
resource_pairs: rules::resource_pairs(Lang::C),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &[],
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let mut state = ProductState::initial();
|
||||
|
|
@ -1156,4 +1350,638 @@ mod tests {
|
|||
"is_authenticated"
|
||||
));
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Phase 3: chain-receiver decomposition + chain_proxies tracking
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
//
|
||||
// These tests pin the contract that:
|
||||
// 1. `try_chain_decompose` parses dotted callees into receiver +
|
||||
// method, bailing on complex tokens.
|
||||
// 2. The proxy-method routing in `apply_call` records chained
|
||||
// receivers in `state.chain_proxies` (keyed by joined chain
|
||||
// text) — independent from the chain root's `SymbolId`-based
|
||||
// `state.receiver_class_group` entries.
|
||||
// 3. Single-dot callees still flow through the existing SymbolId
|
||||
// path (regression guard).
|
||||
// 4. The deleted single-dot band-aid no longer suppresses chain
|
||||
// cases — `c.mu.Lock()` now fires the chain-proxies path
|
||||
// instead of being silently dropped.
|
||||
|
||||
#[test]
|
||||
fn try_chain_decompose_basic_two_dots() {
|
||||
// `c.mu.Lock` → receiver "c.mu", method "Lock". The receiver
|
||||
// is a 1-element chain (one FieldProj at the SSA level).
|
||||
let (recv, method) = try_chain_decompose("c.mu.Lock").unwrap();
|
||||
assert_eq!(recv, "c.mu");
|
||||
assert_eq!(method, "Lock");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn try_chain_decompose_three_dots() {
|
||||
// `c.writer.header.set` → receiver "c.writer.header", method "set".
|
||||
let (recv, method) = try_chain_decompose("c.writer.header.set").unwrap();
|
||||
assert_eq!(recv, "c.writer.header");
|
||||
assert_eq!(method, "set");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn try_chain_decompose_one_dot_keeps_bare_receiver() {
|
||||
// `f.Close` → receiver "f" (bare ident), method "Close". The
|
||||
// single-dot case still decomposes; apply_call routes it through
|
||||
// the existing SymbolId-based path (not chain_proxies).
|
||||
let (recv, method) = try_chain_decompose("f.Close").unwrap();
|
||||
assert_eq!(recv, "f");
|
||||
assert_eq!(method, "Close");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn try_chain_decompose_no_dot_returns_none() {
|
||||
assert!(try_chain_decompose("Close").is_none());
|
||||
assert!(try_chain_decompose("fopen").is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn try_chain_decompose_complex_tokens_returns_none() {
|
||||
// Each of these contains a token signaling complexity that breaks
|
||||
// the simple `<ident>.<ident>...` shape; helper must bail to
|
||||
// preserve the conservative behaviour the band-aid established.
|
||||
for s in [
|
||||
"Foo::bar::baz", // Rust path — `::` rules it out
|
||||
"ptr->field.f", // C arrow operator
|
||||
"obj.f().g", // intermediate call
|
||||
"vec[0].field", // index expression
|
||||
"obj?.f.g", // optional chain
|
||||
"obj.f g", // whitespace
|
||||
"c.writer.header()", // trailing parens (the gin/context shape)
|
||||
] {
|
||||
assert!(
|
||||
try_chain_decompose(s).is_none(),
|
||||
"expected bail on complex callee {s}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn try_chain_decompose_rejects_empty_segments() {
|
||||
for s in [".x.f", "x..f", "x.f.", "."] {
|
||||
assert!(try_chain_decompose(s).is_none(), "expected bail on {s}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn chain_proxy_acquire_records_chain_text_not_root() {
|
||||
// Phase 3 key behaviour: a chained-receiver acquire (`c.mu.Lock()`)
|
||||
// records `c.mu` in `state.chain_proxies` and DOES NOT touch the
|
||||
// SymbolId-keyed `receiver_class_group` for the chain root `c`.
|
||||
let mut interner = SymbolInterner::new();
|
||||
let _sym_c = interner.intern_scoped(None, "c");
|
||||
|
||||
let lock = ResourceMethodSummary {
|
||||
method_name: "Lock".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group: crate::cfg::BodyId(7),
|
||||
original_span: (10, 20),
|
||||
};
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&lock),
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (0, 30),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some("c.mu.Lock".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let (state, events) =
|
||||
transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
assert!(events.is_empty());
|
||||
|
||||
// chain_proxies has the chain text entry.
|
||||
assert!(
|
||||
state.chain_proxies.contains_key("c.mu"),
|
||||
"expected chain_proxies['c.mu'] entry; got {:?}",
|
||||
state.chain_proxies.keys().collect::<Vec<_>>()
|
||||
);
|
||||
let entry = &state.chain_proxies["c.mu"];
|
||||
assert_eq!(entry.lifecycle, ResourceLifecycle::OPEN);
|
||||
assert_eq!(entry.class_group, crate::cfg::BodyId(7));
|
||||
assert_eq!(entry.acquire_span, (10, 20));
|
||||
|
||||
// Root `c` is NOT marked in receiver_class_group — the gin/context FP
|
||||
// the band-aid was guarding against can no longer reappear.
|
||||
assert!(
|
||||
state.receiver_class_group.is_empty(),
|
||||
"chain root must not inherit proxy state; receiver_class_group was {:?}",
|
||||
state.receiver_class_group
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn chain_proxy_release_after_acquire_transitions_to_closed() {
|
||||
// Acquire + matching Release on the same chain receiver +
|
||||
// class group should transition the chain entry to CLOSED.
|
||||
let mut interner = SymbolInterner::new();
|
||||
let _sym_c = interner.intern_scoped(None, "c");
|
||||
let class_group = crate::cfg::BodyId(11);
|
||||
|
||||
let summaries = vec![
|
||||
ResourceMethodSummary {
|
||||
method_name: "Lock".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 10),
|
||||
},
|
||||
ResourceMethodSummary {
|
||||
method_name: "Unlock".into(),
|
||||
effect: ResourceEffect::Release,
|
||||
class_group,
|
||||
original_span: (20, 30),
|
||||
},
|
||||
];
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &summaries,
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let lock_info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (0, 10),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some("c.mu.Lock".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) =
|
||||
transfer.apply(NodeIndex::new(0), &lock_info, None, ProductState::initial());
|
||||
assert_eq!(
|
||||
state.chain_proxies["c.mu"].lifecycle,
|
||||
ResourceLifecycle::OPEN
|
||||
);
|
||||
|
||||
let unlock_info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (20, 30),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some("c.mu.Unlock".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) = transfer.apply(NodeIndex::new(1), &unlock_info, None, state);
|
||||
assert_eq!(
|
||||
state.chain_proxies["c.mu"].lifecycle,
|
||||
ResourceLifecycle::CLOSED
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn chain_proxy_distinct_chains_dont_collide() {
|
||||
// `c.mu.Lock()` and `c.other.Lock()` are independent chain
|
||||
// receivers — each gets its own entry in chain_proxies.
|
||||
let interner = SymbolInterner::new();
|
||||
let class_group = crate::cfg::BodyId(3);
|
||||
|
||||
let lock = ResourceMethodSummary {
|
||||
method_name: "Lock".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 0),
|
||||
};
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&lock),
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
|
||||
let mk_call = |callee: &str| NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (0, 0),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some(callee.into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) = transfer.apply(
|
||||
NodeIndex::new(0),
|
||||
&mk_call("c.mu.Lock"),
|
||||
None,
|
||||
ProductState::initial(),
|
||||
);
|
||||
let (state, _) = transfer.apply(NodeIndex::new(1), &mk_call("c.other.Lock"), None, state);
|
||||
assert!(state.chain_proxies.contains_key("c.mu"));
|
||||
assert!(state.chain_proxies.contains_key("c.other"));
|
||||
assert_eq!(state.chain_proxies.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn single_dot_proxy_acquire_uses_symbol_id_path() {
|
||||
// REGRESSION: single-dot callees keep the existing SymbolId-based
|
||||
// path — `f.acquireMine()` records against
|
||||
// `receiver_class_group[sym_f]`, NOT `chain_proxies["f"]`. This
|
||||
// preserves all existing 1-dot proxy semantics (leak detection,
|
||||
// finding attribution).
|
||||
//
|
||||
// We use an unusual method name so the direct-pair matcher
|
||||
// doesn't fire first (Go's resource_pairs cover `.Close`,
|
||||
// `.close`, etc., which would short-circuit before the proxy
|
||||
// routing).
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern_scoped(None, "f");
|
||||
let class_group = crate::cfg::BodyId(2);
|
||||
|
||||
let acquire = ResourceMethodSummary {
|
||||
method_name: "acquireMine".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 0),
|
||||
};
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&acquire),
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (0, 0),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some("f.acquireMine".into()),
|
||||
receiver: Some("f".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) = transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
|
||||
// SymbolId path fired: receiver_class_group has the SymbolId entry.
|
||||
assert_eq!(
|
||||
state.receiver_class_group.get(&sym_f),
|
||||
Some(&class_group),
|
||||
"single-dot must use SymbolId path"
|
||||
);
|
||||
// chain_proxies stays empty: this is NOT a chain receiver.
|
||||
assert!(
|
||||
state.chain_proxies.is_empty(),
|
||||
"single-dot must not populate chain_proxies; got {:?}",
|
||||
state.chain_proxies.keys().collect::<Vec<_>>()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn complex_callee_does_not_record_proxy() {
|
||||
// REGRESSION: callees with parens / `::` / `[` / `?` are
|
||||
// unparseable as chain receivers. The helper bails, no proxy
|
||||
// entry is recorded anywhere. Matches the conservative behaviour
|
||||
// the band-aid established.
|
||||
let interner = SymbolInterner::new();
|
||||
let class_group = crate::cfg::BodyId(0);
|
||||
let lock = ResourceMethodSummary {
|
||||
method_name: "Lock".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 0),
|
||||
};
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&lock),
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
for callee in ["c.writer.header().Lock", "Foo::bar::Lock", "c[i].mu.Lock"] {
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (0, 0),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some(callee.into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) =
|
||||
transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
assert!(
|
||||
state.chain_proxies.is_empty() && state.receiver_class_group.is_empty(),
|
||||
"complex callee {callee} should not record any proxy state; chain={:?} root={:?}",
|
||||
state.chain_proxies.keys().collect::<Vec<_>>(),
|
||||
state.receiver_class_group.keys().collect::<Vec<_>>()
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn chain_proxy_lattice_join_unions_keys() {
|
||||
// Sanity check: the lattice join unions chain_proxies keys.
|
||||
// Branch A: `c.mu` OPEN. Branch B: `c.other` OPEN. Join must
|
||||
// contain both — this is the dataflow-correctness invariant
|
||||
// for chain tracking across branches.
|
||||
use crate::state::lattice::Lattice;
|
||||
let mut a = ProductState::initial();
|
||||
let mut b = ProductState::initial();
|
||||
a.chain_proxies.insert(
|
||||
"c.mu".into(),
|
||||
ChainProxyState {
|
||||
lifecycle: ResourceLifecycle::OPEN,
|
||||
class_group: crate::cfg::BodyId(1),
|
||||
acquire_span: (0, 0),
|
||||
},
|
||||
);
|
||||
b.chain_proxies.insert(
|
||||
"c.other".into(),
|
||||
ChainProxyState {
|
||||
lifecycle: ResourceLifecycle::OPEN,
|
||||
class_group: crate::cfg::BodyId(2),
|
||||
acquire_span: (10, 20),
|
||||
},
|
||||
);
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(joined.chain_proxies.len(), 2);
|
||||
assert!(joined.chain_proxies.contains_key("c.mu"));
|
||||
assert!(joined.chain_proxies.contains_key("c.other"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn chain_proxy_lattice_join_merges_lifecycle() {
|
||||
// Same chain key on two branches — the lifecycle is OR-joined
|
||||
// (OPEN ∪ CLOSED). Mirrors the `ResourceLifecycle::join`
|
||||
// bitflag-or semantics already used for SymbolId-based tracking.
|
||||
use crate::state::lattice::Lattice;
|
||||
let mut a = ProductState::initial();
|
||||
let mut b = ProductState::initial();
|
||||
a.chain_proxies.insert(
|
||||
"c.mu".into(),
|
||||
ChainProxyState {
|
||||
lifecycle: ResourceLifecycle::OPEN,
|
||||
class_group: crate::cfg::BodyId(1),
|
||||
acquire_span: (0, 0),
|
||||
},
|
||||
);
|
||||
b.chain_proxies.insert(
|
||||
"c.mu".into(),
|
||||
ChainProxyState {
|
||||
lifecycle: ResourceLifecycle::CLOSED,
|
||||
class_group: crate::cfg::BodyId(1),
|
||||
acquire_span: (0, 0),
|
||||
},
|
||||
);
|
||||
let joined = a.join(&b);
|
||||
assert_eq!(joined.chain_proxies.len(), 1);
|
||||
let lc = joined.chain_proxies["c.mu"].lifecycle;
|
||||
assert!(lc.contains(ResourceLifecycle::OPEN));
|
||||
assert!(lc.contains(ResourceLifecycle::CLOSED));
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Pointer-analysis Phase 2: PtrProxyHint::FieldOnly routes
|
||||
// single-dot proxy-acquire to chain_proxies, suppressing the
|
||||
// SymbolId path that would otherwise mark the field-aliased local
|
||||
// as a leakable resource.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn field_only_hint_routes_single_dot_acquire_to_chain_proxies() {
|
||||
// Models `m := c.mu; m.Lock()` — `m`'s pt set is `{Field(SelfParam, mu)}`,
|
||||
// so PtrProxyHint::FieldOnly applies. The acquire must record
|
||||
// `m` in chain_proxies, NOT in receiver_class_group, so the
|
||||
// leak detector does not later flag `m` as an OPEN-at-exit
|
||||
// resource (it lives inside the function and never escapes).
|
||||
let mut interner = SymbolInterner::new();
|
||||
let _sym_m = interner.intern_scoped(None, "m");
|
||||
let class_group = crate::cfg::BodyId(2);
|
||||
|
||||
let acquire = ResourceMethodSummary {
|
||||
method_name: "Lock".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 10),
|
||||
};
|
||||
|
||||
let mut hints = std::collections::HashMap::new();
|
||||
hints.insert("m".to_string(), crate::pointer::PtrProxyHint::FieldOnly);
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&acquire),
|
||||
ptr_proxy_hints: Some(&hints),
|
||||
};
|
||||
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
ast: AstMeta {
|
||||
span: (0, 10),
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta::default(),
|
||||
call: CallMeta {
|
||||
callee: Some("m.Lock".into()),
|
||||
receiver: Some("m".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, events) =
|
||||
transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
assert!(events.is_empty());
|
||||
assert!(
|
||||
state.chain_proxies.contains_key("m"),
|
||||
"FieldOnly hint should route `m.Lock()` into chain_proxies; got {:?}",
|
||||
state.chain_proxies.keys().collect::<Vec<_>>()
|
||||
);
|
||||
assert!(
|
||||
state.receiver_class_group.is_empty(),
|
||||
"FieldOnly hint must not record SymbolId proxy entry; got {:?}",
|
||||
state.receiver_class_group.keys().collect::<Vec<_>>()
|
||||
);
|
||||
let entry = &state.chain_proxies["m"];
|
||||
assert_eq!(entry.lifecycle, ResourceLifecycle::OPEN);
|
||||
assert_eq!(entry.class_group, class_group);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_only_hint_release_transitions_chain_entry_to_closed() {
|
||||
// Acquire + Release pair on the field-aliased local both route
|
||||
// through chain_proxies — the entry transitions OPEN → CLOSED
|
||||
// exactly as the existing chain-receiver path does.
|
||||
let mut interner = SymbolInterner::new();
|
||||
let _sym_m = interner.intern_scoped(None, "m");
|
||||
let class_group = crate::cfg::BodyId(11);
|
||||
|
||||
let summaries = vec![
|
||||
ResourceMethodSummary {
|
||||
method_name: "Lock".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 10),
|
||||
},
|
||||
ResourceMethodSummary {
|
||||
method_name: "Unlock".into(),
|
||||
effect: ResourceEffect::Release,
|
||||
class_group,
|
||||
original_span: (20, 30),
|
||||
},
|
||||
];
|
||||
|
||||
let mut hints = std::collections::HashMap::new();
|
||||
hints.insert("m".to_string(), crate::pointer::PtrProxyHint::FieldOnly);
|
||||
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: &summaries,
|
||||
ptr_proxy_hints: Some(&hints),
|
||||
};
|
||||
|
||||
let lock_info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
call: CallMeta {
|
||||
callee: Some("m.Lock".into()),
|
||||
receiver: Some("m".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) =
|
||||
transfer.apply(NodeIndex::new(0), &lock_info, None, ProductState::initial());
|
||||
assert_eq!(state.chain_proxies["m"].lifecycle, ResourceLifecycle::OPEN);
|
||||
|
||||
let unlock_info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
call: CallMeta {
|
||||
callee: Some("m.Unlock".into()),
|
||||
receiver: Some("m".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) = transfer.apply(NodeIndex::new(1), &unlock_info, None, state);
|
||||
assert_eq!(
|
||||
state.chain_proxies["m"].lifecycle,
|
||||
ResourceLifecycle::CLOSED
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn no_hint_falls_through_to_existing_symbol_id_path() {
|
||||
// REGRESSION: when `ptr_proxy_hints` is `None`, the single-dot
|
||||
// proxy-acquire branch behaves exactly as today — the SymbolId
|
||||
// path fires, `chain_proxies` stays empty. Strict-additive
|
||||
// contract: pointer analysis disabled ⇒ no behavioural change.
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern_scoped(None, "f");
|
||||
let class_group = crate::cfg::BodyId(3);
|
||||
|
||||
let acquire = ResourceMethodSummary {
|
||||
method_name: "acquireMine".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 0),
|
||||
};
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&acquire),
|
||||
ptr_proxy_hints: None,
|
||||
};
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
call: CallMeta {
|
||||
callee: Some("f.acquireMine".into()),
|
||||
receiver: Some("f".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) = transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
assert_eq!(
|
||||
state.receiver_class_group.get(&sym_f),
|
||||
Some(&class_group),
|
||||
"no hint ⇒ SymbolId path"
|
||||
);
|
||||
assert!(state.chain_proxies.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_hint_map_does_not_redirect() {
|
||||
// REGRESSION: an empty hint map means "every name resolves to
|
||||
// PtrProxyHint::Other". The single-dot branch must fall
|
||||
// through to the SymbolId path — not silently route to
|
||||
// chain_proxies because the map happened to be empty.
|
||||
let mut interner = SymbolInterner::new();
|
||||
let sym_f = interner.intern_scoped(None, "f");
|
||||
let class_group = crate::cfg::BodyId(3);
|
||||
let acquire = ResourceMethodSummary {
|
||||
method_name: "acquireMine".into(),
|
||||
effect: ResourceEffect::Acquire,
|
||||
class_group,
|
||||
original_span: (0, 0),
|
||||
};
|
||||
let hints: std::collections::HashMap<String, crate::pointer::PtrProxyHint> =
|
||||
std::collections::HashMap::new();
|
||||
let transfer = DefaultTransfer {
|
||||
lang: Lang::Go,
|
||||
resource_pairs: rules::resource_pairs(Lang::Go),
|
||||
interner: &interner,
|
||||
resource_method_summaries: std::slice::from_ref(&acquire),
|
||||
ptr_proxy_hints: Some(&hints),
|
||||
};
|
||||
let info = NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
call: CallMeta {
|
||||
callee: Some("f.acquireMine".into()),
|
||||
receiver: Some("f".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
};
|
||||
let (state, _) = transfer.apply(NodeIndex::new(0), &info, None, ProductState::initial());
|
||||
assert_eq!(state.receiver_class_group.get(&sym_f), Some(&class_group));
|
||||
assert!(state.chain_proxies.is_empty());
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -112,32 +112,7 @@ impl<'a> SinkSiteLocator<'a> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Extract the source line containing `byte_offset`, trimmed and capped at
|
||||
/// 120 chars. Returns `None` when the offset is out of range or the line
|
||||
/// is entirely blank after trimming.
|
||||
pub(crate) fn line_snippet(src: &[u8], byte_offset: usize) -> Option<String> {
|
||||
if byte_offset >= src.len() {
|
||||
return None;
|
||||
}
|
||||
let line_start = src[..byte_offset]
|
||||
.iter()
|
||||
.rposition(|&b| b == b'\n')
|
||||
.map_or(0, |p| p + 1);
|
||||
let line_end = src[byte_offset..]
|
||||
.iter()
|
||||
.position(|&b| b == b'\n')
|
||||
.map_or(src.len(), |p| byte_offset + p);
|
||||
let line = std::str::from_utf8(&src[line_start..line_end]).ok()?;
|
||||
let trimmed = line.trim();
|
||||
if trimmed.is_empty() {
|
||||
return None;
|
||||
}
|
||||
if trimmed.len() > 120 {
|
||||
Some(format!("{}...", &trimmed[..120]))
|
||||
} else {
|
||||
Some(trimmed.to_string())
|
||||
}
|
||||
}
|
||||
pub(crate) use crate::utils::snippet::line_snippet;
|
||||
|
||||
/// Union two `SmallVec<[SinkSite; 1]>` lists with `(file_rel, line, col,
|
||||
/// cap)` dedup. Preserves insertion order of `existing` then appends any
|
||||
|
|
@ -403,6 +378,31 @@ pub struct FuncSummary {
|
|||
/// alias.
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
pub rust_wildcards: Option<Vec<String>>,
|
||||
|
||||
/// Per-file class / trait / interface hierarchy edges captured at
|
||||
/// CFG-construction time. Each entry is
|
||||
/// `(sub_container, super_container)` after language-specific
|
||||
/// normalisation:
|
||||
///
|
||||
/// * Java `class X extends Y` → `(X, Y)`; `implements I, J` → `(X, I)`, `(X, J)`
|
||||
/// * Rust `impl Trait for Type` → `(Type, Trait)`
|
||||
/// * TypeScript `class X extends Y implements I` → `(X, Y)`, `(X, I)`
|
||||
/// * Python `class X(A, B)` → `(X, A)`, `(X, B)`
|
||||
/// * PHP `class X extends Y implements I` → `(X, Y)`, `(X, I)`
|
||||
/// * Ruby `class X < Y` → `(X, Y)`
|
||||
/// * C++ `class X : public Y` → `(X, Y)`
|
||||
///
|
||||
/// Empty for files with no declared inheritance / impl
|
||||
/// relationships and for Go (which uses implicit interface
|
||||
/// satisfaction — Phase 6 does not try to compute it).
|
||||
///
|
||||
/// **Per-file duplication.** Every `FuncSummary` produced from a
|
||||
/// given file carries the **same** `hierarchy_edges` vector so the
|
||||
/// information survives summary-by-summary persistence to SQLite.
|
||||
/// `merge_summaries` deduplicates downstream when building
|
||||
/// [`crate::callgraph::TypeHierarchyIndex`].
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub hierarchy_edges: Vec<(String, String)>,
|
||||
}
|
||||
|
||||
// ── Cap conversion helpers ──────────────────────────────────────────────
|
||||
|
|
@ -562,6 +562,20 @@ pub struct GlobalSummaries {
|
|||
/// pass 1 and consumed by
|
||||
/// [`crate::auth_analysis::run_auth_analysis`] during pass 2.
|
||||
auth_by_key: HashMap<FuncKey, crate::auth_analysis::model::AuthCheckSummary>,
|
||||
/// Phase 6 type hierarchy index for runtime virtual-dispatch fan-out.
|
||||
///
|
||||
/// Installed by [`Self::install_hierarchy`] after pass 1 from the
|
||||
/// merged `FuncSummary::hierarchy_edges` vectors. Consumed by
|
||||
/// [`Self::resolve_callee_widened`] during pass 2 so the taint
|
||||
/// engine sees every concrete implementer of a method when the
|
||||
/// receiver is statically typed as a super-class / trait /
|
||||
/// interface — recovering the dispatch precision that today's
|
||||
/// single-result [`Self::resolve_callee`] discards.
|
||||
///
|
||||
/// `None` until installed: every consumer treats `None` as
|
||||
/// "fall through to today's bare resolution", so the index is
|
||||
/// strictly additive.
|
||||
hierarchy: Option<crate::callgraph::TypeHierarchyIndex>,
|
||||
}
|
||||
|
||||
impl GlobalSummaries {
|
||||
|
|
@ -858,6 +872,13 @@ impl GlobalSummaries {
|
|||
for (key, auth_sum) in other.auth_by_key {
|
||||
self.auth_by_key.insert(key, auth_sum);
|
||||
}
|
||||
// Hierarchy index: invalidate after a merge so the next consumer
|
||||
// sees a freshly-built view that includes `other`'s edges. The
|
||||
// alternative — point-merging two indexes — is racy when the
|
||||
// same `(lang, super)` key carries different sub-orderings in
|
||||
// each input; rebuild is O(n) over `by_key.iter()` and is the
|
||||
// single source of truth.
|
||||
self.hierarchy = None;
|
||||
}
|
||||
|
||||
/// Insert an SSA summary.
|
||||
|
|
@ -873,8 +894,74 @@ impl GlobalSummaries {
|
|||
/// functions — we synthesize a disambig so both are kept. Silent
|
||||
/// replacement in that case would drop one function's cross-file
|
||||
/// taint signal entirely, which the caller cannot recover.
|
||||
///
|
||||
/// Before reconciliation, drop any parameter-index reference at or
|
||||
/// above `key.arity`. Such indices come from synthetic SSA `Param`
|
||||
/// ops emitted by scoped lowering for **external captures** (free
|
||||
/// identifiers like `this`, module imports, or unresolved method
|
||||
/// names) and are useful for *intra-file* pass-2 analysis (the
|
||||
/// caller's implicit-uses argument group at the same index aligns
|
||||
/// with the synthetic Param) but never for cross-file consumers,
|
||||
/// which key off the FuncKey arity exclusively. Without the trim,
|
||||
/// `ssa_summary_fits_arity` would reject the summary and
|
||||
/// `reconcile_ssa_summary_key` would synthesise a disambig that
|
||||
/// uncouples the SSA FuncKey from the matching FuncSummary FuncKey
|
||||
/// (audit gap A.2.1.G1 —
|
||||
/// `project_typed_callgraph_audit_gap_ssa_disambig.md`).
|
||||
pub fn insert_ssa(&mut self, key: FuncKey, summary: SsaFuncSummary) {
|
||||
let key = self.reconcile_ssa_summary_key(key, &summary);
|
||||
// The summary may reference a parameter index ≥ `key.arity` when
|
||||
// scoped SSA lowering synthesised `Param` ops for **external
|
||||
// captures** (free identifiers like `this`, module imports,
|
||||
// unresolved method names) — see audit gap A.2.1.G1
|
||||
// (`project_typed_callgraph_audit_gap_ssa_disambig.md`). These
|
||||
// synthetic refs are useful inside the file they were extracted
|
||||
// in (the caller's implicit-uses argument group at the same
|
||||
// index aligns with the synthetic Param) and stay useful when
|
||||
// resolved cross-file by name from this map (the same
|
||||
// implicit-uses alignment applies). But they would trip
|
||||
// [`ssa_summary_fits_arity`] inside [`reconcile_ssa_summary_key`],
|
||||
// forcing a synthetic disambig that uncouples the SSA FuncKey
|
||||
// from the matching FuncSummary FuncKey — and Phase 3's
|
||||
// `summaries.get_ssa(caller_key)` lookup (consuming
|
||||
// `typed_call_receivers` at the FuncSummary-aligned key) would
|
||||
// miss.
|
||||
//
|
||||
// Resolution rule (applies only when `summary` does not fit
|
||||
// arity):
|
||||
//
|
||||
// * **No existing entry, or existing entry also has out-of-range
|
||||
// refs** — keep the (untrimmed) summary at the original key,
|
||||
// bypassing the disambig synthesis. Phase 3 finds the entry
|
||||
// under the FuncSummary's own disambig; cross-file resolvers
|
||||
// find the same entry with its full per-param signal
|
||||
// (closures, lambdas, captured-var sinks). The "existing also
|
||||
// has out-of-range refs" branch covers the iterative-rescan
|
||||
// case where round 2's incoming summary lands on top of round
|
||||
// 1's already-installed copy of the same function.
|
||||
//
|
||||
// * **Existing entry fits arity (legit) but new doesn't** — fall
|
||||
// back to the disambig synthesis. This preserves the
|
||||
// `insert_ssa_arity_overflow_rekeys` invariant: a structurally
|
||||
// incompatible incoming summary (different function sharing
|
||||
// name + container + arity, with param refs at indices that
|
||||
// don't even exist in the legitimate function) cannot
|
||||
// dethrone the existing entry by silent overwrite. Both
|
||||
// summaries survive — the existing one at the original key,
|
||||
// the new one at the synthesised disambig.
|
||||
let key = if key.arity.is_some() && !ssa_summary_fits_arity(&summary, key.arity) {
|
||||
let existing_also_overflows = self
|
||||
.ssa_by_key
|
||||
.get(&key)
|
||||
.is_some_and(|existing| !ssa_summary_fits_arity(existing, key.arity));
|
||||
let existing_present = self.ssa_by_key.contains_key(&key);
|
||||
if !existing_present || existing_also_overflows {
|
||||
key
|
||||
} else {
|
||||
self.reconcile_ssa_summary_key(key, &summary)
|
||||
}
|
||||
} else {
|
||||
self.reconcile_ssa_summary_key(key, &summary)
|
||||
};
|
||||
self.ssa_by_key.insert(key, summary);
|
||||
}
|
||||
|
||||
|
|
@ -1363,6 +1450,148 @@ impl GlobalSummaries {
|
|||
_ => CalleeResolution::Ambiguous(same_ns.into_iter().cloned().collect()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Install / refresh the type-hierarchy index from the currently
|
||||
/// loaded summaries. Idempotent — calling twice rebuilds.
|
||||
///
|
||||
/// Call this once after pass-1 merge (and again whenever
|
||||
/// summary state changes in a way that could affect virtual
|
||||
/// dispatch — typically: after the call-graph is rebuilt mid-fixed-point).
|
||||
/// `merge()` automatically invalidates so a forgotten reinstall
|
||||
/// degrades to today's behaviour rather than a stale lookup.
|
||||
pub fn install_hierarchy(&mut self) {
|
||||
let h = crate::callgraph::TypeHierarchyIndex::build(self);
|
||||
self.hierarchy = Some(h);
|
||||
}
|
||||
|
||||
/// Borrow the installed hierarchy index, if any.
|
||||
pub fn hierarchy(&self) -> Option<&crate::callgraph::TypeHierarchyIndex> {
|
||||
self.hierarchy.as_ref()
|
||||
}
|
||||
|
||||
/// Hard cap on hierarchy fan-out from a single call site — see
|
||||
/// [`Self::resolve_callee_widened`] for rationale. Public for tests
|
||||
/// that need to assert cap behaviour without hard-coding the value.
|
||||
pub const MAX_HIERARCHY_FANOUT: usize = 8;
|
||||
|
||||
/// Resolve a call site to *every* candidate FuncKey reachable
|
||||
/// through type-hierarchy fan-out. This is the runtime counterpart
|
||||
/// of the [`crate::callgraph::TypeHierarchyIndex::resolve_with_hierarchy`]
|
||||
/// step that the call-graph builder applies at edge-construction time.
|
||||
///
|
||||
/// Behaviour:
|
||||
///
|
||||
/// * `receiver_type = None` → falls through to
|
||||
/// [`Self::resolve_callee`]; returns `[k]` on `Resolved`, `[]`
|
||||
/// otherwise.
|
||||
/// * `receiver_type = Some(rt)` and either no hierarchy is installed
|
||||
/// or `rt` has no recorded sub-types → identical fall-through;
|
||||
/// the hierarchy lookup is a no-op.
|
||||
/// * `receiver_type = Some(rt)` with sub-types `s1, s2, …` →
|
||||
/// union of `lookup_qualified` for `(rt, s1, s2, …)` after arity
|
||||
/// filtering. Result is dedup'd in insertion order
|
||||
/// (direct-receiver match first, then each sub-type's match).
|
||||
///
|
||||
/// Hard cap: at most [`Self::MAX_HIERARCHY_FANOUT`] keys are
|
||||
/// returned. When the cap fires, the cap-hit is logged at `debug`
|
||||
/// and the tail impls are silently dropped — over-fanning is a
|
||||
/// precision-tax knob, not a soundness one.
|
||||
///
|
||||
/// Empty result + non-empty `subs` triggers a
|
||||
/// secondary fall-through to [`Self::resolve_callee`] so a
|
||||
/// type-fact misclassification (receiver typed as a super-class
|
||||
/// that has no method by this name on any sub) does not silently
|
||||
/// regress to "no resolution at all" — the leaf-name path can still
|
||||
/// pick up a match. This preserves the
|
||||
/// "subset of today's targets, never a superset" rule under
|
||||
/// hierarchy-aware resolution failure.
|
||||
pub fn resolve_callee_widened(&self, q: &CalleeQuery<'_>) -> Vec<FuncKey> {
|
||||
let arity_matches = |k: &FuncKey| match q.arity {
|
||||
Some(a) => k.arity == Some(a),
|
||||
None => true,
|
||||
};
|
||||
|
||||
let single_fallback = || -> Vec<FuncKey> {
|
||||
match self.resolve_callee(q) {
|
||||
CalleeResolution::Resolved(k) => vec![k],
|
||||
_ => Vec::new(),
|
||||
}
|
||||
};
|
||||
|
||||
// Hierarchy fan-out only fires when the call has an
|
||||
// authoritative receiver type AND the index is installed AND
|
||||
// the type has recorded sub-types. Every other case collapses
|
||||
// to today's resolver.
|
||||
let Some(rt) = q.receiver_type.filter(|s| !s.is_empty()) else {
|
||||
return single_fallback();
|
||||
};
|
||||
let Some(h) = self.hierarchy.as_ref() else {
|
||||
return single_fallback();
|
||||
};
|
||||
let subs = h.subs_of(q.caller_lang, rt);
|
||||
if subs.is_empty() {
|
||||
return single_fallback();
|
||||
}
|
||||
|
||||
// Union direct + sub-type matches in insertion order. Dedup is
|
||||
// O(n²) over the cap (n ≤ 8) so a HashSet would be wasted
|
||||
// overhead; linear scan is faster and order-preserving.
|
||||
let mut out: Vec<FuncKey> = Vec::new();
|
||||
let push_unique = |out: &mut Vec<FuncKey>, k: FuncKey| -> bool {
|
||||
if !out.iter().any(|e| e == &k) {
|
||||
out.push(k);
|
||||
true
|
||||
} else {
|
||||
false
|
||||
}
|
||||
};
|
||||
let qualified_lookup = |container: &str| -> Vec<FuncKey> {
|
||||
let qual = format!("{container}::{}", q.name);
|
||||
self.lookup_qualified(q.caller_lang, &qual)
|
||||
.into_iter()
|
||||
.map(|(k, _)| k.clone())
|
||||
.filter(|k| arity_matches(k))
|
||||
.collect()
|
||||
};
|
||||
for k in qualified_lookup(rt) {
|
||||
push_unique(&mut out, k);
|
||||
if out.len() >= Self::MAX_HIERARCHY_FANOUT {
|
||||
tracing::debug!(
|
||||
receiver = rt,
|
||||
method = q.name,
|
||||
cap = Self::MAX_HIERARCHY_FANOUT,
|
||||
"hierarchy fan-out cap reached on direct receiver match"
|
||||
);
|
||||
return out;
|
||||
}
|
||||
}
|
||||
for sub in subs {
|
||||
for k in qualified_lookup(sub.as_str()) {
|
||||
push_unique(&mut out, k);
|
||||
if out.len() >= Self::MAX_HIERARCHY_FANOUT {
|
||||
tracing::debug!(
|
||||
receiver = rt,
|
||||
method = q.name,
|
||||
cap = Self::MAX_HIERARCHY_FANOUT,
|
||||
"hierarchy fan-out cap reached; tail impls dropped"
|
||||
);
|
||||
return out;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if out.is_empty() {
|
||||
// Hierarchy widening produced nothing (e.g., none of the
|
||||
// recorded sub-types declare this method). Fall back to
|
||||
// today's qualified-first resolver so the misclassified-
|
||||
// type case still finds a leaf match — the same
|
||||
// "preserve today's behaviour on miss" rule the call-graph
|
||||
// builder applies.
|
||||
return single_fallback();
|
||||
}
|
||||
|
||||
out
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for GlobalSummaries {
|
||||
|
|
|
|||
|
|
@ -336,3 +336,208 @@ mod tests {
|
|||
assert_eq!(s, back);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Pointer-Phase 5: field-granularity points-to summary ──────────────
|
||||
|
||||
/// Maximum field names retained per parameter in [`FieldPointsToSummary`].
|
||||
///
|
||||
/// Mirror of [`MAX_ALIAS_EDGES`]. Bounds on-disk + cross-file work
|
||||
/// while leaving room for typical helpers (a handful of fields each).
|
||||
pub const MAX_FIELDS_PER_PARAM: usize = 8;
|
||||
|
||||
/// Pointer-Phase 5: field-granularity per-parameter points-to summary.
|
||||
///
|
||||
/// Records, for each positional parameter index, the set of field
|
||||
/// **names** read from and written to inside the callee body. Names
|
||||
/// (not [`crate::ssa::ir::FieldId`]) are persisted because field IDs
|
||||
/// are body-local — the per-body [`crate::ssa::ir::FieldInterner`]
|
||||
/// reassigns IDs across files. Callers re-intern through their own
|
||||
/// body's interner before consulting `field_taint` cells.
|
||||
///
|
||||
/// The receiver (`self` / `this`) uses sentinel index [`usize::MAX`]
|
||||
/// in the outer `Vec` so positional params and the receiver share the
|
||||
/// same indexing convention as `SsaFuncSummary::receiver_to_*`
|
||||
/// (separate channel).
|
||||
///
|
||||
/// Empty by default — functions that don't read or write any field on
|
||||
/// their parameters carry no entries and cost nothing on disk.
|
||||
#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct FieldPointsToSummary {
|
||||
/// `(param_index, field_names_read)` — the callee projected each
|
||||
/// listed field on a value derived from `param_index` somewhere
|
||||
/// in its body. Sorted, deduped per-entry.
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub param_field_reads: Vec<(u32, SmallVec<[String; 2]>)>,
|
||||
/// `(param_index, field_names_written)` — the callee assigned to
|
||||
/// each listed field on a value derived from `param_index`.
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub param_field_writes: Vec<(u32, SmallVec<[String; 2]>)>,
|
||||
/// Set when the read/write graph hit
|
||||
/// [`MAX_FIELDS_PER_PARAM`] for any parameter. Callers seeing
|
||||
/// `overflow=true` treat each parameter as reading/writing every
|
||||
/// field on every other parameter — the conservative greatest
|
||||
/// lower bound that preserves soundness.
|
||||
#[serde(default, skip_serializing_if = "core::ops::Not::not")]
|
||||
pub overflow: bool,
|
||||
}
|
||||
|
||||
impl FieldPointsToSummary {
|
||||
pub fn empty() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.param_field_reads.is_empty() && self.param_field_writes.is_empty() && !self.overflow
|
||||
}
|
||||
|
||||
fn insert_into(
|
||||
list: &mut Vec<(u32, SmallVec<[String; 2]>)>,
|
||||
param: u32,
|
||||
field: &str,
|
||||
overflow: &mut bool,
|
||||
) {
|
||||
let entry = match list.iter_mut().find(|(p, _)| *p == param) {
|
||||
Some(e) => &mut e.1,
|
||||
None => {
|
||||
list.push((param, SmallVec::new()));
|
||||
&mut list.last_mut().unwrap().1
|
||||
}
|
||||
};
|
||||
if entry.iter().any(|s| s == field) {
|
||||
return;
|
||||
}
|
||||
if entry.len() >= MAX_FIELDS_PER_PARAM {
|
||||
*overflow = true;
|
||||
return;
|
||||
}
|
||||
entry.push(field.to_string());
|
||||
entry.sort();
|
||||
}
|
||||
|
||||
/// Record a field READ on parameter `param`. Bounded by
|
||||
/// [`MAX_FIELDS_PER_PARAM`] per parameter; over-cap inserts trip
|
||||
/// `overflow`.
|
||||
pub fn add_read(&mut self, param: u32, field: &str) {
|
||||
if self.overflow {
|
||||
return;
|
||||
}
|
||||
let mut overflow = false;
|
||||
Self::insert_into(&mut self.param_field_reads, param, field, &mut overflow);
|
||||
if overflow {
|
||||
self.overflow = true;
|
||||
}
|
||||
}
|
||||
|
||||
/// Record a field WRITE on parameter `param`. Mirror of [`Self::add_read`].
|
||||
pub fn add_write(&mut self, param: u32, field: &str) {
|
||||
if self.overflow {
|
||||
return;
|
||||
}
|
||||
let mut overflow = false;
|
||||
Self::insert_into(&mut self.param_field_writes, param, field, &mut overflow);
|
||||
if overflow {
|
||||
self.overflow = true;
|
||||
}
|
||||
}
|
||||
|
||||
/// Union with `other`. Overflow propagates per
|
||||
/// [`PointsToSummary::merge`]'s semantics — once a callee is
|
||||
/// "any field on any parameter", merging cannot recover precision.
|
||||
pub fn merge(&mut self, other: &Self) {
|
||||
if other.overflow {
|
||||
self.overflow = true;
|
||||
return;
|
||||
}
|
||||
for (p, fields) in &other.param_field_reads {
|
||||
for f in fields {
|
||||
self.add_read(*p, f);
|
||||
}
|
||||
}
|
||||
for (p, fields) in &other.param_field_writes {
|
||||
for f in fields {
|
||||
self.add_write(*p, f);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod field_summary_tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn empty_summary_round_trips() {
|
||||
let s = FieldPointsToSummary::empty();
|
||||
assert!(s.is_empty());
|
||||
let json = serde_json::to_string(&s).unwrap();
|
||||
let back: FieldPointsToSummary = serde_json::from_str(&json).unwrap();
|
||||
assert_eq!(s, back);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn add_read_dedupes_and_sorts() {
|
||||
let mut s = FieldPointsToSummary::empty();
|
||||
s.add_read(0, "name");
|
||||
s.add_read(0, "id");
|
||||
s.add_read(0, "name"); // duplicate
|
||||
let entry = s.param_field_reads.iter().find(|(p, _)| *p == 0).unwrap();
|
||||
assert_eq!(entry.1.as_slice(), &["id".to_string(), "name".to_string()]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn distinct_params_get_distinct_entries() {
|
||||
let mut s = FieldPointsToSummary::empty();
|
||||
s.add_write(0, "cache");
|
||||
s.add_write(1, "log");
|
||||
assert_eq!(s.param_field_writes.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn overflow_trips_at_cap() {
|
||||
let mut s = FieldPointsToSummary::empty();
|
||||
for i in 0..(MAX_FIELDS_PER_PARAM + 4) {
|
||||
s.add_read(0, &format!("field{i}"));
|
||||
}
|
||||
assert!(s.overflow);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_unions_disjoint_keys() {
|
||||
let mut a = FieldPointsToSummary::empty();
|
||||
let mut b = FieldPointsToSummary::empty();
|
||||
a.add_read(0, "alpha");
|
||||
b.add_read(1, "beta");
|
||||
a.merge(&b);
|
||||
assert!(a.param_field_reads.iter().any(|(p, _)| *p == 0));
|
||||
assert!(a.param_field_reads.iter().any(|(p, _)| *p == 1));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_propagates_overflow() {
|
||||
let mut a = FieldPointsToSummary::empty();
|
||||
let mut b = FieldPointsToSummary::empty();
|
||||
b.overflow = true;
|
||||
a.merge(&b);
|
||||
assert!(a.overflow);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn round_trip_preserves_entries() {
|
||||
let mut s = FieldPointsToSummary::empty();
|
||||
s.add_read(0, "name");
|
||||
s.add_write(1, "cache");
|
||||
s.add_write(1, "log");
|
||||
let json = serde_json::to_string(&s).unwrap();
|
||||
let back: FieldPointsToSummary = serde_json::from_str(&json).unwrap();
|
||||
assert_eq!(s, back);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_serializes_as_empty_object() {
|
||||
let s = FieldPointsToSummary::empty();
|
||||
let json = serde_json::to_string(&s).unwrap();
|
||||
assert_eq!(json, "{}");
|
||||
let back: FieldPointsToSummary = serde_json::from_str("{}").unwrap();
|
||||
assert!(back.is_empty());
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@ use crate::abstract_interp::{AbstractTransfer, AbstractValue, PathFact};
|
|||
use crate::labels::Cap;
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
use crate::summary::SinkSite;
|
||||
use crate::summary::points_to::PointsToSummary;
|
||||
use crate::summary::points_to::{FieldPointsToSummary, PointsToSummary};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use smallvec::SmallVec;
|
||||
|
||||
|
|
@ -268,6 +268,20 @@ pub struct SsaFuncSummary {
|
|||
/// each other or the return value.
|
||||
#[serde(default, skip_serializing_if = "PointsToSummary::is_empty")]
|
||||
pub points_to: PointsToSummary,
|
||||
/// Pointer-Phase 5: field-granularity per-parameter points-to
|
||||
/// summary. Records which fields the callee reads from / writes
|
||||
/// to on each parameter, so cross-file resolution can spread
|
||||
/// taint through field-level mutations the callee performs on
|
||||
/// caller-supplied objects.
|
||||
///
|
||||
/// Default-empty (most functions don't field-mutate their params)
|
||||
/// and elided from serialised output via `skip_serializing_if` so
|
||||
/// pre-Phase-5 summaries deserialise cleanly without migration.
|
||||
/// Built by extraction in `summary_extract.rs` when the per-body
|
||||
/// [`crate::pointer::PointsToFacts`] are available
|
||||
/// (`NYX_POINTER_ANALYSIS=1`); empty otherwise.
|
||||
#[serde(default, skip_serializing_if = "FieldPointsToSummary::is_empty")]
|
||||
pub field_points_to: FieldPointsToSummary,
|
||||
/// Per-return-path abstract [`PathFact`] decomposition.
|
||||
///
|
||||
/// When non-empty, supplies per-predicate-gate facts finer than the
|
||||
|
|
@ -285,6 +299,25 @@ pub struct SsaFuncSummary {
|
|||
/// behaviour.
|
||||
#[serde(default, skip_serializing_if = "SmallVec::is_empty")]
|
||||
pub return_path_facts: SmallVec<[PathFactReturnEntry; 2]>,
|
||||
/// Per-call-site receiver-type info: `(call_ordinal, container_name)`.
|
||||
///
|
||||
/// Populated during SSA lowering (`lower_all_functions_from_bodies`)
|
||||
/// when type-fact analysis can resolve a method call's receiver SSA
|
||||
/// value to a concrete [`crate::ssa::type_facts::TypeKind`] with a
|
||||
/// non-empty [`crate::ssa::type_facts::TypeKind::container_name`].
|
||||
///
|
||||
/// Consumed by [`crate::callgraph::build_call_graph`] to feed
|
||||
/// `CalleeQuery.receiver_type` for the matching ordinal — letting
|
||||
/// the call graph narrow indirect method-call edges to only those
|
||||
/// targets whose defining container matches the inferred type.
|
||||
/// Strictly additive: an empty map means today's name-only
|
||||
/// resolution applies unchanged.
|
||||
///
|
||||
/// Ordinal here is the per-function `CallMeta.call_ordinal` shared
|
||||
/// with [`crate::summary::CalleeSite::ordinal`] so the two tables
|
||||
/// can be joined by ordinal at call-graph build time.
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub typed_call_receivers: Vec<(u32, String)>,
|
||||
}
|
||||
|
||||
/// A per-return-path [`PathFact`] entry.
|
||||
|
|
|
|||
|
|
@ -438,7 +438,9 @@ fn ssa_summary_serde_round_trip_identity() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -468,7 +470,9 @@ fn ssa_summary_serde_round_trip_strip_bits() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -495,7 +499,9 @@ fn ssa_summary_serde_round_trip_add_bits() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -529,7 +535,9 @@ fn ssa_summary_serde_round_trip_all_variants() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -565,7 +573,9 @@ fn global_summaries_insert_ssa_exact_key_replacement() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
gs.insert_ssa(key.clone(), v1.clone());
|
||||
assert_eq!(gs.get_ssa(&key), Some(&v1));
|
||||
|
|
@ -589,7 +599,9 @@ fn global_summaries_insert_ssa_exact_key_replacement() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
gs.insert_ssa(key.clone(), v2.clone());
|
||||
assert_eq!(gs.get_ssa(&key), Some(&v2));
|
||||
|
|
@ -633,7 +645,9 @@ fn global_summaries_merge_with_ssa_entries() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let sum_b = SsaFuncSummary {
|
||||
param_to_return: vec![],
|
||||
|
|
@ -653,7 +667,9 @@ fn global_summaries_merge_with_ssa_entries() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
|
||||
gs1.insert_ssa(key_a.clone(), sum_a.clone());
|
||||
|
|
@ -697,7 +713,9 @@ fn global_summaries_is_empty_considers_ssa() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
);
|
||||
|
||||
|
|
@ -724,7 +742,9 @@ fn ssa_summary_serde_round_trip_param_to_sink_param() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -766,7 +786,9 @@ fn ssa_summary_serde_round_trip_container_fields() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -818,7 +840,9 @@ fn ssa_summary_serde_round_trip_return_abstract() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -890,6 +914,8 @@ fn make_callee_body(
|
|||
value_defs,
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
},
|
||||
opt: crate::ssa::OptimizeResult {
|
||||
const_values: std::collections::HashMap::new(),
|
||||
|
|
@ -1046,6 +1072,7 @@ fn callee_body_serde_with_all_ssa_op_variants() {
|
|||
value: SsaValue(7),
|
||||
op: SsaOp::Call {
|
||||
callee: "foo".into(),
|
||||
callee_text: None,
|
||||
args: vec![smallvec![SsaValue(0)], smallvec![SsaValue(1)]],
|
||||
receiver: Some(SsaValue(2)),
|
||||
},
|
||||
|
|
@ -1077,6 +1104,7 @@ fn callee_body_serde_with_all_ssa_op_variants() {
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} => {
|
||||
assert_eq!(callee, "foo");
|
||||
assert_eq!(args.len(), 2);
|
||||
|
|
@ -1330,7 +1358,9 @@ fn global_summaries_resolve_body_requires_body_present() {
|
|||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
},
|
||||
);
|
||||
// Don't insert body
|
||||
|
|
@ -3169,6 +3199,95 @@ fn insert_ssa_arity_overflow_rekeys() {
|
|||
assert!(kept.param_to_sink.is_empty());
|
||||
}
|
||||
|
||||
/// Audit gap A.2.1.G1 reproducer: a summary whose only param-index
|
||||
/// references come from synthetic SSA `Param` ops for external
|
||||
/// captures (free identifiers, module imports, unresolved method
|
||||
/// names) lands at the original key when no existing entry occupies
|
||||
/// it.
|
||||
///
|
||||
/// This is the case `lower_to_ssa` produces for Java instance/static
|
||||
/// methods that reference free identifiers (e.g. `f.close()` where
|
||||
/// `close` is treated as an external capture — the synthetic Param 0
|
||||
/// then leaks into `param_to_return`/`param_to_sink`). Without the
|
||||
/// audit-gap fix, `reconcile_ssa_summary_key` would synthesise a
|
||||
/// disambig and Phase 3's `summaries.get_ssa(caller_key)` lookup
|
||||
/// (consuming `typed_call_receivers` at the FuncSummary-aligned key)
|
||||
/// would miss.
|
||||
#[test]
|
||||
fn insert_ssa_arity_overflow_keeps_original_key_when_no_collision() {
|
||||
// Single-file fresh insert: no prior entry at `key` to protect, so
|
||||
// the synthetic-Param overflow is treated as the function's own
|
||||
// signal and lands at the original FuncKey.
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let key = FuncKey {
|
||||
lang: Lang::Java,
|
||||
namespace: "Reader.java".into(),
|
||||
container: "Reader".into(),
|
||||
name: "read".into(),
|
||||
arity: Some(0),
|
||||
..Default::default()
|
||||
};
|
||||
let summary = SsaFuncSummary {
|
||||
// Synthetic Param-0 for the external `close` identifier inside
|
||||
// the static `read()` body — `param_count == 0` per the source-
|
||||
// level signature.
|
||||
param_to_return: vec![(0, TaintTransform::Identity)],
|
||||
typed_call_receivers: vec![(1, "FileHandle".to_string())],
|
||||
..Default::default()
|
||||
};
|
||||
gs.insert_ssa(key.clone(), summary.clone());
|
||||
|
||||
let kept = gs
|
||||
.get_ssa(&key)
|
||||
.expect("Reader::read SSA must be reachable at the FuncSummary-aligned key");
|
||||
assert_eq!(kept.typed_call_receivers, summary.typed_call_receivers);
|
||||
// The synthetic Param-0 reference is preserved verbatim — pass-2
|
||||
// analysis still aligns it with the caller's implicit-uses
|
||||
// argument group at the same index.
|
||||
assert_eq!(kept.param_to_return, summary.param_to_return);
|
||||
}
|
||||
|
||||
/// Companion to `insert_ssa_arity_overflow_keeps_original_key_when_no_collision`:
|
||||
/// when both rounds of an iterative scan produce summaries whose
|
||||
/// param-index references overflow the FuncKey arity (the same
|
||||
/// synthetic-Param signal each round), the second-round insert must
|
||||
/// land at the original key (last-writer-wins for the same function),
|
||||
/// not split off into a synthetic disambig.
|
||||
#[test]
|
||||
fn insert_ssa_arity_overflow_iterative_rescan_stays_at_original_key() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let key = FuncKey {
|
||||
lang: Lang::Java,
|
||||
namespace: "Reader.java".into(),
|
||||
container: "Reader".into(),
|
||||
name: "read".into(),
|
||||
arity: Some(0),
|
||||
..Default::default()
|
||||
};
|
||||
let round1 = SsaFuncSummary {
|
||||
param_to_return: vec![(0, TaintTransform::Identity)],
|
||||
typed_call_receivers: vec![(1, "FileHandle".to_string())],
|
||||
..Default::default()
|
||||
};
|
||||
gs.insert_ssa(key.clone(), round1);
|
||||
|
||||
// Iteration 2 of the scan loop produces the same shape with
|
||||
// refined typed_call_receivers (e.g. a new constructor type
|
||||
// discovered cross-file).
|
||||
let round2 = SsaFuncSummary {
|
||||
param_to_return: vec![(0, TaintTransform::Identity)],
|
||||
typed_call_receivers: vec![(1, "FileHandle".to_string()), (2, "Cache".to_string())],
|
||||
..Default::default()
|
||||
};
|
||||
gs.insert_ssa(key.clone(), round2.clone());
|
||||
|
||||
let kept = gs
|
||||
.get_ssa(&key)
|
||||
.expect("iterative-rescan summary must stay at the original key");
|
||||
assert_eq!(kept.typed_call_receivers, round2.typed_call_receivers);
|
||||
assert_eq!(kept.param_to_return, round2.param_to_return);
|
||||
}
|
||||
|
||||
// ── Primary sink-location attribution — SinkSite round-trips ────────────
|
||||
|
||||
#[test]
|
||||
|
|
@ -3382,7 +3501,9 @@ fn cf4_return_path_transform_serde_round_trip() {
|
|||
],
|
||||
)],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
};
|
||||
let json = serde_json::to_string(&summary).unwrap();
|
||||
let back: SsaFuncSummary = serde_json::from_str(&json).unwrap();
|
||||
|
|
@ -3503,8 +3624,15 @@ fn cf4_union_param_return_paths_by_index() {
|
|||
}
|
||||
|
||||
#[test]
|
||||
fn cf4_ssa_summary_fits_arity_rejects_out_of_range_path_idx() {
|
||||
// A path whose param index exceeds the key's arity is incompatible.
|
||||
fn cf4_ssa_summary_fits_arity_keeps_out_of_range_path_idx_at_original_key() {
|
||||
// A path whose param index exceeds the key's arity is treated as a
|
||||
// synthetic external-capture artefact (audit gap A.2.1.G1 — see
|
||||
// `project_typed_callgraph_audit_gap_ssa_disambig.md`). When no
|
||||
// existing entry sits at the key, `insert_ssa` keeps the (untrimmed)
|
||||
// summary at the original key so the SSA FuncKey stays aligned with
|
||||
// the matching FuncSummary FuncKey — Phase 3's
|
||||
// `summaries.get_ssa(caller_key)` lookup (consuming
|
||||
// `typed_call_receivers`) depends on this alignment.
|
||||
let bad = SsaFuncSummary {
|
||||
param_return_paths: vec![(5, smallvec![rpt(TaintTransform::Identity, 1, 0, 0)])],
|
||||
..Default::default()
|
||||
|
|
@ -3513,14 +3641,16 @@ fn cf4_ssa_summary_fits_arity_rejects_out_of_range_path_idx() {
|
|||
lang: Lang::Rust,
|
||||
namespace: "test.rs".into(),
|
||||
name: "helper".into(),
|
||||
arity: Some(2), // too small for idx 5
|
||||
arity: Some(2), // too small for idx 5 — synthetic-Param marker
|
||||
..Default::default()
|
||||
};
|
||||
let mut gs = GlobalSummaries::new();
|
||||
gs.insert_ssa(key.clone(), bad);
|
||||
// Reconciliation synthesises a disambig to keep the bad entry under a
|
||||
// different key; the original key stays empty.
|
||||
assert!(gs.get_ssa(&key).is_none());
|
||||
let kept = gs
|
||||
.get_ssa(&key)
|
||||
.expect("synthetic-Param summary inserted at original key");
|
||||
assert_eq!(kept.param_return_paths.len(), 1);
|
||||
assert_eq!(kept.param_return_paths[0].0, 5);
|
||||
}
|
||||
|
||||
// ── Parameter-granularity points-to summary ─────────────────────────────
|
||||
|
|
@ -3568,10 +3698,14 @@ fn cf6_ssa_summary_legacy_json_without_points_to_deserialises() {
|
|||
}
|
||||
|
||||
#[test]
|
||||
fn cf6_ssa_summary_fits_arity_rejects_out_of_range_points_to_idx() {
|
||||
fn cf6_ssa_summary_fits_arity_keeps_out_of_range_points_to_idx_at_original_key() {
|
||||
// Same arity-overflow handling as `cf4_ssa_summary_fits_arity_*`
|
||||
// for the points-to channel: when the summary references a
|
||||
// synthetic-Param index beyond `key.arity` and no existing entry
|
||||
// occupies the key, `insert_ssa` preserves the FuncKey-aligned
|
||||
// identity by inserting at the original key (audit gap A.2.1.G1).
|
||||
use crate::summary::points_to::{AliasKind, AliasPosition, PointsToSummary};
|
||||
let mut pts = PointsToSummary::empty();
|
||||
// Index 7 exceeds arity 2 below.
|
||||
pts.insert(
|
||||
AliasPosition::Param(7),
|
||||
AliasPosition::Return,
|
||||
|
|
@ -3590,6 +3724,499 @@ fn cf6_ssa_summary_fits_arity_rejects_out_of_range_points_to_idx() {
|
|||
};
|
||||
let mut gs = GlobalSummaries::new();
|
||||
gs.insert_ssa(key.clone(), bad);
|
||||
// Reconciliation rekeys the bad entry; the original key is empty.
|
||||
assert!(gs.get_ssa(&key).is_none());
|
||||
let kept = gs
|
||||
.get_ssa(&key)
|
||||
.expect("synthetic-Param points_to summary inserted at original key");
|
||||
assert_eq!(kept.points_to.max_param_index(), Some(7));
|
||||
}
|
||||
|
||||
/// Phase 4 (typed call-graph devirtualisation): two `findById`
|
||||
/// definitions on different containers must remain structurally
|
||||
/// disjoint after [`merge_summaries`] — no cap union may leak
|
||||
/// across them. The FuncKey identity model already keys on
|
||||
/// `(lang, namespace, container, name, arity, ...)` so this is
|
||||
/// supposed to be true today; the test pins it down so a future
|
||||
/// refactor can't silently widen the merge granularity.
|
||||
///
|
||||
/// Concretely: `Repository::findById` is parameterised (no
|
||||
/// `SQL_QUERY` sink cap), `UnsafeCache::findById` runs a string-
|
||||
/// concatenated query (carries `Cap::SQL_QUERY`). After merge,
|
||||
/// each FuncKey must own only its own caps — Repository must NOT
|
||||
/// inherit Cache's `SQL_QUERY` bit.
|
||||
#[test]
|
||||
fn cross_file_devirt_does_not_union_unrelated_findbyids() {
|
||||
use crate::labels::Cap;
|
||||
use crate::symbol::FuncKey;
|
||||
|
||||
fn method_summary(name: &str, container: &str, file: &str, sink_caps: u16) -> FuncSummary {
|
||||
FuncSummary {
|
||||
name: name.into(),
|
||||
file_path: file.into(),
|
||||
lang: "rust".into(),
|
||||
param_count: 1,
|
||||
param_names: vec!["id".into()],
|
||||
source_caps: 0,
|
||||
sanitizer_caps: 0,
|
||||
sink_caps,
|
||||
propagating_params: vec![],
|
||||
propagates_taint: false,
|
||||
tainted_sink_params: if sink_caps != 0 { vec![0] } else { vec![] },
|
||||
callees: vec![],
|
||||
container: container.into(),
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
let safe_repo = method_summary("findById", "Repository", "src/repo.rs", 0);
|
||||
let unsafe_cache = method_summary(
|
||||
"findById",
|
||||
"UnsafeCache",
|
||||
"src/cache.rs",
|
||||
Cap::SQL_QUERY.bits(),
|
||||
);
|
||||
|
||||
let gs = merge_summaries(vec![safe_repo, unsafe_cache], None);
|
||||
|
||||
// Two distinct keys must coexist — no merge collision.
|
||||
let repo_key = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "src/repo.rs".into(),
|
||||
container: "Repository".into(),
|
||||
name: "findById".into(),
|
||||
arity: Some(1),
|
||||
..Default::default()
|
||||
};
|
||||
let cache_key = FuncKey {
|
||||
lang: Lang::Rust,
|
||||
namespace: "src/cache.rs".into(),
|
||||
container: "UnsafeCache".into(),
|
||||
name: "findById".into(),
|
||||
arity: Some(1),
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let repo_sum = gs.get(&repo_key).expect("Repository::findById missing");
|
||||
let cache_sum = gs.get(&cache_key).expect("UnsafeCache::findById missing");
|
||||
|
||||
// Sink caps stay on their own owner — the whole point of
|
||||
// devirtualisation. Repository must not have inherited the
|
||||
// SQL_QUERY bit from UnsafeCache.
|
||||
assert_eq!(
|
||||
repo_sum.sink_caps, 0,
|
||||
"Repository::findById inherited a sink cap from UnsafeCache::findById — \
|
||||
the per-FuncKey identity model has been broken (sink_caps bits = {:#x})",
|
||||
repo_sum.sink_caps,
|
||||
);
|
||||
assert_eq!(
|
||||
cache_sum.sink_caps,
|
||||
Cap::SQL_QUERY.bits(),
|
||||
"UnsafeCache::findById lost its own sink cap during merge"
|
||||
);
|
||||
// Same invariant on tainted_sink_params — must not bleed across.
|
||||
assert!(
|
||||
repo_sum.tainted_sink_params.is_empty(),
|
||||
"Repository::findById inherited tainted_sink_params from UnsafeCache: {:?}",
|
||||
repo_sum.tainted_sink_params,
|
||||
);
|
||||
assert_eq!(cache_sum.tainted_sink_params, vec![0]);
|
||||
}
|
||||
|
||||
// ── Phase 6 hierarchy fan-out at runtime resolution ────────────────────
|
||||
//
|
||||
// `GlobalSummaries::resolve_callee_widened` is the runtime counterpart of
|
||||
// the call-graph builder's `TypeHierarchyIndex::resolve_with_hierarchy`.
|
||||
// These tests pin the contract that *every* concrete implementer is
|
||||
// reachable when the receiver type is statically a super-class / trait /
|
||||
// interface, with the explicit fall-throughs that preserve today's
|
||||
// behaviour when no fan-out applies.
|
||||
mod hierarchy_widened_tests {
|
||||
use super::*;
|
||||
|
||||
/// Build a minimal `(FuncKey, FuncSummary)` for a method on the
|
||||
/// given container with optional `hierarchy_edges` carried through.
|
||||
fn java_method(
|
||||
namespace: &str,
|
||||
container: &str,
|
||||
name: &str,
|
||||
arity: usize,
|
||||
sink_bits: u16,
|
||||
hierarchy_edges: Vec<(String, String)>,
|
||||
) -> (FuncKey, FuncSummary) {
|
||||
let (key, mut summary) = fs_with(
|
||||
namespace,
|
||||
container,
|
||||
name,
|
||||
arity,
|
||||
FuncKind::Method,
|
||||
Some((namespace.len() + container.len() + name.len()) as u32),
|
||||
sink_bits,
|
||||
);
|
||||
summary.hierarchy_edges = hierarchy_edges;
|
||||
(key, summary)
|
||||
}
|
||||
|
||||
/// A1 — no hierarchy installed. Widening collapses to today's
|
||||
/// single-result behaviour: one key in / one key out.
|
||||
#[test]
|
||||
fn widened_without_hierarchy_returns_single_resolved() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (k, s) = java_method("src/http.java", "HttpClient", "send", 1, 0x01, vec![]);
|
||||
gs.insert(k.clone(), s);
|
||||
|
||||
// Hierarchy is intentionally NOT installed.
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "send",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("HttpClient"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert_eq!(widened, vec![k]);
|
||||
}
|
||||
|
||||
/// A2 — hierarchy installed but the receiver type has no recorded
|
||||
/// sub-types. Falls through to today's single-result behaviour.
|
||||
#[test]
|
||||
fn widened_no_subtypes_returns_single() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (k, s) = java_method("src/http.java", "HttpClient", "send", 1, 0x01, vec![]);
|
||||
gs.insert(k.clone(), s);
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "send",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("HttpClient"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert_eq!(widened, vec![k]);
|
||||
}
|
||||
|
||||
/// A3 — hierarchy with one sub-type implementer. Widening returns
|
||||
/// both the direct receiver match and the sub-type's match.
|
||||
#[test]
|
||||
fn widened_one_subtype_returns_two_keys() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
// Carrier: ILogger -> ConsoleLogger edge.
|
||||
let (k_iface, s_iface) = java_method(
|
||||
"src/logger.java",
|
||||
"ILogger",
|
||||
"log",
|
||||
1,
|
||||
0x00,
|
||||
vec![("ConsoleLogger".to_string(), "ILogger".to_string())],
|
||||
);
|
||||
let (k_impl, s_impl) =
|
||||
java_method("src/logger.java", "ConsoleLogger", "log", 1, 0x01, vec![]);
|
||||
gs.insert(k_iface.clone(), s_iface);
|
||||
gs.insert(k_impl.clone(), s_impl);
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "log",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("ILogger"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert_eq!(
|
||||
widened.len(),
|
||||
2,
|
||||
"expected ILogger + ConsoleLogger fan-out, got {widened:?}"
|
||||
);
|
||||
assert!(widened.contains(&k_iface));
|
||||
assert!(widened.contains(&k_impl));
|
||||
}
|
||||
|
||||
/// A4 — hierarchy with multiple sub-types: every implementer's
|
||||
/// matching method is in the result, deduplicated.
|
||||
#[test]
|
||||
fn widened_multiple_subtypes_returns_all() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
// Three impls + one interface. The interface itself has no
|
||||
// body so we omit a method on it (that is the more common
|
||||
// shape — a pure interface plus concrete classes).
|
||||
let edges = vec![
|
||||
("FileLogger".to_string(), "ILogger".to_string()),
|
||||
("NetLogger".to_string(), "ILogger".to_string()),
|
||||
("StdLogger".to_string(), "ILogger".to_string()),
|
||||
];
|
||||
let (k_file, s_file) = java_method(
|
||||
"src/file_logger.java",
|
||||
"FileLogger",
|
||||
"log",
|
||||
1,
|
||||
0x01,
|
||||
edges.clone(),
|
||||
);
|
||||
let (k_net, s_net) =
|
||||
java_method("src/net_logger.java", "NetLogger", "log", 1, 0x02, vec![]);
|
||||
let (k_std, s_std) =
|
||||
java_method("src/std_logger.java", "StdLogger", "log", 1, 0x04, vec![]);
|
||||
gs.insert(k_file.clone(), s_file);
|
||||
gs.insert(k_net.clone(), s_net);
|
||||
gs.insert(k_std.clone(), s_std);
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "log",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("ILogger"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert_eq!(widened.len(), 3, "expected three impls, got {widened:?}");
|
||||
assert!(widened.contains(&k_file));
|
||||
assert!(widened.contains(&k_net));
|
||||
assert!(widened.contains(&k_std));
|
||||
}
|
||||
|
||||
/// A5 — the arity filter must apply across the whole fan-out, not
|
||||
/// just the direct-receiver leg. An implementer with a different
|
||||
/// arity must not leak into the result.
|
||||
#[test]
|
||||
fn widened_arity_filter_applies_across_fanout() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let edges = vec![
|
||||
("OneArg".to_string(), "IBase".to_string()),
|
||||
("TwoArg".to_string(), "IBase".to_string()),
|
||||
];
|
||||
let (k_one, s_one) = java_method("src/one.java", "OneArg", "do_it", 1, 0x01, edges.clone());
|
||||
let (k_two, s_two) = java_method("src/two.java", "TwoArg", "do_it", 2, 0x02, vec![]);
|
||||
gs.insert(k_one.clone(), s_one);
|
||||
gs.insert(k_two.clone(), s_two);
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "do_it",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("IBase"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert_eq!(widened, vec![k_one], "arity-2 impl must be filtered out");
|
||||
}
|
||||
|
||||
/// A6 — fan-out is bounded at `MAX_HIERARCHY_FANOUT`. Build a
|
||||
/// hierarchy with more impls than the cap allows and assert the
|
||||
/// result is exactly capped (and that early impls are preserved
|
||||
/// — the cap drops the *tail*, not the head).
|
||||
#[test]
|
||||
fn widened_caps_at_max_hierarchy_fanout() {
|
||||
let cap = GlobalSummaries::MAX_HIERARCHY_FANOUT;
|
||||
let mut gs = GlobalSummaries::new();
|
||||
|
||||
// Build cap+3 impls so we can assert the tail truncates and a
|
||||
// deterministic prefix remains.
|
||||
let extra = 3;
|
||||
let total = cap + extra;
|
||||
let edges: Vec<(String, String)> = (0..total)
|
||||
.map(|i| (format!("Impl{i:02}"), "IBase".to_string()))
|
||||
.collect();
|
||||
|
||||
// Carrier — first impl carries every edge so the index is
|
||||
// populated in one shot.
|
||||
let (k0, s0) = java_method("src/impl00.java", "Impl00", "run", 0, 0x01, edges);
|
||||
gs.insert(k0.clone(), s0);
|
||||
for i in 1..total {
|
||||
let (k, s) = java_method(
|
||||
&format!("src/impl{i:02}.java"),
|
||||
&format!("Impl{i:02}"),
|
||||
"run",
|
||||
0,
|
||||
0x01,
|
||||
vec![],
|
||||
);
|
||||
gs.insert(k, s);
|
||||
}
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "run",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("IBase"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert_eq!(
|
||||
widened.len(),
|
||||
cap,
|
||||
"fan-out must cap at MAX_HIERARCHY_FANOUT={cap}, got {}",
|
||||
widened.len()
|
||||
);
|
||||
}
|
||||
|
||||
/// A7 — when hierarchy widening produces no candidates AND the
|
||||
/// receiver_type lookup is authoritative (Step 1), the secondary
|
||||
/// fall-through goes through `resolve_callee` which returns
|
||||
/// Ambiguous/NotFound rather than silently picking an unrelated
|
||||
/// leaf — exactly the "subset of today's targets, never a
|
||||
/// superset" rule. Test asserts the empty result is preserved.
|
||||
#[test]
|
||||
fn widened_empty_does_not_silently_pick_unrelated_leaf() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
// Edge: IUnused has a sub Used, but neither declares
|
||||
// `something`. An unrelated free function `something` exists
|
||||
// in the same namespace — under today's authoritative
|
||||
// receiver_type rules, that function MUST NOT be picked when
|
||||
// the call is annotated with receiver_type "IUnused".
|
||||
let edges = vec![("Used".to_string(), "IUnused".to_string())];
|
||||
let (k_carrier, s_carrier) =
|
||||
java_method("src/util.java", "Used", "carrier", 0, 0x00, edges);
|
||||
let (k_free, s_free) = free_summary("src/app.java", "something", 0, 0x01);
|
||||
gs.insert(k_carrier, s_carrier);
|
||||
gs.insert(k_free, s_free);
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "something",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("IUnused"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert!(
|
||||
widened.is_empty(),
|
||||
"receiver_type IUnused with no matching method must NOT silently \
|
||||
pick an unrelated free function — got {widened:?}"
|
||||
);
|
||||
}
|
||||
|
||||
/// A7b — when hierarchy widening produces nothing AND today's
|
||||
/// `resolve_callee` *does* resolve (no receiver_type, just bare
|
||||
/// leaf or qualifier hint), the fallback returns the single key.
|
||||
/// This pins the secondary-fallback contract on the path where it
|
||||
/// actually matters (no authoritative receiver_type).
|
||||
#[test]
|
||||
fn widened_falls_through_when_resolve_callee_resolves() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (k_free, s_free) = free_summary("src/app.java", "helper", 0, 0x01);
|
||||
gs.insert(k_free.clone(), s_free);
|
||||
gs.install_hierarchy();
|
||||
|
||||
// No receiver_type → first branch of `resolve_callee_widened`
|
||||
// is the single-result fallback path.
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "helper",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: None,
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert_eq!(widened, vec![k_free]);
|
||||
}
|
||||
|
||||
/// A8 — receiver_type is None → no widening; behaves identically
|
||||
/// to `resolve_callee` (single-result wrap).
|
||||
#[test]
|
||||
fn widened_no_receiver_type_collapses_to_resolve_callee() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (k_free, s_free) = free_summary("src/app.java", "helper", 0, 0x01);
|
||||
gs.insert(k_free.clone(), s_free);
|
||||
gs.install_hierarchy();
|
||||
|
||||
let widened = gs.resolve_callee_widened(&CalleeQuery {
|
||||
name: "helper",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: None,
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert_eq!(widened, vec![k_free]);
|
||||
}
|
||||
|
||||
/// A9 — `merge()` must invalidate the cached hierarchy index so a
|
||||
/// post-merge call to `resolve_callee_widened` doesn't look up a
|
||||
/// stale view. Since `install_hierarchy` is required after merges,
|
||||
/// the test asserts: post-merge, before reinstall, fan-out must
|
||||
/// fall through to single-result behaviour.
|
||||
#[test]
|
||||
fn merge_invalidates_hierarchy_cache() {
|
||||
let mut gs_a = GlobalSummaries::new();
|
||||
let edges = vec![("Sub".to_string(), "Super".to_string())];
|
||||
let (k_super, s_super) = java_method("src/super.java", "Super", "m", 0, 0x00, edges);
|
||||
let (k_sub, s_sub) = java_method("src/sub.java", "Sub", "m", 0, 0x01, vec![]);
|
||||
gs_a.insert(k_super.clone(), s_super);
|
||||
gs_a.insert(k_sub.clone(), s_sub);
|
||||
gs_a.install_hierarchy();
|
||||
// Before merge: fan-out works.
|
||||
let pre_merge = gs_a.resolve_callee_widened(&CalleeQuery {
|
||||
name: "m",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("Super"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert_eq!(pre_merge.len(), 2);
|
||||
|
||||
// Merge in an empty `gs_b` — should invalidate the cached
|
||||
// hierarchy.
|
||||
gs_a.merge(GlobalSummaries::new());
|
||||
assert!(
|
||||
gs_a.hierarchy().is_none(),
|
||||
"merge() must clear the cached hierarchy"
|
||||
);
|
||||
|
||||
// After merge, before reinstall: the resolver must fall back
|
||||
// to single-result behaviour (no fan-out).
|
||||
let post_merge_no_install = gs_a.resolve_callee_widened(&CalleeQuery {
|
||||
name: "m",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("Super"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert_eq!(post_merge_no_install.len(), 1);
|
||||
assert_eq!(post_merge_no_install[0], k_super);
|
||||
|
||||
// After reinstall: fan-out is restored.
|
||||
gs_a.install_hierarchy();
|
||||
let post_merge_reinstalled = gs_a.resolve_callee_widened(&CalleeQuery {
|
||||
name: "m",
|
||||
caller_lang: Lang::Java,
|
||||
caller_namespace: "src/app.java",
|
||||
caller_container: None,
|
||||
receiver_type: Some("Super"),
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(0),
|
||||
});
|
||||
assert_eq!(post_merge_reinstalled.len(), 2);
|
||||
assert!(post_merge_reinstalled.contains(&k_super));
|
||||
assert!(post_merge_reinstalled.contains(&k_sub));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1379,6 +1379,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let empty_succs = HashMap::new();
|
||||
|
|
@ -1436,6 +1438,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let empty_succs = HashMap::new();
|
||||
|
|
@ -1566,6 +1570,8 @@ mod tests {
|
|||
value_defs: vec![make_value_def(b0, n0), make_value_def(b1, n1)],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let finding = make_finding(n0, n1);
|
||||
|
|
@ -1671,6 +1677,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
// Finding path goes through B0 → B1 → B3
|
||||
|
|
@ -1814,6 +1822,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let finding = Finding {
|
||||
|
|
@ -1923,6 +1933,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![(b0, b2)],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let mut exc_succs: HashMap<BlockId, SmallVec<[BlockId; 2]>> = HashMap::new();
|
||||
|
|
@ -1987,6 +1999,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![(b0, b2)],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let mut exc_succs: HashMap<BlockId, SmallVec<[BlockId; 2]>> = HashMap::new();
|
||||
|
|
@ -2041,6 +2055,7 @@ mod tests {
|
|||
value: SsaValue(1),
|
||||
op: SsaOp::Call {
|
||||
callee: "JSON.parse".into(),
|
||||
callee_text: None,
|
||||
args: vec![smallvec![SsaValue(0)]],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
@ -2091,6 +2106,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![(b1, b2)],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let finding = Finding {
|
||||
|
|
|
|||
|
|
@ -1094,6 +1094,7 @@ fn handle_nested_calls(
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} = &inst.op
|
||||
{
|
||||
// Only attempt if the current result is opaque
|
||||
|
|
|
|||
|
|
@ -387,6 +387,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -430,6 +432,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -509,6 +513,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -569,6 +575,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -647,6 +655,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -716,6 +726,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -748,6 +760,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -802,6 +816,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -880,6 +896,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -929,6 +947,7 @@ mod tests {
|
|||
2,
|
||||
SsaOp::Call {
|
||||
callee: "f".into(),
|
||||
callee_text: None,
|
||||
args: vec![smallvec![SsaValue(1)]],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
@ -955,6 +974,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
@ -988,6 +1009,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let info = analyse_loops(&ssa);
|
||||
|
|
|
|||
|
|
@ -377,6 +377,8 @@ mod tests {
|
|||
value_defs: vec![make_value_def(b0, n0), make_value_def(b1, n1)],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let finding = Finding {
|
||||
|
|
@ -447,6 +449,8 @@ mod tests {
|
|||
value_defs: vec![make_value_def(b0, n0), make_value_def(b1, n1)],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let finding = Finding {
|
||||
|
|
@ -545,6 +549,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let ctx = SymexContext {
|
||||
|
|
@ -602,6 +608,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let ctx = SymexContext {
|
||||
|
|
|
|||
|
|
@ -350,6 +350,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: [(node, SsaValue(5))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let witness = state.get_sink_witness(&finding, &ssa);
|
||||
|
|
@ -387,6 +389,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: [(node, SsaValue(5))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
assert_eq!(state.get_sink_witness(&finding, &ssa), None);
|
||||
|
|
@ -421,6 +425,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
assert_eq!(state.get_sink_witness(&finding, &ssa), None);
|
||||
|
|
@ -459,6 +465,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
state.widen_at_loop_head(BlockId(0), &ssa);
|
||||
|
|
@ -500,6 +508,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
state.widen_at_loop_head(BlockId(0), &ssa);
|
||||
|
|
@ -541,6 +551,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
state.widen_at_loop_head(BlockId(0), &ssa);
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@
|
|||
//! etc.) for witness enrichment and heuristic mismatch diagnostics. They do
|
||||
//! NOT affect taint semantics.
|
||||
|
||||
use crate::labels::Cap;
|
||||
use crate::labels::{Cap, bare_method_name};
|
||||
use crate::symbol::Lang;
|
||||
|
||||
use super::value::SymbolicValue;
|
||||
|
|
@ -155,7 +155,7 @@ pub fn classify_string_method(
|
|||
args: &[SymbolicValue],
|
||||
lang: Lang,
|
||||
) -> Option<StringMethodInfo> {
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let method = bare_method_name(callee);
|
||||
|
||||
match lang {
|
||||
Lang::JavaScript | Lang::TypeScript => classify_js(method, args),
|
||||
|
|
@ -506,7 +506,7 @@ fn classify_transform_js(callee: &str) -> Option<TransformMethodInfo> {
|
|||
use StringOperandSource::*;
|
||||
use TransformKind::*;
|
||||
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let method = bare_method_name(callee);
|
||||
match method {
|
||||
// URL encoding/decoding
|
||||
"encodeURIComponent" | "encodeURI" => Some(TransformMethodInfo {
|
||||
|
|
@ -622,7 +622,7 @@ fn classify_transform_java(callee: &str) -> Option<TransformMethodInfo> {
|
|||
// `URLEncoder.encode`, `Base64.getEncoder.encodeToString`). Match on
|
||||
// the suffix after the last `.` for the leaf method name, but also
|
||||
// examine the dotted callee for receiver-qualified disambiguation.
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let method = bare_method_name(callee);
|
||||
|
||||
// URL encoding/decoding — `java.net.URLEncoder.encode` / `URLDecoder.decode`.
|
||||
if callee.ends_with("URLEncoder.encode") {
|
||||
|
|
@ -1039,7 +1039,7 @@ pub fn detect_replace_sanitizer(
|
|||
|
||||
/// Determine whether a replace call is global (replaces all occurrences).
|
||||
fn is_global_replace(callee: &str, lang: Lang) -> bool {
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let method = bare_method_name(callee);
|
||||
match lang {
|
||||
// JS: replace() is NOT global; replaceAll() IS global
|
||||
Lang::JavaScript | Lang::TypeScript => method == "replaceAll",
|
||||
|
|
|
|||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue