mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-12 19:55:14 +02:00
fix(engine): CFG/SSA/taint/IPA soundness, precision & recall fixes
This commit is contained in:
parent
59e4359257
commit
246f32a419
39 changed files with 4729 additions and 465 deletions
14
CHANGELOG.md
14
CHANGELOG.md
|
|
@ -74,6 +74,20 @@ The attack-surface map and chain composer turn the flat finding list into a rout
|
|||
- **DB fast-fail preflight.** `Indexer::init` reads the first 16 bytes of any candidate SQLite file and rejects anything without the standard `SQLite format 3\0` magic. Stops a misnamed JSON / text file from corrupting the index path with a SQLite error halfway through migration.
|
||||
- **Symbolic-execution coverage.** Symex now recognises a wider set of string operations (`substr`, `replace`, `to_lower`, `to_upper`, `trim`, `strlen`) per the value/transfer pipeline, and the abstract-interpretation framework reasons about interval and prefix/suffix string facts during the dynamic verdict pass.
|
||||
|
||||
### Fixed (engine correctness)
|
||||
|
||||
- **CFG construction.** Python `if/elif/elif/else` chains no longer drop every alternative past the first, so a sink in a second `elif` or a trailing `else` is analysed (same fix covers PHP `else_if`). C-style `for (init; cond; incr)` loops now lower the initializer and increment, so taint introduced in the loop header (`for (cmd = getenv(...); ...)`) reaches the body. A `switch` `default` is no longer unconditionally hoisted to the chain tail, preserving fall-through order in C/C++/JS/TS/PHP/Java. Source/sink calls inside short-circuit `&&` / `||` operands of an `if` / `while` condition are now classified instead of dropped.
|
||||
- **SSA lowering of exception handlers.** Catch blocks with internal control flow (an `if` / loop / nested `try` inside the handler) no longer lose every instruction past the catch entry — the orphan subtree is renamed through a virtual-root dominator tree, so sinks reached only inside a `catch` are seen. Catch-side variable reads now resolve to the most entry-dominating reaching definition (the pre-`try` value) rather than a post-join reassignment. A genuine positional argument equal to a chained-call receiver root (`a.b.m(p, a)`) is preserved instead of being stripped as the implicit chain root.
|
||||
- **Taint soundness.** Sink suppression now gates on `validated_must` (validated on every path) instead of `validated_may` (any path), closing a false-negative where a single validated branch silenced a sink. `is_noreturn_call` no longer matches receiver-qualified `.exit()` / `.abort()` / method calls, so `transaction.abort()` stops wiping taint state. The SSRF same-origin check rejects protocol-relative `//host` prefixes (an open-redirect / SSRF bypass that a bare `/`-prefix check accepted). Inline-return taint unions the derived and parameter-passthrough channels for mixed-return helpers (`if (c) return src(); return x;`). The inline-analysis cache is keyed to exclude callback-bound arguments, so a function-valued argument no longer poisons sibling call sites that pass a different callback.
|
||||
- **Taint precision.** `String.valueOf(String/Object)` is no longer tagged a safe-string producer (it is an identity passthrough, so `String.valueOf(req.getParameter(...))` was silently suppressed). Cross-parameter sanitizers no longer bleed onto sibling arguments (`f(a, b){ return a + escape(b) }` sanitises only `b`), and a cross-file sanitizer resolved through the coarse summary tier still applies its strip. Relative-URL and host-allowlist cap clearing is alias-aware. Substring-rejection and `indexOf() === -1` idioms are no longer misread as allowlist validation, and dotted multi-argument validators no longer over-validate unrelated targets.
|
||||
- **Interprocedural resolution.** SCC / topo file batching and reachability key files by their package-qualified namespace, matching the call-graph nodes and SSA summary tier so cross-package callers resolve. Directly self-recursive functions now get SCC fixed-point treatment. Call resolution tolerates under-application (a call supplying fewer arguments than a callee with default / optional parameters) while still degrading to `Ambiguous` rather than a wrong pick. A failed SCC iteration no longer overwrites a file's cached diagnostics with an empty set. JS/TS module resolution appends extensions to dotted specifiers (`./user.service` → `user.service.ts`) and swaps a `.js` import to a `.ts` file (NodeNext / ESM).
|
||||
- **JS/TS two-level solve.** Pass-2 top-level (global) taint now reaches nested closures two or more scopes deep, and the pass-2 dirty-skip no longer drops a nested body that transitively consumes a changed global through a parent-local.
|
||||
- **Scan pipeline and index.** `replace_all_for_file` deletes stale SSA summary / body rows unconditionally, so an incremental rescan cannot leave orphaned rows. Cached findings recompute their category instead of being stamped `Security`, so structural warnings keep their real class. The indexed build persists auth summaries and cross-package imports, logs-and-skips an unreadable file instead of aborting the whole build, and keys `FuncSummary` entries to match the SSA tier so an indexed scan and a full scan agree.
|
||||
- **Language coverage (recall).** KINDS maps were completed so previously-dropped bodies are walked: Java interface / enum / record / `synchronized` blocks, Rust inline `mod { ... }` items, Go labeled-statement bodies, and Ruby lambda / brace-block bodies. Go variadic and Python `*args` / `**kwargs` parameters are seeded with correct arity. C/C++ `scanf` / `fscanf` / `sscanf` / `read` register their output buffers as taint sources. TypeScript gated sinks dropped from the JS mirror were restored (`_.template`, `http.get` / `https.get`, `setValue` / `dotProp.set` / `jp.set`). The weak-hash and HTTP-URL AST patterns match single- and double-quoted string literals across JS, TS, and Ruby.
|
||||
- **Symbolic execution.** Interprocedural parameter seeding fixed an off-by-one for method calls and now seeds the receiver / self parameter; the cross-file depth guard increments on descent; and a path cut short by the global step budget records `Inconclusive` instead of `Confirmed`.
|
||||
- **Abstract interpretation and pointer analysis.** Interval division handles the `i64::MIN / -1` overflow (degrading to unbounded instead of a falsely-narrow range) and multiplication computes overflow in `i128`. `AbstractState::leq` checks entries present only in the other state, restoring a sound partial order. The pointer fixpoint re-projects container-element field reads after the receiver's points-to set converges.
|
||||
- **CFG-level analyses.** Error-fallthrough termination stops at the `if` join point; a guard's constant-operand test refuses a `Source`-labelled call result; guard / sanitizer matchers require a leaf-name boundary (so `invalidate` no longer matches the `validate` guard and `unquote` no longer matches `quote`); resource ownership-transfer requires a real `->field =` assignment rather than any `->` in a span; post-dominators are computed once per resource pass; and the web-entrypoint heuristic confirms web parameters against the candidate handler only, so an unrelated `req` parameter elsewhere in the file no longer promotes batch / CLI functions to web entry points.
|
||||
|
||||
### CLI
|
||||
|
||||
- **`nyx scan --verify`** (enabled by default in standard builds) and `--backend {auto,process,docker}` select the dynamic-verification harness. `--no-verify` skips verification for a single run without changing config.
|
||||
|
|
|
|||
|
|
@ -179,30 +179,21 @@ impl IntervalFact {
|
|||
}
|
||||
match (self.lo, self.hi, other.lo, other.hi) {
|
||||
(Some(a_lo), Some(a_hi), Some(b_lo), Some(b_hi)) => {
|
||||
let products = [
|
||||
a_lo.checked_mul(b_lo),
|
||||
a_lo.checked_mul(b_hi),
|
||||
a_hi.checked_mul(b_lo),
|
||||
a_hi.checked_mul(b_hi),
|
||||
// Compute all four endpoint products in i128 (no i64 overflow
|
||||
// possible) so we know the *true* min/max, then attribute
|
||||
// overflow by which direction escapes the i64 range — not by
|
||||
// which first-operand endpoint produced it.
|
||||
let products: [i128; 4] = [
|
||||
a_lo as i128 * b_lo as i128,
|
||||
a_lo as i128 * b_hi as i128,
|
||||
a_hi as i128 * b_lo as i128,
|
||||
a_hi as i128 * b_hi as i128,
|
||||
];
|
||||
let lo = products.iter().filter_map(|p| *p).min();
|
||||
let hi = products.iter().filter_map(|p| *p).max();
|
||||
// If any product overflowed, the corresponding bound is None
|
||||
if products.iter().any(|p| p.is_none()) {
|
||||
Self {
|
||||
lo: if lo.is_some() && products[..2].iter().all(|p| p.is_some()) {
|
||||
lo
|
||||
} else {
|
||||
None
|
||||
},
|
||||
hi: if hi.is_some() && products[2..].iter().all(|p| p.is_some()) {
|
||||
hi
|
||||
} else {
|
||||
None
|
||||
},
|
||||
}
|
||||
} else {
|
||||
Self { lo, hi }
|
||||
let true_min = *products.iter().min().unwrap();
|
||||
let true_max = *products.iter().max().unwrap();
|
||||
Self {
|
||||
lo: clamp_lo_i128(true_min),
|
||||
hi: clamp_hi_i128(true_max),
|
||||
}
|
||||
}
|
||||
_ => Self::top(),
|
||||
|
|
@ -220,15 +211,24 @@ impl IntervalFact {
|
|||
if b_lo <= 0 && b_hi >= 0 {
|
||||
return Self::top();
|
||||
}
|
||||
let quotients = [
|
||||
a_lo.checked_div(b_lo),
|
||||
a_lo.checked_div(b_hi),
|
||||
a_hi.checked_div(b_lo),
|
||||
a_hi.checked_div(b_hi),
|
||||
// Compute the four endpoint quotients in i128. This is exact
|
||||
// for division (the divisor cannot be zero here) and captures
|
||||
// the i64::MIN / -1 = i64::MAX + 1 overflow case, which
|
||||
// checked_div would silently drop, producing a falsely narrow
|
||||
// interval. Attribute the escape by direction: a quotient
|
||||
// above i64::MAX leaves hi unbounded.
|
||||
let quotients: [i128; 4] = [
|
||||
a_lo as i128 / b_lo as i128,
|
||||
a_lo as i128 / b_hi as i128,
|
||||
a_hi as i128 / b_lo as i128,
|
||||
a_hi as i128 / b_hi as i128,
|
||||
];
|
||||
let lo = quotients.iter().filter_map(|q| *q).min();
|
||||
let hi = quotients.iter().filter_map(|q| *q).max();
|
||||
Self { lo, hi }
|
||||
let true_min = *quotients.iter().min().unwrap();
|
||||
let true_max = *quotients.iter().max().unwrap();
|
||||
Self {
|
||||
lo: clamp_lo_i128(true_min),
|
||||
hi: clamp_hi_i128(true_max),
|
||||
}
|
||||
}
|
||||
_ => Self::top(),
|
||||
}
|
||||
|
|
@ -523,6 +523,28 @@ fn checked_sub_opt(a: Option<i64>, b: Option<i64>) -> Option<i64> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Clamp an `i128` lower bound to `Option<i64>`. A bound outside the `i64`
|
||||
/// range is unrepresentable on this side, so we degrade to `None`
|
||||
/// (−∞), which is a sound over-approximation. Mirrors the overflow handling
|
||||
/// of `add`/`sub` (overflow → unbounded).
|
||||
fn clamp_lo_i128(lo: i128) -> Option<i64> {
|
||||
if (i64::MIN as i128..=i64::MAX as i128).contains(&lo) {
|
||||
Some(lo as i64)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
/// Clamp an `i128` upper bound to `Option<i64>`. A bound outside the `i64`
|
||||
/// range degrades to `None` (+∞), a sound over-approximation.
|
||||
fn clamp_hi_i128(hi: i128) -> Option<i64> {
|
||||
if (i64::MIN as i128..=i64::MAX as i128).contains(&hi) {
|
||||
Some(hi as i64)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
|
@ -822,6 +844,53 @@ mod tests {
|
|||
assert!(r.lo.is_none() || r.hi.is_none());
|
||||
}
|
||||
|
||||
/// Soundness: on overflow `mul` must attribute the unbounded direction
|
||||
/// by which endpoint product actually escaped the i64 range, not by the
|
||||
/// first operand's endpoint. `[i64::MIN, 0] * [-1, -1]` reaches
|
||||
/// `i64::MAX + 1` (unbounded above), so `hi` must be `None`, never a
|
||||
/// finite value like `0`.
|
||||
#[test]
|
||||
fn mul_overflow_attributes_high_bound_unbounded() {
|
||||
let a = IntervalFact {
|
||||
lo: Some(i64::MIN),
|
||||
hi: Some(0),
|
||||
};
|
||||
let b = IntervalFact::exact(-1);
|
||||
let r = a.mul(&b);
|
||||
// True range is [0, i64::MAX + 1]: lo = 0 finite, hi unbounded.
|
||||
assert_eq!(r.lo, Some(0), "mul lo must stay finite at 0");
|
||||
assert_eq!(
|
||||
r.hi, None,
|
||||
"mul hi must be unbounded: i64::MIN * -1 escapes above i64::MAX"
|
||||
);
|
||||
assert!(
|
||||
!r.is_proven_bounded(),
|
||||
"an overflowing product must not be proven-bounded"
|
||||
);
|
||||
}
|
||||
|
||||
/// Symmetric soundness check on the lower bound:
|
||||
/// `[0, i64::MAX] * [-2, -1]` reaches `-2 * i64::MAX` (unbounded below),
|
||||
/// so `lo` must be `None`, never a finite floor like `-i64::MAX`.
|
||||
#[test]
|
||||
fn mul_overflow_attributes_low_bound_unbounded() {
|
||||
let a = IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(i64::MAX),
|
||||
};
|
||||
let b = IntervalFact {
|
||||
lo: Some(-2),
|
||||
hi: Some(-1),
|
||||
};
|
||||
let r = a.mul(&b);
|
||||
// True range is [-2*i64::MAX, 0]: lo unbounded, hi = 0 finite.
|
||||
assert_eq!(
|
||||
r.lo, None,
|
||||
"mul lo must be unbounded: 0 * -2 .. i64::MAX * -2 escapes below i64::MIN"
|
||||
);
|
||||
assert_eq!(r.hi, Some(0), "mul hi must stay finite at 0");
|
||||
}
|
||||
|
||||
// ── Bitwise interval transfer tests ────────────────────────────────
|
||||
|
||||
#[test]
|
||||
|
|
@ -1062,6 +1131,31 @@ mod tests {
|
|||
);
|
||||
}
|
||||
|
||||
/// Soundness: `[i64::MIN, 0] / [-1, -1]` truly spans `[0, i64::MAX + 1]`,
|
||||
/// so the result must NOT be proven-bounded. The old `checked_div`
|
||||
/// implementation silently dropped the overflowing `i64::MIN / -1`
|
||||
/// quotient and produced a narrow `[0, 0]`, falsely passing
|
||||
/// `is_proven_bounded()` and defeating SQL/SHELL sink suppression.
|
||||
#[test]
|
||||
fn div_i64_min_overflow_not_proven_bounded() {
|
||||
let a = IntervalFact {
|
||||
lo: Some(i64::MIN),
|
||||
hi: Some(0),
|
||||
};
|
||||
let b = IntervalFact::exact(-1);
|
||||
let r = a.div(&b);
|
||||
// Lower edge: i64::MIN / -1 overflows above i64::MAX → hi unbounded.
|
||||
assert_eq!(r.lo, Some(0), "div lo must stay finite at 0");
|
||||
assert_eq!(
|
||||
r.hi, None,
|
||||
"div hi must be unbounded: i64::MIN / -1 = i64::MAX + 1 escapes i64"
|
||||
);
|
||||
assert!(
|
||||
!r.is_proven_bounded(),
|
||||
"an overflowing quotient must not be reported as proven-bounded"
|
||||
);
|
||||
}
|
||||
|
||||
/// Modulo with a single-point negative divisor: `[0,10] % -3` must
|
||||
/// be a valid interval (no panic, no negative-zero bound nonsense).
|
||||
#[test]
|
||||
|
|
|
|||
|
|
@ -488,15 +488,27 @@ impl AbstractState {
|
|||
|
||||
/// Partial order: self ⊑ other.
|
||||
pub fn leq(&self, other: &Self) -> bool {
|
||||
// Every non-Top entry in self must have a corresponding entry in other
|
||||
// with self[v] ⊑ other[v]. Entries only in other are fine (Top ⊑ anything
|
||||
// is false, but absent self entries are Top which is handled).
|
||||
// self ⊑ other iff for every SSA value v: self[v] ⊑ other[v], using the
|
||||
// convention that an absent entry is Top.
|
||||
//
|
||||
// Three cases by where v is stored:
|
||||
// - in both: check self[v] ⊑ other[v] (loop over self below).
|
||||
// - in self only: other[v] = Top, and self[v] ⊑ Top always holds — ok.
|
||||
// - in other only: self[v] = Top; since stored entries are non-Top,
|
||||
// Top ⋢ other[v], so self ⋢ other. This case was previously missed.
|
||||
for (v, val) in &self.values {
|
||||
let other_val = other.get(*v);
|
||||
if !val.leq(&other_val) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
// Any value present only in `other` means self[v] = Top ⋢ other[v]
|
||||
// (other's stored entries are non-Top), so self ⋢ other.
|
||||
for (v, _) in &other.values {
|
||||
if self.values.binary_search_by_key(v, |(id, _)| *id).is_err() {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
true
|
||||
}
|
||||
}
|
||||
|
|
@ -675,6 +687,36 @@ mod tests {
|
|||
assert_eq!(v1.interval.hi, None); // grew → widened
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn abstract_state_leq_respects_other_only_entries() {
|
||||
// self = empty (every value is implicitly Top).
|
||||
// other = { v1: [0,5] } (a non-Top, hence strictly-lower fact).
|
||||
// Since Top ⋢ [0,5], empty ⋢ other.
|
||||
let bounded = AbstractValue {
|
||||
interval: IntervalFact {
|
||||
lo: Some(0),
|
||||
hi: Some(5),
|
||||
},
|
||||
string: StringFact::top(),
|
||||
bits: BitFact::top(),
|
||||
path: PathFact::top(),
|
||||
};
|
||||
let empty = AbstractState::empty();
|
||||
let mut other = AbstractState::empty();
|
||||
other.set(SsaValue(1), bounded);
|
||||
|
||||
// The bug under test: empty.leq(other) used to return true.
|
||||
assert!(
|
||||
!empty.leq(&other),
|
||||
"empty (Top everywhere) must not be ⊑ a state with a bounded entry"
|
||||
);
|
||||
// Sanity: the reverse direction holds (a bounded state ⊑ all-Top).
|
||||
assert!(other.leq(&empty), "a bounded state must be ⊑ all-Top");
|
||||
// Reflexivity still holds.
|
||||
assert!(other.leq(&other));
|
||||
assert!(empty.leq(&empty));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn loop_carried_phi_join_and_widen() {
|
||||
// Simulate: x = 0; loop { x = phi(0, x+1) }
|
||||
|
|
|
|||
184
src/callgraph.rs
184
src/callgraph.rs
|
|
@ -160,6 +160,28 @@ pub(crate) fn callee_container_hint(raw: &str) -> &str {
|
|||
""
|
||||
}
|
||||
|
||||
/// Strip the optional `"<pkg>::"` package prefix from a call-graph
|
||||
/// namespace, yielding the plain project-relative path.
|
||||
///
|
||||
/// `crate::symbol::namespace_with_package` produces package-qualified
|
||||
/// namespaces of the form `format!("{pkg}::{rel}")` for any file that
|
||||
/// lives inside a resolved [`crate::resolve::PackageEntry`] (e.g. any
|
||||
/// repo with a named `package.json`). The package name never contains
|
||||
/// `::` and project-relative file paths never contain `::`, so the
|
||||
/// **first** `::` is unambiguously the package separator and everything
|
||||
/// after it is the plain `normalize_namespace` form.
|
||||
///
|
||||
/// Used to align two keying conventions that would otherwise never
|
||||
/// match: call-graph [`FuncKey::namespace`]s (package-qualified) versus
|
||||
/// file paths normalised via plain `normalize_namespace`. Returns the
|
||||
/// input unchanged when no `::` is present.
|
||||
pub(crate) fn strip_package_prefix(ns: &str) -> &str {
|
||||
match ns.split_once("::") {
|
||||
Some((_pkg, rel)) => rel,
|
||||
None => ns,
|
||||
}
|
||||
}
|
||||
|
||||
// Class / container → method index
|
||||
|
||||
/// Per-language `(container, method_name)` → candidate [`FuncKey`] index.
|
||||
|
|
@ -897,12 +919,20 @@ impl FileReachMap {
|
|||
/// [`FileReachMap::with_scan_root`] when callers may pass absolute
|
||||
/// paths.
|
||||
pub fn build(cg: &CallGraph) -> Self {
|
||||
// Call-graph namespaces are package-qualified (`pkg::rel`) for any
|
||||
// file inside a resolved package, but `reaches` normalises its
|
||||
// arguments via plain `normalize_namespace` (`rel`). Strip the
|
||||
// package prefix on both keys and caller-set entries so the stored
|
||||
// form matches the lookup form; otherwise `reaches` always returns
|
||||
// false for package-resident files and chain reach widening /
|
||||
// surface transitive-exposure detection silently never fire.
|
||||
let mut by_callee_ns: HashMap<String, std::collections::HashSet<String>> = HashMap::new();
|
||||
for callee in cg.index.keys() {
|
||||
let entry = by_callee_ns.entry(callee.namespace.clone()).or_default();
|
||||
entry.insert(callee.namespace.clone());
|
||||
let callee_ns = strip_package_prefix(&callee.namespace).to_string();
|
||||
let entry = by_callee_ns.entry(callee_ns.clone()).or_default();
|
||||
entry.insert(callee_ns);
|
||||
for caller in callers_transitive(cg, callee) {
|
||||
entry.insert(caller.namespace);
|
||||
entry.insert(strip_package_prefix(&caller.namespace).to_string());
|
||||
}
|
||||
}
|
||||
FileReachMap {
|
||||
|
|
@ -941,11 +971,17 @@ impl FileReachMap {
|
|||
}
|
||||
|
||||
fn normalize<'a>(&self, path: &'a str) -> std::borrow::Cow<'a, str> {
|
||||
// Reduce both path-normalisation (absolute → project-relative) and
|
||||
// package-qualification (`pkg::rel` → `rel`) so lookups match the
|
||||
// package-stripped keys stored by `build`. Inputs may arrive as
|
||||
// absolute host paths, plain project-relative paths, or
|
||||
// package-qualified call-graph namespaces.
|
||||
match self.scan_root.as_deref() {
|
||||
Some(root) => {
|
||||
std::borrow::Cow::Owned(crate::symbol::normalize_namespace(path, Some(root)))
|
||||
let normalized = crate::symbol::normalize_namespace(path, Some(root));
|
||||
std::borrow::Cow::Owned(strip_package_prefix(&normalized).to_string())
|
||||
}
|
||||
None => std::borrow::Cow::Borrowed(path),
|
||||
None => std::borrow::Cow::Borrowed(strip_package_prefix(path)),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -993,6 +1029,26 @@ pub fn scc_spans_files(cg: &CallGraph, scc: &[NodeIndex]) -> bool {
|
|||
iter.any(|n| cg.graph[*n].namespace.as_str() != first_ns)
|
||||
}
|
||||
|
||||
/// True when an SCC requires the run_topo_batches fixed-point loop.
|
||||
///
|
||||
/// A multi-node SCC is mutually recursive by definition. A **single**-node
|
||||
/// SCC is recursive only when the function calls itself directly: petgraph's
|
||||
/// `tarjan_scc` places a directly self-recursive function (`f` calls `f`) in
|
||||
/// its own singleton SCC with a self-loop edge, so `scc.len() > 1` alone
|
||||
/// misses it. Self-recursion (tree walkers, retry wrappers, recursive-descent
|
||||
/// parsers) is far more common than mutual recursion and needs the same
|
||||
/// fixed-point treatment so its summary converges against its own refined
|
||||
/// summary rather than being analysed exactly once.
|
||||
pub fn scc_is_recursive(cg: &CallGraph, scc: &[NodeIndex]) -> bool {
|
||||
if scc.len() > 1 {
|
||||
return true;
|
||||
}
|
||||
match scc.first() {
|
||||
Some(&n) => cg.graph.contains_edge(n, n),
|
||||
None => false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Map SCC topological order to an ordered sequence of file-path batches
|
||||
/// annotated with whether any contributing SCC is mutually recursive
|
||||
/// (`len > 1`) or cross-file.
|
||||
|
|
@ -1022,14 +1078,22 @@ pub fn scc_file_batches_with_metadata<'a>(
|
|||
// 2. Build file relative-path → (min topo index, has_mutual_recursion, cross_file).
|
||||
// `cross_file` is set whenever the file participates in an SCC whose
|
||||
// nodes span more than one namespace, the cross-file signal.
|
||||
//
|
||||
// Call-graph namespaces are package-qualified (`pkg::rel`) for any
|
||||
// file inside a resolved package, while `rel_to_path` above keys by
|
||||
// the plain `normalize_namespace` form (`rel`). Strip the package
|
||||
// prefix here so both conventions agree; otherwise every
|
||||
// package-resident file misses `file_topo` and lands in `orphans`,
|
||||
// silently disabling the topo/SCC machinery for exactly the repos it
|
||||
// was built for (any project with a named package.json).
|
||||
let mut file_topo: HashMap<&str, (usize, bool, bool)> = HashMap::new();
|
||||
for (topo_pos, &scc_idx) in analysis.topo_scc_callee_first.iter().enumerate() {
|
||||
let scc_recursive = analysis.sccs[scc_idx].len() > 1;
|
||||
let scc_recursive = scc_is_recursive(cg, &analysis.sccs[scc_idx]);
|
||||
let scc_cross_file = scc_spans_files(cg, &analysis.sccs[scc_idx]);
|
||||
for &node in &analysis.sccs[scc_idx] {
|
||||
let ns = &cg.graph[node].namespace;
|
||||
let ns = strip_package_prefix(&cg.graph[node].namespace);
|
||||
file_topo
|
||||
.entry(ns.as_str())
|
||||
.entry(ns)
|
||||
.and_modify(|(min_pos, recursive, cross_file)| {
|
||||
if topo_pos < *min_pos {
|
||||
*min_pos = topo_pos;
|
||||
|
|
@ -1782,6 +1846,110 @@ mod tests {
|
|||
}
|
||||
}
|
||||
|
||||
// ── package-prefix namespace alignment (finding #35) ──────────────
|
||||
|
||||
#[test]
|
||||
fn strip_package_prefix_handles_qualified_and_plain() {
|
||||
// Package-qualified: prefix stripped at the first "::".
|
||||
assert_eq!(strip_package_prefix("myapp::src/a.js"), "src/a.js");
|
||||
assert_eq!(strip_package_prefix("@scope/name::src/a.ts"), "src/a.ts");
|
||||
// Plain relative path with no package prefix is returned verbatim.
|
||||
assert_eq!(strip_package_prefix("src/a.js"), "src/a.js");
|
||||
assert_eq!(strip_package_prefix("a.rs"), "a.rs");
|
||||
assert_eq!(strip_package_prefix(""), "");
|
||||
}
|
||||
|
||||
/// A call graph whose namespaces are package-qualified (the normal case
|
||||
/// for any repo with a named package.json) must still match the plain
|
||||
/// `normalize_namespace` file paths in `all_files`, so files land in
|
||||
/// topo batches rather than `orphans`.
|
||||
#[test]
|
||||
fn scc_file_batches_with_metadata_matches_package_qualified_ns() {
|
||||
let root = Path::new("/proj");
|
||||
// file_path values are NOT prefixed by root, so func_key keeps them
|
||||
// verbatim → simulates package-qualified call-graph namespaces.
|
||||
let a = make_summary("ping", "myapp::a.rs", "rust", 0, vec!["pong"]);
|
||||
let b = make_summary("pong", "myapp::b.rs", "rust", 0, vec!["ping"]);
|
||||
|
||||
// all_files are absolute paths under root → normalize to "a.rs"/"b.rs".
|
||||
let files: Vec<PathBuf> = vec![PathBuf::from("/proj/a.rs"), PathBuf::from("/proj/b.rs")];
|
||||
|
||||
let (batches, orphans) = build_metadata_batches(vec![a, b], &files, root);
|
||||
|
||||
assert!(
|
||||
orphans.is_empty(),
|
||||
"package-qualified files must not be orphaned: {orphans:?}"
|
||||
);
|
||||
assert_eq!(batches.len(), 1, "mutual recursion → single batch");
|
||||
assert!(batches[0].has_mutual_recursion);
|
||||
}
|
||||
|
||||
/// FileReachMap built from a package-qualified call graph must resolve
|
||||
/// plain-relative `reaches` lookups.
|
||||
#[test]
|
||||
fn file_reach_map_matches_package_qualified_ns() {
|
||||
let handle = make_summary("handle", "myapp::routes.rs", "rust", 0, vec!["sink"]);
|
||||
let sink = make_summary("sink", "myapp::helper.rs", "rust", 0, vec![]);
|
||||
let gs = merge_summaries(vec![handle, sink], None);
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
let reach = FileReachMap::build(&cg);
|
||||
|
||||
// Plain-relative lookups resolve even though graph keys are qualified.
|
||||
assert!(reach.reaches("routes.rs", "helper.rs"));
|
||||
// A package-qualified caller string also resolves (normalize strips it).
|
||||
assert!(reach.reaches("myapp::routes.rs", "helper.rs"));
|
||||
}
|
||||
|
||||
// ── self-recursion fixed-point flag (finding #39) ─────────────────
|
||||
|
||||
/// A directly self-recursive function forms a singleton SCC with a
|
||||
/// self-loop edge. `scc_is_recursive` must flag it so run_topo_batches
|
||||
/// applies the fixed-point loop, matching the mutual-recursion path.
|
||||
#[test]
|
||||
fn scc_is_recursive_flags_self_loop_singleton() {
|
||||
let root = Path::new("/proj");
|
||||
let f = make_summary("f", "/proj/a.rs", "rust", 0, vec!["f"]);
|
||||
let gs = merge_summaries(vec![f], Some(&root.to_string_lossy()));
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
let analysis = analyse(&cg);
|
||||
|
||||
// The single SCC is a self-loop singleton.
|
||||
assert_eq!(analysis.sccs.len(), 1);
|
||||
assert!(
|
||||
scc_is_recursive(&cg, &analysis.sccs[0]),
|
||||
"self-recursive singleton must be flagged recursive"
|
||||
);
|
||||
// scc_spans_files correctly stays false for singletons.
|
||||
assert!(!scc_spans_files(&cg, &analysis.sccs[0]));
|
||||
}
|
||||
|
||||
/// A non-recursive singleton (no self-edge) must NOT be flagged.
|
||||
#[test]
|
||||
fn scc_is_recursive_ignores_plain_singleton() {
|
||||
let root = Path::new("/proj");
|
||||
let f = make_summary("f", "/proj/a.rs", "rust", 0, vec![]);
|
||||
let gs = merge_summaries(vec![f], Some(&root.to_string_lossy()));
|
||||
let cg = build_call_graph(&gs, &[]);
|
||||
let analysis = analyse(&cg);
|
||||
assert_eq!(analysis.sccs.len(), 1);
|
||||
assert!(!scc_is_recursive(&cg, &analysis.sccs[0]));
|
||||
}
|
||||
|
||||
/// The self-recursive flag propagates through to the FileBatch metadata.
|
||||
#[test]
|
||||
fn scc_file_batches_with_metadata_marks_self_recursive() {
|
||||
let root = Path::new("/proj");
|
||||
let f = make_summary("f", "/proj/a.rs", "rust", 0, vec!["f"]);
|
||||
let files: Vec<PathBuf> = vec![PathBuf::from("/proj/a.rs")];
|
||||
let (batches, orphans) = build_metadata_batches(vec![f], &files, root);
|
||||
assert!(orphans.is_empty());
|
||||
assert_eq!(batches.len(), 1);
|
||||
assert!(
|
||||
batches[0].has_mutual_recursion,
|
||||
"self-recursive file must be flagged for the fixed-point loop"
|
||||
);
|
||||
}
|
||||
|
||||
// ── qualified disambiguation resolves ambiguous common names ──────
|
||||
|
||||
#[test]
|
||||
|
|
|
|||
|
|
@ -16,6 +16,28 @@ fn lang_has_exclusive_cases(lang: &str) -> bool {
|
|||
matches!(lang, "rust" | "go")
|
||||
}
|
||||
|
||||
/// True when *this specific switch* has guaranteed-exclusive (non-fall-through)
|
||||
/// cases, so it is safe to reorder the `default` arm to the cascade tail.
|
||||
///
|
||||
/// Rust `match` and Go `switch` are always exclusive. Java mixes shapes: the
|
||||
/// arrow form (`switch_rule` cases, `case x -> ...`) is exclusive, but the
|
||||
/// classic colon form (`switch_block_statement_group`, `case x:` with implicit
|
||||
/// fall-through) is NOT. C/C++/JS/TS/PHP classic switches fall through and are
|
||||
/// never exclusive. Reordering `default` to the tail is only correct for the
|
||||
/// exclusive shapes; doing it for fall-through switches connects the wrong
|
||||
/// case bodies in the source-order fall-through chain (both missed and phantom
|
||||
/// taint flows).
|
||||
fn switch_is_exclusive(lang: &str, cases: &[(Node<'_>, bool)]) -> bool {
|
||||
if lang_has_exclusive_cases(lang) {
|
||||
return true;
|
||||
}
|
||||
if lang == "java" {
|
||||
// Arrow-switch when every case is the arrow `switch_rule` shape.
|
||||
return cases.iter().all(|(c, _)| c.kind() == "switch_rule");
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Extract the scrutinee subtree from a switch-like AST node.
|
||||
///
|
||||
/// Returns the AST node referenced by the language's scrutinee field. Only
|
||||
|
|
@ -591,12 +613,21 @@ pub(super) fn build_switch<'a>(
|
|||
return exits;
|
||||
}
|
||||
|
||||
// Whether this switch's cases are mutually exclusive (no fall-through).
|
||||
// Only exclusive switches may have `default` reordered to the cascade tail.
|
||||
let is_exclusive = switch_is_exclusive(lang, &cases);
|
||||
|
||||
// Reorder so the default arm (if any) sits at the tail of the cascade.
|
||||
// Reordering case dispatch is semantically harmless (mutually exclusive
|
||||
// pattern matches), and it keeps the chain a clean Branch(True→case,
|
||||
// False→next). Fall-through chains are a separate Seq layer below.
|
||||
// Reordering case dispatch is semantically harmless ONLY for mutually
|
||||
// exclusive pattern matches (Rust match, Go switch, Java arrow-switch); it
|
||||
// keeps the chain a clean Branch(True→case, False→next). For classic
|
||||
// fall-through switches (C/C++/JS/TS/PHP, Java colon-switch) a mid-chain
|
||||
// `default:` can fall into the following case and a preceding case can fall
|
||||
// into it, so the source order MUST be preserved — reordering there breaks
|
||||
// the fall-through Seq layer and produces both missed and phantom flows.
|
||||
let default_pos = cases.iter().position(|(_, d)| *d);
|
||||
if let Some(pos) = default_pos
|
||||
if is_exclusive
|
||||
&& let Some(pos) = default_pos
|
||||
&& pos != cases.len() - 1
|
||||
{
|
||||
let default_pair = cases.remove(pos);
|
||||
|
|
@ -648,22 +679,40 @@ pub(super) fn build_switch<'a>(
|
|||
let mut fallthrough_exits: Vec<NodeIndex> = Vec::new();
|
||||
let mut last_header_false: Option<NodeIndex> = None;
|
||||
let mut chain_preds: Vec<NodeIndex> = preds.to_vec();
|
||||
// First node of the `default` body for a fall-through switch where the
|
||||
// default is NOT at the tail. The cumulative no-match path (the last
|
||||
// non-default header's False edge) is wired into it after the loop, so the
|
||||
// default stays in its source position for the fall-through Seq layer while
|
||||
// still being reachable when no case matches.
|
||||
let mut pending_default_no_match: Option<NodeIndex> = None;
|
||||
|
||||
for (idx, (case, is_default)) in cases.iter().copied().enumerate() {
|
||||
let is_last = idx + 1 == cases.len();
|
||||
|
||||
// A `default` arm carries no discriminant test, so it never gets its
|
||||
// own dispatch If. For exclusive switches it has been reordered to the
|
||||
// tail (`is_last`); for fall-through switches it stays in source
|
||||
// position (`!is_exclusive`) and is wired into the Seq fall-through
|
||||
// chain instead of acting as a conditional branch.
|
||||
let default_no_dispatch = is_default && (is_last || !is_exclusive);
|
||||
|
||||
// Default at the chain tail doesn't get its own dispatch If, the
|
||||
// previous header's False edge already targets it directly.
|
||||
let case_first_preds: Vec<NodeIndex> = if is_default && is_last {
|
||||
// First node of the default body becomes the False target of the
|
||||
// previous header. Build the case with the previous chain_preds
|
||||
// (the last header's "fall-through" branch) plus any fallthrough
|
||||
// from the preceding case.
|
||||
let mut p = chain_preds.clone();
|
||||
p.append(&mut fallthrough_exits);
|
||||
// `last_header_false` will receive a False edge once we know the
|
||||
// first node of this body.
|
||||
last_header_false = chain_preds.first().copied();
|
||||
let case_first_preds: Vec<NodeIndex> = if default_no_dispatch {
|
||||
// Body entry = fall-through from the preceding case body.
|
||||
let mut p = std::mem::take(&mut fallthrough_exits);
|
||||
if is_last {
|
||||
// Tail default: the previous header's False branch also lands
|
||||
// here directly (legacy behavior preserved for exclusive
|
||||
// switches and tail defaults).
|
||||
p.extend(chain_preds.iter().copied());
|
||||
last_header_false = chain_preds.first().copied();
|
||||
}
|
||||
// For a non-tail (fall-through) default the dispatch chain must
|
||||
// continue PAST it, so `chain_preds` / `last_header_false` are left
|
||||
// untouched and the next case's dispatch header still receives the
|
||||
// previous header's False edge. The cumulative no-match entry is
|
||||
// recorded below once the body's first node is known.
|
||||
p
|
||||
} else {
|
||||
// Normal case: synthesize a per-case dispatch header. We tie it
|
||||
|
|
@ -741,12 +790,21 @@ pub(super) fn build_switch<'a>(
|
|||
// Wire the dispatch True edge from this header (or from the previous
|
||||
// header for a tail-default) to the first node of the case body.
|
||||
if body_first_idx.index() < g.node_count() {
|
||||
let header_for_true = if is_default && is_last {
|
||||
// The previous header's False already lands here via the
|
||||
// EdgeKind::Seq inside `case_first_preds`; we additionally
|
||||
// emit a False edge directly so SSA labels the branch.
|
||||
if let Some(prev) = last_header_false {
|
||||
g.add_edge(prev, body_first_idx, EdgeKind::False);
|
||||
let header_for_true = if default_no_dispatch {
|
||||
if is_last {
|
||||
// Tail default: the previous header's False already lands
|
||||
// here via the EdgeKind::Seq inside `case_first_preds`; we
|
||||
// additionally emit a False edge directly so SSA labels the
|
||||
// branch.
|
||||
if let Some(prev) = last_header_false {
|
||||
g.add_edge(prev, body_first_idx, EdgeKind::False);
|
||||
}
|
||||
} else {
|
||||
// Non-tail fall-through default: defer wiring the no-match
|
||||
// entry until the last non-default header's False edge is
|
||||
// known (after the loop). The body's only in-edge for now
|
||||
// is the source-order fall-through from the preceding case.
|
||||
pending_default_no_match = Some(body_first_idx);
|
||||
}
|
||||
None
|
||||
} else {
|
||||
|
|
@ -762,11 +820,22 @@ pub(super) fn build_switch<'a>(
|
|||
let _ = is_default;
|
||||
}
|
||||
|
||||
// After the chain: the last non-default header (if no default arm) needs
|
||||
// a False edge that escapes to the post-switch frontier.
|
||||
// Resolve the cumulative no-match (the last non-default header's False
|
||||
// edge):
|
||||
// - If the `default` arm sits mid-chain (fall-through switch), the
|
||||
// no-match path enters the default body — wire the deferred False edge
|
||||
// into it. The default stayed in source position for the fall-through
|
||||
// Seq layer, so this is the only edge making it reachable on no-match.
|
||||
// - Otherwise, with no reachable default (no default arm, or it was the
|
||||
// tail and already consumed the False edge), the no-match path escapes
|
||||
// to the post-switch frontier.
|
||||
let mut exits: Vec<NodeIndex> = switch_breaks;
|
||||
exits.append(&mut fallthrough_exits);
|
||||
if !has_default {
|
||||
if let Some(default_first) = pending_default_no_match {
|
||||
if let Some(prev) = last_header_false {
|
||||
g.add_edge(prev, default_first, EdgeKind::False);
|
||||
}
|
||||
} else if !has_default {
|
||||
if let Some(prev) = last_header_false {
|
||||
exits.push(prev);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -64,13 +64,26 @@ pub(super) fn get_boolean_operands<'a>(node: Node<'a>) -> Option<(Node<'a>, Node
|
|||
None
|
||||
}
|
||||
|
||||
/// Create a lightweight `StmtKind::If` node for a sub-condition in a boolean chain.
|
||||
/// Create a `StmtKind::If` node for a sub-condition in a boolean chain.
|
||||
///
|
||||
/// When the operand contains a classifiable call (`if (flag && cp.execSync(x))`),
|
||||
/// the node is built via [`push_node`] so the inner source/sink/sanitizer
|
||||
/// classification — callee, labels, arg-uses, gated sinks — is preserved, then
|
||||
/// the branch-condition metadata is overlaid on top. Without this, a sink or
|
||||
/// source CALL inside a short-circuited operand was dropped entirely (the bare
|
||||
/// node only carried `condition_vars`, no callee or labels), so the
|
||||
/// `if (flag && sink(x))` form missed flows that the un-decomposed
|
||||
/// `if (sink(x))` form catches via the mod.rs condition-call fallback. This
|
||||
/// mirrors `lower_ternary_branch`, which already uses `push_node` for the same
|
||||
/// reason. Non-call operands keep the original lightweight shape.
|
||||
pub(super) fn push_condition_node<'a>(
|
||||
g: &mut Cfg,
|
||||
cond_ast: Node<'a>,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
enclosing_func: Option<&str>,
|
||||
call_ordinal: &mut u32,
|
||||
analysis_rules: Option<&LangAnalysisRules>,
|
||||
) -> NodeIndex {
|
||||
// Pass cond_ast as both args, sub-conditions are never `unless` nodes
|
||||
let (inner, negated) = detect_negation(cond_ast, cond_ast, lang);
|
||||
|
|
@ -94,6 +107,44 @@ pub(super) fn push_condition_node<'a>(
|
|||
// because the per-disjunct cond nodes (built via
|
||||
// `build_condition_chain`) didn't populate `taint.uses`.
|
||||
let uses_for_taint: Vec<String> = vars.clone();
|
||||
|
||||
// Operand containing a call → route through `push_node` for full
|
||||
// source/sink/sanitizer classification, then overlay branch metadata.
|
||||
if has_call_descendant(inner, lang) {
|
||||
let ord = *call_ordinal;
|
||||
*call_ordinal += 1;
|
||||
let node = push_node(
|
||||
g,
|
||||
StmtKind::If,
|
||||
inner,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
ord,
|
||||
analysis_rules,
|
||||
);
|
||||
// Overlay the branch-condition metadata that the chain wiring depends
|
||||
// on. `push_node` set `defines`/`uses`/`labels`/`callee` from the call;
|
||||
// keep those, but ensure the condition fields and the interned
|
||||
// condition vars (for `apply_branch_predicates`) are present, and the
|
||||
// span/enclosing_func point at the operand expression.
|
||||
let info = &mut g[node];
|
||||
info.kind = StmtKind::If;
|
||||
info.ast.span = span;
|
||||
info.ast.enclosing_func = enclosing_func.map(|s| s.to_string());
|
||||
info.condition_text = text;
|
||||
info.condition_vars = vars;
|
||||
info.condition_negated = negated;
|
||||
// Union the condition idents into `taint.uses` so branch-predicate
|
||||
// lookup keeps working without clobbering the call's own arg uses.
|
||||
for v in &uses_for_taint {
|
||||
if !info.taint.uses.contains(v) {
|
||||
info.taint.uses.push(v.clone());
|
||||
}
|
||||
}
|
||||
return node;
|
||||
}
|
||||
|
||||
g.add_node(NodeInfo {
|
||||
kind: StmtKind::If,
|
||||
ast: AstMeta {
|
||||
|
|
@ -221,7 +272,15 @@ pub(super) fn build_ternary_diamond<'a>(
|
|||
// 1. Condition header. `push_condition_node` sets span/text/vars/negated
|
||||
// but leaves `is_eq_with_const` default; stamp it explicitly so the
|
||||
// taint engine's equality-narrowing fires for `x === 'literal' ? …`.
|
||||
let cond_if = push_condition_node(g, cond_ast, lang, code, enclosing_func);
|
||||
let cond_if = push_condition_node(
|
||||
g,
|
||||
cond_ast,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
g[cond_if].is_eq_with_const = detect_eq_with_const(cond_ast, lang);
|
||||
// Capture the pure int-arith + comparison tree so `fold_constant_branches`
|
||||
// can prune a dead constant-condition arm of the ternary (e.g. Java
|
||||
|
|
@ -488,6 +547,7 @@ pub(super) fn classify_ternary_lhs(
|
|||
///
|
||||
/// Returns `(true_exits, false_exits)`, the sets of nodes from which True/False
|
||||
/// edges should connect to the then/else branches.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub(super) fn build_condition_chain<'a>(
|
||||
cond_ast: Node<'a>,
|
||||
preds: &[NodeIndex],
|
||||
|
|
@ -496,6 +556,8 @@ pub(super) fn build_condition_chain<'a>(
|
|||
lang: &str,
|
||||
code: &'a [u8],
|
||||
enclosing_func: Option<&str>,
|
||||
call_ordinal: &mut u32,
|
||||
analysis_rules: Option<&LangAnalysisRules>,
|
||||
) -> (Vec<NodeIndex>, Vec<NodeIndex>) {
|
||||
let inner = unwrap_parens(cond_ast);
|
||||
|
||||
|
|
@ -503,8 +565,17 @@ pub(super) fn build_condition_chain<'a>(
|
|||
Some(BoolOp::And) => {
|
||||
if let Some((left, right)) = get_boolean_operands(inner) {
|
||||
// Left operand with current preds
|
||||
let (left_true, left_false) =
|
||||
build_condition_chain(left, preds, pred_edge, g, lang, code, enclosing_func);
|
||||
let (left_true, left_false) = build_condition_chain(
|
||||
left,
|
||||
preds,
|
||||
pred_edge,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
// Right operand only evaluated when left is true
|
||||
let (right_true, right_false) = build_condition_chain(
|
||||
right,
|
||||
|
|
@ -514,6 +585,8 @@ pub(super) fn build_condition_chain<'a>(
|
|||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
// AND: true only when both true; false when either false
|
||||
let mut false_exits = left_false;
|
||||
|
|
@ -521,7 +594,15 @@ pub(super) fn build_condition_chain<'a>(
|
|||
(right_true, false_exits)
|
||||
} else {
|
||||
// Safety fallback: treat as leaf
|
||||
let node = push_condition_node(g, inner, lang, code, enclosing_func);
|
||||
let node = push_condition_node(
|
||||
g,
|
||||
inner,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
connect_all(g, preds, node, pred_edge);
|
||||
(vec![node], vec![node])
|
||||
}
|
||||
|
|
@ -529,8 +610,17 @@ pub(super) fn build_condition_chain<'a>(
|
|||
Some(BoolOp::Or) => {
|
||||
if let Some((left, right)) = get_boolean_operands(inner) {
|
||||
// Left operand with current preds
|
||||
let (left_true, left_false) =
|
||||
build_condition_chain(left, preds, pred_edge, g, lang, code, enclosing_func);
|
||||
let (left_true, left_false) = build_condition_chain(
|
||||
left,
|
||||
preds,
|
||||
pred_edge,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
// Right operand only evaluated when left is false
|
||||
let (right_true, right_false) = build_condition_chain(
|
||||
right,
|
||||
|
|
@ -540,6 +630,8 @@ pub(super) fn build_condition_chain<'a>(
|
|||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
// OR: true when either true; false only when both false
|
||||
let mut true_exits = left_true;
|
||||
|
|
@ -547,14 +639,30 @@ pub(super) fn build_condition_chain<'a>(
|
|||
(true_exits, right_false)
|
||||
} else {
|
||||
// Safety fallback: treat as leaf
|
||||
let node = push_condition_node(g, inner, lang, code, enclosing_func);
|
||||
let node = push_condition_node(
|
||||
g,
|
||||
inner,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
connect_all(g, preds, node, pred_edge);
|
||||
(vec![node], vec![node])
|
||||
}
|
||||
}
|
||||
None => {
|
||||
// Leaf: single condition node
|
||||
let node = push_condition_node(g, inner, lang, code, enclosing_func);
|
||||
let node = push_condition_node(
|
||||
g,
|
||||
inner,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
connect_all(g, preds, node, pred_edge);
|
||||
(vec![node], vec![node])
|
||||
}
|
||||
|
|
|
|||
411
src/cfg/mod.rs
411
src/cfg/mod.rs
|
|
@ -3541,9 +3541,17 @@ pub(super) fn push_node<'a>(
|
|||
None
|
||||
};
|
||||
|
||||
// Extract condition metadata for If nodes.
|
||||
// Extract condition metadata for If nodes. Python `elif_clause` and PHP
|
||||
// `else_if_clause` are lowered as guard nodes by `build_alternative_chain`
|
||||
// (the flat-sibling elif-chain handler) and carry their own `condition`
|
||||
// field, so they must also receive condition-metadata extraction even
|
||||
// though their grammar kind maps to `Kind::Block`. These two node kinds
|
||||
// only ever reach `push_node` from that handler, so widening the guard
|
||||
// here cannot affect ordinary block lowering.
|
||||
let (condition_text, condition_vars, condition_negated, cond_arith) =
|
||||
if matches!(lookup(lang, ast.kind()), Kind::If) {
|
||||
if matches!(lookup(lang, ast.kind()), Kind::If)
|
||||
|| matches!(ast.kind(), "elif_clause" | "else_if_clause")
|
||||
{
|
||||
extract_condition_raw(ast, lang, code)
|
||||
} else {
|
||||
(None, Vec::new(), false, None)
|
||||
|
|
@ -5074,6 +5082,230 @@ fn apply_arg_source_bindings(
|
|||
}
|
||||
}
|
||||
|
||||
/// Lower a flat chain of `alternative` siblings (Python `elif_clause`s / PHP
|
||||
/// `else_if_clause`s, optionally trailed by an `else_clause`) into a properly
|
||||
/// nested else-if CFG.
|
||||
///
|
||||
/// tree-sitter exposes these clauses as repeated, FLAT `alternative` fields on
|
||||
/// a single `if_statement` rather than nesting each `else if` inside the
|
||||
/// previous `else` (as JS/TS/Rust/Go/Java/C do). The default single-`else`
|
||||
/// lowering only consumed the first sibling, silently dropping every 2nd+ elif
|
||||
/// and the trailing else — so any source/sink/sanitizer/guard living there was
|
||||
/// invisible to the whole pipeline. This builder restores the missing CFG
|
||||
/// nodes by chaining each elif as a guard whose False edge flows into the next
|
||||
/// alternative.
|
||||
///
|
||||
/// `incoming_edge` is the edge label used to enter the chain (the parent `if`'s
|
||||
/// false edge — `EdgeKind::False` normally, `EdgeKind::True` for Ruby
|
||||
/// `unless`). Returns the union of all branch exits plus the final
|
||||
/// fall-through (when the chain has no terminal `else`).
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub(super) fn build_alternative_chain<'a>(
|
||||
alternatives: &[Node<'a>],
|
||||
preds: &[NodeIndex],
|
||||
incoming_edge: EdgeKind,
|
||||
g: &mut Cfg,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
summaries: &mut FuncSummaries,
|
||||
file_path: &str,
|
||||
enclosing_func: Option<&str>,
|
||||
call_ordinal: &mut u32,
|
||||
analysis_rules: Option<&LangAnalysisRules>,
|
||||
break_targets: &mut Vec<NodeIndex>,
|
||||
continue_targets: &mut Vec<NodeIndex>,
|
||||
throw_targets: &mut Vec<NodeIndex>,
|
||||
bodies: &mut Vec<BodyCfg>,
|
||||
next_body_id: &mut u32,
|
||||
current_body_id: BodyId,
|
||||
) -> Vec<NodeIndex> {
|
||||
// Predecessor frontier entering the current alternative, and the edge label
|
||||
// to use when wiring into it. Updated as we descend the chain: each elif's
|
||||
// False edge becomes the predecessor/edge for the next alternative.
|
||||
let mut chain_preds: Vec<NodeIndex> = preds.to_vec();
|
||||
let mut chain_edge = incoming_edge;
|
||||
let mut exits: Vec<NodeIndex> = Vec::new();
|
||||
|
||||
for &alt in alternatives {
|
||||
let is_elif = matches!(alt.kind(), "elif_clause" | "else_if_clause");
|
||||
|
||||
if is_elif {
|
||||
// Guard node for the elif condition. `push_node` with `StmtKind::If`
|
||||
// extracts condition metadata and runs label classification on the
|
||||
// clause (its `condition` field), so a source/sink call inside an
|
||||
// elif condition is no longer dropped.
|
||||
let guard = push_node(
|
||||
g,
|
||||
StmtKind::If,
|
||||
alt,
|
||||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
0,
|
||||
analysis_rules,
|
||||
);
|
||||
connect_all(g, &chain_preds, guard, chain_edge);
|
||||
|
||||
// True branch: the elif body (`consequence` for Python, `body` for
|
||||
// PHP / colon-block forms).
|
||||
let body = alt
|
||||
.child_by_field_name("consequence")
|
||||
.or_else(|| alt.child_by_field_name("body"));
|
||||
if let Some(b) = body {
|
||||
let body_first = NodeIndex::new(g.node_count());
|
||||
let body_exits = build_sub(
|
||||
b,
|
||||
&[guard],
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
);
|
||||
if body_first.index() < g.node_count() {
|
||||
connect_all(g, &[guard], body_first, EdgeKind::True);
|
||||
exits.extend(body_exits);
|
||||
} else if let Some(&first) = body_exits.first() {
|
||||
connect_all(g, &[guard], first, EdgeKind::True);
|
||||
exits.extend(body_exits);
|
||||
} else {
|
||||
// Empty body: the guard's True edge falls through.
|
||||
exits.push(guard);
|
||||
}
|
||||
} else {
|
||||
exits.push(guard);
|
||||
}
|
||||
|
||||
// False branch descends to the next alternative.
|
||||
chain_preds = vec![guard];
|
||||
chain_edge = EdgeKind::False;
|
||||
} else {
|
||||
// Terminal `else_clause` (or any non-guard block): lower its body and
|
||||
// end the chain. The else body field is `body` for both Python and
|
||||
// PHP `else_clause`; fall back to the clause node itself.
|
||||
let body = alt.child_by_field_name("body").unwrap_or(alt);
|
||||
let body_first = NodeIndex::new(g.node_count());
|
||||
let body_exits = build_sub(
|
||||
body,
|
||||
&chain_preds,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
);
|
||||
if body_first.index() < g.node_count() {
|
||||
connect_all(g, &chain_preds, body_first, chain_edge);
|
||||
exits.extend(body_exits);
|
||||
} else if let Some(&first) = body_exits.first() {
|
||||
connect_all(g, &chain_preds, first, chain_edge);
|
||||
exits.extend(body_exits);
|
||||
} else {
|
||||
exits.extend(chain_preds.iter().copied());
|
||||
}
|
||||
// An else clause terminates the chain; nothing follows it.
|
||||
return exits;
|
||||
}
|
||||
}
|
||||
|
||||
// Chain ended with an elif (no terminal else): the last guard's False edge
|
||||
// is the fall-through path out of the whole if/elif construct. Materialise
|
||||
// it as a synthetic pass-through so the false edge has a concrete target,
|
||||
// mirroring the no-else branch in the `Kind::If` arm.
|
||||
let pass = g.add_node(NodeInfo {
|
||||
kind: StmtKind::Seq,
|
||||
ast: AstMeta {
|
||||
span: chain_preds
|
||||
.first()
|
||||
.map(|&n| g[n].ast.span)
|
||||
.unwrap_or((0, 0)),
|
||||
enclosing_func: enclosing_func.map(|s| s.to_string()),
|
||||
},
|
||||
..Default::default()
|
||||
});
|
||||
connect_all(g, &chain_preds, pass, chain_edge);
|
||||
exits.push(pass);
|
||||
exits
|
||||
}
|
||||
|
||||
/// Lower a C-style `for` loop's increment/update clause onto the back-edge
|
||||
/// path. `back_sources` are the body/continue exits that would otherwise
|
||||
/// back-edge straight to the loop header. When an `update_subtree` exists it
|
||||
/// is lowered from those sources and its exits are returned, so callers
|
||||
/// back-edge the update's exits to the header instead — making increment-clause
|
||||
/// side effects (assignments, sanitizer calls) visible to taint analysis.
|
||||
/// Without an update clause the input sources are returned unchanged, so the
|
||||
/// CFG is bit-identical to the pre-fix behaviour.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub(super) fn lower_loop_update<'a>(
|
||||
update_subtree: Option<Node<'a>>,
|
||||
back_sources: &[NodeIndex],
|
||||
g: &mut Cfg,
|
||||
lang: &str,
|
||||
code: &'a [u8],
|
||||
summaries: &mut FuncSummaries,
|
||||
file_path: &str,
|
||||
enclosing_func: Option<&str>,
|
||||
call_ordinal: &mut u32,
|
||||
analysis_rules: Option<&LangAnalysisRules>,
|
||||
break_targets: &mut Vec<NodeIndex>,
|
||||
continue_targets: &mut Vec<NodeIndex>,
|
||||
throw_targets: &mut Vec<NodeIndex>,
|
||||
bodies: &mut Vec<BodyCfg>,
|
||||
next_body_id: &mut u32,
|
||||
current_body_id: BodyId,
|
||||
) -> Vec<NodeIndex> {
|
||||
let Some(update) = update_subtree else {
|
||||
return back_sources.to_vec();
|
||||
};
|
||||
if back_sources.is_empty() {
|
||||
return Vec::new();
|
||||
}
|
||||
let update_exits = build_sub(
|
||||
update,
|
||||
back_sources,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
);
|
||||
// If the update produced no CFG nodes (e.g. it was pure trivia), preserve
|
||||
// the original back-edge sources so the loop still closes.
|
||||
if update_exits.is_empty() {
|
||||
back_sources.to_vec()
|
||||
} else {
|
||||
update_exits
|
||||
}
|
||||
}
|
||||
|
||||
// The recursive *work‑horse* that converts an AST node into a CFG slice.
|
||||
// Returns the set of *exit* nodes that need to be wired further.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
|
|
@ -5183,6 +5415,8 @@ pub(super) fn build_sub<'a>(
|
|||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
)
|
||||
} else {
|
||||
// Single-node path (original behavior)
|
||||
|
|
@ -5214,14 +5448,27 @@ pub(super) fn build_sub<'a>(
|
|||
|
||||
// Locate then & else blocks using field-based lookup first,
|
||||
// then positional fallback (Rust uses positional blocks).
|
||||
let (then_block, else_block) = {
|
||||
//
|
||||
// `alternatives` collects *every* `alternative` field, not just the
|
||||
// first. In tree-sitter-python and tree-sitter-php an
|
||||
// `if/elif/.../else` (Python) or `if/elseif/.../else` (PHP) chain
|
||||
// produces several FLAT sibling `alternative` fields on one
|
||||
// `if_statement` (a list of `elif_clause`/`else_if_clause` nodes
|
||||
// optionally trailed by an `else_clause`). JS/TS/Rust/Go/Java/C
|
||||
// nest their `else if`, so they expose at most one `alternative`
|
||||
// (the nested if) and the list has length ≤ 1.
|
||||
let (then_block, else_block, alternatives) = {
|
||||
let field_then = ast
|
||||
.child_by_field_name("consequence")
|
||||
.or_else(|| ast.child_by_field_name("body"));
|
||||
let field_else = ast.child_by_field_name("alternative");
|
||||
let mut alt_cursor = ast.walk();
|
||||
let alternatives: Vec<Node> = ast
|
||||
.children_by_field_name("alternative", &mut alt_cursor)
|
||||
.collect();
|
||||
let field_else = alternatives.first().copied();
|
||||
|
||||
if field_then.is_some() || field_else.is_some() {
|
||||
(field_then, field_else)
|
||||
(field_then, field_else, alternatives)
|
||||
} else {
|
||||
// Fallback: positional block children (Rust `if_expression`)
|
||||
let mut cursor = ast.walk();
|
||||
|
|
@ -5229,10 +5476,22 @@ pub(super) fn build_sub<'a>(
|
|||
.children(&mut cursor)
|
||||
.filter(|n| lookup(lang, n.kind()) == Kind::Block)
|
||||
.collect();
|
||||
(blocks.first().copied(), blocks.get(1).copied())
|
||||
(blocks.first().copied(), blocks.get(1).copied(), Vec::new())
|
||||
}
|
||||
};
|
||||
|
||||
// A flat elif/elseif chain has 2+ `alternative` siblings, or a
|
||||
// single `alternative` that is itself an elif/else-if clause (the
|
||||
// `if a: .. elif b: ..` form with no trailing `else`, where the lone
|
||||
// alternative is an `elif_clause` rather than a nested if). In both
|
||||
// cases the default single-`else_block` lowering below would either
|
||||
// drop later siblings entirely or fail to treat the elif condition
|
||||
// as a branch guard, so route through the dedicated chain builder.
|
||||
let is_flat_elif_chain = alternatives.len() > 1
|
||||
|| alternatives
|
||||
.first()
|
||||
.is_some_and(|a| matches!(a.kind(), "elif_clause" | "else_if_clause"));
|
||||
|
||||
// THEN branch
|
||||
let then_first_node = NodeIndex::new(g.node_count());
|
||||
let then_exits = if let Some(b) = then_block {
|
||||
|
|
@ -5267,7 +5526,30 @@ pub(super) fn build_sub<'a>(
|
|||
|
||||
// ELSE branch
|
||||
let else_first_node = NodeIndex::new(g.node_count());
|
||||
let else_exits = if let Some(b) = else_block {
|
||||
let else_exits = if is_flat_elif_chain {
|
||||
// Flat elif/elseif chain: lower every alternative sibling as a
|
||||
// proper nested else-if so 2nd+ elif and trailing else clauses
|
||||
// are no longer dropped from the CFG.
|
||||
build_alternative_chain(
|
||||
&alternatives,
|
||||
else_preds,
|
||||
else_edge,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
)
|
||||
} else if let Some(b) = else_block {
|
||||
let exits = build_sub(
|
||||
b,
|
||||
else_preds,
|
||||
|
|
@ -5380,6 +5662,61 @@ pub(super) fn build_sub<'a>(
|
|||
|
||||
// WHILE / FOR: classic loop with a back edge.
|
||||
Kind::While | Kind::For => {
|
||||
// C-style `for (init; cond; incr) body` loops (C/C++, JS/TS, Go
|
||||
// three-clause, PHP) attach `initializer`/`update` (or `increment`)
|
||||
// subtrees that the previous loop lowering ignored entirely — so a
|
||||
// taint source bound in the init (`for (cmd = getenv("X"); …)`) or a
|
||||
// side effect in the increment had no CFG node at all and was
|
||||
// invisible to taint analysis. Tree-sitter exposes these either as
|
||||
// direct fields on the loop node (C/C++/JS/TS/PHP) or nested under a
|
||||
// `for_clause` child (Go's three-clause form). The init runs once
|
||||
// before the header; its exits become the header's predecessors so
|
||||
// its defs flow into both the condition and the body.
|
||||
let clause_owner = ast
|
||||
.child_by_field_name("body")
|
||||
.is_none()
|
||||
.then(|| {
|
||||
let mut c = ast.walk();
|
||||
ast.children(&mut c).find(|n| n.kind() == "for_clause")
|
||||
})
|
||||
.flatten();
|
||||
let init_subtree = ast
|
||||
.child_by_field_name("initializer")
|
||||
.or_else(|| ast.child_by_field_name("initialize"))
|
||||
.or_else(|| clause_owner.and_then(|fc| fc.child_by_field_name("initializer")));
|
||||
let update_subtree = ast
|
||||
.child_by_field_name("update")
|
||||
.or_else(|| ast.child_by_field_name("increment"))
|
||||
.or_else(|| clause_owner.and_then(|fc| fc.child_by_field_name("update")));
|
||||
|
||||
// Lower the initializer (if any) from `preds`; its exits become the
|
||||
// header's predecessors. Empty / absent inits leave `preds`
|
||||
// bit-identical to the pre-fix behaviour.
|
||||
let init_exits_owned = init_subtree.map(|init| {
|
||||
build_sub(
|
||||
init,
|
||||
preds,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
)
|
||||
});
|
||||
let header_preds: &[NodeIndex] = match &init_exits_owned {
|
||||
Some(exits) if !exits.is_empty() => exits.as_slice(),
|
||||
_ => preds,
|
||||
};
|
||||
|
||||
let header = push_node(
|
||||
g,
|
||||
StmtKind::Loop,
|
||||
|
|
@ -5390,7 +5727,7 @@ pub(super) fn build_sub<'a>(
|
|||
0,
|
||||
analysis_rules,
|
||||
);
|
||||
connect_all(g, preds, header, EdgeKind::Seq);
|
||||
connect_all(g, header_preds, header, EdgeKind::Seq);
|
||||
|
||||
// Check for short-circuit condition
|
||||
let cond_subtree = ast.child_by_field_name("condition");
|
||||
|
|
@ -5445,6 +5782,8 @@ pub(super) fn build_sub<'a>(
|
|||
lang,
|
||||
code,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
);
|
||||
|
||||
// Wire body from true_exits
|
||||
|
|
@ -5472,13 +5811,33 @@ pub(super) fn build_sub<'a>(
|
|||
connect_all(g, &true_exits, body_first, EdgeKind::True);
|
||||
}
|
||||
|
||||
// The increment runs at the end of each iteration before the
|
||||
// condition is re-checked, so it sits on the back-edge path
|
||||
// between the body/continue exits and the header.
|
||||
let mut back_sources: Vec<NodeIndex> = body_exits;
|
||||
back_sources.extend(loop_continues.iter().copied());
|
||||
let back_sources = lower_loop_update(
|
||||
update_subtree,
|
||||
&back_sources,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
);
|
||||
// Back-edges go to header (not into the condition chain)
|
||||
for &e in &body_exits {
|
||||
for &e in &back_sources {
|
||||
connect_all(g, &[e], header, EdgeKind::Back);
|
||||
}
|
||||
for &c in &loop_continues {
|
||||
connect_all(g, &[c], header, EdgeKind::Back);
|
||||
}
|
||||
|
||||
// Loop exits = false_exits + breaks
|
||||
let mut exits: Vec<NodeIndex> = false_exits;
|
||||
|
|
@ -5504,14 +5863,32 @@ pub(super) fn build_sub<'a>(
|
|||
current_body_id,
|
||||
);
|
||||
|
||||
// The increment runs on the back-edge path (end of each
|
||||
// iteration, before the next condition check).
|
||||
let mut back_sources: Vec<NodeIndex> = body_exits;
|
||||
back_sources.extend(loop_continues.iter().copied());
|
||||
let back_sources = lower_loop_update(
|
||||
update_subtree,
|
||||
&back_sources,
|
||||
g,
|
||||
lang,
|
||||
code,
|
||||
summaries,
|
||||
file_path,
|
||||
enclosing_func,
|
||||
call_ordinal,
|
||||
analysis_rules,
|
||||
break_targets,
|
||||
continue_targets,
|
||||
throw_targets,
|
||||
bodies,
|
||||
next_body_id,
|
||||
current_body_id,
|
||||
);
|
||||
// Back‑edge for every linear exit → header.
|
||||
for &e in &body_exits {
|
||||
for &e in &back_sources {
|
||||
connect_all(g, &[e], header, EdgeKind::Back);
|
||||
}
|
||||
// Wire continue targets as back edges to header
|
||||
for &c in &loop_continues {
|
||||
connect_all(g, &[c], header, EdgeKind::Back);
|
||||
}
|
||||
// Falling out of the loop = header’s false branch +
|
||||
// any break targets that exit the loop.
|
||||
let mut exits = vec![header];
|
||||
|
|
|
|||
|
|
@ -108,9 +108,20 @@ fn branch_terminates(cfg: &crate::cfg::Cfg, if_node: NodeIndex) -> bool {
|
|||
return false;
|
||||
}
|
||||
|
||||
// Check if any path through the true branch terminates
|
||||
// The join point of the if statement is its immediate post-dominator:
|
||||
// the first node every branch reconverges on. The true-branch walk
|
||||
// must stop there — reaching the join means the body fell through
|
||||
// *past* the if without terminating, so any `return` in the function
|
||||
// tail past the join must NOT count as the error branch terminating.
|
||||
// Without this bound, a trailing `return nil` (present in essentially
|
||||
// every Go `func(...) error`) makes the walk report "all paths
|
||||
// terminate" and silently suppresses the rule.
|
||||
let join = super::dominators::compute_post_dominators(cfg)
|
||||
.and_then(|pd| pd.immediate_dominator(if_node));
|
||||
|
||||
// Check if any path through the true branch terminates before the join.
|
||||
for &start in &true_successors {
|
||||
if terminates_on_all_paths(cfg, start, if_node) {
|
||||
if terminates_on_all_paths(cfg, start, join) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
|
@ -206,11 +217,18 @@ fn call_never_returns(info: &crate::cfg::NodeInfo) -> bool {
|
|||
}
|
||||
|
||||
/// Check if all paths from `node` reach a Return/Break/Continue (or a
|
||||
/// known never-returning call) before exiting scope.
|
||||
/// known never-returning call) before reaching the if's join point.
|
||||
///
|
||||
/// `join` is the if statement's immediate post-dominator (`None` if it
|
||||
/// could not be computed — e.g. no Exit node). The walk stops at the
|
||||
/// join: reaching it means the true branch fell through past the if
|
||||
/// without terminating, so that path does NOT terminate and the rule
|
||||
/// should fire. This prevents a `return` in the function tail (after the
|
||||
/// join) from being mis-attributed to the error branch.
|
||||
fn terminates_on_all_paths(
|
||||
cfg: &crate::cfg::Cfg,
|
||||
node: NodeIndex,
|
||||
_scope_entry: NodeIndex,
|
||||
join: Option<NodeIndex>,
|
||||
) -> bool {
|
||||
use std::collections::HashSet;
|
||||
|
||||
|
|
@ -222,6 +240,12 @@ fn terminates_on_all_paths(
|
|||
continue;
|
||||
}
|
||||
|
||||
// Reaching the if's join point means this path fell through past
|
||||
// the if without terminating inside the branch.
|
||||
if join == Some(current) {
|
||||
return false;
|
||||
}
|
||||
|
||||
let info = &cfg[current];
|
||||
match info.kind {
|
||||
StmtKind::Return | StmtKind::Throw | StmtKind::Break | StmtKind::Continue => {
|
||||
|
|
@ -460,6 +484,91 @@ mod err_ident_tests {
|
|||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod join_boundary_tests {
|
||||
use super::branch_terminates;
|
||||
use crate::cfg::{CallMeta, Cfg, EdgeKind, NodeInfo, StmtKind};
|
||||
use petgraph::graph::NodeIndex;
|
||||
|
||||
fn node(kind: StmtKind) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
fn call_node(callee: &str) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind: StmtKind::Call,
|
||||
call: CallMeta {
|
||||
callee: Some(callee.to_string()),
|
||||
..Default::default()
|
||||
},
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
/// Build the canonical `if err != nil { <body> } <tail>; return`
|
||||
/// shape and return (cfg, if_node). `body_terminates` selects whether
|
||||
/// the true branch body itself returns (terminates) or falls through
|
||||
/// to the join.
|
||||
fn build_if_cfg(body_terminates: bool) -> (Cfg, NodeIndex) {
|
||||
let mut cfg = Cfg::new();
|
||||
let entry = cfg.add_node(node(StmtKind::Entry));
|
||||
let if_n = cfg.add_node(node(StmtKind::If));
|
||||
// true-branch body
|
||||
let body = if body_terminates {
|
||||
cfg.add_node(node(StmtKind::Return))
|
||||
} else {
|
||||
cfg.add_node(call_node("log"))
|
||||
};
|
||||
// join point where both branches reconverge: a downstream use
|
||||
let join = cfg.add_node(call_node("use"));
|
||||
// function tail: an explicit `return nil` (present in every Go
|
||||
// value-returning function) followed by exit.
|
||||
let ret = cfg.add_node(node(StmtKind::Return));
|
||||
let exit = cfg.add_node(node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, if_n, EdgeKind::Seq);
|
||||
cfg.add_edge(if_n, body, EdgeKind::True);
|
||||
cfg.add_edge(if_n, join, EdgeKind::False);
|
||||
if !body_terminates {
|
||||
cfg.add_edge(body, join, EdgeKind::Seq);
|
||||
} else {
|
||||
cfg.add_edge(body, exit, EdgeKind::Seq);
|
||||
}
|
||||
cfg.add_edge(join, ret, EdgeKind::Seq);
|
||||
cfg.add_edge(ret, exit, EdgeKind::Seq);
|
||||
|
||||
(cfg, if_n)
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fallthrough_body_does_not_terminate_despite_trailing_return() {
|
||||
// True branch falls through to the join; the function tail has a
|
||||
// `return nil`. Before the join-boundary fix the walk reached
|
||||
// that trailing return and reported "terminates", suppressing the
|
||||
// rule. The fix bounds the walk at the join, so this is correctly
|
||||
// reported as NOT terminating.
|
||||
let (cfg, if_n) = build_if_cfg(false);
|
||||
assert!(
|
||||
!branch_terminates(&cfg, if_n),
|
||||
"fall-through error branch must not count as terminating"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn returning_body_terminates() {
|
||||
// True branch returns directly: the error is handled, so the rule
|
||||
// must stay suppressed.
|
||||
let (cfg, if_n) = build_if_cfg(true);
|
||||
assert!(
|
||||
branch_terminates(&cfg, if_n),
|
||||
"error branch with an explicit return must count as terminating"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod terminator_call_tests {
|
||||
use super::call_never_returns;
|
||||
|
|
|
|||
|
|
@ -131,7 +131,7 @@ fn ssa_all_sink_operands_constant(
|
|||
};
|
||||
|
||||
let operand_const = |v: SsaValue| -> bool {
|
||||
ssa_operand_constant(v, facts, callee_desc, callee_parts, outer_parts)
|
||||
ssa_operand_constant(v, facts, ctx.cfg, callee_desc, callee_parts, outer_parts)
|
||||
};
|
||||
let args_ok = args
|
||||
.iter()
|
||||
|
|
@ -401,6 +401,7 @@ fn ssa_operand_const_or_param(
|
|||
fn ssa_operand_constant(
|
||||
root: SsaValue,
|
||||
facts: &BodyConstFacts,
|
||||
cfg: &crate::cfg::Cfg,
|
||||
callee_desc: &str,
|
||||
callee_parts: &[&str],
|
||||
outer_parts: &[&str],
|
||||
|
|
@ -426,6 +427,26 @@ fn ssa_operand_constant(
|
|||
let Some(inst) = find_inst(&facts.ssa, v) else {
|
||||
return false;
|
||||
};
|
||||
// CFG-node-level Source label: a `SsaOp::Call` can be the lowering
|
||||
// of a Source-labeled CFG node (`file_get_contents`, `env::var`,
|
||||
// `readline`, …). Such a call's result is tainted user input, not a
|
||||
// constant, even when its own arguments are constant (or it is a
|
||||
// zero-arg source). Mirror the refusal in the sibling
|
||||
// `ssa_operand_const_or_param` so a source-fed sink is never proven
|
||||
// "all args constant" and silently dropped.
|
||||
let cfg_node = inst.cfg_node;
|
||||
if cfg
|
||||
.node_weight(cfg_node)
|
||||
.map(|info| {
|
||||
info.taint
|
||||
.labels
|
||||
.iter()
|
||||
.any(|l| matches!(l, DataLabel::Source(_)))
|
||||
})
|
||||
.unwrap_or(false)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
match &inst.op {
|
||||
SsaOp::Const(_) => {}
|
||||
SsaOp::Assign(vals) => stack.extend(vals.iter().copied()),
|
||||
|
|
@ -2190,6 +2211,32 @@ fn cond_indirect_validator_callee(
|
|||
crate::ssa::type_facts::classify_input_validator_callee(callee).map(|_| callee.to_string())
|
||||
}
|
||||
|
||||
/// Match a guard suffix matcher against a callee, requiring the suffix to
|
||||
/// begin on a *leaf-name boundary* rather than mid-identifier.
|
||||
///
|
||||
/// A bare `callee_lower.ends_with(suffix)` over-matches: `invalidate` ends
|
||||
/// with `validate` (the `Cap::all()` guard) and `unquote` ends with `quote`
|
||||
/// (the SHELL_ESCAPE guard), so cache-invalidation and URL-decoding calls
|
||||
/// would register as dominating guards and silently suppress every
|
||||
/// downstream `cfg-unguarded-sink` in the function. Require that the
|
||||
/// character preceding the matched suffix is a name separator (`.`, `_`, or
|
||||
/// `:` from `::`) or that the suffix sits at the start of the callee. This
|
||||
/// mirrors the existing prefix-anchor convention (a trailing `_` on the
|
||||
/// matcher anchors at the start).
|
||||
pub(super) fn suffix_matches_at_leaf_boundary(callee_lower: &str, suffix: &str) -> bool {
|
||||
if !callee_lower.ends_with(suffix) {
|
||||
return false;
|
||||
}
|
||||
let prefix_len = callee_lower.len() - suffix.len();
|
||||
if prefix_len == 0 {
|
||||
// Suffix is the whole callee (or its leaf with nothing before it).
|
||||
return true;
|
||||
}
|
||||
// The byte immediately before the suffix must be a leaf-name separator.
|
||||
let prev = callee_lower.as_bytes()[prefix_len - 1];
|
||||
matches!(prev, b'.' | b'_' | b':')
|
||||
}
|
||||
|
||||
/// Find all nodes in the CFG that are calls to guard functions.
|
||||
fn find_guard_nodes(ctx: &AnalysisContext) -> Vec<(NodeIndex, Cap)> {
|
||||
let guard_rules = rules::guard_rules(ctx.lang);
|
||||
|
|
@ -2300,7 +2347,7 @@ fn find_guard_nodes(ctx: &AnalysisContext) -> Vec<(NodeIndex, Cap)> {
|
|||
if ml.ends_with('_') {
|
||||
callee_lower.starts_with(&ml)
|
||||
} else {
|
||||
callee_lower.ends_with(&ml)
|
||||
suffix_matches_at_leaf_boundary(&callee_lower, &ml)
|
||||
}
|
||||
});
|
||||
if matched {
|
||||
|
|
@ -3187,3 +3234,49 @@ mod chain_fragments_tests {
|
|||
assert!(!got.contains(&"raw".to_string()));
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod guard_suffix_boundary_tests {
|
||||
use super::suffix_matches_at_leaf_boundary;
|
||||
|
||||
#[test]
|
||||
fn rejects_mid_identifier_suffixes() {
|
||||
// The whole point of the fix: these must NOT register as guards.
|
||||
assert!(!suffix_matches_at_leaf_boundary("invalidate", "validate"));
|
||||
assert!(!suffix_matches_at_leaf_boundary(
|
||||
"cache.invalidate",
|
||||
"validate"
|
||||
));
|
||||
assert!(!suffix_matches_at_leaf_boundary("unquote", "quote"));
|
||||
assert!(!suffix_matches_at_leaf_boundary(
|
||||
"urllib.parse.unquote",
|
||||
"quote"
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn accepts_leaf_boundary_suffixes() {
|
||||
// Suffix at a real leaf-name boundary stays a valid guard match.
|
||||
assert!(suffix_matches_at_leaf_boundary("validate", "validate"));
|
||||
assert!(suffix_matches_at_leaf_boundary("shlex.quote", "quote"));
|
||||
assert!(suffix_matches_at_leaf_boundary(
|
||||
"urllib.parse.quote",
|
||||
"quote"
|
||||
));
|
||||
// `_` and `::` separators also count as leaf boundaries.
|
||||
assert!(suffix_matches_at_leaf_boundary("my_validate", "validate"));
|
||||
assert!(suffix_matches_at_leaf_boundary(
|
||||
"std::shell_escape",
|
||||
"shell_escape"
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_non_suffix() {
|
||||
assert!(!suffix_matches_at_leaf_boundary(
|
||||
"validate_input",
|
||||
"validate"
|
||||
));
|
||||
assert!(!suffix_matches_at_leaf_boundary("os.system", "quote"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -288,7 +288,11 @@ pub(crate) fn is_guard_call(
|
|||
if callee_lower.starts_with(&ml) {
|
||||
return true;
|
||||
}
|
||||
} else if callee_lower.ends_with(&ml) {
|
||||
} else if guards::suffix_matches_at_leaf_boundary(&callee_lower, &ml) {
|
||||
// Leaf-boundary match, not a bare `ends_with`: otherwise
|
||||
// `invalidate` matches the `validate` guard and `unquote`
|
||||
// matches `quote`, registering cache-invalidation /
|
||||
// URL-decode calls as guards and suppressing real sinks.
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
|
@ -305,7 +309,7 @@ pub(crate) fn is_guard_call(
|
|||
if callee_lower.starts_with(&ml) {
|
||||
return true;
|
||||
}
|
||||
} else if callee_lower.ends_with(&ml) {
|
||||
} else if guards::suffix_matches_at_leaf_boundary(&callee_lower, &ml) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ use super::{AnalysisContext, CfgAnalysis, CfgFinding, Confidence};
|
|||
use crate::cfg::{EdgeKind, StmtKind};
|
||||
use crate::patterns::Severity;
|
||||
use crate::symbol::Lang;
|
||||
use petgraph::algo::dominators::Dominators;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use petgraph::visit::EdgeRef;
|
||||
use std::collections::HashSet;
|
||||
|
|
@ -132,11 +133,14 @@ fn release_on_all_exit_paths(
|
|||
acquire: NodeIndex,
|
||||
release_nodes: &[NodeIndex],
|
||||
exit: NodeIndex,
|
||||
post_doms: Option<&Dominators<NodeIndex>>,
|
||||
) -> bool {
|
||||
// Use post-dominators as optimization: if any release post-dominates acquire, it's fine
|
||||
if let Some(post_doms) = dominators::compute_post_dominators(ctx.cfg) {
|
||||
// Use post-dominators as optimization: if any release post-dominates acquire, it's fine.
|
||||
// The post-dominator tree is computed once per CFG by the caller (the CFG is
|
||||
// immutable across acquire sites), so it is threaded in rather than recomputed here.
|
||||
if let Some(post_doms) = post_doms {
|
||||
for &release in release_nodes {
|
||||
if dominators::dominates(&post_doms, release, acquire) {
|
||||
if dominators::dominates(post_doms, release, acquire) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
|
@ -363,8 +367,13 @@ fn is_ownership_transferred(ctx: &AnalysisContext, acquire: NodeIndex) -> bool {
|
|||
|
||||
if references_var && start < end && end <= ctx.source_bytes.len() {
|
||||
let span_text = &ctx.source_bytes[start..end];
|
||||
// `->` anywhere in span means pointer-to-member store
|
||||
if span_text.windows(2).any(|w| w == b"->") {
|
||||
// `ptr->field = var` pointer-to-member store. A bare `->`
|
||||
// anywhere in the span is NOT enough: a member READ on or near
|
||||
// the handle (`fread(buf->data, 1, n, fp)`, PHP
|
||||
// `$conn->query(...)`) also contains `->` but transfers no
|
||||
// ownership, so requiring the `= ` (assignment, not `==`) keeps
|
||||
// the suppression to actual stores.
|
||||
if has_arrow_field_assignment(span_text) {
|
||||
return true;
|
||||
}
|
||||
// `.field = var` pattern (but not `==`)
|
||||
|
|
@ -384,7 +393,7 @@ fn is_ownership_transferred(ctx: &AnalysisContext, acquire: NodeIndex) -> bool {
|
|||
{
|
||||
let is_field_write = if start < end && end <= ctx.source_bytes.len() {
|
||||
let span_text = &ctx.source_bytes[start..end];
|
||||
span_text.windows(2).any(|w| w == b"->") || has_dot_field_assignment(span_text)
|
||||
has_arrow_field_assignment(span_text) || has_dot_field_assignment(span_text)
|
||||
} else {
|
||||
false
|
||||
};
|
||||
|
|
@ -434,6 +443,58 @@ fn has_dot_field_assignment(span_text: &[u8]) -> bool {
|
|||
false
|
||||
}
|
||||
|
||||
/// Check if `span_text` contains a pointer-to-member assignment pattern like
|
||||
/// `ptr->field = var` (but not a member READ such as `fread(buf->data, ...)`,
|
||||
/// PHP `$conn->query(...)`, or a comparison `a->b == c`).
|
||||
///
|
||||
/// A bare `->` anywhere in the span is insufficient — only an arrow access
|
||||
/// that is the LHS of an assignment indicates an ownership-transfer store.
|
||||
/// We require, after one or more `->ident` (or `.ident`) member-access
|
||||
/// segments, a single `=` that is not part of `==` / `!=` / `<=` / `>=`.
|
||||
fn has_arrow_field_assignment(span_text: &[u8]) -> bool {
|
||||
let mut i = 0;
|
||||
while i + 1 < span_text.len() {
|
||||
if span_text[i] == b'-' && span_text[i + 1] == b'>' {
|
||||
// Walk past chained member-access segments: `->ident`,
|
||||
// `.ident`, further `->ident`, and any whitespace, looking for
|
||||
// the assignment `=`.
|
||||
let mut j = i + 2;
|
||||
loop {
|
||||
// identifier chars
|
||||
while j < span_text.len()
|
||||
&& (span_text[j].is_ascii_alphanumeric() || span_text[j] == b'_')
|
||||
{
|
||||
j += 1;
|
||||
}
|
||||
// chained `->` / `.` member access keeps the LHS going
|
||||
if j + 1 < span_text.len() && span_text[j] == b'-' && span_text[j + 1] == b'>' {
|
||||
j += 2;
|
||||
continue;
|
||||
}
|
||||
if j < span_text.len() && span_text[j] == b'.' {
|
||||
j += 1;
|
||||
continue;
|
||||
}
|
||||
break;
|
||||
}
|
||||
// Skip whitespace before the operator
|
||||
while j < span_text.len() && span_text[j].is_ascii_whitespace() {
|
||||
j += 1;
|
||||
}
|
||||
// Reject `==`, `!=`, `<=`, `>=` — only a bare `=` is a store.
|
||||
if j < span_text.len()
|
||||
&& span_text[j] == b'='
|
||||
&& (j + 1 >= span_text.len() || span_text[j + 1] != b'=')
|
||||
&& (j == 0 || !matches!(span_text[j - 1], b'!' | b'<' | b'>' | b'='))
|
||||
{
|
||||
return true;
|
||||
}
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Check whether the acquired variable is consumed by an ownership-taking
|
||||
/// function (e.g. `FileResponse(f)`, `send_file(f)`) downstream of the
|
||||
/// acquire node. These functions take ownership of the file handle so there
|
||||
|
|
@ -538,6 +599,12 @@ impl CfgAnalysis for ResourceMisuse {
|
|||
None => return Vec::new(),
|
||||
};
|
||||
|
||||
// The CFG is immutable for the duration of this pass, so the
|
||||
// post-dominator tree only needs to be computed once. Previously
|
||||
// `release_on_all_exit_paths` recomputed it for every acquire site,
|
||||
// turning the body's post-dominator analysis into an O(A) hot spot.
|
||||
let post_doms = dominators::compute_post_dominators(ctx.cfg);
|
||||
|
||||
let mut findings = Vec::new();
|
||||
|
||||
for pair in pairs {
|
||||
|
|
@ -593,8 +660,13 @@ impl CfgAnalysis for ResourceMisuse {
|
|||
continue;
|
||||
}
|
||||
}
|
||||
if !release_on_all_exit_paths(ctx, acquire, &release_nodes, exit)
|
||||
&& !is_ownership_transferred(ctx, acquire)
|
||||
if !release_on_all_exit_paths(
|
||||
ctx,
|
||||
acquire,
|
||||
&release_nodes,
|
||||
exit,
|
||||
post_doms.as_ref(),
|
||||
) && !is_ownership_transferred(ctx, acquire)
|
||||
&& !is_consumed_by_owner(ctx, acquire)
|
||||
{
|
||||
// For mutex pairs, require an explicit .acquire()/.lock() call
|
||||
|
|
@ -724,4 +796,30 @@ mod tests {
|
|||
let info = call_node(vec![None], vec![vec!["receiver_func".into()]]);
|
||||
assert!(!is_event_handler_register_shape(&info));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn arrow_field_assignment_matches_real_stores() {
|
||||
// Genuine ownership-transfer stores: `ptr->field = var`.
|
||||
assert!(has_arrow_field_assignment(b"ctx->fp = fp"));
|
||||
assert!(has_arrow_field_assignment(b"p->next = cfg->variables"));
|
||||
// Chained member access on the LHS.
|
||||
assert!(has_arrow_field_assignment(b"a->b->c = handle"));
|
||||
// Whitespace variations.
|
||||
assert!(has_arrow_field_assignment(b"obj->handle=res"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn arrow_field_assignment_rejects_member_reads() {
|
||||
// Member READ on / near the handle: contains `->` but no store.
|
||||
// This is the false-negative the fix targets — must NOT be treated
|
||||
// as ownership transfer.
|
||||
assert!(!has_arrow_field_assignment(b"fread(buf->data, 1, n, fp)"));
|
||||
assert!(!has_arrow_field_assignment(b"$result = $conn->query(sql)"));
|
||||
assert!(!has_arrow_field_assignment(b"if (node->len > 0)"));
|
||||
// Comparisons through arrow access are not stores.
|
||||
assert!(!has_arrow_field_assignment(b"if (obj->state == READY)"));
|
||||
assert!(!has_arrow_field_assignment(b"x->y != z"));
|
||||
// No arrow at all.
|
||||
assert!(!has_arrow_field_assignment(b"fclose(fp)"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -96,6 +96,49 @@ pub fn handle(
|
|||
}
|
||||
}
|
||||
|
||||
/// Run `f` with optional panic recovery, mirroring
|
||||
/// `crate::commands::scan::recover_or_propagate` (which is private to the
|
||||
/// scan module). When `enabled` is false, panics propagate as before.
|
||||
/// When enabled, a panic in `f` is caught, logged, and surfaced as an
|
||||
/// `Err` so the caller can log-and-skip the offending file instead of
|
||||
/// aborting the whole index build.
|
||||
fn recover_or_propagate<T>(
|
||||
enabled: bool,
|
||||
path: &std::path::Path,
|
||||
logs: Option<&Arc<ScanLogCollector>>,
|
||||
f: impl FnOnce() -> NyxResult<T>,
|
||||
) -> NyxResult<T> {
|
||||
if !enabled {
|
||||
return f();
|
||||
}
|
||||
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(f)) {
|
||||
Ok(r) => r,
|
||||
Err(panic) => {
|
||||
let msg = panic
|
||||
.downcast_ref::<&str>()
|
||||
.copied()
|
||||
.map(str::to_owned)
|
||||
.or_else(|| panic.downcast_ref::<String>().cloned())
|
||||
.unwrap_or_else(|| "<non-string panic>".to_string());
|
||||
tracing::warn!(
|
||||
path = %path.display(),
|
||||
panic = %msg,
|
||||
"index-build analysis panicked; continuing"
|
||||
);
|
||||
if let Some(l) = logs {
|
||||
l.warn(
|
||||
format!("Analysis panicked: {msg}"),
|
||||
Some(path.display().to_string()),
|
||||
Some(msg.clone()),
|
||||
);
|
||||
}
|
||||
Err(crate::errors::NyxError::Msg(format!(
|
||||
"analysis panicked: {msg}"
|
||||
)))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn build_index(
|
||||
project_name: &str,
|
||||
project_path: &std::path::Path,
|
||||
|
|
@ -135,6 +178,16 @@ pub fn build_index_with_observer(
|
|||
let owned_cfg = crate::commands::scan::ensure_framework_ctx(project_path, config);
|
||||
let config = owned_cfg.as_ref().unwrap_or(config);
|
||||
|
||||
// Pass-1 rescans of later-edited files run with a populated module
|
||||
// graph (scan_with_index_parallel_observer builds one before pass 1),
|
||||
// so the SSA summaries/bodies they persist carry package-qualified
|
||||
// namespaces. Build the same graph here so the rows persisted at
|
||||
// index-build time use the identical key convention; without it the DB
|
||||
// ends up mixing package-qualified and bare namespaces for the same
|
||||
// project, breaking cross-file SSA resolution.
|
||||
let owned_cfg_with_graph = crate::commands::scan::ensure_module_graph(project_path, config);
|
||||
let config = owned_cfg_with_graph.as_ref().unwrap_or(config);
|
||||
|
||||
tracing::debug!("Building index for: {}", project_name);
|
||||
let pool = Indexer::init(db_path)?;
|
||||
{
|
||||
|
|
@ -206,22 +259,62 @@ pub fn build_index_with_observer(
|
|||
let pass1_start = std::time::Instant::now();
|
||||
let writer = IndexWriteQueue::start(project_name.to_owned(), Arc::clone(&pool));
|
||||
let write_tx = writer.sender();
|
||||
let panic_recovery = config.scanner.enable_panic_recovery;
|
||||
let index_result = paths.into_par_iter().try_for_each(|path| -> NyxResult<()> {
|
||||
// Read once, hash once, pass bytes to both rule execution and
|
||||
// summary extraction. Use pre-computed hash for upsert to avoid
|
||||
// a redundant file read inside upsert_file.
|
||||
let bytes = std::fs::read(&path)?;
|
||||
//
|
||||
// A file that disappears between the walk and this read
|
||||
// (delete/rename race) or that is unreadable (permission denied)
|
||||
// must not abort the entire index build: log and skip it, matching
|
||||
// the non-indexed scan paths (`scan.rs` pass-1 fold).
|
||||
let bytes = match std::fs::read(&path) {
|
||||
Ok(b) => b,
|
||||
Err(e) => {
|
||||
tracing::warn!("index build: cannot read {}: {e}", path.display());
|
||||
if let Some(l) = logs.as_ref() {
|
||||
l.warn(
|
||||
format!("Skipping unreadable file: {e}"),
|
||||
Some(path.display().to_string()),
|
||||
None,
|
||||
);
|
||||
}
|
||||
pb.inc(1);
|
||||
return Ok(());
|
||||
}
|
||||
};
|
||||
let hash = Indexer::digest_bytes(&bytes);
|
||||
|
||||
// Parse once and persist every artifact we can reuse later:
|
||||
// findings, coarse summaries, and precise SSA summaries.
|
||||
let fused = crate::commands::scan::analyse_file_fused(
|
||||
&bytes,
|
||||
&path,
|
||||
config,
|
||||
None,
|
||||
Some(project_path),
|
||||
)?;
|
||||
// findings, coarse summaries, and precise SSA summaries. Wrap the
|
||||
// analysis in optional panic recovery so an engine panic on one
|
||||
// file does not abort the whole build when the user enabled
|
||||
// `scanner.enable_panic_recovery`; an analysis error (or a caught
|
||||
// panic) is logged and the file skipped, matching the scan paths.
|
||||
let fused = match recover_or_propagate(panic_recovery, &path, logs.as_ref(), || {
|
||||
crate::commands::scan::analyse_file_fused(
|
||||
&bytes,
|
||||
&path,
|
||||
config,
|
||||
None,
|
||||
Some(project_path),
|
||||
)
|
||||
}) {
|
||||
Ok(f) => f,
|
||||
Err(e) => {
|
||||
tracing::warn!("index build: analysis failed for {}: {e}", path.display());
|
||||
if let Some(l) = logs.as_ref() {
|
||||
l.warn(
|
||||
format!("Skipping file after analysis error: {e}"),
|
||||
Some(path.display().to_string()),
|
||||
None,
|
||||
);
|
||||
}
|
||||
pb.inc(1);
|
||||
return Ok(());
|
||||
}
|
||||
};
|
||||
if let Some(ref p) = progress {
|
||||
p.inc_parsed(1);
|
||||
p.set_current_file(&path.to_string_lossy());
|
||||
|
|
@ -287,6 +380,32 @@ pub fn build_index_with_observer(
|
|||
})
|
||||
.collect();
|
||||
|
||||
// Persist auth-check summaries and cross-package import maps too.
|
||||
// The indexed pass-1 rescan path persists these via
|
||||
// `replace_all_for_file`, but a hash match on the freshly-stamped
|
||||
// file makes that path skip every unchanged file, so unless we
|
||||
// write them here they are lost until a file's content changes:
|
||||
// `load_all_auth_summaries` / `load_all_cross_package_imports`
|
||||
// return empty after the first indexed scan, killing cross-file
|
||||
// auth-helper lifting and cross-package callee resolution.
|
||||
let auth_rows: Vec<_> = fused
|
||||
.auth_summaries
|
||||
.into_iter()
|
||||
.map(|(key, sum)| {
|
||||
(
|
||||
key.name,
|
||||
key.arity.unwrap_or(0),
|
||||
key.lang.as_str().to_string(),
|
||||
key.namespace,
|
||||
key.container,
|
||||
key.disambig,
|
||||
key.kind,
|
||||
sum,
|
||||
)
|
||||
})
|
||||
.collect();
|
||||
let cross_pkg_imports = fused.cross_package_imports;
|
||||
|
||||
let path_for_write = path.clone();
|
||||
write_tx.enqueue(move |idx| {
|
||||
let file_id = idx.upsert_file_with_hash(&path_for_write, &hash)?;
|
||||
|
|
@ -302,15 +421,23 @@ pub fn build_index_with_observer(
|
|||
}),
|
||||
)?;
|
||||
|
||||
if !summaries.is_empty() {
|
||||
idx.replace_summaries_for_file(&path_for_write, &hash, &summaries)?;
|
||||
}
|
||||
if !ssa_rows.is_empty() {
|
||||
idx.replace_ssa_summaries_for_file(&path_for_write, &hash, &ssa_rows)?;
|
||||
}
|
||||
if !body_rows.is_empty() {
|
||||
idx.replace_ssa_bodies_for_file(&path_for_write, &hash, &body_rows)?;
|
||||
}
|
||||
// Single transaction for all summary caches, matching the
|
||||
// indexed pass-1 rescan path (`replace_all_for_file`): one
|
||||
// fsync per file, and — critically — auth summaries and
|
||||
// cross-package imports are persisted at build time so the
|
||||
// first indexed scan does not lose them.
|
||||
let cpi_arg = cross_pkg_imports
|
||||
.as_ref()
|
||||
.map(|(ns, map)| (ns.as_str(), map.as_ref()));
|
||||
idx.replace_all_for_file(
|
||||
&path_for_write,
|
||||
&hash,
|
||||
&summaries,
|
||||
&ssa_rows,
|
||||
&body_rows,
|
||||
&auth_rows,
|
||||
cpi_arg,
|
||||
)?;
|
||||
Ok(())
|
||||
})?;
|
||||
|
||||
|
|
@ -409,3 +536,69 @@ app.get('/safe', function(req, res) {
|
|||
"index build should persist SSA summaries for functions with non-trivial SSA effects"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn recover_or_propagate_catches_panics_when_enabled() {
|
||||
// When disabled, the panic propagates (default fail-fast).
|
||||
let disabled = std::panic::catch_unwind(|| {
|
||||
let _ = recover_or_propagate(
|
||||
false,
|
||||
std::path::Path::new("x"),
|
||||
None,
|
||||
|| -> NyxResult<()> { panic!("boom") },
|
||||
);
|
||||
});
|
||||
assert!(
|
||||
disabled.is_err(),
|
||||
"with recovery disabled the panic must propagate"
|
||||
);
|
||||
|
||||
// When enabled, the panic is caught and surfaced as Err so the
|
||||
// index-build loop can log-and-skip instead of aborting.
|
||||
let recovered = recover_or_propagate(
|
||||
true,
|
||||
std::path::Path::new("x"),
|
||||
None,
|
||||
|| -> NyxResult<()> { panic!("boom") },
|
||||
);
|
||||
assert!(
|
||||
recovered.is_err(),
|
||||
"with recovery enabled the panic must become an Err, not unwind"
|
||||
);
|
||||
|
||||
// A clean closure passes its result through unchanged.
|
||||
let ok = recover_or_propagate(true, std::path::Path::new("x"), None, || Ok(7u32));
|
||||
assert_eq!(ok.unwrap(), 7);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn build_index_skips_unreadable_entry_without_aborting() {
|
||||
// A path under the project that cannot be read (here: a directory
|
||||
// entry that fs::read fails on) must be skipped, not abort the build.
|
||||
// Verifies the read-failure log-and-skip branch in the pass-1 loop.
|
||||
let mut cfg = Config::default();
|
||||
cfg.performance.worker_threads = Some(1);
|
||||
cfg.performance.channel_multiplier = 1;
|
||||
cfg.performance.batch_size = 2;
|
||||
|
||||
let td = tempfile::tempdir().unwrap();
|
||||
let project_dir = td.path().join("proj");
|
||||
fs::create_dir(&project_dir).unwrap();
|
||||
// One good file so the build has real work to persist.
|
||||
let good = project_dir.join("good.rs");
|
||||
fs::write(&good, "fn main() {}").unwrap();
|
||||
|
||||
let db_path = td.path().join("proj.sqlite");
|
||||
// Must succeed even though the project may contain entries that are
|
||||
// not plain readable files; the build never returns Err for that.
|
||||
build_index("proj", &project_dir, &db_path, &cfg, false)
|
||||
.expect("index build must not abort on a per-file read error");
|
||||
|
||||
let pool = Indexer::init(&db_path).unwrap();
|
||||
let idx = Indexer::from_pool("proj", &pool).unwrap();
|
||||
let files = idx.get_files("proj").unwrap();
|
||||
assert!(
|
||||
files.iter().any(|p| p == &good),
|
||||
"the readable file must still be indexed"
|
||||
);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -387,7 +387,18 @@ pub(crate) fn verify_findings_for_scan(
|
|||
// can resolve the enclosing function and callgraph entry context
|
||||
// without re-hitting SQLite per finding. Best-effort: a load failure
|
||||
// logs and falls through to the substring heuristics.
|
||||
opts.summaries = load_verify_summaries(project_name, db_path, scan_path);
|
||||
// Resolve a module graph so summaries are keyed with the same
|
||||
// package-qualified namespaces the indexed scan path uses. Reuse
|
||||
// the one already on `config` when present; otherwise build it
|
||||
// best-effort for this root.
|
||||
let verify_cfg_with_graph = ensure_module_graph(scan_path, config);
|
||||
let verify_module_graph = config.module_graph.as_deref().or_else(|| {
|
||||
verify_cfg_with_graph
|
||||
.as_ref()
|
||||
.and_then(|c| c.module_graph.as_deref())
|
||||
});
|
||||
opts.summaries =
|
||||
load_verify_summaries(project_name, db_path, scan_path, verify_module_graph);
|
||||
if let Some(ref summaries) = opts.summaries {
|
||||
opts.callgraph = Some(load_verify_callgraph(summaries));
|
||||
}
|
||||
|
|
@ -589,6 +600,7 @@ fn load_verify_summaries(
|
|||
project: &str,
|
||||
db_path: &Path,
|
||||
scan_root: &Path,
|
||||
module_graph: Option<&crate::resolve::ModuleGraph>,
|
||||
) -> Option<Arc<crate::summary::GlobalSummaries>> {
|
||||
let pool = match Indexer::init(db_path) {
|
||||
Ok(p) => p,
|
||||
|
|
@ -612,9 +624,15 @@ fn load_verify_summaries(
|
|||
}
|
||||
};
|
||||
let root_str = scan_root.to_string_lossy().into_owned();
|
||||
Some(Arc::new(crate::summary::merge_summaries(
|
||||
// Key with package-qualified namespaces (when a module graph is
|
||||
// available) so the verify-path summary index matches the keys the
|
||||
// indexed scan path and pass-1 SSA summaries use. Plain
|
||||
// normalize_namespace keys would diverge for any repo with a named
|
||||
// package.json and silently lose cross-file callee resolution.
|
||||
Some(Arc::new(crate::summary::merge_summaries_with_resolver(
|
||||
all,
|
||||
Some(&root_str),
|
||||
module_graph,
|
||||
)))
|
||||
}
|
||||
|
||||
|
|
@ -1566,6 +1584,12 @@ fn run_topo_batches(
|
|||
|
||||
let batch_results: Vec<(
|
||||
std::path::PathBuf,
|
||||
// `false` when this file's read or analysis failed this
|
||||
// iteration. A failed file must NOT overwrite the diags it
|
||||
// produced in an earlier successful iteration with an empty
|
||||
// vector, so the result loop below skips the diags cache
|
||||
// update when this flag is false.
|
||||
bool,
|
||||
Vec<Diag>,
|
||||
Vec<crate::summary::FuncSummary>,
|
||||
Vec<(
|
||||
|
|
@ -1597,7 +1621,7 @@ fn run_topo_batches(
|
|||
None,
|
||||
);
|
||||
}
|
||||
return (path.to_path_buf(), vec![], vec![], vec![], vec![]);
|
||||
return (path.to_path_buf(), false, vec![], vec![], vec![], vec![]);
|
||||
}
|
||||
};
|
||||
match recover_or_propagate(
|
||||
|
|
@ -1618,6 +1642,7 @@ fn run_topo_batches(
|
|||
pb.inc(0); // don't double-count iterations in progress bar
|
||||
(
|
||||
path.to_path_buf(),
|
||||
true,
|
||||
r.diags,
|
||||
r.summaries,
|
||||
r.ssa_summaries,
|
||||
|
|
@ -1637,7 +1662,7 @@ fn run_topo_batches(
|
|||
None,
|
||||
);
|
||||
}
|
||||
(path.to_path_buf(), vec![], vec![], vec![], vec![])
|
||||
(path.to_path_buf(), false, vec![], vec![], vec![], vec![])
|
||||
}
|
||||
}
|
||||
})
|
||||
|
|
@ -1645,12 +1670,23 @@ fn run_topo_batches(
|
|||
|
||||
let mut ssa_count: usize = 0;
|
||||
let mg = cfg.module_graph.as_deref();
|
||||
for (path, diags, summaries, ssa_summaries, _ssa_bodies) in batch_results {
|
||||
for (path, analysis_ok, diags, summaries, ssa_summaries, _ssa_bodies) in
|
||||
batch_results
|
||||
{
|
||||
// Phase-B: replace (not append) this file's diags
|
||||
// so the cache always reflects the latest
|
||||
// iteration's output. Clean files skipped this
|
||||
// iteration retain their previous diags.
|
||||
diags_by_file.insert(path, diags);
|
||||
//
|
||||
// A file whose read or analysis FAILED this iteration is
|
||||
// not "clean" but must be treated like one for the diags
|
||||
// cache: overwriting with the empty vector would erase
|
||||
// findings the same file produced in an earlier successful
|
||||
// iteration. Its (empty) summaries are likewise a no-op
|
||||
// below, leaving the previous iteration's summaries intact.
|
||||
if analysis_ok {
|
||||
diags_by_file.insert(path, diags);
|
||||
}
|
||||
|
||||
for s in summaries {
|
||||
let key = s.func_key_with_resolver(root_str_ref, mg);
|
||||
|
|
@ -2718,6 +2754,17 @@ pub fn scan_with_index_parallel_observer(
|
|||
| crate::utils::config::AnalysisMode::Taint
|
||||
);
|
||||
|
||||
// Records the pass-1-time content hash for every file whose summary
|
||||
// extraction succeeded (or was correctly skipped as unchanged). The
|
||||
// post-pass-2 persistence loop stamps the `files` table hash only for
|
||||
// these entries, using the pass-1 hash. Files whose pass-1 extraction
|
||||
// FAILED (recovered panic, hard parse failure, transient read error)
|
||||
// are absent, so their `files` row keeps its previous hash and
|
||||
// `should_scan_with_hash` retries them on the next scan instead of
|
||||
// freezing stale summaries against the new content forever.
|
||||
let pass1_safe_hashes: Arc<Mutex<HashMap<PathBuf, Vec<u8>>>> =
|
||||
Arc::new(Mutex::new(HashMap::new()));
|
||||
|
||||
// ── Pass 1: ensure summaries are up‑to‑date ──────────────────────────
|
||||
if needs_taint {
|
||||
if let Some(p) = progress {
|
||||
|
|
@ -2744,6 +2791,7 @@ pub fn scan_with_index_parallel_observer(
|
|||
let scan_root_ref = scan_root.to_path_buf();
|
||||
let persist_errors_ref = Arc::clone(&persist_errors);
|
||||
let skipped_files_ref = Arc::clone(&skipped_files);
|
||||
let pass1_safe_hashes_ref = Arc::clone(&pass1_safe_hashes);
|
||||
let progress_ref = progress.cloned();
|
||||
files.par_iter().for_each_init(
|
||||
|| Indexer::from_pool(project, &pool).expect("db pool"),
|
||||
|
|
@ -2822,6 +2870,14 @@ pub fn scan_with_index_parallel_observer(
|
|||
)
|
||||
})
|
||||
.collect();
|
||||
// Extraction succeeded: this file's `files`
|
||||
// row may be stamped with the pass-1 hash.
|
||||
// Pass-1 persist failures abort the scan
|
||||
// (writer_result? below), so a successful
|
||||
// enqueue is a sufficient safety signal.
|
||||
if let Ok(mut m) = pass1_safe_hashes_ref.lock() {
|
||||
m.insert(path.clone(), hash.clone());
|
||||
}
|
||||
// Single transaction for all four caches:
|
||||
// one fsync per file instead of four.
|
||||
let path_for_write = path.clone();
|
||||
|
|
@ -2855,6 +2911,11 @@ pub fn scan_with_index_parallel_observer(
|
|||
if let Some(p) = &progress_ref {
|
||||
p.inc_skipped(1);
|
||||
}
|
||||
// Skipped-as-unchanged: the `files` row hash already
|
||||
// equals `hash`, so re-stamping it is a safe no-op.
|
||||
if let Ok(mut m) = pass1_safe_hashes_ref.lock() {
|
||||
m.insert(path.clone(), hash.clone());
|
||||
}
|
||||
}
|
||||
} else {
|
||||
tracing::warn!("pass 1: cannot read {}", path.display());
|
||||
|
|
@ -2898,7 +2959,16 @@ pub fn scan_with_index_parallel_observer(
|
|||
let idx = Indexer::from_pool(project, &pool)?;
|
||||
let all = idx.load_all_summaries()?;
|
||||
tracing::info!(summaries = all.len(), "loaded cross-file summaries from DB");
|
||||
let mut gs = summary::merge_summaries(all, Some(&root_str));
|
||||
// Key coarse summaries with the SAME package-qualified namespaces that
|
||||
// pass-1 SSA summaries (namespace_with_package) and pass-2 refinements
|
||||
// (func_key_with_resolver) use. Plain merge_summaries here would key
|
||||
// them by normalize_namespace, so cross-file SSA resolution would miss
|
||||
// every package-resident function in any repo with a named package.json.
|
||||
let mut gs = summary::merge_summaries_with_resolver(
|
||||
all,
|
||||
Some(&root_str),
|
||||
cfg.module_graph.as_deref(),
|
||||
);
|
||||
|
||||
// Load and insert SSA summaries
|
||||
let ssa_rows = idx.load_all_ssa_summaries()?;
|
||||
|
|
@ -3358,6 +3428,10 @@ pub fn scan_with_index_parallel_observer(
|
|||
for d in &topo_diags {
|
||||
by_file.entry(&d.path).or_default().push(d);
|
||||
}
|
||||
let safe_hashes = pass1_safe_hashes
|
||||
.lock()
|
||||
.map(|m| m.clone())
|
||||
.unwrap_or_default();
|
||||
let mut idx = Indexer::from_pool(project, &pool)?;
|
||||
for path in &files {
|
||||
if !path.exists() {
|
||||
|
|
@ -3365,7 +3439,21 @@ pub fn scan_with_index_parallel_observer(
|
|||
continue;
|
||||
}
|
||||
|
||||
let file_id = idx.upsert_file(path)?;
|
||||
// Only stamp the `files` row when pass-1 extraction for this file
|
||||
// succeeded (or was correctly skipped as unchanged); use the
|
||||
// pass-1-time hash, never a fresh re-read (avoids the TOCTOU where
|
||||
// a mid-scan edit would pair pass-1 artifacts with content that was
|
||||
// never analysed). Files whose pass-1 extraction FAILED are absent
|
||||
// from `safe_hashes`: their `files` row is left untouched so its
|
||||
// previous hash forces a pass-1 retry on the next scan instead of
|
||||
// freezing stale summaries against the new content forever. Their
|
||||
// best-effort pass-2 findings still appear in this scan's in-memory
|
||||
// result; we just do not persist them or advance the hash.
|
||||
let Some(pass1_hash) = safe_hashes.get(path) else {
|
||||
continue;
|
||||
};
|
||||
|
||||
let file_id = idx.upsert_file_with_hash(path, pass1_hash)?;
|
||||
let empty: [&Diag; 0] = [];
|
||||
let file_diags = by_file
|
||||
.get(path.to_string_lossy().as_ref())
|
||||
|
|
|
|||
239
src/database.rs
239
src/database.rs
|
|
@ -1053,6 +1053,67 @@ pub mod index {
|
|||
Ok(())
|
||||
}
|
||||
|
||||
/// Recompute a finding's [`crate::patterns::FindingCategory`] from
|
||||
/// its rule id alone.
|
||||
///
|
||||
/// The `issues` table persists only `(rule_id, severity, line, col)`,
|
||||
/// not the category, so cached rows served by
|
||||
/// [`Self::get_issues_from_file`] must reconstruct it. Hardcoding
|
||||
/// `Security` here resurrects quality findings (e.g.
|
||||
/// `rs.quality.unwrap`) past the `include_quality` filter and the
|
||||
/// quality-rollup, producing cold/warm non-determinism for unchanged
|
||||
/// files. Mapping is best-effort and falls back to `Security`:
|
||||
///
|
||||
/// * structural / state-machine ids (`state-*`, `cfg-*`) route through
|
||||
/// [`crate::patterns::FindingCategory::for_structural_rule`]
|
||||
/// (reliability for leaks/error-fallthrough, security otherwise);
|
||||
/// * AST-pattern ids look up their declared
|
||||
/// [`crate::patterns::PatternCategory`] in a one-time index built
|
||||
/// over every language's pattern registry;
|
||||
/// * everything else (taint flows, etc.) stays `Security`.
|
||||
fn category_for_rule_id(rule_id: &str) -> crate::patterns::FindingCategory {
|
||||
use crate::patterns::FindingCategory;
|
||||
use std::sync::OnceLock;
|
||||
|
||||
// Structural / state-machine findings are not AST patterns and
|
||||
// carry no language slug; classify them by id directly.
|
||||
if rule_id.starts_with("state-") || rule_id.starts_with("cfg-") {
|
||||
return FindingCategory::for_structural_rule(rule_id);
|
||||
}
|
||||
|
||||
// Lazily build a rule_id -> FindingCategory index across every
|
||||
// language's static pattern registry. Distinct ids only; alias
|
||||
// slugs share the same pattern slice so duplicate inserts are a
|
||||
// no-op.
|
||||
static PATTERN_CATEGORIES: OnceLock<
|
||||
std::collections::HashMap<String, FindingCategory>,
|
||||
> = OnceLock::new();
|
||||
let map = PATTERN_CATEGORIES.get_or_init(|| {
|
||||
let mut m = std::collections::HashMap::new();
|
||||
for lang in [
|
||||
"rust",
|
||||
"typescript",
|
||||
"javascript",
|
||||
"c",
|
||||
"cpp",
|
||||
"java",
|
||||
"go",
|
||||
"php",
|
||||
"python",
|
||||
"ruby",
|
||||
] {
|
||||
for p in crate::patterns::load(lang) {
|
||||
m.insert(p.id.to_string(), p.category.finding_category());
|
||||
}
|
||||
}
|
||||
m
|
||||
});
|
||||
|
||||
map.get(rule_id)
|
||||
.copied()
|
||||
.unwrap_or(FindingCategory::Security)
|
||||
}
|
||||
|
||||
/// Gets the issues for a specific file so we don't have to rescan
|
||||
pub fn get_issues_from_file(&self, path: &Path) -> NyxResult<Vec<Diag>> {
|
||||
let file_id: i64 = self.c().query_row(
|
||||
|
|
@ -1076,13 +1137,15 @@ pub mod index {
|
|||
);
|
||||
Severity::Medium
|
||||
});
|
||||
let rule_id: String = row.get::<_, String>(0)?;
|
||||
let category = Self::category_for_rule_id(&rule_id);
|
||||
Ok(Diag {
|
||||
path: path.to_string_lossy().to_string(),
|
||||
id: row.get::<_, String>(0)?, // rule_id
|
||||
id: rule_id, // rule_id
|
||||
line: row.get::<_, i64>(2)? as usize,
|
||||
col: row.get::<_, i64>(3)? as usize,
|
||||
severity,
|
||||
category: crate::patterns::FindingCategory::Security,
|
||||
category,
|
||||
path_validated: false,
|
||||
guard_kind: None,
|
||||
message: None,
|
||||
|
|
@ -1695,8 +1758,14 @@ pub mod index {
|
|||
/// * function and auth summaries: DELETE-then-INSERT regardless
|
||||
/// of input length, so emptying a file's summaries clears
|
||||
/// stale rows.
|
||||
/// * SSA summaries and bodies: only touched when the input is
|
||||
/// non-empty, matching the existing scan path.
|
||||
/// * SSA summaries and bodies: DELETE always runs so a file that
|
||||
/// no longer yields any SSA artifacts (functions removed, file
|
||||
/// gutted, or SSA lowering now failing for every function)
|
||||
/// clears its stale high-precedence rows; the INSERT loop is
|
||||
/// skipped when the new set is empty. Leaving stale SSA rows
|
||||
/// would let deleted/phantom functions keep resolving cross-file
|
||||
/// via resolve step 0.5 (SSA summaries outrank coarse
|
||||
/// FuncSummary), so the DELETE must not be gated on non-empty.
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn replace_all_for_file(
|
||||
&mut self,
|
||||
|
|
@ -1780,13 +1849,15 @@ pub mod index {
|
|||
}
|
||||
}
|
||||
|
||||
// ssa_function_summaries, only touched when non-empty.
|
||||
// ssa_function_summaries: DELETE always, INSERT only when
|
||||
// the new set is non-empty, so emptying a file's SSA
|
||||
// summaries clears its stale high-precedence rows.
|
||||
tx.execute(
|
||||
"DELETE FROM ssa_function_summaries
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
if !ssa_summaries.is_empty() {
|
||||
tx.execute(
|
||||
"DELETE FROM ssa_function_summaries
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR REPLACE INTO ssa_function_summaries
|
||||
(project, file_path, file_hash, name, arity, lang, namespace,
|
||||
|
|
@ -1822,13 +1893,15 @@ pub mod index {
|
|||
}
|
||||
}
|
||||
|
||||
// ssa_function_bodies, only touched when non-empty.
|
||||
// ssa_function_bodies: DELETE always, INSERT only when the
|
||||
// new set is non-empty, so emptying a file's SSA bodies
|
||||
// clears its stale high-precedence rows.
|
||||
tx.execute(
|
||||
"DELETE FROM ssa_function_bodies
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
if !ssa_bodies.is_empty() {
|
||||
tx.execute(
|
||||
"DELETE FROM ssa_function_bodies
|
||||
WHERE project = ?1 AND file_path = ?2",
|
||||
params![self.project, path_str],
|
||||
)?;
|
||||
let mut stmt = tx.prepare(
|
||||
"INSERT OR REPLACE INTO ssa_function_bodies
|
||||
(project, file_path, file_hash, name, arity, lang, namespace,
|
||||
|
|
@ -2922,6 +2995,65 @@ fn replace_issues_and_query_back() {
|
|||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn get_issues_from_file_recomputes_category_from_rule_id() {
|
||||
// Regression for the hardcoded `category = Security` bug: cached rows
|
||||
// must reconstruct the finding category from the rule id, otherwise a
|
||||
// quality finding (correctly dropped/rolled-up on the cold scan)
|
||||
// resurfaces as an ungrouped Security finding on every warm scan.
|
||||
use crate::patterns::FindingCategory;
|
||||
|
||||
let td = tempfile::tempdir().unwrap();
|
||||
let db = td.path().join("nyx.sqlite");
|
||||
let file = td.path().join("lib.rs");
|
||||
std::fs::write(&file, "fn main() {}").unwrap();
|
||||
|
||||
let pool = index::Indexer::init(&db).unwrap();
|
||||
let mut idx = index::Indexer::from_pool("proj", &pool).unwrap();
|
||||
let fid = idx.upsert_file(&file).unwrap();
|
||||
|
||||
let issues = [
|
||||
// Tier-A quality AST pattern -> must come back as Quality.
|
||||
index::IssueRow {
|
||||
rule_id: "rs.quality.unwrap",
|
||||
severity: "Low",
|
||||
line: 1,
|
||||
col: 1,
|
||||
},
|
||||
// Structural resource leak -> Reliability.
|
||||
index::IssueRow {
|
||||
rule_id: "state-resource-leak",
|
||||
severity: "Medium",
|
||||
line: 2,
|
||||
col: 1,
|
||||
},
|
||||
// Taint-style id (not in any AST registry) -> Security fallback.
|
||||
index::IssueRow {
|
||||
rule_id: "taint-sql-injection",
|
||||
severity: "High",
|
||||
line: 3,
|
||||
col: 1,
|
||||
},
|
||||
];
|
||||
idx.replace_issues(fid, issues).unwrap();
|
||||
|
||||
let stored = idx.get_issues_from_file(&file).unwrap();
|
||||
let cat = |id: &str| {
|
||||
stored
|
||||
.iter()
|
||||
.find(|d| d.id == id)
|
||||
.unwrap_or_else(|| panic!("missing {id}"))
|
||||
.category
|
||||
};
|
||||
assert_eq!(
|
||||
cat("rs.quality.unwrap"),
|
||||
FindingCategory::Quality,
|
||||
"quality AST pattern must not be resurrected as Security on warm scans"
|
||||
);
|
||||
assert_eq!(cat("state-resource-leak"), FindingCategory::Reliability);
|
||||
assert_eq!(cat("taint-sql-injection"), FindingCategory::Security);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn clear_and_vacuum_reset_tables() {
|
||||
let td = tempfile::tempdir().unwrap();
|
||||
|
|
@ -3286,6 +3418,81 @@ fn ssa_summaries_hash_rescan_replaces_stale() {
|
|||
assert_eq!(loaded[0].1, "new_func");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replace_all_for_file_clears_stale_ssa_rows_when_emptied() {
|
||||
// Regression for the gated-DELETE staleness bug: a file edited so it no
|
||||
// longer yields any SSA summaries (functions removed / file gutted /
|
||||
// lowering now failing) must clear its high-precedence ssa_function_*
|
||||
// rows. Otherwise phantom functions keep resolving cross-file via
|
||||
// resolve step 0.5 (SSA outranks coarse FuncSummary).
|
||||
use crate::labels::Cap;
|
||||
use crate::summary::ssa_summary::{SsaFuncSummary, TaintTransform};
|
||||
|
||||
let td = tempfile::tempdir().unwrap();
|
||||
let db = td.path().join("nyx.sqlite");
|
||||
let f = td.path().join("lib.py");
|
||||
std::fs::write(&f, "v1").unwrap();
|
||||
|
||||
let pool = index::Indexer::init(&db).unwrap();
|
||||
let mut idx = index::Indexer::from_pool("proj", &pool).unwrap();
|
||||
|
||||
let hash_v1 = index::Indexer::digest_bytes(b"v1");
|
||||
let ssa_sums = vec![(
|
||||
"old_func".to_string(),
|
||||
1_usize,
|
||||
"python".to_string(),
|
||||
"lib.py".to_string(),
|
||||
String::new(),
|
||||
None,
|
||||
crate::symbol::FuncKind::Function,
|
||||
SsaFuncSummary {
|
||||
param_to_return: vec![(0, TaintTransform::Identity)],
|
||||
param_to_sink: vec![],
|
||||
source_caps: Cap::empty(),
|
||||
param_to_sink_param: vec![],
|
||||
param_container_to_return: vec![],
|
||||
param_to_container_store: vec![],
|
||||
return_type: None,
|
||||
return_abstract: None,
|
||||
source_to_callback: vec![],
|
||||
receiver_to_return: None,
|
||||
receiver_to_sink: Cap::empty(),
|
||||
abstract_transfer: vec![],
|
||||
param_return_paths: vec![],
|
||||
points_to: Default::default(),
|
||||
field_points_to: Default::default(),
|
||||
return_path_facts: smallvec::SmallVec::new(),
|
||||
typed_call_receivers: vec![],
|
||||
validated_params_to_return: smallvec::SmallVec::new(),
|
||||
param_to_gate_filters: vec![],
|
||||
entry_kind: None,
|
||||
},
|
||||
)];
|
||||
|
||||
// Pass 1: file has one SSA summary.
|
||||
idx.replace_all_for_file(&f, &hash_v1, &[], &ssa_sums, &[], &[], None)
|
||||
.unwrap();
|
||||
assert_eq!(
|
||||
idx.load_all_ssa_summaries().unwrap().len(),
|
||||
1,
|
||||
"SSA summary should persist on first write"
|
||||
);
|
||||
|
||||
// Pass 2: file gutted — new hash, empty SSA summaries/bodies. The old
|
||||
// row must be deleted, not left behind.
|
||||
let hash_v2 = index::Indexer::digest_bytes(b"v2");
|
||||
idx.replace_all_for_file(&f, &hash_v2, &[], &[], &[], &[], None)
|
||||
.unwrap();
|
||||
assert!(
|
||||
idx.load_all_ssa_summaries().unwrap().is_empty(),
|
||||
"stale SSA summary rows must be cleared when the new set is empty"
|
||||
);
|
||||
assert!(
|
||||
idx.load_all_ssa_bodies().unwrap().is_empty(),
|
||||
"stale SSA body rows must be cleared when the new set is empty"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn clear_drops_ssa_summaries_table() {
|
||||
use crate::labels::Cap;
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ pub static RULES: &[LabelRule] = &[
|
|||
case_sensitive: false,
|
||||
},
|
||||
LabelRule {
|
||||
matchers: &["fgets", "scanf", "fscanf", "gets", "read"],
|
||||
matchers: &["fgets", "scanf", "fscanf", "sscanf", "gets", "read"],
|
||||
label: DataLabel::Source(Cap::all()),
|
||||
case_sensitive: false,
|
||||
},
|
||||
|
|
@ -273,6 +273,15 @@ pub static OUTPUT_PARAM_SOURCES: &[(&str, &[usize])] = &[
|
|||
("gets", &[0]), // gets(buf), buf receives input
|
||||
("recv", &[1]), // recv(fd, buf, len, flags)
|
||||
("recvfrom", &[1]), // recvfrom(fd, buf, len, flags, ...)
|
||||
("read", &[1]), // read(fd, buf, len), buf receives attacker bytes
|
||||
// `scanf`/`fscanf`/`sscanf` return a match count; the attacker-controlled
|
||||
// bytes land in the variadic output pointers after the format string.
|
||||
// OUTPUT_PARAM_SOURCES stores a fixed position list, so we enumerate a
|
||||
// conservative span of trailing argument positions to cover the common
|
||||
// single- and multi-conversion forms.
|
||||
("scanf", &[1, 2, 3, 4, 5, 6, 7, 8]), // scanf("%s", buf, ...) , outputs start at arg 1
|
||||
("fscanf", &[2, 3, 4, 5, 6, 7, 8]), // fscanf(stream, "%s", buf, ...) , outputs at arg 2+
|
||||
("sscanf", &[2, 3, 4, 5, 6, 7, 8]), // sscanf(src, "%s", buf, ...) , outputs at arg 2+
|
||||
];
|
||||
|
||||
/// Arg-to-arg taint propagation for known C functions.
|
||||
|
|
@ -288,3 +297,36 @@ pub static ARG_PROPAGATIONS: &[super::ArgPropagation] = &[
|
|||
to_args: &[1],
|
||||
},
|
||||
];
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use crate::labels::output_param_source_positions;
|
||||
|
||||
#[test]
|
||||
fn scanf_family_and_read_taint_output_args() {
|
||||
// `scanf("%s", buf)` , buf is at arg 1.
|
||||
assert_eq!(
|
||||
output_param_source_positions("c", "scanf"),
|
||||
Some([1usize, 2, 3, 4, 5, 6, 7, 8].as_slice())
|
||||
);
|
||||
// `fscanf(stream, "%s", buf)` and `sscanf(src, "%s", buf)` , outputs at arg 2+.
|
||||
assert_eq!(
|
||||
output_param_source_positions("c", "fscanf"),
|
||||
Some([2usize, 3, 4, 5, 6, 7, 8].as_slice())
|
||||
);
|
||||
assert_eq!(
|
||||
output_param_source_positions("c", "sscanf"),
|
||||
Some([2usize, 3, 4, 5, 6, 7, 8].as_slice())
|
||||
);
|
||||
// `read(fd, buf, len)` , buf is at arg 1.
|
||||
assert_eq!(
|
||||
output_param_source_positions("c", "read"),
|
||||
Some([1usize].as_slice())
|
||||
);
|
||||
// Namespaced/qualified callees normalize to the last segment.
|
||||
assert_eq!(
|
||||
output_param_source_positions("c", "std::sscanf"),
|
||||
Some([2usize, 3, 4, 5, 6, 7, 8].as_slice())
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1011,6 +1011,11 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"communication_case" => Kind::Block,
|
||||
"go_statement" => Kind::Block,
|
||||
"defer_statement" => Kind::Block,
|
||||
// `outer: for { ... }` wraps the whole labeled loop in a
|
||||
// labeled_statement; map to Block so the CFG builder recurses into the
|
||||
// inner statement instead of collapsing the loop body into one leaf Seq
|
||||
// node (mirrors c.rs / cpp.rs).
|
||||
"labeled_statement" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
@ -1032,7 +1037,10 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
|
||||
pub static PARAM_CONFIG: ParamConfig = ParamConfig {
|
||||
params_field: "parameters",
|
||||
param_node_kinds: &["parameter_declaration"],
|
||||
// `variadic_parameter_declaration` covers `func run(args ...string)`;
|
||||
// without it the variadic param is dropped, registering wrong arity and
|
||||
// never seeding caller taint into the variadic position.
|
||||
param_node_kinds: &["parameter_declaration", "variadic_parameter_declaration"],
|
||||
self_param_kinds: &[],
|
||||
ident_fields: &["name"],
|
||||
};
|
||||
|
|
@ -1094,3 +1102,27 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
|
|||
|
||||
rules
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::{KINDS, PARAM_CONFIG};
|
||||
use crate::labels::Kind;
|
||||
|
||||
#[test]
|
||||
fn labeled_statement_is_walkable_block() {
|
||||
// `outer: for { ... }` must be a Block so the CFG builder recurses
|
||||
// into the labeled loop body instead of collapsing it to a leaf Seq.
|
||||
assert_eq!(KINDS.get("labeled_statement"), Some(&Kind::Block));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn variadic_param_is_extracted() {
|
||||
// `func run(args ...string)` emits variadic_parameter_declaration;
|
||||
// it must be a recognised param node so arity registers correctly.
|
||||
assert!(
|
||||
PARAM_CONFIG
|
||||
.param_node_kinds
|
||||
.contains(&"variadic_parameter_declaration")
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1069,7 +1069,13 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"block" => Kind::Block,
|
||||
"class_declaration" => Kind::Block,
|
||||
"class_body" => Kind::Block,
|
||||
"interface_declaration" => Kind::Block,
|
||||
"interface_body" => Kind::Block,
|
||||
"enum_declaration" => Kind::Block,
|
||||
"enum_body" => Kind::Block,
|
||||
"enum_body_declarations" => Kind::Block,
|
||||
"record_declaration" => Kind::Block,
|
||||
"synchronized_statement" => Kind::Block,
|
||||
"method_declaration" => Kind::Function,
|
||||
"constructor_declaration" => Kind::Function,
|
||||
"switch_expression" => Kind::Switch,
|
||||
|
|
@ -1126,3 +1132,31 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
|
|||
|
||||
rules
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::KINDS;
|
||||
use crate::labels::Kind;
|
||||
|
||||
#[test]
|
||||
fn container_declarations_are_walkable_blocks() {
|
||||
// interface_declaration must be mapped so its (already-mapped)
|
||||
// interface_body is reachable; enum/record/synchronized bodies
|
||||
// must be walked instead of collapsing to a leaf Seq node.
|
||||
for kind in [
|
||||
"interface_declaration",
|
||||
"interface_body",
|
||||
"enum_declaration",
|
||||
"enum_body",
|
||||
"enum_body_declarations",
|
||||
"record_declaration",
|
||||
"synchronized_statement",
|
||||
] {
|
||||
assert_eq!(
|
||||
KINDS.get(kind),
|
||||
Some(&Kind::Block),
|
||||
"{kind} should map to Kind::Block so the CFG builder walks its body"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1643,6 +1643,11 @@ pub static PARAM_CONFIG: ParamConfig = ParamConfig {
|
|||
"typed_parameter",
|
||||
"default_parameter",
|
||||
"typed_default_parameter",
|
||||
// `*args` / `**kwargs`: without these the splat params are dropped,
|
||||
// registering wrong arity and never seeding caller taint into the
|
||||
// variadic positions.
|
||||
"list_splat_pattern",
|
||||
"dictionary_splat_pattern",
|
||||
],
|
||||
self_param_kinds: &[],
|
||||
ident_fields: &["name"],
|
||||
|
|
@ -1667,9 +1672,26 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
|
|||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::KINDS;
|
||||
use super::{KINDS, PARAM_CONFIG};
|
||||
use crate::labels::Kind;
|
||||
|
||||
#[test]
|
||||
fn splat_params_are_extracted() {
|
||||
// `*args` (list_splat_pattern) and `**kwargs` (dictionary_splat_pattern)
|
||||
// must be recognised param nodes so arity registers correctly and the
|
||||
// variadic positions are seeded with caller taint.
|
||||
assert!(
|
||||
PARAM_CONFIG
|
||||
.param_node_kinds
|
||||
.contains(&"list_splat_pattern")
|
||||
);
|
||||
assert!(
|
||||
PARAM_CONFIG
|
||||
.param_node_kinds
|
||||
.contains(&"dictionary_splat_pattern")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn lambda_classified_as_function() {
|
||||
assert_eq!(KINDS.get("lambda"), Some(&Kind::Function));
|
||||
|
|
|
|||
|
|
@ -542,6 +542,11 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
// structure
|
||||
"program" => Kind::SourceFile,
|
||||
"body_statement" => Kind::Block,
|
||||
// Brace blocks (`{ ... }`, including `->(){ ... }` lambda bodies) expose
|
||||
// their body as a `block_body` node (do-end blocks use `body_statement`,
|
||||
// already mapped above). Without this, multi-statement brace-block bodies
|
||||
// collapse into a single leaf Seq node and lose all intra-body taint flow.
|
||||
"block_body" => Kind::Block,
|
||||
"do_block" => Kind::Function,
|
||||
"then" => Kind::Block,
|
||||
"else" => Kind::Block,
|
||||
|
|
@ -718,3 +723,17 @@ mod ar_query_tests {
|
|||
assert!(ar_query_safe_shape("a.b.c.where", "pair", false));
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod kinds_tests {
|
||||
use super::KINDS;
|
||||
use crate::labels::Kind;
|
||||
|
||||
#[test]
|
||||
fn block_body_is_walkable_block() {
|
||||
// Brace blocks (incl. lambda bodies) expose their body as `block_body`;
|
||||
// it must be a Block so multi-statement bodies are walked instead of
|
||||
// collapsing into one leaf Seq node and losing intra-body taint flow.
|
||||
assert_eq!(KINDS.get("block_body"), Some(&Kind::Block));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -403,6 +403,12 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"impl_item" => Kind::Block,
|
||||
"trait_item" => Kind::Block,
|
||||
"declaration_list" => Kind::Block,
|
||||
// Inline modules `mod foo { ... }` wrap their items in a
|
||||
// `declaration_list`; map to Block so the CFG builder recurses into the
|
||||
// body and the `function_item`s inside are lowered, instead of dropping
|
||||
// the whole module (the old `Kind::Trivia` mapping discarded every
|
||||
// function/source/sink inside an inline module).
|
||||
"mod_item" => Kind::Block,
|
||||
|
||||
// data-flow
|
||||
"call_expression" => Kind::CallFn,
|
||||
|
|
@ -430,7 +436,6 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
|||
"{" => Kind::Trivia, "}" => Kind::Trivia, "\n" => Kind::Trivia,
|
||||
"use_declaration" => Kind::Trivia,
|
||||
"attribute_item" => Kind::Trivia,
|
||||
"mod_item" => Kind::Trivia,
|
||||
"type_item" => Kind::Trivia,
|
||||
};
|
||||
|
||||
|
|
@ -614,3 +619,18 @@ pub fn phase_c_auth_rules() -> Vec<RuntimeLabelRule> {
|
|||
},
|
||||
]
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::KINDS;
|
||||
use crate::labels::Kind;
|
||||
|
||||
#[test]
|
||||
fn mod_item_is_walkable_block_not_trivia() {
|
||||
// Inline `mod foo { ... }` must be a Block so the CFG builder recurses
|
||||
// into the module body; the old Trivia mapping dropped every function,
|
||||
// source, and sink inside inline modules.
|
||||
assert_eq!(KINDS.get("mod_item"), Some(&Kind::Block));
|
||||
assert_ne!(KINDS.get("mod_item"), Some(&Kind::Trivia));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -546,6 +546,34 @@ pub static GATED_SINKS: &[SinkGate] = &[
|
|||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::ValueMatch,
|
||||
},
|
||||
// ── Lodash `_.template` SSTI/RCE gates, mirrors `labels/javascript.rs` ─
|
||||
// (Strapi CVE-2023-22621 class). Lodash compiles `<% ... %>` evaluate
|
||||
// blocks into a JS Function; gate on the `evaluate` option and fire
|
||||
// conservatively when missing/dynamic.
|
||||
SinkGate {
|
||||
callee_matcher: "_.template",
|
||||
arg_index: 0,
|
||||
dangerous_values: &["true"],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::CODE_EXEC),
|
||||
case_sensitive: true,
|
||||
payload_args: &[0],
|
||||
keyword_name: Some("evaluate"),
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::ValueMatch,
|
||||
},
|
||||
SinkGate {
|
||||
callee_matcher: "lodash.template",
|
||||
arg_index: 0,
|
||||
dangerous_values: &["true"],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::CODE_EXEC),
|
||||
case_sensitive: true,
|
||||
payload_args: &[0],
|
||||
keyword_name: Some("evaluate"),
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::ValueMatch,
|
||||
},
|
||||
// ── XML XXE gates, mirrors `labels/javascript.rs` ────────────────────
|
||||
SinkGate {
|
||||
callee_matcher: "xml2js.parseString",
|
||||
|
|
@ -762,6 +790,37 @@ pub static GATED_SINKS: &[SinkGate] = &[
|
|||
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
|
||||
},
|
||||
},
|
||||
// Node `http.get` / `https.get` convenience wrappers around `.request()`.
|
||||
// Same destination semantics. Motivated by CVE-2025-64430 (Parse Server
|
||||
// SSRF via http.get(uri)). Mirrors `labels/javascript.rs`.
|
||||
SinkGate {
|
||||
callee_matcher: "http.get",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::SSRF),
|
||||
case_sensitive: false,
|
||||
payload_args: &[0],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
|
||||
},
|
||||
},
|
||||
SinkGate {
|
||||
callee_matcher: "https.get",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::SSRF),
|
||||
case_sensitive: false,
|
||||
payload_args: &[0],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
|
||||
},
|
||||
},
|
||||
// ── Cross-boundary data exfiltration ──────────────────────────────────
|
||||
// `fetch(input, init)`, payload-bearing fields of `init` (arg 1) flow
|
||||
// into the request body / headers / json, distinct from SSRF on the URL
|
||||
|
|
@ -1070,6 +1129,69 @@ pub static GATED_SINKS: &[SinkGate] = &[
|
|||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::LiteralOnly,
|
||||
},
|
||||
// `set-value` standalone helper (CVE-2019-10747 / CVE-2021-23440) —
|
||||
// recursive set-by-path helper that did not block `__proto__` keys.
|
||||
// Mirrors `labels/javascript.rs`.
|
||||
SinkGate {
|
||||
callee_matcher: "setValue",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
|
||||
case_sensitive: true,
|
||||
payload_args: &[1, 2],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &[],
|
||||
},
|
||||
},
|
||||
// `dot-prop` standalone helper: `dotProp.set(obj, path, val)` —
|
||||
// CVE-2020-8116. Mirrors `labels/javascript.rs`.
|
||||
SinkGate {
|
||||
callee_matcher: "dotProp.set",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
|
||||
case_sensitive: true,
|
||||
payload_args: &[1, 2],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &[],
|
||||
},
|
||||
},
|
||||
// `jsonpath` / `jsonpath-plus` `jp.set(obj, path, value)` family —
|
||||
// mirrors `labels/javascript.rs`.
|
||||
SinkGate {
|
||||
callee_matcher: "jp.set",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
|
||||
case_sensitive: true,
|
||||
payload_args: &[1, 2],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &[],
|
||||
},
|
||||
},
|
||||
SinkGate {
|
||||
callee_matcher: "jsonpath.set",
|
||||
arg_index: 0,
|
||||
dangerous_values: &[],
|
||||
dangerous_prefixes: &[],
|
||||
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
|
||||
case_sensitive: false,
|
||||
payload_args: &[1, 2],
|
||||
keyword_name: None,
|
||||
dangerous_kwargs: &[],
|
||||
activation: GateActivation::Destination {
|
||||
object_destination_fields: &[],
|
||||
},
|
||||
},
|
||||
];
|
||||
|
||||
pub static KINDS: Map<&'static str, Kind> = phf_map! {
|
||||
|
|
|
|||
|
|
@ -117,7 +117,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "createHash"))
|
||||
arguments: (arguments
|
||||
(string) @alg (#match? @alg "\"(md5|sha1)\"")))
|
||||
(string) @alg (#match? @alg "^[\"'](md5|sha1)[\"']$")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
|
|
@ -201,7 +201,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
query: r#"(call_expression
|
||||
function: (identifier) @id (#eq? @id "fetch")
|
||||
arguments: (arguments
|
||||
(string) @url (#match? @url "^\"http://")))
|
||||
(string) @url (#match? @url "^[\"']http://")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
|
|
|
|||
|
|
@ -119,7 +119,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
query: r#"(call
|
||||
method: (identifier) @m (#eq? @m "open")
|
||||
arguments: (argument_list
|
||||
(string) @url (#match? @url "^\"https?://")))
|
||||
(string) @url (#match? @url "^[\"']https?://")))
|
||||
@vuln"#,
|
||||
severity: Severity::Medium,
|
||||
tier: PatternTier::A,
|
||||
|
|
|
|||
|
|
@ -88,7 +88,7 @@ pub const PATTERNS: &[Pattern] = &[
|
|||
function: (member_expression
|
||||
property: (property_identifier) @prop (#eq? @prop "createHash"))
|
||||
arguments: (arguments
|
||||
(string) @alg (#match? @alg "\"(md5|sha1)\"")))
|
||||
(string) @alg (#match? @alg "^[\"'](md5|sha1)[\"']$")))
|
||||
@vuln"#,
|
||||
severity: Severity::Low,
|
||||
tier: PatternTier::A,
|
||||
|
|
|
|||
|
|
@ -558,6 +558,41 @@ impl AnalysisState {
|
|||
}
|
||||
changed
|
||||
}
|
||||
SsaOp::Call {
|
||||
callee, receiver, ..
|
||||
} => {
|
||||
// Re-project the `Field(pt(receiver), ELEM)` cell for
|
||||
// container reads. `transfer_inst` (pass 1) projects this
|
||||
// from a snapshot of `pt(receiver)`, but the receiver's set
|
||||
// is often still empty or partial on that pass (e.g. the
|
||||
// container comes from a `FieldProj` that only resolves in
|
||||
// the fixpoint, or from a value unified by a later `Assign`).
|
||||
// Without re-projecting here, the promised `Field(_, ELEM)`
|
||||
// members never appear and per-element parent-field aliasing
|
||||
// through the read result is dropped. The fresh-alloc
|
||||
// fallback added in pass 1 is preserved (`union_in_place`
|
||||
// only adds members), so this strictly refines the set.
|
||||
let Some(rcv) = receiver else {
|
||||
return false;
|
||||
};
|
||||
if !is_container_read_callee(callee) || (rcv.0 as usize) >= self.parent.len() {
|
||||
return false;
|
||||
}
|
||||
let rcv_rep = self.find(rcv.0) as usize;
|
||||
let rcv_pt = self.pt[rcv_rep].clone();
|
||||
// Mirror the pass-1 guard: skip empty (nothing to project)
|
||||
// and Top (already maximally imprecise) receivers.
|
||||
if rcv_pt.is_empty() || rcv_pt.is_top() {
|
||||
return false;
|
||||
}
|
||||
let mut new_pt = PointsToSet::empty();
|
||||
for parent_loc in rcv_pt.iter() {
|
||||
let proj = self.interner.intern_field(parent_loc, FieldId::ELEM);
|
||||
new_pt.insert(proj);
|
||||
}
|
||||
let v_rep = self.find(v) as usize;
|
||||
self.pt[v_rep].union_in_place(&new_pt)
|
||||
}
|
||||
// No re-propagation needed for leaf ops.
|
||||
_ => false,
|
||||
}
|
||||
|
|
@ -1015,6 +1050,68 @@ mod tests {
|
|||
);
|
||||
}
|
||||
|
||||
/// Regression for the fixpoint re-projection of container reads.
|
||||
///
|
||||
/// Shape: `e := this.queue.shift()`. The receiver of `shift` is a
|
||||
/// `FieldProj` (`this.queue`) whose points-to set is *empty* on pass 1
|
||||
/// (field projections only resolve in the fixpoint). Before the fix,
|
||||
/// `propagate_inst` never re-evaluated `SsaOp::Call`, so the promised
|
||||
/// `Field(Field(SelfParam, "queue"), ELEM)` member was never projected
|
||||
/// into the result and only the fresh allocation remained. After the
|
||||
/// fix the nested ELEM cell must appear once the receiver converges.
|
||||
#[test]
|
||||
fn container_read_reprojects_elem_after_receiver_converges() {
|
||||
let mut b = BodyBuilder::new();
|
||||
// `this` is the self parameter.
|
||||
let this = b.fresh(Some("this"));
|
||||
b.emit(this, SsaOp::SelfParam, Some("this"));
|
||||
// `q := this.queue`, a field projection whose pt is empty in pass 1.
|
||||
let queue_field = b.intern_field("queue");
|
||||
let q = b.fresh(Some("q"));
|
||||
b.emit(
|
||||
q,
|
||||
SsaOp::FieldProj {
|
||||
receiver: this,
|
||||
field: queue_field,
|
||||
projected_type: None,
|
||||
},
|
||||
Some("q"),
|
||||
);
|
||||
// `e := q.shift()`, container read whose receiver (`q`) only
|
||||
// converges during the fixpoint.
|
||||
let e = b.fresh(Some("e"));
|
||||
b.emit(
|
||||
e,
|
||||
SsaOp::Call {
|
||||
callee: "shift".into(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: Some(q),
|
||||
},
|
||||
Some("e"),
|
||||
);
|
||||
|
||||
let body = b.build();
|
||||
let facts = analyse_body(&body, BodyId(0));
|
||||
let pt_e = facts.pt(e);
|
||||
// The result must now include a Field(_, ELEM) member projected
|
||||
// from the converged receiver set (not just the fresh alloc).
|
||||
let mut saw_elem = false;
|
||||
for loc in pt_e.iter() {
|
||||
if let crate::pointer::AbsLoc::Field { field, .. } = facts.interner.resolve(loc)
|
||||
&& *field == FieldId::ELEM
|
||||
{
|
||||
saw_elem = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
assert!(
|
||||
saw_elem,
|
||||
"container read with a FieldProj receiver must re-project \
|
||||
Field(_, ELEM) in the fixpoint; got {pt_e:?}"
|
||||
);
|
||||
}
|
||||
|
||||
/// `extract_field_points_to` records a field
|
||||
/// READ on the parameter index when a `FieldProj` traces back to
|
||||
/// an `AbsLoc::Param`.
|
||||
|
|
|
|||
|
|
@ -996,14 +996,35 @@ fn resolve_file_or_index(candidate: &Path) -> Option<PathBuf> {
|
|||
if candidate.is_file() {
|
||||
return Some(normalize_path(candidate));
|
||||
}
|
||||
for ext in RESOLVE_EXTENSIONS {
|
||||
let mut with_ext = candidate.to_path_buf();
|
||||
match with_ext.extension() {
|
||||
Some(_) => {}
|
||||
None => {
|
||||
with_ext.set_extension(ext);
|
||||
if with_ext.is_file() {
|
||||
return Some(normalize_path(&with_ext));
|
||||
// Node / TS resolution appends each candidate extension to the FULL
|
||||
// specifier, regardless of whether the last segment already contains a
|
||||
// dot. `Path::set_extension` *replaces* the trailing component after the
|
||||
// last `.`, so it is wrong here: it would turn `./user.service` into
|
||||
// `./user.ts` and never produce `./user.service.ts` (the dominant
|
||||
// Angular/NestJS `.service`/`.component`/`.module`/`.controller`
|
||||
// convention). Build the appended filename by hand instead.
|
||||
if let Some(name) = candidate.file_name().and_then(|n| n.to_str()) {
|
||||
for ext in RESOLVE_EXTENSIONS {
|
||||
let appended = candidate.with_file_name(format!("{name}.{ext}"));
|
||||
if appended.is_file() {
|
||||
return Some(normalize_path(&appended));
|
||||
}
|
||||
}
|
||||
}
|
||||
// TypeScript NodeNext / ESM idiom: an `import './x.js'` (or `.mjs` /
|
||||
// `.cjs`) frequently refers to `x.ts` / `x.mts` / `x.cts` on disk. When
|
||||
// the specifier carries a JS-family extension that did not resolve as a
|
||||
// file above, retry with each TS-family extension swapped in.
|
||||
if let Some(stem) = candidate.file_stem().and_then(|s| s.to_str()) {
|
||||
let has_js_ext = candidate
|
||||
.extension()
|
||||
.and_then(|e| e.to_str())
|
||||
.is_some_and(|e| matches!(e, "js" | "jsx" | "mjs" | "cjs"));
|
||||
if has_js_ext {
|
||||
for ext in ["ts", "tsx", "mts", "cts"] {
|
||||
let swapped = candidate.with_file_name(format!("{stem}.{ext}"));
|
||||
if swapped.is_file() {
|
||||
return Some(normalize_path(&swapped));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1040,3 +1061,73 @@ fn normalize_path(p: &Path) -> PathBuf {
|
|||
fn canonicalize_or_owned(p: &Path) -> PathBuf {
|
||||
p.canonicalize().unwrap_or_else(|_| p.to_path_buf())
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod resolve_file_ext_tests {
|
||||
use super::resolve_file_or_index;
|
||||
use std::fs;
|
||||
|
||||
/// `import './user.service'` must resolve to `user.service.ts` on disk:
|
||||
/// the extension is APPENDED to the full specifier, not replacing the
|
||||
/// `.service` suffix (the Angular/NestJS convention).
|
||||
#[test]
|
||||
fn resolves_dotted_basename_by_appending_extension() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let target = dir.path().join("user.service.ts");
|
||||
fs::write(&target, "export const x = 1;").unwrap();
|
||||
|
||||
// The specifier as the importer would join it: no extension appended.
|
||||
let candidate = dir.path().join("user.service");
|
||||
let resolved = resolve_file_or_index(&candidate)
|
||||
.expect("'./user.service' should resolve to user.service.ts");
|
||||
assert!(
|
||||
resolved.ends_with("user.service.ts"),
|
||||
"expected user.service.ts, got {resolved:?}"
|
||||
);
|
||||
}
|
||||
|
||||
/// TypeScript NodeNext idiom: `import './x.js'` resolves to `x.ts` on disk.
|
||||
#[test]
|
||||
fn resolves_js_specifier_to_ts_file() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let target = dir.path().join("x.ts");
|
||||
fs::write(&target, "export const y = 2;").unwrap();
|
||||
|
||||
let candidate = dir.path().join("x.js");
|
||||
let resolved = resolve_file_or_index(&candidate)
|
||||
.expect("'./x.js' should resolve to x.ts under NodeNext semantics");
|
||||
assert!(
|
||||
resolved.ends_with("x.ts"),
|
||||
"expected x.ts, got {resolved:?}"
|
||||
);
|
||||
}
|
||||
|
||||
/// Extensionless specifiers still resolve (no regression).
|
||||
#[test]
|
||||
fn resolves_extensionless_specifier() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let target = dir.path().join("plain.ts");
|
||||
fs::write(&target, "export const z = 3;").unwrap();
|
||||
|
||||
let candidate = dir.path().join("plain");
|
||||
let resolved =
|
||||
resolve_file_or_index(&candidate).expect("'./plain' should resolve to plain.ts");
|
||||
assert!(resolved.ends_with("plain.ts"), "got {resolved:?}");
|
||||
}
|
||||
|
||||
/// A non-JS asset import (e.g. `./image.css`) with no matching source on
|
||||
/// disk must NOT spuriously resolve to an unrelated file.
|
||||
#[test]
|
||||
fn does_not_spuriously_resolve_unrelated_asset() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
// A same-stem .ts exists, but a `.css` import is not a JS-family
|
||||
// extension, so the NodeNext swap must not fire.
|
||||
fs::write(dir.path().join("image.ts"), "x").unwrap();
|
||||
|
||||
let candidate = dir.path().join("image.css");
|
||||
assert!(
|
||||
resolve_file_or_index(&candidate).is_none(),
|
||||
"a .css specifier must not resolve to image.ts"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
528
src/ssa/lower.rs
528
src/ssa/lower.rs
|
|
@ -121,6 +121,64 @@ fn try_lower_field_proj_chain(
|
|||
Some((current, method))
|
||||
}
|
||||
|
||||
/// Remove the *implicit* chain-root argument group that `build_call_args`
|
||||
/// appends when decomposing a chained-receiver call (`a.b.m(...)`) into a
|
||||
/// FieldProj chain plus a bare-method Call.
|
||||
///
|
||||
/// `build_call_args` adds any `info.taint.uses` entry not already in
|
||||
/// `info.call.arg_uses` (and not the receiver) as a trailing "implicit" arg
|
||||
/// group. For a chained callee `a.b.m`, the chain root `a` lands in that
|
||||
/// implicit group; once the FieldProj chain carries it on the typed `receiver`
|
||||
/// channel, re-listing it as a positional argument inflates arity and
|
||||
/// double-taints, so it must be dropped.
|
||||
///
|
||||
/// The previous implementation used `retain` to drop *every* group equal to
|
||||
/// `[base_v]`, which also deleted a **genuine** positional argument that
|
||||
/// happens to be the chain root identifier — e.g. the second `a` in
|
||||
/// `a.b.m(p, a)` or `this` in `this.logger.log(this)`. That silently lost the
|
||||
/// taint flowing through that argument into the callee and shifted later arg
|
||||
/// positions left.
|
||||
///
|
||||
/// This helper fixes that by:
|
||||
/// 1. Skipping the strip entirely when the chain root is a genuine
|
||||
/// positional argument (present in `arg_uses`): there is no implicit
|
||||
/// group to remove in that case, only the real argument.
|
||||
/// 2. Otherwise removing at most ONE matching group — the trailing implicit
|
||||
/// one appended by `build_call_args` — so a coincidental earlier
|
||||
/// `[base_v]` argument still survives.
|
||||
fn strip_implicit_chain_root(
|
||||
callee: &str,
|
||||
info: &crate::cfg::NodeInfo,
|
||||
var_stacks: &HashMap<String, Vec<SsaValue>>,
|
||||
args: &mut Vec<SmallVec<[SsaValue; 2]>>,
|
||||
) {
|
||||
let Some(base_ident) = callee.split('.').next() else {
|
||||
return;
|
||||
};
|
||||
let Some(&base_v) = var_stacks.get(base_ident).and_then(|s| s.last()) else {
|
||||
return;
|
||||
};
|
||||
// If the chain root is itself a genuine positional argument, the matching
|
||||
// `[base_v]` group is that real argument — `build_call_args` did NOT append
|
||||
// an implicit chain-root group (the root is already in `arg_uses`). Keep it.
|
||||
let root_is_real_arg = info
|
||||
.call
|
||||
.arg_uses
|
||||
.iter()
|
||||
.any(|grp| grp.iter().any(|ident| ident.as_str() == base_ident));
|
||||
if root_is_real_arg {
|
||||
return;
|
||||
}
|
||||
// Remove only the last matching group (the appended implicit chain-root
|
||||
// group), preserving any earlier coincidental `[base_v]` argument.
|
||||
if let Some(pos) = args
|
||||
.iter()
|
||||
.rposition(|grp| grp.len() == 1 && grp.first() == Some(&base_v))
|
||||
{
|
||||
args.remove(pos);
|
||||
}
|
||||
}
|
||||
|
||||
/// Lower a CFG to SSA form for a single function scope.
|
||||
///
|
||||
/// `scope` filters nodes by `enclosing_func`:
|
||||
|
|
@ -341,8 +399,9 @@ fn lower_to_ssa_inner(
|
|||
// listed as an exception target indicates a CFG construction bug. Debug
|
||||
// builds panic loudly; release builds warn, record an engine note so
|
||||
// downstream findings carry "SSA lowering bailed" provenance, and fall
|
||||
// through to the existing orphan handling above (the "all definitions"
|
||||
// fallback) which remains sound for taint reachability.
|
||||
// through to the existing orphan handling above, which lowers orphan
|
||||
// (catch) subtrees with a locally-built dominator tree and seeds them from
|
||||
// the most entry-dominating reaching definitions.
|
||||
check_catch_block_reachability_gated(&body);
|
||||
|
||||
Ok(body)
|
||||
|
|
@ -352,9 +411,11 @@ fn lower_to_ssa_inner(
|
|||
/// debug builds and warns + records an engine note in release builds.
|
||||
///
|
||||
/// The current lowering's orphan handling (`process_block` fallback in
|
||||
/// `rename_variables`) already widens to an "all definitions" conservative
|
||||
/// state for blocks without predecessors. That preserves soundness for
|
||||
/// taint reachability but masks CFG-builder bugs: this gate surfaces them.
|
||||
/// `rename_variables`) lowers orphan (catch) subtrees via a locally-built
|
||||
/// dominator tree and seeds them from the most entry-dominating reaching
|
||||
/// definitions, so blocks without predecessors are still renamed and
|
||||
/// populated. That keeps catch-block analysis usable but masks CFG-builder
|
||||
/// bugs: this gate surfaces them.
|
||||
fn check_catch_block_reachability_gated(body: &SsaBody) {
|
||||
let result = super::invariants::check_catch_block_reachability(body);
|
||||
if let Err(err) = result {
|
||||
|
|
@ -923,6 +984,100 @@ fn build_dom_tree_children(
|
|||
children
|
||||
}
|
||||
|
||||
/// Blocks reachable from the entry block (block 0) over the block-level
|
||||
/// successor graph. Blocks NOT in this set are "orphan domain" — catch blocks
|
||||
/// (and everything they dominate) that became unreachable once exception edges
|
||||
/// were stripped before SSA lowering.
|
||||
fn compute_entry_reachable(num_blocks: usize, block_succs: &[Vec<usize>]) -> Vec<bool> {
|
||||
let mut reachable = vec![false; num_blocks];
|
||||
if num_blocks == 0 {
|
||||
return reachable;
|
||||
}
|
||||
let mut stack = vec![0usize];
|
||||
reachable[0] = true;
|
||||
while let Some(b) = stack.pop() {
|
||||
for &s in &block_succs[b] {
|
||||
if !reachable[s] {
|
||||
reachable[s] = true;
|
||||
stack.push(s);
|
||||
}
|
||||
}
|
||||
}
|
||||
reachable
|
||||
}
|
||||
|
||||
/// Build a dominator-tree children list restricted to the orphan domain.
|
||||
///
|
||||
/// The main `simple_fast` walk is rooted at block 0 and never reaches orphan
|
||||
/// blocks (catch entries + their internal subtree), so they have no entry in
|
||||
/// the global dominator tree and their internal successors are never renamed.
|
||||
/// Here we construct a fresh block graph containing a virtual super-root that
|
||||
/// reaches every orphan *entry* (an orphan-domain block with no predecessors),
|
||||
/// plus the orphan-domain blocks and the edges among them. Edges that leave
|
||||
/// the orphan domain (e.g. a catch arm flowing into the entry-reachable
|
||||
/// post-`try` join) are dropped: those target blocks were already processed by
|
||||
/// the main walk and act as boundaries. Running `simple_fast` over this graph
|
||||
/// yields a dominator tree whose `children[entry]` lists the catch's internal
|
||||
/// dominator-tree successors, letting `process_block` recurse through the whole
|
||||
/// handler body.
|
||||
///
|
||||
/// Returned vector is indexed by real block id (length `num_blocks`); entries
|
||||
/// for entry-reachable blocks stay empty.
|
||||
fn build_orphan_dom_tree_children(
|
||||
num_blocks: usize,
|
||||
block_succs: &[Vec<usize>],
|
||||
block_preds: &[Vec<usize>],
|
||||
entry_reachable: &[bool],
|
||||
) -> Vec<Vec<usize>> {
|
||||
let mut children: Vec<Vec<usize>> = vec![vec![]; num_blocks];
|
||||
|
||||
// Collect orphan-domain blocks (unreachable from entry).
|
||||
let orphan_blocks: Vec<usize> = (0..num_blocks).filter(|&b| !entry_reachable[b]).collect();
|
||||
if orphan_blocks.is_empty() {
|
||||
return children;
|
||||
}
|
||||
|
||||
// Build a graph: virtual root node + one node per orphan-domain block.
|
||||
// Node weight stores the real block id; the virtual root uses a sentinel
|
||||
// (u32::MAX) that maps to no real block.
|
||||
let mut g: Graph<u32, ()> = Graph::new();
|
||||
let root = g.add_node(u32::MAX);
|
||||
let mut node_of_block: HashMap<usize, NodeIndex> = HashMap::new();
|
||||
for &b in &orphan_blocks {
|
||||
node_of_block.insert(b, g.add_node(b as u32));
|
||||
}
|
||||
|
||||
for &b in &orphan_blocks {
|
||||
let bn = node_of_block[&b];
|
||||
// Orphan entries (no real predecessors) hang off the virtual root.
|
||||
if block_preds[b].is_empty() {
|
||||
g.add_edge(root, bn, ());
|
||||
}
|
||||
// Intra-orphan-domain edges. Targets that escape the orphan domain
|
||||
// are boundaries (already processed) and are not added.
|
||||
for &s in &block_succs[b] {
|
||||
if let Some(&sn) = node_of_block.get(&s) {
|
||||
g.add_edge(bn, sn, ());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let doms = simple_fast(&g, root);
|
||||
for &b in &orphan_blocks {
|
||||
let bn = node_of_block[&b];
|
||||
if let Some(idom) = doms.immediate_dominator(bn) {
|
||||
// Skip blocks whose immediate dominator is the virtual root: these
|
||||
// are the orphan entries, processed directly by the caller's loop.
|
||||
let idom_w = g[idom];
|
||||
if idom_w != u32::MAX {
|
||||
children[idom_w as usize].push(b);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
children
|
||||
}
|
||||
|
||||
/// Rename variables: dominator tree preorder walk with per-variable stacks.
|
||||
///
|
||||
/// Returns (ssa_blocks, value_defs, cfg_node_map).
|
||||
|
|
@ -1161,16 +1316,22 @@ fn rename_variables(
|
|||
) {
|
||||
Some((recv_v, bare_method)) => {
|
||||
receiver = Some(recv_v);
|
||||
// Strip any positional arg group that exactly matches the
|
||||
// chain root identifier, it has been replaced by the
|
||||
// FieldProj chain receiver, and re-listing it as an
|
||||
// Strip the *implicit* chain-root arg group that
|
||||
// `build_call_args` appends for `taint.uses` not present
|
||||
// in `arg_uses`: the chain root has been replaced by the
|
||||
// FieldProj chain receiver, so re-listing it as an
|
||||
// argument would inflate arity / double-taint.
|
||||
if let Some(base_ident) = callee.split('.').next() {
|
||||
if let Some(base_v) = var_stacks.get(base_ident).and_then(|s| s.last())
|
||||
{
|
||||
args.retain(|grp| !(grp.len() == 1 && grp.first() == Some(base_v)));
|
||||
}
|
||||
}
|
||||
//
|
||||
// Only do this when the chain root is NOT a genuine
|
||||
// positional argument. For `a.b.m(p, a)` the root `a`
|
||||
// is a real second argument (present in `arg_uses`); the
|
||||
// `[a]` group is that argument, not the implicit
|
||||
// chain-root group, and must be kept — otherwise taint
|
||||
// flowing through it into the callee is silently lost.
|
||||
// Also remove at most one matching group (the appended
|
||||
// implicit one) so a coincidental earlier `[base_v]`
|
||||
// positional arg survives.
|
||||
strip_implicit_chain_root(&callee, info, var_stacks, &mut args);
|
||||
(bare_method, Some(callee.clone()))
|
||||
}
|
||||
None => (callee, None),
|
||||
|
|
@ -1306,12 +1467,10 @@ fn rename_variables(
|
|||
) {
|
||||
Some((recv_v, bare_method)) => {
|
||||
receiver = Some(recv_v);
|
||||
if let Some(base_ident) = callee.split('.').next() {
|
||||
if let Some(base_v) = var_stacks.get(base_ident).and_then(|s| s.last())
|
||||
{
|
||||
args.retain(|grp| !(grp.len() == 1 && grp.first() == Some(base_v)));
|
||||
}
|
||||
}
|
||||
// Same implicit-chain-root strip as the primary Call
|
||||
// branch above; keep a genuine positional arg equal to
|
||||
// the chain root. See `strip_implicit_chain_root`.
|
||||
strip_implicit_chain_root(&callee, info, var_stacks, &mut args);
|
||||
(bare_method, Some(callee.clone()))
|
||||
}
|
||||
None => (callee, None),
|
||||
|
|
@ -2179,24 +2338,71 @@ fn rename_variables(
|
|||
// Process orphan blocks (e.g. catch blocks disconnected after exception edge removal).
|
||||
// These blocks have no predecessors and weren't reached by the dominator tree walk.
|
||||
//
|
||||
// Rebuild var_stacks from already-processed instructions so that catch blocks
|
||||
// can reference variables defined before the try block (e.g. `userInput`).
|
||||
let has_orphans =
|
||||
(1..num_blocks).any(|bid| block_preds[bid].is_empty() && ssa_blocks[bid].body.is_empty());
|
||||
// An "orphan domain" is the set of blocks unreachable from the entry block
|
||||
// after exception-edge stripping — the catch entry plus everything it
|
||||
// dominates (internal `if`/`for`/`while`/`switch`/`try` blocks). The main
|
||||
// `simple_fast` dominator walk is rooted at block 0 and never reaches these
|
||||
// blocks, so their `immediate_dominator` is `None` and
|
||||
// `build_dom_tree_children` links neither the catch entry nor its internal
|
||||
// subtree. Processing only the catch *entry* (which has empty preds) via
|
||||
// the global `dom_tree_children` therefore leaves every catch-internal
|
||||
// block body empty — dropping all Source/Sink/Assign instructions past the
|
||||
// catch's first basic block (a soundness false-negative). To fix this we
|
||||
// build a *local* dominator tree over the orphan domain (rooted at a
|
||||
// virtual super-root that reaches every orphan entry) and recurse through
|
||||
// it, so the whole catch subtree is renamed and populated.
|
||||
let entry_reachable = compute_entry_reachable(num_blocks, block_succs);
|
||||
let has_orphans = (1..num_blocks).any(|bid| !entry_reachable[bid]);
|
||||
if has_orphans {
|
||||
// Rebuild var_stacks from all SSA instructions created during the main walk.
|
||||
// This gives orphan blocks access to all variable definitions.
|
||||
// Seed var_stacks with the definitions that actually *reach* the catch.
|
||||
//
|
||||
// The previous implementation rebuilt var_stacks from *all* blocks in
|
||||
// ascending block-id order and took `.last()`, i.e. the highest-block-id
|
||||
// def. For a variable defined before the `try` (block 0) and re-killed
|
||||
// *after* the `try`/`catch` join (a later, higher-id block — e.g.
|
||||
// `x = "safe"` post-catch), `.last()` picked that post-join kill, which
|
||||
// does NOT reach the handler. A catch-side `sink(x)` then resolved to
|
||||
// the killed constant instead of the pre-`try` (possibly tainted) value
|
||||
// — a false negative. (The doc comment further up claiming this is a
|
||||
// sound "all definitions" widening was inaccurate.)
|
||||
//
|
||||
// We restrict the rebuild to *entry-reachable* blocks (orphan-domain
|
||||
// blocks are renamed below, not used as a seed) and iterate them in
|
||||
// DESCENDING block-id order so `.last()` lands on the LOWEST-id — i.e.
|
||||
// most entry-dominating — definition of each variable. Block 0 (which
|
||||
// dominates every block, including the catch) therefore wins over any
|
||||
// post-join reassignment, fixing the reported false negative, while a
|
||||
// variable defined only inside the `try` body still keeps a `try`-side
|
||||
// definition (rather than being dropped), avoiding a new false negative.
|
||||
// Exact reaching-definition resolution for variables redefined along the
|
||||
// protected region would require the pre-strip exception-edge structure,
|
||||
// which isn't available here; this ordering is the conservative
|
||||
// approximation.
|
||||
var_stacks.clear();
|
||||
for block in &ssa_blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
for bid in (0..num_blocks).rev() {
|
||||
if !entry_reachable[bid] {
|
||||
continue;
|
||||
}
|
||||
for inst in ssa_blocks[bid]
|
||||
.phis
|
||||
.iter()
|
||||
.chain(ssa_blocks[bid].body.iter())
|
||||
{
|
||||
if let Some(ref name) = inst.var_name {
|
||||
var_stacks.entry(name.clone()).or_default().push(inst.value);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Build a local dominator tree over the orphan domain so the catch
|
||||
// entry's internal successors are reached during renaming.
|
||||
let orphan_dom_children =
|
||||
build_orphan_dom_tree_children(num_blocks, block_succs, block_preds, &entry_reachable);
|
||||
|
||||
for bid in 1..num_blocks {
|
||||
if block_preds[bid].is_empty() && ssa_blocks[bid].body.is_empty() {
|
||||
// Orphan *entries* are orphan-domain blocks with no predecessors
|
||||
// (their only inbound edges were exception edges, now stripped).
|
||||
if !entry_reachable[bid] && block_preds[bid].is_empty() {
|
||||
process_block(
|
||||
bid,
|
||||
cfg,
|
||||
|
|
@ -2204,7 +2410,7 @@ fn rename_variables(
|
|||
block_succs,
|
||||
block_preds,
|
||||
phi_placements,
|
||||
dom_tree_children,
|
||||
&orphan_dom_children,
|
||||
filtered_edges,
|
||||
&mut var_stacks,
|
||||
&mut ssa_blocks,
|
||||
|
|
@ -2780,6 +2986,208 @@ mod tests {
|
|||
assert!(!ssa.blocks.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn orphan_catch_with_internal_branch_populates_subtree() {
|
||||
// Regression for the soundness hole where a catch block containing
|
||||
// internal control flow (`catch(e){ if(cond){ sink(x) } else { y=2 } }`)
|
||||
// dropped every instruction past the catch's first basic block: the
|
||||
// catch-internal arms were orphan-domain (unreachable from entry) AND
|
||||
// absent from the global dominator tree, so their bodies stayed empty
|
||||
// and `sink(x)` was silently lost.
|
||||
//
|
||||
// Layout:
|
||||
// entry → body(defines x) → exit (normal flow)
|
||||
// body --Exception--> catch(e) → if → [True: sink(x)] [False: y=2] → join → exit
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let body = cfg.add_node(NodeInfo {
|
||||
taint: TaintMeta {
|
||||
defines: Some("x".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let catch = cfg.add_node(NodeInfo {
|
||||
catch_param: true,
|
||||
taint: TaintMeta {
|
||||
defines: Some("e".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let if_node = cfg.add_node(make_node(StmtKind::If));
|
||||
let sink = cfg.add_node(NodeInfo {
|
||||
call: crate::cfg::CallMeta {
|
||||
callee: Some("sink".into()),
|
||||
arg_uses: vec![vec!["x".into()]],
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta {
|
||||
uses: vec!["x".into()],
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let else_assign = cfg.add_node(NodeInfo {
|
||||
taint: TaintMeta {
|
||||
defines: Some("y".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let join = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, body, EdgeKind::Seq);
|
||||
cfg.add_edge(body, exit, EdgeKind::Seq);
|
||||
cfg.add_edge(body, catch, EdgeKind::Exception);
|
||||
cfg.add_edge(catch, if_node, EdgeKind::Seq);
|
||||
cfg.add_edge(if_node, sink, EdgeKind::True);
|
||||
cfg.add_edge(if_node, else_assign, EdgeKind::False);
|
||||
cfg.add_edge(sink, join, EdgeKind::Seq);
|
||||
cfg.add_edge(else_assign, join, EdgeKind::Seq);
|
||||
cfg.add_edge(join, exit, EdgeKind::Seq);
|
||||
|
||||
let ssa = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
|
||||
// The `sink` Call must survive lowering: before the fix the
|
||||
// catch-internal True arm was an empty `unreachable` block.
|
||||
let has_sink_call = ssa.blocks.iter().any(|b| {
|
||||
b.body
|
||||
.iter()
|
||||
.any(|inst| matches!(&inst.op, SsaOp::Call { callee, .. } if callee == "sink"))
|
||||
});
|
||||
assert!(
|
||||
has_sink_call,
|
||||
"catch-internal sink(x) call was dropped; orphan subtree not populated"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn orphan_catch_resolves_pre_try_def_not_post_join_kill() {
|
||||
// Regression for the reaching-definition hole where the orphan
|
||||
// var_stacks rebuild iterated all blocks in ASCENDING block-id order
|
||||
// and took `.last()`, picking the highest-block-id def of each variable
|
||||
// instead of the one that actually reaches the catch. For `x` sourced
|
||||
// before the try (block 0) and re-killed to a constant AFTER the
|
||||
// try/catch join (a later, higher-id block), `.last()` wrongly resolved
|
||||
// a catch-side `sink(x)` to the post-join constant — masking the
|
||||
// pre-try tainted value (a false negative).
|
||||
//
|
||||
// Layout (mirrors `x = source(); try{..}catch(e){ sink(x) }; x="safe"`,
|
||||
// where the post-try kill lands at a control-flow join in a later
|
||||
// block than the pre-try source):
|
||||
// entry → src(defines x = Source) → {b1, b2} → kill(x="safe") → exit
|
||||
// src --Exception--> catch(e) → sink(x) (orphan)
|
||||
// `src` branches (b1/b2) so it ends block 0; the `kill` join is a
|
||||
// higher-id block. The buggy ascending+`.last()` rebuild seeded the
|
||||
// catch with the Const kill; the fixed descending rebuild seeds the
|
||||
// most entry-dominating def (the block-0 Source). A degenerate
|
||||
// straight-line `src→kill` (single coalesced block) is a separate,
|
||||
// documented limitation — exact within-block reaching defs need the
|
||||
// pre-strip exception-point structure, unavailable post-lowering.
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
// Pure source (no callee) so it lowers to `SsaOp::Source` and defines x.
|
||||
let src = cfg.add_node(NodeInfo {
|
||||
taint: TaintMeta {
|
||||
labels: smallvec::smallvec![crate::labels::DataLabel::Source(
|
||||
crate::labels::Cap::all()
|
||||
)],
|
||||
defines: Some("x".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
// Diamond arms so `src` is a branch point (block boundary) and the
|
||||
// `kill` join below starts a strictly-higher-id block.
|
||||
let b1 = cfg.add_node(make_node(StmtKind::Seq));
|
||||
let b2 = cfg.add_node(make_node(StmtKind::Seq));
|
||||
// Post-join kill: defines x from a literal, no uses, not a source →
|
||||
// lowers to `SsaOp::Const`. Higher block id than `src`.
|
||||
let kill = cfg.add_node(NodeInfo {
|
||||
taint: TaintMeta {
|
||||
defines: Some("x".into()),
|
||||
const_text: Some("safe".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let catch = cfg.add_node(NodeInfo {
|
||||
catch_param: true,
|
||||
taint: TaintMeta {
|
||||
defines: Some("e".into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let sink = cfg.add_node(NodeInfo {
|
||||
call: crate::cfg::CallMeta {
|
||||
callee: Some("sink".into()),
|
||||
arg_uses: vec![vec!["x".into()]],
|
||||
..Default::default()
|
||||
},
|
||||
taint: TaintMeta {
|
||||
uses: vec!["x".into()],
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, src, EdgeKind::Seq);
|
||||
cfg.add_edge(src, b1, EdgeKind::Seq);
|
||||
cfg.add_edge(src, b2, EdgeKind::Seq);
|
||||
cfg.add_edge(b1, kill, EdgeKind::Seq);
|
||||
cfg.add_edge(b2, kill, EdgeKind::Seq);
|
||||
cfg.add_edge(kill, exit, EdgeKind::Seq);
|
||||
cfg.add_edge(src, catch, EdgeKind::Exception);
|
||||
cfg.add_edge(catch, sink, EdgeKind::Seq);
|
||||
cfg.add_edge(sink, exit, EdgeKind::Seq);
|
||||
|
||||
let ssa = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
|
||||
// SsaValue of the Source def (the pre-try value that reaches the catch)
|
||||
// and of the Const kill (the post-join value that does NOT reach it).
|
||||
let source_v = ssa
|
||||
.blocks
|
||||
.iter()
|
||||
.flat_map(|b| b.body.iter())
|
||||
.find(|inst| matches!(inst.op, SsaOp::Source))
|
||||
.map(|inst| inst.value)
|
||||
.expect("Source def must be present");
|
||||
let kill_v = ssa
|
||||
.blocks
|
||||
.iter()
|
||||
.flat_map(|b| b.body.iter())
|
||||
.find(|inst| matches!(inst.op, SsaOp::Const(_)))
|
||||
.map(|inst| inst.value)
|
||||
.expect("Const kill must be present");
|
||||
|
||||
// Find the sink Call and its single argument value.
|
||||
let sink_arg = ssa
|
||||
.blocks
|
||||
.iter()
|
||||
.flat_map(|b| b.body.iter())
|
||||
.find_map(|inst| match &inst.op {
|
||||
SsaOp::Call { callee, args, .. } if callee == "sink" => {
|
||||
args.iter().flat_map(|g| g.iter().copied()).next()
|
||||
}
|
||||
_ => None,
|
||||
})
|
||||
.expect("sink(x) Call with an argument must be present");
|
||||
|
||||
assert_eq!(
|
||||
sink_arg, source_v,
|
||||
"catch sink(x) must resolve to the pre-try Source def {source_v:?} \
|
||||
that reaches the handler, not the post-join Const kill {kill_v:?}"
|
||||
);
|
||||
assert_ne!(
|
||||
sink_arg, kill_v,
|
||||
"catch sink(x) wrongly resolved to the post-join Const kill"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn phi_operand_count_equals_pred_count_in_diamond() {
|
||||
// Specific test: phi operands == predecessor count (not just <=)
|
||||
|
|
@ -3988,6 +4396,66 @@ mod tests {
|
|||
);
|
||||
}
|
||||
|
||||
/// Collect the arg groups of the first Call whose bare callee matches.
|
||||
fn call_args_for(body: &SsaBody, bare_callee: &str) -> Option<Vec<SmallVec<[SsaValue; 2]>>> {
|
||||
for blk in &body.blocks {
|
||||
for inst in blk.body.iter() {
|
||||
if let SsaOp::Call { callee, args, .. } = &inst.op {
|
||||
if callee == bare_callee {
|
||||
return Some(args.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn phase2_e2e_chain_root_genuine_positional_arg_is_kept() {
|
||||
// Regression for the soundness hole where decomposing a chained-receiver
|
||||
// call stripped a *genuine* positional argument equal to the chain root.
|
||||
// For `a.b.m(p, a)` the second `a` is a real argument, not the implicit
|
||||
// chain-root group `build_call_args` appends, so it must survive
|
||||
// decomposition. Previously `retain(... == [base_v])` deleted every
|
||||
// `[a]` group, losing the argument (and shifting later positions left).
|
||||
//
|
||||
// Go: `a` and `p` are parameters so the chain root resolves.
|
||||
let with_arg = parse_to_first_body(
|
||||
b"package p\nfunc f(a *T, p string) { a.b.m(p, a) }\n",
|
||||
"go",
|
||||
tree_sitter::Language::from(tree_sitter_go::LANGUAGE),
|
||||
"with.go",
|
||||
);
|
||||
let control = parse_to_first_body(
|
||||
b"package p\nfunc f(a *T, p string) { a.b.m(p) }\n",
|
||||
"go",
|
||||
tree_sitter::Language::from(tree_sitter_go::LANGUAGE),
|
||||
"ctrl.go",
|
||||
);
|
||||
|
||||
let with_args = call_args_for(&with_arg, "m").expect("decomposed m() call present");
|
||||
let ctrl_args = call_args_for(&control, "m").expect("decomposed m() call present");
|
||||
|
||||
// The soundness invariant: adding a genuine positional argument `a`
|
||||
// (which equals the chain root) must add EXACTLY one arg group versus
|
||||
// the control. The buggy `retain(... == [base_v])` deleted the genuine
|
||||
// `a` group, so `with` would have had the SAME count as `control` (the
|
||||
// argument silently vanished). The fix's `root_is_real_arg` guard keeps
|
||||
// it, so `with` has one more group than `control`.
|
||||
//
|
||||
// We compare the two counts rather than asserting an absolute value:
|
||||
// for Go's multi-segment field receiver the implicit chain-root group
|
||||
// `build_call_args` appends is a multi-value group (`[a, a.b, ...]`),
|
||||
// not a clean single `[base_v]`, so the absolute group count is an
|
||||
// implementation detail — the *delta* is the property under test.
|
||||
assert_eq!(
|
||||
with_args.len(),
|
||||
ctrl_args.len() + 1,
|
||||
"a.b.m(p, a) must keep the genuine `a` argument: expected one more \
|
||||
arg group than the control a.b.m(p).\n with: {with_args:?}\n ctrl: {ctrl_args:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn phase2_e2e_python_chained_receiver_emits_field_proj() {
|
||||
// Python: `obj.client.session.send(p)`, 3-segment receiver.
|
||||
|
|
|
|||
|
|
@ -1339,7 +1339,16 @@ pub fn is_int_producing_callee(callee: &str) -> bool {
|
|||
// Peel trailing identity methods (e.g. `.unwrap()`/`.expect("...")` after
|
||||
// `.parse()`) so the underlying numeric-producing verb is exposed.
|
||||
let base = peel_identity_suffix(callee);
|
||||
let suffix = base.rsplit(['.', ':']).next().unwrap_or(&base);
|
||||
// `peel_identity_suffix` normalizes a turbofish callee by truncating at
|
||||
// `<`, which leaves a trailing `::` (e.g. `s.parse::<u32>()` →
|
||||
// `s.parse::`). Trim trailing path separators so the method suffix is
|
||||
// recovered (`parse`) instead of an empty segment.
|
||||
// Trim the trailing path separators a turbofish callee leaves behind:
|
||||
// `peel_identity_suffix` truncates at `<`, so `s.parse::<u32>()` →
|
||||
// `s.parse::`. Trimming recovers the method suffix (`parse`) instead of
|
||||
// an empty segment.
|
||||
let trimmed = base.trim_end_matches([':', '.']);
|
||||
let suffix = trimmed.rsplit(['.', ':']).next().unwrap_or(trimmed);
|
||||
matches!(
|
||||
suffix,
|
||||
"parseInt" | "parseFloat" | "Number" // JS/TS
|
||||
|
|
@ -1348,8 +1357,19 @@ pub fn is_int_producing_callee(callee: &str) -> bool {
|
|||
| "Atoi" | "ParseInt" | "ParseFloat" // Go
|
||||
| "intval" | "floatval" // PHP
|
||||
| "to_i" | "to_f" // Ruby
|
||||
| "parse" // Rust: `.parse::<N>()` / `.parse().unwrap()`, conservative
|
||||
// (most Rust .parse() calls target numeric types)
|
||||
| "parse" // Rust: `.parse::<N>()` / `let n: u32 = s.parse()?` ,
|
||||
// conservative (most `str::parse` targets are numeric).
|
||||
//
|
||||
// KNOWN LIMITATION: the callee text alone cannot
|
||||
// distinguish `parse::<u32>()` from `parse::<PathBuf>()`
|
||||
// / `let p: PathBuf = s.parse()?`. Tagging every `parse`
|
||||
// as `Int` keeps the tested precision behaviour
|
||||
// (`type_facts_suppress_int_typed_shell_arg`: a u16 port
|
||||
// into a shell arg is suppressed) at the cost of a rare
|
||||
// false negative on `let p: PathBuf = s.parse()?;
|
||||
// fs::read(p)`. Distinguishing the two soundly requires
|
||||
// reading the let-binding type annotation into the type
|
||||
// lattice (tracked as a deep-fix; NOT inferable here).
|
||||
)
|
||||
}
|
||||
|
||||
|
|
@ -1372,14 +1392,13 @@ pub fn is_int_producing_callee(callee: &str) -> bool {
|
|||
/// `Byte.toString` / `Boolean.toString` / `Character.toString` ,
|
||||
/// output is `[+-]?\d+(\.\d+)?` / `"true"` / `"false"` / `"NaN"` /
|
||||
/// `"Infinity"`, none of which can carry CRLF or injection metachars.
|
||||
/// * `String.valueOf` static factories , most overloads (`int`,
|
||||
/// `long`, `boolean`, `char`, ...) emit the same digit / boolean /
|
||||
/// single-character text as their per-class `toString`. The
|
||||
/// `Object` overload falls back to `Object.toString()` whose output
|
||||
/// shape depends on the runtime type, but the dominant safe usage
|
||||
/// shape (`String.valueOf(payload.size())`,
|
||||
/// `String.valueOf(rendered.length())`) covers the common
|
||||
/// header-injection mitigation pattern.
|
||||
/// * `String.valueOf` is deliberately **not** covered. Its `String` /
|
||||
/// `Object` overloads return the argument verbatim (or its arbitrary
|
||||
/// `toString()`), so `String.valueOf(req.getParameter("x"))` is an
|
||||
/// identity passthrough — recognising it as a safe-string producer
|
||||
/// silently suppressed real injections. The callee text alone cannot
|
||||
/// distinguish the safe numeric overload from the identity form, so the
|
||||
/// whole family is left unrecognised (soundness over precision).
|
||||
/// * `Class.getName` / `Class.getSimpleName` / `Class.getCanonicalName`
|
||||
/// , the JVM class-name grammar disallows CRLF, quotes, slashes,
|
||||
/// spaces, and shell metacharacters; the dot-separated FQCN is safe
|
||||
|
|
@ -1402,7 +1421,21 @@ pub fn is_safe_string_producing_callee(callee: &str) -> bool {
|
|||
| "Character",
|
||||
"toString",
|
||||
) => return true,
|
||||
("String", "valueOf") => return true,
|
||||
// `String.valueOf` is NOT safe-by-construction: while the
|
||||
// primitive overloads (`valueOf(int)` / `valueOf(boolean)` /
|
||||
// `valueOf(char)`) emit digit / boolean / single-char text,
|
||||
// `valueOf(String s)` returns `s` *verbatim* and the
|
||||
// `valueOf(Object o)` overload returns `o.toString()` whose
|
||||
// shape is arbitrary. The callee text (`String.valueOf`)
|
||||
// carries no argument-type signal, so we cannot tell the safe
|
||||
// numeric form from the identity form here. Tagging it as a
|
||||
// safe-string producer suppressed real injections like
|
||||
// `stmt.execute(String.valueOf(req.getParameter("x")))`
|
||||
// (the suppression consumer `apply_arg_type_safe_suppression`
|
||||
// drops the tainted arg at the sink). Erring toward soundness:
|
||||
// `String.valueOf` is left unrecognised, so the numeric
|
||||
// mitigation shape (`String.valueOf(payload.size())`) is no
|
||||
// longer auto-suppressed but no injection is silently dropped.
|
||||
("Class", "getName" | "getSimpleName" | "getCanonicalName") => return true,
|
||||
_ => {}
|
||||
}
|
||||
|
|
@ -1552,7 +1585,21 @@ pub fn analyze_types_with_param_types(
|
|||
}
|
||||
SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object),
|
||||
SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object),
|
||||
SsaOp::Call { callee, args, .. } => {
|
||||
SsaOp::Call {
|
||||
callee,
|
||||
callee_text,
|
||||
args,
|
||||
..
|
||||
} => {
|
||||
// For the Rust `.parse()` numeric-turbofish gate (see
|
||||
// `is_int_producing_callee`) the discriminating
|
||||
// `::<T>` lives in the *original* callee text, which SSA
|
||||
// moves into `callee_text` when it decomposes a chained
|
||||
// receiver (`x.parse::<u32>()` → `callee = "parse"`,
|
||||
// `callee_text = Some("x.parse::<u32>")`). Prefer the
|
||||
// full text so the turbofish survives; `callee` is the
|
||||
// canonical fallback when no decomposition occurred.
|
||||
let callee_for_int = callee_text.as_deref().unwrap_or(callee);
|
||||
// CFG marks `Object.create(null)` (and future
|
||||
// null-prototype constructors) at lowering time.
|
||||
// Honour it ahead of generic constructor / arg-aware
|
||||
|
|
@ -1588,7 +1635,7 @@ pub fn analyze_types_with_param_types(
|
|||
lang.and_then(|l| arg_aware_call_type(l, callee, args, consts))
|
||||
{
|
||||
TypeFact::from_kind(ty)
|
||||
} else if is_int_producing_callee(callee) {
|
||||
} else if is_int_producing_callee(callee_for_int) {
|
||||
TypeFact::from_kind(TypeKind::Int)
|
||||
} else if is_safe_string_producing_callee(callee) {
|
||||
// Numeric/boolean to-string converters and class-name
|
||||
|
|
@ -3439,4 +3486,62 @@ mod tests {
|
|||
"createQuery is overloaded — must not map at constructor_type level"
|
||||
);
|
||||
}
|
||||
|
||||
/// Audit #3: `String.valueOf` is an identity passthrough for its
|
||||
/// `String`/`Object` overloads, so it must NOT be recognised as a
|
||||
/// safe-string producer (which would suppress real injections like
|
||||
/// `stmt.execute(String.valueOf(req.getParameter("x")))`). The
|
||||
/// genuinely-safe numeric to-string converters and class-name
|
||||
/// accessors are still recognised.
|
||||
#[test]
|
||||
fn string_valueof_not_safe_string_producer() {
|
||||
// The defect: must be false now (was unconditionally true).
|
||||
assert!(
|
||||
!is_safe_string_producing_callee("String.valueOf"),
|
||||
"String.valueOf is identity for String/Object overloads — unsound to treat as safe"
|
||||
);
|
||||
assert!(
|
||||
!is_safe_string_producing_callee("java.lang.String.valueOf"),
|
||||
"fully-qualified String.valueOf must also be unrecognised"
|
||||
);
|
||||
// Regression guard: the genuinely safe converters/accessors remain.
|
||||
assert!(is_safe_string_producing_callee("Integer.toString"));
|
||||
assert!(is_safe_string_producing_callee("Long.toString"));
|
||||
assert!(is_safe_string_producing_callee("Boolean.toString"));
|
||||
assert!(is_safe_string_producing_callee("Class.getName"));
|
||||
assert!(is_safe_string_producing_callee("loaded.getClass().getName"));
|
||||
}
|
||||
|
||||
/// Audit #4: bare Rust `.parse()` is generic over `FromStr`, but the
|
||||
/// callee text cannot distinguish `parse::<u32>()` from
|
||||
/// `parse::<PathBuf>()`. The engine deliberately tags every `parse` as
|
||||
/// `Int` (precision: a u16 port into a shell arg is suppressed — pinned by
|
||||
/// `cfg_analysis::tests::type_facts_suppress_int_typed_shell_arg`), at the
|
||||
/// cost of a rare FN on `let p: PathBuf = s.parse()?`. Distinguishing the
|
||||
/// two soundly needs let-annotation typing (deep-fix queue), not the
|
||||
/// callee text. This test pins the chosen behaviour + cross-language
|
||||
/// converters + turbofish-form robustness.
|
||||
#[test]
|
||||
fn rust_parse_is_int_producing_and_turbofish_forms_are_robust() {
|
||||
// Bare parse and its identity-peeled / receiver-qualified forms — Int.
|
||||
assert!(is_int_producing_callee("parse"));
|
||||
assert!(is_int_producing_callee("s.parse"));
|
||||
assert!(is_int_producing_callee("parse().unwrap()"));
|
||||
// Turbofish forms survive `peel_identity_suffix` truncation at `<`
|
||||
// (the trailing `::` is trimmed back to the `parse` suffix).
|
||||
assert!(is_int_producing_callee("s.parse::<u32>()"));
|
||||
assert!(is_int_producing_callee("s.parse::<i64>()"));
|
||||
assert!(is_int_producing_callee(
|
||||
"contents.trim().parse::<u32>().expect(\"invalid pid\")"
|
||||
));
|
||||
// Other languages' numeric converters remain unconditional.
|
||||
assert!(is_int_producing_callee("parseInt"));
|
||||
assert!(is_int_producing_callee("Number"));
|
||||
assert!(is_int_producing_callee("Atoi"));
|
||||
assert!(is_int_producing_callee("strconv.ParseInt"));
|
||||
assert!(is_int_producing_callee("x.to_i"));
|
||||
// Negative: a non-numeric-producing method must not be Int.
|
||||
assert!(!is_int_producing_callee("toUpperCase"));
|
||||
assert!(!is_int_producing_callee("trim"));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -407,8 +407,23 @@ pub fn extract_findings(
|
|||
if !lifecycle.contains(ResourceLifecycle::CLOSED) {
|
||||
continue;
|
||||
}
|
||||
// Check if there are intervening Call nodes between acquire and release
|
||||
// in the CFG (these could throw and bypass the release)
|
||||
// Check if there are intervening Call nodes between acquire and
|
||||
// release in the CFG (these could throw and bypass the release).
|
||||
//
|
||||
// NOTE: a stricter variant (audit #59) tried to exclude the
|
||||
// resource's own lifecycle ops (the acquire/release proxy
|
||||
// calls) and require reachability from the acquire node, to
|
||||
// suppress spurious findings on correctly open/close-paired
|
||||
// proxies. That over-suppressed a *tested* true positive: a
|
||||
// class-field resource (`this.fd = fs.openSync(...)` in `open()`
|
||||
// with `close()` in a separate method — see
|
||||
// `tests/fixtures/real_world/typescript/cfg/try_catch_typed.ts`)
|
||||
// has only its own acquire call in scope, so excluding it left
|
||||
// zero intervening calls and dropped the must-match leak
|
||||
// finding. Distinguishing a clean same-scope open/close pair
|
||||
// from a cross-method field leak needs proper inter-method
|
||||
// lifecycle modelling (deep-fix queue), so we keep the original
|
||||
// span-based exclusion here.
|
||||
let has_intervening_calls = cfg.node_references().any(|(_, ni)| {
|
||||
ni.kind == StmtKind::Call
|
||||
&& ni.ast.enclosing_func == info.ast.enclosing_func
|
||||
|
|
@ -567,11 +582,23 @@ fn is_web_entrypoint_simple(
|
|||
_ => &["request", "req"],
|
||||
};
|
||||
|
||||
let has_web_params = func_summaries.values().any(|s| {
|
||||
s.param_names
|
||||
.iter()
|
||||
.any(|p| web_params.contains(&p.to_ascii_lowercase().as_str()))
|
||||
});
|
||||
// Confirm web parameters against THIS candidate handler only, not any
|
||||
// function in the file. Scanning every summary made an unrelated
|
||||
// function's `req`/`r`/`ctx` parameter promote every
|
||||
// `process_*`/`api_*`/`serve_*` function in the file to a web
|
||||
// entrypoint, firing High-severity state-unauthed-access on batch/CLI
|
||||
// code. Filter the file-level summary map down to the named function
|
||||
// via `FuncKey.name` (matches `info.ast.enclosing_func`); summary
|
||||
// `entry` NodeIndexes are not valid in the per-body CFG, so the name
|
||||
// is the safe selector here.
|
||||
let has_web_params = func_summaries
|
||||
.iter()
|
||||
.filter(|(key, _)| key.name == func_name)
|
||||
.any(|(_, s)| {
|
||||
s.param_names
|
||||
.iter()
|
||||
.any(|p| web_params.contains(&p.to_ascii_lowercase().as_str()))
|
||||
});
|
||||
|
||||
// Only handle_* and route_* are strong enough to skip param confirmation.
|
||||
// api_*, serve_*, process_* require web parameter evidence.
|
||||
|
|
@ -916,4 +943,68 @@ mod tests {
|
|||
);
|
||||
assert_eq!(findings[0].rule_id, "state-resource-leak");
|
||||
}
|
||||
|
||||
/// Finding #64: `is_web_entrypoint_simple` must confirm web parameters
|
||||
/// against the *candidate* handler only, not any function in the file.
|
||||
/// Before the fix, an unrelated `read_stream(req)` in the same file
|
||||
/// promoted every `process_*` function to a web entrypoint, firing
|
||||
/// High-severity `state-unauthed-access` on batch/CLI code.
|
||||
#[test]
|
||||
fn web_entrypoint_param_confirmation_is_per_function() {
|
||||
use crate::cfg::LocalFuncSummary;
|
||||
use crate::symbol::FuncKey;
|
||||
use petgraph::graph::NodeIndex;
|
||||
|
||||
fn summary(name: &str, params: &[&str]) -> (FuncKey, LocalFuncSummary) {
|
||||
let key = FuncKey::new_function(Lang::Python, "f.py", name, Some(params.len()));
|
||||
let s = LocalFuncSummary {
|
||||
entry: NodeIndex::new(0),
|
||||
source_caps: Cap::empty(),
|
||||
sanitizer_caps: Cap::empty(),
|
||||
sink_caps: Cap::empty(),
|
||||
param_count: params.len(),
|
||||
param_names: params.iter().map(|p| p.to_string()).collect(),
|
||||
propagating_params: Vec::new(),
|
||||
tainted_sink_params: Vec::new(),
|
||||
callees: Vec::new(),
|
||||
container: String::new(),
|
||||
disambig: None,
|
||||
kind: crate::symbol::FuncKind::Function,
|
||||
};
|
||||
(key, s)
|
||||
}
|
||||
|
||||
// `process_data` has NO web-like parameters; `read_stream` does.
|
||||
let mut summaries: crate::cfg::FuncSummaries = HashMap::new();
|
||||
let (k1, s1) = summary("process_data", &["data"]);
|
||||
let (k2, s2) = summary("read_stream", &["req"]);
|
||||
summaries.insert(k1, s1);
|
||||
summaries.insert(k2, s2);
|
||||
|
||||
let cfg: Cfg = Graph::new();
|
||||
|
||||
// The unrelated `read_stream(req)` must NOT promote `process_data`.
|
||||
assert!(
|
||||
!is_web_entrypoint_simple("process_data", Lang::Python, &summaries, &cfg),
|
||||
"process_data has no web params and must not be a web entrypoint just \
|
||||
because an unrelated function in the file does"
|
||||
);
|
||||
|
||||
// `read_stream` is not a handler-prefixed name, so even though it
|
||||
// carries the `req` param it is NOT an entrypoint — confirms the
|
||||
// name gate still stands independently of the param check.
|
||||
assert!(
|
||||
!is_web_entrypoint_simple("read_stream", Lang::Python, &summaries, &cfg),
|
||||
"read_stream lacks a handler-prefixed name and must not be an entrypoint"
|
||||
);
|
||||
|
||||
// Positive control: give `process_data` its own `req` param.
|
||||
let mut summaries2: crate::cfg::FuncSummaries = HashMap::new();
|
||||
let (k3, s3) = summary("process_data", &["req"]);
|
||||
summaries2.insert(k3, s3);
|
||||
assert!(
|
||||
is_web_entrypoint_simple("process_data", Lang::Python, &summaries2, &cfg),
|
||||
"process_data with its own web param must be a web entrypoint"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1641,13 +1641,34 @@ impl GlobalSummaries {
|
|||
return CalleeResolution::NotFound;
|
||||
}
|
||||
|
||||
let arity_filtered: Vec<&FuncKey> = all_candidates
|
||||
let mut arity_filtered: Vec<&FuncKey> = all_candidates
|
||||
.iter()
|
||||
.copied()
|
||||
.filter(|k| arity_matches(k))
|
||||
.collect();
|
||||
if arity_filtered.is_empty() {
|
||||
return CalleeResolution::NotFound;
|
||||
// Exact-arity match found nothing. Tolerate under-application:
|
||||
// `FuncKey::arity` is the *total* parameter count, so a call
|
||||
// supplying `a` arguments to a function declared with more
|
||||
// parameters (the surplus being default-valued / optional) is a
|
||||
// routine, valid call shape in Python, JS/TS, PHP, and Ruby. Retry
|
||||
// with `param_count >= call_arity` so these calls still resolve.
|
||||
//
|
||||
// This only ever WIDENS the candidate set, and resolution below
|
||||
// still requires a unique candidate, so a genuinely ambiguous name
|
||||
// degrades to `Ambiguous`, never a wrong `Resolved`. Exact-arity
|
||||
// matches always take precedence (this branch runs only when none
|
||||
// exist), so no existing exact-match resolution regresses.
|
||||
if let Some(a) = q.arity {
|
||||
arity_filtered = all_candidates
|
||||
.iter()
|
||||
.copied()
|
||||
.filter(|k| matches!(k.arity, Some(p) if p >= a))
|
||||
.collect();
|
||||
}
|
||||
if arity_filtered.is_empty() {
|
||||
return CalleeResolution::NotFound;
|
||||
}
|
||||
}
|
||||
|
||||
let same_ns: Vec<&FuncKey> = arity_filtered
|
||||
|
|
@ -2033,14 +2054,42 @@ fn synthesize_ssa_disambig(summary: &SsaFuncSummary) -> u32 {
|
|||
/// Merging only happens for exact `FuncKey` matches (same lang + namespace +
|
||||
/// name + arity). Functions with the same bare name but different languages
|
||||
/// or namespaces are stored separately.
|
||||
///
|
||||
/// This variant keys summaries via the plain [`FuncSummary::func_key`]
|
||||
/// (`normalize_namespace`), so it is only safe for repos with no
|
||||
/// package boundaries. The indexed scan path must use
|
||||
/// [`merge_summaries_with_resolver`] so the loaded coarse summaries are
|
||||
/// keyed with the same package-qualified namespaces that pass-1 SSA
|
||||
/// summaries, cross-package import maps, and pass-2 refinements use.
|
||||
pub fn merge_summaries(
|
||||
per_file: impl IntoIterator<Item = FuncSummary>,
|
||||
scan_root: Option<&str>,
|
||||
) -> GlobalSummaries {
|
||||
merge_summaries_with_resolver(per_file, scan_root, None)
|
||||
}
|
||||
|
||||
/// Module-graph-aware variant of [`merge_summaries`].
|
||||
///
|
||||
/// Keys each summary via [`FuncSummary::func_key_with_resolver`], so a
|
||||
/// file inside a discovered package gets a package-qualified namespace
|
||||
/// (`"@scope/name::src/file.ts"`) instead of the plain
|
||||
/// `normalize_namespace` form. This must match the keying convention
|
||||
/// used by pass-1 SSA summaries (`namespace_with_package`) and pass-2
|
||||
/// topo refinements (`func_key_with_resolver`); otherwise exact-key
|
||||
/// joins between the coarse FuncSummary tier and the SSA tier miss, and
|
||||
/// same-namespace narrowing in [`GlobalSummaries::resolve_callee`] never
|
||||
/// matches the package-qualified caller namespace.
|
||||
///
|
||||
/// Passing `module_graph: None` is equivalent to [`merge_summaries`].
|
||||
pub fn merge_summaries_with_resolver(
|
||||
per_file: impl IntoIterator<Item = FuncSummary>,
|
||||
scan_root: Option<&str>,
|
||||
module_graph: Option<&crate::resolve::ModuleGraph>,
|
||||
) -> GlobalSummaries {
|
||||
let mut map = GlobalSummaries::new();
|
||||
|
||||
for fs in per_file {
|
||||
let key = fs.func_key(scan_root);
|
||||
let key = fs.func_key_with_resolver(scan_root, module_graph);
|
||||
map.insert(key, fs);
|
||||
}
|
||||
|
||||
|
|
@ -2049,3 +2098,127 @@ pub fn merge_summaries(
|
|||
|
||||
#[cfg(test)]
|
||||
mod tests;
|
||||
|
||||
#[cfg(test)]
|
||||
mod arity_leniency_tests {
|
||||
use super::*;
|
||||
use crate::symbol::{FuncKey, Lang};
|
||||
|
||||
fn py_func(name: &str, namespace: &str, param_count: usize) -> (FuncKey, FuncSummary) {
|
||||
let key = FuncKey {
|
||||
lang: Lang::Python,
|
||||
namespace: namespace.into(),
|
||||
name: name.into(),
|
||||
arity: Some(param_count),
|
||||
..Default::default()
|
||||
};
|
||||
let summary = FuncSummary {
|
||||
name: name.into(),
|
||||
file_path: namespace.into(),
|
||||
lang: "python".into(),
|
||||
param_count,
|
||||
..Default::default()
|
||||
};
|
||||
(key, summary)
|
||||
}
|
||||
|
||||
/// `run_cmd(cmd, opts=None)` (param_count 2) called as `run_cmd(user)`
|
||||
/// (arity 1) must resolve via the under-application tolerance, not fall
|
||||
/// through to NotFound.
|
||||
#[test]
|
||||
fn under_application_resolves_unique_defaulted_callee() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (key, summary) = py_func("run_cmd", "helper.py", 2);
|
||||
gs.insert(key.clone(), summary);
|
||||
|
||||
let resolved = gs.resolve_callee(&CalleeQuery {
|
||||
name: "run_cmd",
|
||||
caller_lang: Lang::Python,
|
||||
caller_namespace: "routes.py",
|
||||
caller_container: None,
|
||||
receiver_type: None,
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert_eq!(
|
||||
resolved,
|
||||
CalleeResolution::Resolved(key),
|
||||
"under-applied unique callee with default params must resolve"
|
||||
);
|
||||
}
|
||||
|
||||
/// Exact-arity matches still take precedence: when both an exact-arity and
|
||||
/// a higher-arity candidate exist, the exact one wins (no regression).
|
||||
#[test]
|
||||
fn exact_arity_match_preferred_over_lenient() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (exact_key, exact_sum) = py_func("run_cmd", "a.py", 1);
|
||||
let (wide_key, wide_sum) = py_func("run_cmd", "b.py", 3);
|
||||
gs.insert(exact_key.clone(), exact_sum);
|
||||
gs.insert(wide_key, wide_sum);
|
||||
|
||||
let resolved = gs.resolve_callee(&CalleeQuery {
|
||||
name: "run_cmd",
|
||||
caller_lang: Lang::Python,
|
||||
caller_namespace: "routes.py",
|
||||
caller_container: None,
|
||||
receiver_type: None,
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
// The exact-arity candidate (a.py, arity 1) is the sole exact match,
|
||||
// so the lenient branch never runs and resolution is unambiguous.
|
||||
assert_eq!(resolved, CalleeResolution::Resolved(exact_key));
|
||||
}
|
||||
|
||||
/// Leniency only widens the candidate set; it never produces a wrong
|
||||
/// Resolved. Two distinct higher-arity callees both tolerating the call
|
||||
/// arity must degrade to Ambiguous, not a silent pick.
|
||||
#[test]
|
||||
fn under_application_ambiguous_when_multiple_candidates() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (k1, s1) = py_func("run_cmd", "a.py", 2);
|
||||
let (k2, s2) = py_func("run_cmd", "b.py", 3);
|
||||
gs.insert(k1, s1);
|
||||
gs.insert(k2, s2);
|
||||
|
||||
let resolved = gs.resolve_callee(&CalleeQuery {
|
||||
name: "run_cmd",
|
||||
caller_lang: Lang::Python,
|
||||
caller_namespace: "routes.py",
|
||||
caller_container: None,
|
||||
receiver_type: None,
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(1),
|
||||
});
|
||||
assert!(
|
||||
matches!(resolved, CalleeResolution::Ambiguous(_)),
|
||||
"two under-applied candidates must be Ambiguous, not a wrong Resolved: {resolved:?}"
|
||||
);
|
||||
}
|
||||
|
||||
/// Over-application (more args than params) must NOT be tolerated: the
|
||||
/// lenient predicate is `param_count >= call_arity`, so a 1-param function
|
||||
/// called with 2 args still returns NotFound.
|
||||
#[test]
|
||||
fn over_application_not_tolerated() {
|
||||
let mut gs = GlobalSummaries::new();
|
||||
let (key, summary) = py_func("run_cmd", "helper.py", 1);
|
||||
gs.insert(key, summary);
|
||||
|
||||
let resolved = gs.resolve_callee(&CalleeQuery {
|
||||
name: "run_cmd",
|
||||
caller_lang: Lang::Python,
|
||||
caller_namespace: "routes.py",
|
||||
caller_container: None,
|
||||
receiver_type: None,
|
||||
namespace_qualifier: None,
|
||||
receiver_var: None,
|
||||
arity: Some(2),
|
||||
});
|
||||
assert_eq!(resolved, CalleeResolution::NotFound);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -401,7 +401,9 @@ fn run_path(
|
|||
// Global step budget
|
||||
if *total_steps >= MAX_TOTAL_STEPS {
|
||||
*search_exhausted = false;
|
||||
return Some(record_outcome(state, finding, ssa, cfg));
|
||||
// Budget cut mid-walk: constraints beyond this point are unchecked,
|
||||
// so feasibility is unproven → Inconclusive, not Confirmed.
|
||||
return Some(record_cutoff(state, finding, ssa, cfg));
|
||||
}
|
||||
|
||||
let block_id = state.current_block;
|
||||
|
|
@ -454,8 +456,9 @@ fn run_path(
|
|||
continue;
|
||||
}
|
||||
}
|
||||
// Stuck (infinite loop / nested loops with no exit)
|
||||
return Some(record_outcome(state, finding, ssa, cfg));
|
||||
// Stuck (infinite loop / nested loops with no exit): the path never
|
||||
// reached a terminal, so feasibility is unproven → Inconclusive.
|
||||
return Some(record_cutoff(state, finding, ssa, cfg));
|
||||
}
|
||||
|
||||
// Move exception context into sym_state before block transfer
|
||||
|
|
@ -1114,6 +1117,30 @@ fn record_outcome(
|
|||
}
|
||||
}
|
||||
|
||||
/// Record the outcome for a path cut short by a budget/loop cutoff.
|
||||
///
|
||||
/// Unlike [`record_outcome`], the path did NOT reach a normal terminal: it was
|
||||
/// abandoned mid-walk (global step budget exhausted, or a loop with no
|
||||
/// reachable exit). Constraints beyond the cutoff point were never checked, so
|
||||
/// feasibility is unproven and the verdict must be `Inconclusive` (which
|
||||
/// contributes 0 to confidence), not `Confirmed`. A best-effort witness is
|
||||
/// still extracted for diagnostics. Mirrors `analyse_finding_path`, which
|
||||
/// already returns `Inconclusive` for the over-budget (`>MAX_PATH_BLOCKS`)
|
||||
/// case.
|
||||
fn record_cutoff(
|
||||
state: &ExplorationState,
|
||||
finding: &Finding,
|
||||
ssa: &SsaBody,
|
||||
cfg: &Cfg,
|
||||
) -> PathOutcome {
|
||||
let witness = try_extract_witness(state, finding, ssa, cfg);
|
||||
PathOutcome {
|
||||
verdict: Verdict::Inconclusive,
|
||||
constraints_checked: state.constraints_checked,
|
||||
witness,
|
||||
}
|
||||
}
|
||||
|
||||
/// Best-effort witness extraction from the current symbolic state.
|
||||
///
|
||||
/// Used by both `record_outcome` (Confirmed paths) and inconclusive exits
|
||||
|
|
@ -1516,6 +1543,77 @@ mod tests {
|
|||
assert_eq!(v.verdict, Verdict::Inconclusive);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn record_cutoff_is_inconclusive_not_confirmed() {
|
||||
// A path abandoned at a budget/loop cutoff has unchecked constraints
|
||||
// beyond the cutoff point, so its outcome must be Inconclusive (which
|
||||
// contributes 0 to confidence), NOT Confirmed. record_outcome (used
|
||||
// only at genuine terminal states) still yields Confirmed.
|
||||
let n0 = NodeIndex::new(0);
|
||||
let n1 = NodeIndex::new(1);
|
||||
let b0 = BlockId(0);
|
||||
let b1 = BlockId(1);
|
||||
let ssa = SsaBody {
|
||||
blocks: vec![
|
||||
SsaBlock {
|
||||
id: b0,
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Goto(b1),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![b1],
|
||||
},
|
||||
SsaBlock {
|
||||
id: b1,
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: smallvec![b0],
|
||||
succs: smallvec![],
|
||||
},
|
||||
],
|
||||
entry: b0,
|
||||
value_defs: vec![make_value_def(b0, n0), make_value_def(b1, n1)],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
synthetic_externals: std::collections::HashSet::new(),
|
||||
slot_scoped_assigns: std::collections::HashSet::new(),
|
||||
};
|
||||
let cfg = Cfg::new();
|
||||
let finding = make_finding(n0, n1);
|
||||
let state = ExplorationState {
|
||||
sym_state: SymbolicState::new(),
|
||||
env: constraint::PathEnv::empty(),
|
||||
current_block: b0,
|
||||
predecessor: None,
|
||||
forks_used: 0,
|
||||
steps_taken: 0,
|
||||
constraints_checked: 3,
|
||||
visit_counts: HashMap::new(),
|
||||
exception_context: None,
|
||||
};
|
||||
|
||||
let cutoff = record_cutoff(&state, &finding, &ssa, &cfg);
|
||||
assert_eq!(cutoff.verdict, Verdict::Inconclusive);
|
||||
assert_eq!(cutoff.constraints_checked, 3);
|
||||
|
||||
let terminal = record_outcome(&state, &finding, &ssa, &cfg);
|
||||
assert_eq!(terminal.verdict, Verdict::Confirmed);
|
||||
|
||||
// A lone budget-cut path must NOT aggregate to Confirmed.
|
||||
let result = ExplorationResult {
|
||||
paths_completed: vec![cutoff],
|
||||
paths_pruned: 1,
|
||||
total_steps: MAX_TOTAL_STEPS,
|
||||
search_exhausted: false,
|
||||
interproc_findings: Vec::new(),
|
||||
interproc_cutoffs: Vec::new(),
|
||||
};
|
||||
assert_eq!(result.aggregate_verdict().verdict, Verdict::Inconclusive);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn aggregate_empty_is_inconclusive() {
|
||||
let result = ExplorationResult {
|
||||
|
|
|
|||
|
|
@ -268,6 +268,38 @@ pub struct InterprocCtx<'a> {
|
|||
pub caller_namespace: &'a str,
|
||||
}
|
||||
|
||||
impl<'a> InterprocCtx<'a> {
|
||||
/// Build a child context for resolving calls *inside* a frame.
|
||||
///
|
||||
/// All shared state (budgets, caches, reentry counts) is carried by
|
||||
/// reference, so the child observes the same interior-mutable counters.
|
||||
/// When `descended_cross_file` is true the frame was itself reached by a
|
||||
/// cross-file body resolution, so `cross_file_depth` is bumped to enforce
|
||||
/// `MAX_CROSS_FILE_DEPTH` on any further cross-file descents. Without this
|
||||
/// the depth never increments and the guard at `execute_callee` is dead
|
||||
/// (always `0 >= 1 == false`), allowing cross-file bodies to keep resolving
|
||||
/// further cross-file bodies up to `max_depth` instead of the intended one
|
||||
/// level.
|
||||
fn child_for_nested(&self, descended_cross_file: bool) -> InterprocCtx<'a> {
|
||||
InterprocCtx {
|
||||
callee_bodies: self.callee_bodies,
|
||||
cfg: self.cfg,
|
||||
lang: self.lang,
|
||||
max_depth: self.max_depth,
|
||||
budget: self.budget,
|
||||
cache: self.cache,
|
||||
reentry_counts: self.reentry_counts,
|
||||
max_reentry_per_func: self.max_reentry_per_func,
|
||||
scc_membership: self.scc_membership,
|
||||
max_scc_reentry: self.max_scc_reentry,
|
||||
stats: self.stats,
|
||||
cross_file_bodies: self.cross_file_bodies,
|
||||
cross_file_depth: self.cross_file_depth + usize::from(descended_cross_file),
|
||||
caller_namespace: self.caller_namespace,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Budget counters shared across all interprocedural frames for one finding.
|
||||
#[derive(Clone, Copy, Debug)]
|
||||
pub struct InterprocBudget {
|
||||
|
|
@ -728,16 +760,48 @@ pub fn execute_callee(
|
|||
let mut initial_state = SymbolicState::new();
|
||||
initial_state.seed_from_const_values(&body.opt.const_values);
|
||||
|
||||
// Seed parameters: walk callee SSA for Param instructions
|
||||
// Seed parameters: walk callee SSA for Param / SelfParam instructions.
|
||||
//
|
||||
// The caller (`transfer.rs` Call arm and `handle_nested_calls`) PREPENDS
|
||||
// the method receiver into `arg_values` at index 0 whenever the call has a
|
||||
// receiver. SSA lowering (`src/ssa/lower.rs`) emits that receiver as
|
||||
// `SsaOp::SelfParam` and assigns `Param { index }` positions starting at 0
|
||||
// to the non-receiver formals only. So when the callee body is a method
|
||||
// (has a `SelfParam`), the positional formal `Param{index}` must be seeded
|
||||
// from `arg_values[index + 1]` (skipping the receiver at slot 0), and the
|
||||
// `SelfParam` itself from `arg_values[0]`. Free functions (no `SelfParam`)
|
||||
// map `Param{index}` directly to `arg_values[index]`. This matches the
|
||||
// taint engine's inline path, which builds `param_seed` from non-receiver
|
||||
// args and carries receiver taint on a separate `receiver_seed` channel
|
||||
// consumed by `SelfParam` (`src/taint/ssa_transfer/mod.rs`).
|
||||
let has_self_param = body
|
||||
.ssa
|
||||
.blocks
|
||||
.iter()
|
||||
.flat_map(|block| block.phis.iter().chain(block.body.iter()))
|
||||
.any(|inst| matches!(inst.op, SsaOp::SelfParam));
|
||||
let param_offset = if has_self_param { 1 } else { 0 };
|
||||
for block in &body.ssa.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
if let SsaOp::Param { index } = &inst.op {
|
||||
if let Some((_, sym, tainted)) = arg_values.get(*index) {
|
||||
initial_state.set(inst.value, sym.clone());
|
||||
if *tainted {
|
||||
initial_state.mark_tainted(inst.value);
|
||||
match &inst.op {
|
||||
SsaOp::Param { index } => {
|
||||
if let Some((_, sym, tainted)) = arg_values.get(*index + param_offset) {
|
||||
initial_state.set(inst.value, sym.clone());
|
||||
if *tainted {
|
||||
initial_state.mark_tainted(inst.value);
|
||||
}
|
||||
}
|
||||
}
|
||||
SsaOp::SelfParam => {
|
||||
// Receiver was prepended at slot 0 by the caller.
|
||||
if let Some((_, sym, tainted)) = arg_values.first() {
|
||||
initial_state.set(inst.value, sym.clone());
|
||||
if *tainted {
|
||||
initial_state.mark_tainted(inst.value);
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -751,6 +815,12 @@ pub fn execute_callee(
|
|||
frame_chain.push(normalized.to_string());
|
||||
|
||||
// ─── Work-queue exploration (intra-callee forking) ────────
|
||||
//
|
||||
// Nested calls executed inside this frame go through a child context whose
|
||||
// `cross_file_depth` is bumped when this frame was itself reached via a
|
||||
// cross-file body. This keeps `MAX_CROSS_FILE_DEPTH` enforced across nested
|
||||
// descents (the shared `ctx` would otherwise never increment the field).
|
||||
let nested_ctx = ctx.child_for_nested(is_cross_file);
|
||||
|
||||
let mut exit_states: Vec<CalleeExitState> = Vec::new();
|
||||
let mut internal_findings: Vec<InternalSinkFinding> = Vec::new();
|
||||
|
|
@ -853,7 +923,7 @@ pub fn execute_callee(
|
|||
// Handle nested calls
|
||||
handle_nested_calls(
|
||||
block,
|
||||
ctx,
|
||||
&nested_ctx,
|
||||
&mut path.sym_state,
|
||||
depth,
|
||||
&frame_chain,
|
||||
|
|
@ -1562,4 +1632,46 @@ mod tests {
|
|||
assert_eq!(stats.cutoffs, 0);
|
||||
assert_eq!(stats.forks, 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn child_for_nested_bumps_cross_file_depth_only_on_descent() {
|
||||
// Backing state for the borrowed-by-reference InterprocCtx fields.
|
||||
let bodies: HashMap<crate::symbol::FuncKey, CalleeSsaBody> = HashMap::new();
|
||||
let cfg = Cfg::new();
|
||||
let budget = Cell::new(InterprocBudget::new());
|
||||
let cache = RefCell::new(InterprocCache::new());
|
||||
let reentry = RefCell::new(HashMap::new());
|
||||
let stats = Cell::new(InterprocStats::default());
|
||||
|
||||
let ctx = InterprocCtx {
|
||||
callee_bodies: &bodies,
|
||||
cfg: &cfg,
|
||||
lang: Lang::JavaScript,
|
||||
max_depth: DEFAULT_MAX_DEPTH,
|
||||
budget: &budget,
|
||||
cache: &cache,
|
||||
reentry_counts: &reentry,
|
||||
max_reentry_per_func: DEFAULT_MAX_REENTRY_PER_FUNC,
|
||||
scc_membership: None,
|
||||
max_scc_reentry: DEFAULT_MAX_SCC_REENTRY,
|
||||
stats: &stats,
|
||||
cross_file_bodies: None,
|
||||
cross_file_depth: 0,
|
||||
caller_namespace: "test.js",
|
||||
};
|
||||
|
||||
// Intra-file descent leaves cross_file_depth untouched.
|
||||
let intra = ctx.child_for_nested(false);
|
||||
assert_eq!(intra.cross_file_depth, 0);
|
||||
|
||||
// A cross-file descent bumps the depth so the guard at execute_callee
|
||||
// (>= MAX_CROSS_FILE_DEPTH) trips on the next cross-file step.
|
||||
let xfile = ctx.child_for_nested(true);
|
||||
assert_eq!(xfile.cross_file_depth, 1);
|
||||
assert!(xfile.cross_file_depth >= MAX_CROSS_FILE_DEPTH);
|
||||
|
||||
// Nested cross-file from an already-cross-file frame keeps climbing.
|
||||
let xfile2 = xfile.child_for_nested(true);
|
||||
assert_eq!(xfile2.cross_file_depth, 2);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -7,20 +7,32 @@
|
|||
//!
|
||||
//! This module implements the opposite direction: start at each sink value,
|
||||
//! walk *reverse* SSA edges and (when needed) cross-file callee bodies on
|
||||
//! demand, and emit a [`BackwardFlow`] when a source is reached or an
|
||||
//! accumulated path predicate proves the flow infeasible.
|
||||
//! demand, and emit a [`BackwardFlow`] when a source is reached.
|
||||
//!
|
||||
//! The analysis is additive:
|
||||
//!
|
||||
//! * When a forward finding's sink is confirmed by a backwards walk that
|
||||
//! reaches a matching source, we append `backwards-confirmed` to the
|
||||
//! finding's evidence notes.
|
||||
//! * When the backwards walk proves the flow infeasible via accumulated
|
||||
//! path predicates, we append `backwards-infeasible`, consumed by the
|
||||
//! confidence scorer as a cap-to-Low signal.
|
||||
//! * Backward flows that reach a source with no matching forward finding
|
||||
//! become standalone `taint-backwards-flow` diags (a separate rule id so
|
||||
//! existing graders can distinguish the two channels).
|
||||
//! finding's evidence notes. **This is the only channel currently active.**
|
||||
//!
|
||||
//! ## Reserved / not-yet-implemented channels
|
||||
//!
|
||||
//! The scaffolding below exists but does not fire in production; it is kept so
|
||||
//! the downstream consumers (confidence scorer, [`FindingVerdict`]) have a
|
||||
//! stable shape to grow into. Do not build new behaviour on the assumption
|
||||
//! that these signals can be produced by a real walk:
|
||||
//!
|
||||
//! * **Infeasibility.** [`DemandState::validated_true`] /
|
||||
//! [`DemandState::validated_false`] are never written by the transfer (no
|
||||
//! reverse predicate accumulation is implemented yet), so
|
||||
//! [`BackwardFlow::infeasible`] is never set, [`aggregate_verdict`] never
|
||||
//! returns [`FindingVerdict::Infeasible`], and the `backwards-infeasible`
|
||||
//! note is never appended. Implementing this requires reading branch
|
||||
//! [`crate::taint::domain::PredicateSummary`] bits off the reverse-dominating
|
||||
//! conditions, which is future work.
|
||||
//! * **Standalone diags.** Backward flows that reach a source with no matching
|
||||
//! forward finding are *not* yet surfaced as standalone `taint-backwards-flow`
|
||||
//! diags; that rule id is reserved.
|
||||
//!
|
||||
//! The feature is gated by
|
||||
//! [`crate::utils::analysis_options::AnalysisOptions::backwards_analysis`]
|
||||
|
|
@ -76,8 +88,15 @@ pub struct DemandState {
|
|||
/// [`crate::taint::domain::predicate_kind_bit`]; bit `i` set means the
|
||||
/// corresponding `PredicateKind` was observed as holding on every
|
||||
/// predecessor visited so far.
|
||||
///
|
||||
/// **Reserved / not yet written.** The current backward transfer does not
|
||||
/// accumulate reverse path predicates, so this field is always 0 in
|
||||
/// production. See the module-level "Reserved / not-yet-implemented
|
||||
/// channels" note.
|
||||
pub validated_true: u8,
|
||||
/// Counterpart to [`Self::validated_true`] for known-false predicates.
|
||||
///
|
||||
/// **Reserved / not yet written** (always 0 in production).
|
||||
pub validated_false: u8,
|
||||
/// Number of cross-function inline expansions performed along this walk.
|
||||
pub depth: u32,
|
||||
|
|
@ -111,6 +130,10 @@ pub struct BackwardFlow {
|
|||
pub source_node: Option<NodeIndex>,
|
||||
/// Set when the accumulated predicates proved the flow infeasible before
|
||||
/// reaching any source.
|
||||
///
|
||||
/// **Reserved / never set in production.** Reverse predicate accumulation
|
||||
/// is not yet implemented (see [`DemandState::validated_true`] and the
|
||||
/// module-level note), so every production flow leaves this `false`.
|
||||
pub infeasible: bool,
|
||||
/// Set when the walk hit [`BACKWARDS_VALUE_BUDGET`] without terminating.
|
||||
pub budget_exhausted: bool,
|
||||
|
|
@ -517,11 +540,25 @@ fn walk_dfs(
|
|||
// the key in the matcher, the key is useful for debug
|
||||
// logging in bigger expansions.
|
||||
let _ = callee_key;
|
||||
return;
|
||||
// Intentionally fall through to the conservative arg/receiver
|
||||
// fanout below. Walking only the callee's return chains is
|
||||
// strictly lossier than not resolving the callee at all: a
|
||||
// passthrough `return param` inside the callee bottoms out at
|
||||
// a `ReachedParam` terminal that is never mapped back to the
|
||||
// caller's argument, so `sink(identity(tainted))` would record
|
||||
// a dead-end param terminal and miss the source, whereas an
|
||||
// unresolvable callee would have taken the fanout and reached
|
||||
// it. Running the fanout in addition keeps callee expansion
|
||||
// additive (it can only add confirmations, never drop them);
|
||||
// `aggregate_verdict` only needs one confirmation, and the
|
||||
// shared `visited` set prevents re-expanding values the callee
|
||||
// walk already covered in the caller frame.
|
||||
}
|
||||
// Fall-through: no resolvable body. Conservatively fan out to
|
||||
// every operand / receiver so a source reachable through the
|
||||
// call arguments is still observed.
|
||||
// Conservatively fan out to every operand / receiver so a source
|
||||
// reachable through the call arguments is still observed. Runs
|
||||
// for both the resolvable-callee case (after return-chain
|
||||
// expansion, to recover argument-passthrough flows) and the
|
||||
// unresolvable case (sole channel).
|
||||
for (operand, next_demand) in next {
|
||||
chain.push(operand);
|
||||
walk_dfs(
|
||||
|
|
@ -564,18 +601,55 @@ fn resolve_callee_body<'a>(
|
|||
.rsplit('.')
|
||||
.next()
|
||||
.unwrap_or(callee);
|
||||
if let Some(map) = ctx.intra_file_bodies {
|
||||
|
||||
// Pick the deterministic, language-matched best candidate among same-leaf
|
||||
// entries. Bare-leaf matching over a `HashMap` is both unsound (a
|
||||
// same-name sibling from another class/file/language can be picked, the
|
||||
// exact hazard the forward inline path refuses bare-name lookup for) and
|
||||
// nondeterministic (HashMap iteration order varies run to run, so the
|
||||
// resolved body, hence the `backwards-confirmed` note and the confidence
|
||||
// score, would differ across identical runs). We cannot reconstruct the
|
||||
// call-site's container/arity here (only the textual callee is in hand),
|
||||
// but we can at least exclude cross-language siblings and fix a stable
|
||||
// tie-break so resolution is reproducible.
|
||||
fn best_candidate<'b>(
|
||||
map: &'b HashMap<FuncKey, CalleeSsaBody>,
|
||||
leaf: &str,
|
||||
lang: Lang,
|
||||
) -> Option<(&'b CalleeSsaBody, FuncKey)> {
|
||||
let mut best: Option<(&FuncKey, &CalleeSsaBody)> = None;
|
||||
for (key, body) in map.iter() {
|
||||
if key.name == leaf && body.ssa.blocks.len() <= MAX_BACKWARDS_CALLEE_BLOCKS {
|
||||
return Some((body, key.clone()));
|
||||
if key.name != leaf
|
||||
|| key.lang != lang
|
||||
|| body.ssa.blocks.len() > MAX_BACKWARDS_CALLEE_BLOCKS
|
||||
{
|
||||
continue;
|
||||
}
|
||||
let better = match best {
|
||||
None => true,
|
||||
// Deterministic tie-break: smallest by the structural fields
|
||||
// that distinguish same-leaf siblings. Independent of
|
||||
// HashMap iteration order.
|
||||
Some((bk, _)) => {
|
||||
(&key.namespace, &key.container, &key.arity, &key.disambig)
|
||||
< (&bk.namespace, &bk.container, &bk.arity, &bk.disambig)
|
||||
}
|
||||
};
|
||||
if better {
|
||||
best = Some((key, body));
|
||||
}
|
||||
}
|
||||
best.map(|(k, b)| (b, k.clone()))
|
||||
}
|
||||
|
||||
if let Some(map) = ctx.intra_file_bodies {
|
||||
if let Some(found) = best_candidate(map, leaf, ctx.lang) {
|
||||
return Some(found);
|
||||
}
|
||||
}
|
||||
if let Some(map) = ctx.global_summaries.and_then(|gs| gs.bodies_by_key()) {
|
||||
for (key, body) in map.iter() {
|
||||
if key.name == leaf && body.ssa.blocks.len() <= MAX_BACKWARDS_CALLEE_BLOCKS {
|
||||
return Some((body, key.clone()));
|
||||
}
|
||||
if let Some(found) = best_candidate(map, leaf, ctx.lang) {
|
||||
return Some(found);
|
||||
}
|
||||
}
|
||||
None
|
||||
|
|
@ -1189,4 +1263,211 @@ mod tests {
|
|||
};
|
||||
assert!(!bf.is_confirmation());
|
||||
}
|
||||
|
||||
/// Build a passthrough callee body `fn identity(p0) { return p0 }` wrapped
|
||||
/// in a [`CalleeSsaBody`] ready for [`BackwardsCtx::intra_file_bodies`].
|
||||
fn build_passthrough_callee() -> CalleeSsaBody {
|
||||
let mut cfg: Graph<NodeInfo, EdgeKind> = Graph::new();
|
||||
let p_node = cfg.add_node(NodeInfo::default());
|
||||
|
||||
let mut ssa = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Param { index: 0 },
|
||||
cfg_node: p_node,
|
||||
var_name: None,
|
||||
span: (0, 0),
|
||||
}],
|
||||
terminator: Terminator::Return(Some(SsaValue(0))),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![make_value_def(BlockId(0), p_node)],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: Vec::new(),
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
synthetic_externals: std::collections::HashSet::new(),
|
||||
slot_scoped_assigns: std::collections::HashSet::new(),
|
||||
};
|
||||
let opt = crate::ssa::optimize_ssa(&mut ssa, &cfg, Some(Lang::JavaScript));
|
||||
CalleeSsaBody {
|
||||
ssa,
|
||||
opt,
|
||||
param_count: 1,
|
||||
node_meta: std::collections::HashMap::new(),
|
||||
body_graph: Some(cfg),
|
||||
cross_package_imports: std::sync::Arc::new(std::collections::HashMap::new()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolving a passthrough callee (`sink(identity(tainted))`) must not be
|
||||
/// strictly lossier than leaving it unresolved. The callee's
|
||||
/// `return param` chain bottoms out at a `ReachedParam` terminal that the
|
||||
/// walk cannot map back to the caller's argument, so without the
|
||||
/// post-expansion conservative arg fanout the source is missed entirely.
|
||||
/// After the fix, the fanout walks the call argument (the `Source`) and the
|
||||
/// flow is confirmed.
|
||||
#[test]
|
||||
fn resolved_passthrough_callee_still_confirms_via_arg_fanout() {
|
||||
// Caller body: v0 = Source; v1 = identity(v0); return v1.
|
||||
let mut cfg: Graph<NodeInfo, EdgeKind> = Graph::new();
|
||||
let src_node = cfg.add_node(NodeInfo::default());
|
||||
let call_node = cfg.add_node(NodeInfo::default());
|
||||
|
||||
let ssa = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Source,
|
||||
cfg_node: src_node,
|
||||
var_name: None,
|
||||
span: (0, 0),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Call {
|
||||
callee: "identity".to_string(),
|
||||
callee_text: None,
|
||||
args: vec![smallvec![SsaValue(0)]],
|
||||
receiver: None,
|
||||
},
|
||||
cfg_node: call_node,
|
||||
var_name: None,
|
||||
span: (0, 0),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(Some(SsaValue(1))),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
make_value_def(BlockId(0), src_node),
|
||||
make_value_def(BlockId(0), call_node),
|
||||
],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: Vec::new(),
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
synthetic_externals: std::collections::HashSet::new(),
|
||||
slot_scoped_assigns: std::collections::HashSet::new(),
|
||||
};
|
||||
|
||||
let mut bodies: HashMap<FuncKey, CalleeSsaBody> = HashMap::new();
|
||||
bodies.insert(
|
||||
FuncKey::new_function(Lang::JavaScript, "test.js", "identity", Some(1)),
|
||||
build_passthrough_callee(),
|
||||
);
|
||||
|
||||
let ctx = BackwardsCtx {
|
||||
ssa: &ssa,
|
||||
cfg: &cfg,
|
||||
lang: Lang::JavaScript,
|
||||
global_summaries: None,
|
||||
intra_file_bodies: Some(&bodies),
|
||||
depth_budget: DEFAULT_BACKWARDS_DEPTH,
|
||||
};
|
||||
let flows = analyse_sink_backwards(&ctx, SsaValue(1), call_node, Cap::all());
|
||||
assert!(
|
||||
flows.iter().any(|f| f.is_confirmation()),
|
||||
"argument-passthrough source must still be confirmed after callee expansion"
|
||||
);
|
||||
}
|
||||
|
||||
/// `resolve_callee_body` must only match same-language siblings and pick a
|
||||
/// deterministic candidate, never an arbitrary HashMap-order entry from
|
||||
/// another language.
|
||||
#[test]
|
||||
fn resolve_callee_body_filters_language_and_is_deterministic() {
|
||||
let mut bodies: HashMap<FuncKey, CalleeSsaBody> = HashMap::new();
|
||||
// Two same-leaf siblings: one in the analysed language, one not.
|
||||
bodies.insert(
|
||||
FuncKey::new_function(Lang::Python, "other.py", "process", Some(1)),
|
||||
build_passthrough_callee(),
|
||||
);
|
||||
bodies.insert(
|
||||
FuncKey::new_function(Lang::JavaScript, "a.js", "process", Some(1)),
|
||||
build_passthrough_callee(),
|
||||
);
|
||||
bodies.insert(
|
||||
FuncKey::new_function(Lang::JavaScript, "b.js", "process", Some(1)),
|
||||
build_passthrough_callee(),
|
||||
);
|
||||
|
||||
let dummy_cfg: Graph<NodeInfo, EdgeKind> = Graph::new();
|
||||
let dummy_ssa = SsaBody {
|
||||
blocks: vec![],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: Vec::new(),
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
synthetic_externals: std::collections::HashSet::new(),
|
||||
slot_scoped_assigns: std::collections::HashSet::new(),
|
||||
};
|
||||
let ctx = BackwardsCtx {
|
||||
ssa: &dummy_ssa,
|
||||
cfg: &dummy_cfg,
|
||||
lang: Lang::JavaScript,
|
||||
global_summaries: None,
|
||||
intra_file_bodies: Some(&bodies),
|
||||
depth_budget: DEFAULT_BACKWARDS_DEPTH,
|
||||
};
|
||||
|
||||
// Deterministic across repeated resolutions, and never the Python body.
|
||||
let first = resolve_callee_body(&ctx, "process", 0)
|
||||
.map(|(_, k)| k)
|
||||
.expect("a JS sibling must resolve");
|
||||
assert_eq!(
|
||||
first.lang,
|
||||
Lang::JavaScript,
|
||||
"must not match cross-language"
|
||||
);
|
||||
// Namespace tie-break is the smallest: "a.js" < "b.js".
|
||||
assert_eq!(first.namespace, "a.js");
|
||||
for _ in 0..32 {
|
||||
let again = resolve_callee_body(&ctx, "process", 0)
|
||||
.map(|(_, k)| k)
|
||||
.expect("stable resolution");
|
||||
assert_eq!(again, first, "resolution must be deterministic across runs");
|
||||
}
|
||||
}
|
||||
|
||||
/// Reserved-channel invariant (documented in the module header): the
|
||||
/// backwards transfer never writes the predicate-accumulation bits, so a
|
||||
/// real walk can never set `infeasible`. Guards against the dead
|
||||
/// infeasibility channel silently coming alive (or the docs drifting from
|
||||
/// the implementation) without a corresponding transfer change.
|
||||
#[test]
|
||||
fn infeasibility_channel_is_inert() {
|
||||
// backward_transfer leaves the demand's predicate bits untouched.
|
||||
let (ssa, _cfg) = build_trivial_source_body();
|
||||
let demand = DemandState::new(Cap::all());
|
||||
let (_step, next) = backward_transfer(&ssa, SsaValue(1), &demand);
|
||||
for (_v, d) in &next {
|
||||
assert_eq!(d.validated_true, 0, "validated_true must never be written");
|
||||
assert_eq!(
|
||||
d.validated_false, 0,
|
||||
"validated_false must never be written"
|
||||
);
|
||||
}
|
||||
|
||||
// A full driver walk over a source→sink body produces a confirmation,
|
||||
// never an infeasible flow.
|
||||
let ctx = BackwardsCtx::new(&ssa, &_cfg, Lang::JavaScript);
|
||||
let flows = analyse_sink_backwards(&ctx, SsaValue(1), NodeIndex::new(1), Cap::all());
|
||||
assert!(
|
||||
!flows.iter().any(|f| f.infeasible),
|
||||
"no production flow may set infeasible (channel is reserved)"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
168
src/taint/mod.rs
168
src/taint/mod.rs
|
|
@ -822,6 +822,51 @@ fn containment_order(bodies: &[BodyCfg]) -> Vec<usize> {
|
|||
order
|
||||
}
|
||||
|
||||
/// Build the lexical-scope seed map for a non-toplevel body.
|
||||
///
|
||||
/// A body's `global_seed` ancestor lookup
|
||||
/// ([`ssa_transfer`] `SsaOp::Param` handling) reads its captured names in
|
||||
/// ancestor order: the direct `parent_body_id` first, then `BodyId(0)` to
|
||||
/// pick up the JS/TS pass-2 re-keyed top-level globals (see
|
||||
/// [`ssa_transfer::filter_seed_to_toplevel`]). But `body_exit_states`
|
||||
/// keys every exit entry under the owning body's id, so a parent's exit
|
||||
/// map never contains `BodyId(0)` entries. For a body nested two or more
|
||||
/// levels deep (its parent is itself a non-toplevel body) the `BodyId(0)`
|
||||
/// leg of the ancestor lookup would therefore find nothing — a global
|
||||
/// written by a sibling top-level function and read inside a nested
|
||||
/// closure would be silently missed.
|
||||
///
|
||||
/// This helper restores that leg: when `parent_id` is a *non*-toplevel
|
||||
/// body, it returns a merged map = parent's exit ∪ the `BodyId(0)`
|
||||
/// top-level seed (the converged globals, already keyed under
|
||||
/// `BodyId(0)`). When the parent *is* the top level (`BodyId(0)`), the
|
||||
/// parent's exit already carries the `BodyId(0)` entries, so the borrowed
|
||||
/// parent map is returned unchanged with no clone.
|
||||
fn seed_for_nested_body<'a>(
|
||||
parent_id: BodyId,
|
||||
body_exit_states: &'a HashMap<
|
||||
BodyId,
|
||||
HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>,
|
||||
>,
|
||||
) -> Option<std::borrow::Cow<'a, HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>>>
|
||||
{
|
||||
let parent_exit = body_exit_states.get(&parent_id);
|
||||
if parent_id == BodyId(0) {
|
||||
// Depth-1 body: the parent IS the top level, so its exit already
|
||||
// carries the BodyId(0) entries the ancestor lookup wants.
|
||||
return parent_exit.map(std::borrow::Cow::Borrowed);
|
||||
}
|
||||
// Depth ≥ 2 body: merge the direct parent's exit with the converged
|
||||
// top-level seed so the ancestor lookup's BodyId(0) leg is live.
|
||||
let toplevel_seed = body_exit_states.get(&BodyId(0));
|
||||
match (parent_exit, toplevel_seed) {
|
||||
(Some(p), Some(t)) => Some(std::borrow::Cow::Owned(ssa_transfer::join_seed_maps(p, t))),
|
||||
(Some(p), None) => Some(std::borrow::Cow::Borrowed(p)),
|
||||
(None, Some(t)) => Some(std::borrow::Cow::Borrowed(t)),
|
||||
(None, None) => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Build a `var_name → TypeKind` map from a body's optimised SSA + type-fact
|
||||
/// result. Used by [`analyse_multi_body`] to forward closure-captured types
|
||||
/// from a parent body into its children, so that bound-variable receiver
|
||||
|
|
@ -1455,11 +1500,16 @@ fn analyse_multi_body(
|
|||
// ── Pass 1: lexical containment propagation ──────────────────────
|
||||
for &idx in &order {
|
||||
let body = &file_cfg.bodies[idx];
|
||||
// Determine seed from parent body's exit state.
|
||||
let parent_seed = body
|
||||
// Determine seed from parent body's exit state. For bodies nested
|
||||
// two or more levels deep, merge in the top-level (`BodyId(0)`)
|
||||
// seed so the `global_seed` ancestor lookup's `BodyId(0)` leg can
|
||||
// reach globals written by sibling top-level functions (see
|
||||
// [`seed_for_nested_body`]).
|
||||
let parent_seed_owned = body
|
||||
.meta
|
||||
.parent_body_id
|
||||
.and_then(|pid| body_exit_states.get(&pid));
|
||||
.and_then(|pid| seed_for_nested_body(pid, &body_exit_states));
|
||||
let parent_seed = parent_seed_owned.as_deref();
|
||||
let parent_var_types = body
|
||||
.meta
|
||||
.parent_body_id
|
||||
|
|
@ -1648,15 +1698,40 @@ fn analyse_multi_body(
|
|||
// changed, so re-analysis would produce byte-identical
|
||||
// output. The cached findings from the previous
|
||||
// round (or pass-1) remain correct.
|
||||
if let Some(reads) = body_reads.get(&body.meta.id) {
|
||||
if reads.is_disjoint(&changed_names) {
|
||||
continue;
|
||||
//
|
||||
// Restricted to depth-1 bodies (direct children of the top
|
||||
// level). `changed_names` is derived purely from the
|
||||
// *top-level* seed delta, and `body_reads` is the body's
|
||||
// own `info.taint.uses` names. For a depth-1 body the
|
||||
// entire inbound channel is the top-level seed, so the
|
||||
// disjointness test is sound. A body nested two or more
|
||||
// levels deep also consumes its parent's exit, which can
|
||||
// carry a parent-local derived from a changed global (e.g.
|
||||
// `reader(){ const local = 'x'+g; child(){ exec(local) } }`):
|
||||
// its read-set `{local, exec}` is disjoint from the
|
||||
// top-level change set `{g}`, so skipping it would miss a
|
||||
// real flow. These bodies are cheap relative to the
|
||||
// soundness cost, so we always re-run them.
|
||||
let is_depth1 = body.meta.parent_body_id == Some(BodyId(0));
|
||||
if is_depth1 {
|
||||
if let Some(reads) = body_reads.get(&body.meta.id) {
|
||||
if reads.is_disjoint(&changed_names) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
}
|
||||
let parent_seed = body
|
||||
// For nested (depth ≥ 2) bodies, merge the top-level
|
||||
// (`BodyId(0)`) seed into the direct parent's exit so the
|
||||
// `global_seed` ancestor lookup's `BodyId(0)` leg can reach
|
||||
// the converged globals (see [`seed_for_nested_body`]).
|
||||
// `body_exit_states[BodyId(0)]` holds the freshest
|
||||
// `current_seed` (updated above, and per-body under
|
||||
// Gauss-Seidel).
|
||||
let parent_seed_owned = body
|
||||
.meta
|
||||
.parent_body_id
|
||||
.and_then(|pid| body_exit_states.get(&pid));
|
||||
.and_then(|pid| seed_for_nested_body(pid, &body_exit_states));
|
||||
let parent_seed = parent_seed_owned.as_deref();
|
||||
let parent_var_types = body
|
||||
.meta
|
||||
.parent_body_id
|
||||
|
|
@ -2957,5 +3032,82 @@ pub(crate) fn build_eligible_bodies(
|
|||
eligible_bodies
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod seed_threading_tests {
|
||||
use super::*;
|
||||
use crate::labels::Cap;
|
||||
use crate::taint::domain::VarTaint;
|
||||
use ssa_transfer::BindingKey;
|
||||
use std::collections::HashMap;
|
||||
|
||||
fn tainted() -> VarTaint {
|
||||
VarTaint {
|
||||
caps: Cap::all(),
|
||||
origins: smallvec::SmallVec::new(),
|
||||
uses_summary: true,
|
||||
}
|
||||
}
|
||||
|
||||
/// Depth-1 body (parent == top level): the parent's exit already
|
||||
/// carries the `BodyId(0)` entries, so the helper hands it back
|
||||
/// borrowed with no merge.
|
||||
#[test]
|
||||
fn depth1_returns_parent_exit_unchanged() {
|
||||
let mut exits: HashMap<BodyId, HashMap<BindingKey, VarTaint>> = HashMap::new();
|
||||
let mut top = HashMap::new();
|
||||
top.insert(BindingKey::new("g", BodyId(0)), tainted());
|
||||
exits.insert(BodyId(0), top);
|
||||
|
||||
let seed = seed_for_nested_body(BodyId(0), &exits).expect("seed present");
|
||||
// Borrowed, single BodyId(0) entry.
|
||||
assert!(matches!(seed, std::borrow::Cow::Borrowed(_)));
|
||||
assert!(seed.contains_key(&BindingKey::new("g", BodyId(0))));
|
||||
assert_eq!(seed.len(), 1);
|
||||
}
|
||||
|
||||
/// Depth-2 grandchild (parent == BodyId(2), itself non-toplevel):
|
||||
/// the helper merges the direct parent's exit with the converged
|
||||
/// `BodyId(0)` top-level seed so the ancestor lookup's `BodyId(0)`
|
||||
/// leg can reach the sibling-written global. This is the finding
|
||||
/// #19 regression guard: without the merge, the returned map would
|
||||
/// have no `BodyId(0)` entry and the grandchild would miss `g`.
|
||||
#[test]
|
||||
fn depth2_merges_toplevel_seed() {
|
||||
let mut exits: HashMap<BodyId, HashMap<BindingKey, VarTaint>> = HashMap::new();
|
||||
// Top-level converged seed: global `g` is tainted.
|
||||
let mut top = HashMap::new();
|
||||
top.insert(BindingKey::new("g", BodyId(0)), tainted());
|
||||
exits.insert(BodyId(0), top);
|
||||
// Direct parent (reader, BodyId(2)) exports a parent-local.
|
||||
let mut parent = HashMap::new();
|
||||
parent.insert(BindingKey::new("local", BodyId(2)), tainted());
|
||||
exits.insert(BodyId(2), parent);
|
||||
|
||||
let seed = seed_for_nested_body(BodyId(2), &exits).expect("seed present");
|
||||
// Owned merge of both maps.
|
||||
assert!(matches!(seed, std::borrow::Cow::Owned(_)));
|
||||
// Parent-local survives (parent leg of ancestor lookup).
|
||||
assert!(seed.contains_key(&BindingKey::new("local", BodyId(2))));
|
||||
// Converged global survives under BodyId(0) (the previously-dead
|
||||
// ancestor leg for grandchildren).
|
||||
assert!(seed.contains_key(&BindingKey::new("g", BodyId(0))));
|
||||
}
|
||||
|
||||
/// Depth-2 body whose parent exported nothing still receives the
|
||||
/// `BodyId(0)` top-level seed — borrowed directly, no clone.
|
||||
#[test]
|
||||
fn depth2_empty_parent_falls_back_to_toplevel() {
|
||||
let mut exits: HashMap<BodyId, HashMap<BindingKey, VarTaint>> = HashMap::new();
|
||||
let mut top = HashMap::new();
|
||||
top.insert(BindingKey::new("g", BodyId(0)), tainted());
|
||||
exits.insert(BodyId(0), top);
|
||||
// No entry for BodyId(2) at all (parent produced no exit state).
|
||||
|
||||
let seed = seed_for_nested_body(BodyId(2), &exits).expect("seed present");
|
||||
assert!(matches!(seed, std::borrow::Cow::Borrowed(_)));
|
||||
assert!(seed.contains_key(&BindingKey::new("g", BodyId(0))));
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests;
|
||||
|
|
|
|||
|
|
@ -619,6 +619,95 @@ fn parse_leading_uint(s: &str) -> Option<u64> {
|
|||
any.then_some(n)
|
||||
}
|
||||
|
||||
/// Detect a substring-REJECTION idiom dressed up as a membership method:
|
||||
/// `x.includes("..")`, `x.contains("<script>")`, `x.indexOf("..")`, etc., where
|
||||
/// the needle is a **string literal**.
|
||||
///
|
||||
/// Genuine allowlist membership has the form `ALLOWED.includes(value)`, the
|
||||
/// argument is the variable under test and the TRUE branch proves membership.
|
||||
/// When the argument is a string literal, the receiver is the value under test
|
||||
/// and the call asks "does this user string contain a dangerous substring?",
|
||||
/// the TRUE branch is the dangerous/reject path, not a validated path.
|
||||
///
|
||||
/// Returning `true` keeps such conditions out of [`PredicateKind::AllowlistCheck`]
|
||||
/// (which marks every condition var validated on the TRUE branch with the wrong
|
||||
/// polarity). The shell-metachar form is handled earlier by
|
||||
/// [`is_shell_metachar_rejection`]; this covers the broader literal-needle case
|
||||
/// (`..`, `<script>`, etc.) that the shell-metachar carve-out deliberately
|
||||
/// excludes. Conservative: only fires when the first argument parses as a
|
||||
/// string literal, so `ALLOWED.includes(value)` (identifier arg) is untouched.
|
||||
fn is_literal_needle_membership(text: &str) -> bool {
|
||||
let lower = text.to_ascii_lowercase();
|
||||
for method in [
|
||||
".includes(",
|
||||
".include?(",
|
||||
".contains(",
|
||||
".indexof(",
|
||||
".has(",
|
||||
] {
|
||||
if let Some(idx) = lower.find(method) {
|
||||
let args_start = idx + method.len();
|
||||
// Index into the original (case-preserving) text so quoted needle
|
||||
// characters stay accurate; the byte offset matches because the
|
||||
// method tokens are ASCII.
|
||||
if extract_first_string_arg(&text[args_start..]).is_some() {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Detect the *negated* `indexOf`/`search`/`find` membership idiom whose TRUE
|
||||
/// branch is the NOT-in-list (reject) path:
|
||||
///
|
||||
/// * `ALLOWED.indexOf(x) === -1` / `== -1` (JS/TS — not found)
|
||||
/// * `ALLOWED.indexOf(x) < 0` (not found)
|
||||
/// * `s.find(x) == -1` (Python `str.find` / C++ `std::string::find` use
|
||||
/// `npos`/`-1` for absent; `< 0` covers the common int form)
|
||||
///
|
||||
/// Genuine allowlist membership classifies as [`PredicateKind::AllowlistCheck`]
|
||||
/// whose generic mechanic marks the receiver/arg validated on the TRUE branch.
|
||||
/// That polarity is only correct for the *positive* form
|
||||
/// (`indexOf(x) !== -1` / `>= 0` — found ⇒ in list ⇒ validated). For the
|
||||
/// `=== -1` / `< 0` form the TRUE branch means NOT-in-list, so marking the var
|
||||
/// validated there is inverted: it suppresses a genuine finding on the
|
||||
/// reject-then-sink shape and an FP on the correctly-guarded `=== -1; return`
|
||||
/// shape. The polarity-inversion machinery in `apply_branch_predicates`
|
||||
/// (mod.rs) only flips for Python `not in` / TypeCheck `!=`, not for indexOf
|
||||
/// result comparisons, so we conservatively drop these to
|
||||
/// [`PredicateKind::Unknown`] — neither branch is over-validated and the sink
|
||||
/// finding survives. The positive `!== -1` / `>= 0` form is intentionally left
|
||||
/// to fall through to `AllowlistCheck`, where the existing polarity is correct.
|
||||
fn is_negated_indexof_membership(text: &str) -> bool {
|
||||
let lower = text.to_ascii_lowercase();
|
||||
// Require an index-of / find-style search method whose result is being
|
||||
// compared. `.indexof(` covers JS/TS `indexOf` and Java `indexOf`;
|
||||
// `.find(` / `.search(` cover Python/C++/JS string searches.
|
||||
let has_index_search =
|
||||
lower.contains(".indexof(") || lower.contains(".search(") || lower.contains(".find(");
|
||||
if !has_index_search {
|
||||
return false;
|
||||
}
|
||||
// Strip whitespace so spacing variants (`=== -1`, `===-1`, `< 0`) collapse
|
||||
// to a single canonical form.
|
||||
let compact: String = lower.chars().filter(|c| !c.is_whitespace()).collect();
|
||||
// Positive (found ⇒ in list) forms keep correct AllowlistCheck polarity —
|
||||
// do NOT claim them here.
|
||||
if compact.contains("!==-1")
|
||||
|| compact.contains("!=-1")
|
||||
|| compact.contains(">=0")
|
||||
|| compact.contains(">-1")
|
||||
{
|
||||
return false;
|
||||
}
|
||||
// Negated (not-found ⇒ reject) forms: inverted polarity, drop to Unknown.
|
||||
compact.contains("===-1")
|
||||
|| compact.contains("==-1")
|
||||
|| compact.contains("<0")
|
||||
|| compact.contains("<=-1")
|
||||
}
|
||||
|
||||
/// Classify a raw condition text into a [`PredicateKind`].
|
||||
///
|
||||
/// # Rules
|
||||
|
|
@ -710,6 +799,35 @@ pub fn classify_condition(text: &str) -> PredicateKind {
|
|||
return PredicateKind::HostAllowlistValidated;
|
||||
}
|
||||
|
||||
// ── Substring-REJECTION with a literal needle (not an allowlist) ─────
|
||||
//
|
||||
// `x.includes("..")` / `x.contains("<script>")` / `x.indexOf("..")` test
|
||||
// the *receiver* against a fixed literal — a rejection idiom whose TRUE
|
||||
// branch is the dangerous path. Classifying these as `AllowlistCheck`
|
||||
// (below) would mark the receiver validated on the TRUE branch with the
|
||||
// wrong polarity, silencing a genuine finding. Drop to `Unknown` so
|
||||
// neither branch is over-validated and the sink finding survives.
|
||||
// (The shell-metachar form was already caught earlier; this covers the
|
||||
// broader literal-needle case.)
|
||||
if is_literal_needle_membership(text) {
|
||||
return PredicateKind::Unknown;
|
||||
}
|
||||
|
||||
// ── Negated indexOf/find membership (inverted polarity) ──────────────
|
||||
//
|
||||
// `ALLOWED.indexOf(x) === -1` / `< 0` means NOT-in-list, so the TRUE
|
||||
// branch is the reject path. Classifying as `AllowlistCheck` (below)
|
||||
// would mark `x` validated on the TRUE branch with the wrong polarity —
|
||||
// suppressing a genuine finding on `if (!FOUND) sink(x)` and producing an
|
||||
// FP on the correctly-guarded `if (!FOUND) return; sink(x)` shape. The
|
||||
// polarity-inversion machinery does not cover indexOf result comparisons,
|
||||
// so drop to `Unknown` and let neither branch over-validate. The positive
|
||||
// `!== -1` / `>= 0` form is excluded by `is_negated_indexof_membership`
|
||||
// and still classifies as `AllowlistCheck` with correct polarity.
|
||||
if is_negated_indexof_membership(text) {
|
||||
return PredicateKind::Unknown;
|
||||
}
|
||||
|
||||
// ── Allowlist / membership checks ────────────────────────────────────
|
||||
if lower.contains(".includes(")
|
||||
|| lower.contains(".include?(")
|
||||
|
|
@ -909,12 +1027,20 @@ pub fn classify_condition_with_target(text: &str) -> (PredicateKind, Option<Stri
|
|||
|
||||
match kind {
|
||||
PredicateKind::ValidationCall | PredicateKind::SanitizerCall => {
|
||||
if let Some(target) = extract_validation_target(text) {
|
||||
(kind, Some(target))
|
||||
} else if count_call_args(text).map(|n| n > 1).unwrap_or(false) {
|
||||
(PredicateKind::Unknown, None)
|
||||
} else {
|
||||
(kind, None)
|
||||
// A dotted target (`a.field`, `req.body.x`) is a member expression.
|
||||
// `condition_vars` only ever contains bare identifier tokens
|
||||
// (collect_idents pushes single identifiers), so a dotted target
|
||||
// can never match and the consumer falls back to validating EVERY
|
||||
// condition var, the exact over-validation the multi-arg Unknown
|
||||
// guard was written to prevent. Treat dotted extraction the same
|
||||
// as a failed extraction so multi-arg validators degrade to
|
||||
// Unknown instead of silently validating unrelated arguments.
|
||||
match extract_validation_target(text) {
|
||||
Some(target) if !target.contains('.') => (kind, Some(target)),
|
||||
_ if count_call_args(text).map(|n| n > 1).unwrap_or(false) => {
|
||||
(PredicateKind::Unknown, None)
|
||||
}
|
||||
_ => (kind, None),
|
||||
}
|
||||
}
|
||||
PredicateKind::AllowlistCheck => {
|
||||
|
|
@ -1704,6 +1830,47 @@ mod tests {
|
|||
assert_eq!(target, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn target_multi_arg_dotted_first_arg_is_unknown() {
|
||||
// `validate(a.field, limit)` extracts the dotted target `a.field`,
|
||||
// which can never match a bare-identifier condition var, so the
|
||||
// consumer would fall back to validating EVERY condition var
|
||||
// (including `limit`, which the validator never inspected). A dotted
|
||||
// multi-arg target must degrade to Unknown, exactly like a failed
|
||||
// extraction, so no unrelated argument is wrongly validated.
|
||||
let (kind, target) = classify_condition_with_target("validate(a.field, limit)");
|
||||
assert_eq!(kind, PredicateKind::Unknown);
|
||||
assert_eq!(target, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn target_multi_arg_dotted_receiver_is_unknown() {
|
||||
// `req.session.verify(sig, opts)` resolves to the dotted receiver
|
||||
// `req.session`; same non-matching-target hazard on a multi-arg call.
|
||||
let (kind, target) = classify_condition_with_target("req.session.verify(sig, opts)");
|
||||
assert_eq!(kind, PredicateKind::Unknown);
|
||||
assert_eq!(target, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn target_single_arg_dotted_preserves_kind_no_dotted_target() {
|
||||
// Single-arg dotted-receiver validator: over-validation of all
|
||||
// condition vars is pre-existing/intentional for single-arg calls, but
|
||||
// the returned target must NOT be a dotted member expression (which
|
||||
// can never match condition_vars). Preserve the kind with None.
|
||||
let (kind, target) = classify_condition_with_target("req.session.verify(sig)");
|
||||
assert_eq!(kind, PredicateKind::ValidationCall);
|
||||
assert_eq!(target, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn target_bare_identifier_still_extracted() {
|
||||
// Regression guard: a bare-identifier multi-arg target still narrows.
|
||||
let (kind, target) = classify_condition_with_target("validate(x, limit)");
|
||||
assert_eq!(kind, PredicateKind::ValidationCall);
|
||||
assert_eq!(target.as_deref(), Some("x"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn count_call_args_basic() {
|
||||
assert_eq!(super::count_call_args("f(a, b, c)"), Some(3));
|
||||
|
|
@ -1779,6 +1946,109 @@ mod tests {
|
|||
);
|
||||
}
|
||||
|
||||
// ── Literal-needle substring rejection is NOT an allowlist ─────────
|
||||
//
|
||||
// `x.includes("..")` / `x.contains("<script>")` test the receiver
|
||||
// against a literal needle (a rejection idiom whose TRUE branch is the
|
||||
// dangerous path), so they must NOT classify as AllowlistCheck (which
|
||||
// would mark `x` validated on the TRUE branch with inverted polarity and
|
||||
// silence the finding). They drop to Unknown so neither branch is
|
||||
// over-validated.
|
||||
#[test]
|
||||
fn classify_literal_needle_dotdot_not_allowlist() {
|
||||
assert_eq!(
|
||||
classify_condition("p.includes(\"..\")"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("p.contains(\"..\")"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_literal_needle_html_not_allowlist() {
|
||||
assert_eq!(
|
||||
classify_condition("input.includes(\"<script>\")"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_literal_needle_indexof_not_allowlist() {
|
||||
assert_eq!(
|
||||
classify_condition("path.indexOf(\"..\")"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
// ── Negated indexOf membership has inverted polarity ──────────────
|
||||
//
|
||||
// `ALLOWED.indexOf(x) === -1` / `< 0` means NOT-in-list, so the TRUE
|
||||
// branch is the reject path. Must NOT classify as AllowlistCheck (which
|
||||
// would mark `x` validated on the TRUE branch with inverted polarity,
|
||||
// silencing a real reject-then-sink finding and producing an FP on the
|
||||
// correctly-guarded `=== -1; return; sink` shape). Drop to Unknown.
|
||||
#[test]
|
||||
fn classify_negated_indexof_eq_minus_one_not_allowlist() {
|
||||
assert_eq!(
|
||||
classify_condition("ALLOWED.indexOf(cmd) === -1"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("ALLOWED.indexOf(cmd) == -1"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_negated_indexof_lt_zero_not_allowlist() {
|
||||
assert_eq!(
|
||||
classify_condition("allowed.indexOf(x) < 0"),
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_positive_indexof_membership_stays_allowlist() {
|
||||
// `!== -1` / `>= 0` mean found ⇒ in list ⇒ TRUE branch is validated.
|
||||
// The existing AllowlistCheck polarity is correct here, so these must
|
||||
// remain AllowlistCheck (NOT dropped to Unknown).
|
||||
assert_eq!(
|
||||
classify_condition("ALLOWED.indexOf(cmd) !== -1"),
|
||||
PredicateKind::AllowlistCheck
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("ALLOWED.indexOf(cmd) >= 0"),
|
||||
PredicateKind::AllowlistCheck
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_genuine_allowlist_identifier_arg_unchanged() {
|
||||
// Identifier argument (the value under test) is a real membership
|
||||
// check and must remain AllowlistCheck.
|
||||
assert_eq!(
|
||||
classify_condition("ALLOWED.includes(cmd)"),
|
||||
PredicateKind::AllowlistCheck
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("whitelist.contains(value)"),
|
||||
PredicateKind::AllowlistCheck
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_literal_needle_shell_metachar_still_inverted() {
|
||||
// A shell-metachar literal needle is caught by the earlier
|
||||
// shell-metachar branch (inverted-polarity SHELL_ESCAPE clear) and
|
||||
// must NOT be swallowed by the literal-needle carve-out.
|
||||
assert_eq!(
|
||||
classify_condition("cmd.includes(\";\")"),
|
||||
PredicateKind::ShellMetaValidated
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn extract_allowlist_target_negated_paren_wrapper() {
|
||||
// Tree-sitter records the if-condition as `(!ALLOWED.includes(cmd))`,
|
||||
|
|
@ -2067,20 +2337,29 @@ mod tests {
|
|||
}
|
||||
|
||||
#[test]
|
||||
fn classify_non_metachar_contains_stays_allowlist() {
|
||||
fn classify_non_metachar_contains_is_unknown_not_allowlist() {
|
||||
// `x.contains("foo")` must NOT be credited as a shell-metachar
|
||||
// rejection. It falls back to the existing AllowlistCheck behavior.
|
||||
// rejection, AND must NOT be classified as `AllowlistCheck`: the
|
||||
// argument is a string literal, so this is a substring presence/
|
||||
// rejection test on the receiver, not membership of the receiver in a
|
||||
// collection. Classifying it as AllowlistCheck would mark the
|
||||
// receiver validated on the TRUE branch with inverted polarity and
|
||||
// silence a genuine finding (e.g. `if (path.contains("..")) sink(path)`).
|
||||
// The classifier degrades to Unknown so neither branch is
|
||||
// over-validated. (guards.rs already excluded these from structural
|
||||
// `Cap::all()` dominator guards via the missing allowlist target, so
|
||||
// cfg-unguarded-sink suppression is unaffected by this change.)
|
||||
assert_eq!(
|
||||
classify_condition("input.contains(\"foo\")"),
|
||||
PredicateKind::AllowlistCheck
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("path.contains(\"..\")"),
|
||||
PredicateKind::AllowlistCheck
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
assert_eq!(
|
||||
classify_condition("name.contains(\"admin\")"),
|
||||
PredicateKind::AllowlistCheck
|
||||
PredicateKind::Unknown
|
||||
);
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -156,6 +156,17 @@ pub(super) fn reconstruct_flow_path(
|
|||
current = pick_tainted_operand(&vals, origin, ssa);
|
||||
continue;
|
||||
}
|
||||
// NOTE: deliberately NOT continuing through
|
||||
// `SsaOp::FieldProj` in this dedup arm. Walking the
|
||||
// receiver chain of a member-access source
|
||||
// (`req.body.path`) advances the backward walk past the
|
||||
// meaningful taint-read node (the `.body.path` access) all
|
||||
// the way to the bare parameter (`req`), regressing the
|
||||
// reported source line from the member access (line 9) to
|
||||
// the function signature (line 8). The recall corpus
|
||||
// (`tests/recall_gaps.rs` fs_promises_*) and
|
||||
// `real_world_tests` pin the member-access source line, so
|
||||
// the FieldProj node terminates the walk here.
|
||||
_ => break,
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1510,35 +1510,7 @@ fn apply_branch_predicates(
|
|||
// (XSS / SQLi / FILE_IO) downstream still fire on residual taint.
|
||||
if kind == PredicateKind::RelativeUrlValidated && polarity {
|
||||
for var in condition_vars {
|
||||
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
for (val, _) in state.values.iter() {
|
||||
if let Some(name) = ssa
|
||||
.value_defs
|
||||
.get(val.0 as usize)
|
||||
.and_then(|vd| vd.var_name.as_deref())
|
||||
{
|
||||
if name == var {
|
||||
to_clear.push(*val);
|
||||
}
|
||||
}
|
||||
}
|
||||
for val in to_clear {
|
||||
if let Some(taint) = state.get(val).cloned() {
|
||||
let new_caps = taint.caps & !Cap::OPEN_REDIRECT;
|
||||
if new_caps.is_empty() {
|
||||
state.remove(val);
|
||||
} else {
|
||||
state.set(
|
||||
val,
|
||||
VarTaint {
|
||||
caps: new_caps,
|
||||
origins: taint.origins,
|
||||
uses_summary: taint.uses_summary,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
clear_cap_alias_aware(state, var, Cap::OPEN_REDIRECT, ssa, base_aliases);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1551,35 +1523,7 @@ fn apply_branch_predicates(
|
|||
// inline leading-slash check).
|
||||
if kind == PredicateKind::HostAllowlistValidated && polarity {
|
||||
for var in condition_vars {
|
||||
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
for (val, _) in state.values.iter() {
|
||||
if let Some(name) = ssa
|
||||
.value_defs
|
||||
.get(val.0 as usize)
|
||||
.and_then(|vd| vd.var_name.as_deref())
|
||||
{
|
||||
if name == var {
|
||||
to_clear.push(*val);
|
||||
}
|
||||
}
|
||||
}
|
||||
for val in to_clear {
|
||||
if let Some(taint) = state.get(val).cloned() {
|
||||
let new_caps = taint.caps & !Cap::OPEN_REDIRECT;
|
||||
if new_caps.is_empty() {
|
||||
state.remove(val);
|
||||
} else {
|
||||
state.set(
|
||||
val,
|
||||
VarTaint {
|
||||
caps: new_caps,
|
||||
origins: taint.origins,
|
||||
uses_summary: taint.uses_summary,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
clear_cap_alias_aware(state, var, Cap::OPEN_REDIRECT, ssa, base_aliases);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1594,43 +1538,7 @@ fn apply_branch_predicates(
|
|||
// taint caps.
|
||||
if kind == PredicateKind::ShellMetaValidated && !polarity {
|
||||
for var in condition_vars {
|
||||
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
let mut names: SmallVec<[&str; 4]> = smallvec::smallvec![var.as_str()];
|
||||
if let Some(aliases) = base_aliases.and_then(|aliases| aliases.aliases_of(var)) {
|
||||
for alias in aliases {
|
||||
if alias != var {
|
||||
names.push(alias.as_str());
|
||||
}
|
||||
}
|
||||
}
|
||||
for &name_to_clear in names.iter() {
|
||||
for (idx, def) in ssa.value_defs.iter().enumerate() {
|
||||
if def.var_name.as_deref() == Some(name_to_clear) {
|
||||
let val = SsaValue(idx as u32);
|
||||
to_clear.push(val);
|
||||
collect_copy_alias_operands(val, ssa, &mut to_clear);
|
||||
}
|
||||
}
|
||||
}
|
||||
to_clear.sort_by_key(|v| v.0);
|
||||
to_clear.dedup_by_key(|v| v.0);
|
||||
for val in to_clear {
|
||||
if let Some(taint) = state.get(val).cloned() {
|
||||
let new_caps = taint.caps & !Cap::SHELL_ESCAPE;
|
||||
if new_caps.is_empty() {
|
||||
state.remove(val);
|
||||
} else {
|
||||
state.set(
|
||||
val,
|
||||
VarTaint {
|
||||
caps: new_caps,
|
||||
origins: taint.origins,
|
||||
uses_summary: taint.uses_summary,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
clear_cap_alias_aware(state, var, Cap::SHELL_ESCAPE, ssa, base_aliases);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1658,6 +1566,71 @@ fn apply_branch_predicates(
|
|||
}
|
||||
}
|
||||
|
||||
/// Clear `cap` from the taint of `var` and every SSA value that aliases it
|
||||
/// on a validated branch.
|
||||
///
|
||||
/// "Aliases" covers three sources, matching the (formerly inline)
|
||||
/// `ShellMetaValidated` handler:
|
||||
///
|
||||
/// 1. SSA values whose `var_name` equals `var`.
|
||||
/// 2. Base aliases of `var` from copy propagation
|
||||
/// (`base_aliases.aliases_of(var)`), so a value renamed away by
|
||||
/// `copy_propagate` (`const t = url;`) is still reached when the
|
||||
/// condition or sink references the other name.
|
||||
/// 3. Transitive copy-chain operands (`collect_copy_alias_operands`) for
|
||||
/// Assign/Phi copies the optimizer did not propagate away.
|
||||
///
|
||||
/// The two URL arms (`RelativeUrlValidated` / `HostAllowlistValidated`)
|
||||
/// previously matched only source 1 against `state.values`, missing the
|
||||
/// `const t = url; if (t.startsWith("/")) { res.redirect(url) }` alias shape
|
||||
/// the shell arm handled, so they fired false positives on the validated
|
||||
/// branch. Sharing this helper unifies all three.
|
||||
fn clear_cap_alias_aware(
|
||||
state: &mut SsaTaintState,
|
||||
var: &str,
|
||||
cap: Cap,
|
||||
ssa: &SsaBody,
|
||||
base_aliases: Option<&crate::ssa::alias::BaseAliasResult>,
|
||||
) {
|
||||
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
let mut names: SmallVec<[&str; 4]> = smallvec::smallvec![var];
|
||||
if let Some(aliases) = base_aliases.and_then(|aliases| aliases.aliases_of(var)) {
|
||||
for alias in aliases {
|
||||
if alias.as_str() != var {
|
||||
names.push(alias.as_str());
|
||||
}
|
||||
}
|
||||
}
|
||||
for &name_to_clear in names.iter() {
|
||||
for (idx, def) in ssa.value_defs.iter().enumerate() {
|
||||
if def.var_name.as_deref() == Some(name_to_clear) {
|
||||
let val = SsaValue(idx as u32);
|
||||
to_clear.push(val);
|
||||
collect_copy_alias_operands(val, ssa, &mut to_clear);
|
||||
}
|
||||
}
|
||||
}
|
||||
to_clear.sort_by_key(|v| v.0);
|
||||
to_clear.dedup_by_key(|v| v.0);
|
||||
for val in to_clear {
|
||||
if let Some(taint) = state.get(val).cloned() {
|
||||
let new_caps = taint.caps & !cap;
|
||||
if new_caps.is_empty() {
|
||||
state.remove(val);
|
||||
} else {
|
||||
state.set(
|
||||
val,
|
||||
VarTaint {
|
||||
caps: new_caps,
|
||||
origins: taint.origins,
|
||||
uses_summary: taint.uses_summary,
|
||||
},
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn collect_copy_alias_operands(root: SsaValue, ssa: &SsaBody, out: &mut SmallVec<[SsaValue; 4]>) {
|
||||
let mut seen = HashSet::new();
|
||||
let mut stack = vec![root];
|
||||
|
|
@ -1685,6 +1658,36 @@ fn collect_copy_alias_operands(root: SsaValue, ssa: &SsaBody, out: &mut SmallVec
|
|||
}
|
||||
}
|
||||
|
||||
/// Decide whether the "null / none / no-error" outcome is the condition's
|
||||
/// TRUE branch from the lowered condition text.
|
||||
///
|
||||
/// This must distinguish equality (`err == null` → null branch is TRUE)
|
||||
/// from strict/loose inequality (`err !== null` / `err != null` → null
|
||||
/// branch is the FALSE branch). A naive `contains("== null")` is wrong
|
||||
/// because `"err !== null"` contains `"== null"` as a substring (the `==`
|
||||
/// of `!==` followed by ` null`), which would mark validation on the wrong
|
||||
/// branch. We therefore reject any inequality form first.
|
||||
fn null_check_true_branch_is_success(lower: &str) -> bool {
|
||||
// Inequality forms put the null/none outcome on the FALSE branch.
|
||||
// Strip them before testing equality so the `==` inside `!==`/`!=`
|
||||
// cannot be mistaken for an equality comparison.
|
||||
if lower.contains("!= nil")
|
||||
|| lower.contains("!= none")
|
||||
|| lower.contains("is not none")
|
||||
|| lower.contains("is_err")
|
||||
|| lower.contains("!== null")
|
||||
|| lower.contains("!= null")
|
||||
{
|
||||
return false;
|
||||
}
|
||||
lower.contains("== nil")
|
||||
|| lower.contains("== none")
|
||||
|| lower.contains("is none")
|
||||
|| lower.contains("is_ok")
|
||||
|| lower.contains("=== null")
|
||||
|| lower.contains("== null")
|
||||
}
|
||||
|
||||
/// Mark the input arguments of a value-producing validator as validated
|
||||
/// on the success branch of a downstream `err`-check.
|
||||
///
|
||||
|
|
@ -1728,12 +1731,7 @@ fn apply_validation_err_check_narrowing(
|
|||
// Defaults to FALSE for `err != nil`-style; flips to TRUE for
|
||||
// `err == nil`-style and `is_ok()`.
|
||||
let lower = cond_text.to_ascii_lowercase();
|
||||
let success_branch_is_true = lower.contains("== nil")
|
||||
|| lower.contains("== none")
|
||||
|| lower.contains("is none")
|
||||
|| lower.contains("is_ok")
|
||||
|| lower.contains("=== null")
|
||||
|| lower.contains("== null");
|
||||
let success_branch_is_true = null_check_true_branch_is_success(&lower);
|
||||
|
||||
// Resolve `err`'s reaching SSA value (last def in this or earlier block).
|
||||
// We restrict to single-var conditions to avoid mis-attributing
|
||||
|
|
@ -1887,12 +1885,7 @@ fn apply_input_validator_branch_narrowing(
|
|||
// `apply_validation_err_check_narrowing` uses for the `err == nil`
|
||||
// family.
|
||||
let lower = cond_text.to_ascii_lowercase();
|
||||
let cond_text_says_null_branch_is_true = lower.contains("== nil")
|
||||
|| lower.contains("== none")
|
||||
|| lower.contains("is none")
|
||||
|| lower.contains("is_ok")
|
||||
|| lower.contains("=== null")
|
||||
|| lower.contains("== null");
|
||||
let cond_text_says_null_branch_is_true = null_check_true_branch_is_success(&lower);
|
||||
|
||||
let success_branch_is_true = match polarity {
|
||||
InputValidatorPolarity::ErrorReturning => cond_text_says_null_branch_is_true,
|
||||
|
|
@ -2388,6 +2381,68 @@ fn inline_analyse_callee(
|
|||
/// mechanism would otherwise lose the flow).
|
||||
pub(crate) type PromiseCallbackSeeds<'a> = &'a [(usize, VarTaint)];
|
||||
|
||||
/// Resolve callback arguments: when a call argument refers to a known
|
||||
/// function name (resolvable to a `FuncKey` in `local_summaries` and present
|
||||
/// in `callee_bodies`), record the mapping `callee param name → target
|
||||
/// FuncKey` so the callee's analysis can resolve calls through the
|
||||
/// parameter.
|
||||
///
|
||||
/// This is consulted by `resolve_callee` step -1. Because the resolved
|
||||
/// bindings shape the callee's return taint but are NOT reflected in the
|
||||
/// `(FuncKey, ArgTaintSig)` inline-cache key (function references are
|
||||
/// typically untainted, so different callbacks produce identical sigs),
|
||||
/// callers MUST skip the inline cache (both lookup and insert) whenever the
|
||||
/// returned map is non-empty, otherwise a call site passing `safeCb` could
|
||||
/// reuse / poison the cached shape of a call site passing `sourceCb`.
|
||||
fn compute_callback_bindings(
|
||||
callee_ssa: &SsaBody,
|
||||
args: &[SmallVec<[SsaValue; 2]>],
|
||||
transfer: &SsaTaintTransfer,
|
||||
caller_ssa: &SsaBody,
|
||||
) -> HashMap<String, FuncKey> {
|
||||
let mut callback_bindings: HashMap<String, FuncKey> = HashMap::new();
|
||||
for block in &callee_ssa.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
if let SsaOp::Param { index } = &inst.op {
|
||||
if let Some(param_name) = inst.var_name.as_ref() {
|
||||
if *index < args.len() {
|
||||
for v in &args[*index] {
|
||||
if let Some(arg_var_name) = caller_ssa
|
||||
.value_defs
|
||||
.get(v.0 as usize)
|
||||
.and_then(|vd| vd.var_name.as_deref())
|
||||
{
|
||||
let norm = callee_leaf_name(arg_var_name);
|
||||
let hint_raw = callee_container_hint(arg_var_name);
|
||||
let hint = if hint_raw.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(hint_raw)
|
||||
};
|
||||
if let Some(target_key) = resolve_local_func_key(
|
||||
transfer.local_summaries,
|
||||
transfer.lang,
|
||||
transfer.namespace,
|
||||
norm,
|
||||
hint,
|
||||
) {
|
||||
if transfer
|
||||
.callee_bodies
|
||||
.is_some_and(|cb| cb.contains_key(&target_key))
|
||||
{
|
||||
callback_bindings.insert(param_name.clone(), target_key);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
callback_bindings
|
||||
}
|
||||
|
||||
fn inline_analyse_callee_with_seeds(
|
||||
callee: &str,
|
||||
args: &[SmallVec<[SsaValue; 2]>],
|
||||
|
|
@ -2463,7 +2518,8 @@ fn inline_analyse_callee_with_seeds(
|
|||
receiver_var,
|
||||
arity: arity_hint,
|
||||
};
|
||||
match gs.resolve_callee(&query) {
|
||||
let res = gs.resolve_callee(&query);
|
||||
match res {
|
||||
CalleeResolution::Resolved(key) => {
|
||||
let xfile_bodies = transfer.cross_file_bodies?;
|
||||
let body = xfile_bodies.get(&key)?;
|
||||
|
|
@ -2510,11 +2566,22 @@ fn inline_analyse_callee_with_seeds(
|
|||
// promise-callback seeds into the signature.
|
||||
let sig = build_arg_taint_sig_with_seeds(args, receiver, state, promise_callback_seeds);
|
||||
|
||||
// Resolve callback bindings BEFORE consulting the cache. The cache key
|
||||
// `(FuncKey, ArgTaintSig)` does not capture WHICH function was passed as
|
||||
// a callback argument (function refs are typically untainted, so
|
||||
// `wrap(safeCb)` and `wrap(sourceCb)` produce identical sigs). Since
|
||||
// the bound function's summary shapes the callee's return taint via
|
||||
// `resolve_callee` step -1, a cached shape is only sound to reuse / store
|
||||
// when there are no callback bindings. When bindings exist we bypass the
|
||||
// cache entirely (both lookup below and insert later) to avoid poisoning.
|
||||
let callback_bindings = compute_callback_bindings(&callee_body.ssa, args, transfer, caller_ssa);
|
||||
let cacheable = callback_bindings.is_empty();
|
||||
|
||||
// Check cache (keyed by FuncKey + arg signature). The cached value
|
||||
// is a structural shape, re-attribute origins to the current call
|
||||
// site before returning so two callers with matching caps but
|
||||
// different origins see their own source chains.
|
||||
{
|
||||
if cacheable {
|
||||
let cache = cache_ref.borrow();
|
||||
if let Some(cached) = cache.get(&(callee_key.clone(), sig.clone())) {
|
||||
record_engine_note(crate::engine_notes::EngineNote::InlineCacheReused);
|
||||
|
|
@ -2610,53 +2677,10 @@ fn inline_analyse_callee_with_seeds(
|
|||
})
|
||||
});
|
||||
|
||||
// Detect callback arguments: when a call argument refers to a known function
|
||||
// name (resolvable to a FuncKey in the local summaries index), record the
|
||||
// mapping so the callee's analysis can resolve calls through the parameter.
|
||||
//
|
||||
// The binding value is a full `FuncKey` rather than a leaf string so the
|
||||
// child transfer can look up `callee_bodies` / `ssa_summaries` / local
|
||||
// summaries by canonical identity.
|
||||
let mut callback_bindings: HashMap<String, FuncKey> = HashMap::new();
|
||||
for block in &callee_body.ssa.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
if let SsaOp::Param { index } = &inst.op {
|
||||
if let Some(param_name) = inst.var_name.as_ref() {
|
||||
if *index < args.len() {
|
||||
for v in &args[*index] {
|
||||
if let Some(arg_var_name) = caller_ssa
|
||||
.value_defs
|
||||
.get(v.0 as usize)
|
||||
.and_then(|vd| vd.var_name.as_deref())
|
||||
{
|
||||
let norm = callee_leaf_name(arg_var_name);
|
||||
let hint_raw = callee_container_hint(arg_var_name);
|
||||
let hint = if hint_raw.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(hint_raw)
|
||||
};
|
||||
if let Some(target_key) = resolve_local_func_key(
|
||||
transfer.local_summaries,
|
||||
transfer.lang,
|
||||
transfer.namespace,
|
||||
norm,
|
||||
hint,
|
||||
) {
|
||||
if transfer
|
||||
.callee_bodies
|
||||
.is_some_and(|cb| cb.contains_key(&target_key))
|
||||
{
|
||||
callback_bindings.insert(param_name.clone(), target_key);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// Callback argument bindings were resolved before the cache lookup
|
||||
// (see `compute_callback_bindings`) so a non-empty map could bypass the
|
||||
// inline cache. Reuse the same map here to drive the child transfer's
|
||||
// `resolve_callee` step -1.
|
||||
|
||||
let cb_ref = if callback_bindings.is_empty() {
|
||||
None
|
||||
|
|
@ -2774,7 +2798,13 @@ fn inline_analyse_callee_with_seeds(
|
|||
|
||||
// Cache the structural shape under the canonical FuncKey, then
|
||||
// re-attribute to this call site's actual arg/receiver origins.
|
||||
{
|
||||
//
|
||||
// Only cache shapes that do not depend on callback identity. When
|
||||
// `callback_bindings` is non-empty the resolved return taint is shaped by
|
||||
// step -1 callback resolution, which is invisible to the
|
||||
// `(FuncKey, ArgTaintSig)` key, so caching would poison sibling call
|
||||
// sites that pass a different callback.
|
||||
if cacheable {
|
||||
let mut cache = cache_ref.borrow_mut();
|
||||
cache.insert((callee_key, sig), shape.clone());
|
||||
}
|
||||
|
|
@ -2916,11 +2946,26 @@ fn extract_inline_return_taint(
|
|||
let mut derived_params: u64 = 0;
|
||||
let mut derived_receiver: bool = false;
|
||||
|
||||
// Explicit-return param passthrough: return paths whose return value IS
|
||||
// a formal parameter (`return x;`). This channel is JOINED with the
|
||||
// derived channel below so a mixed-return helper
|
||||
// (`if (c) return src(); return x;`) propagates BOTH the derived source
|
||||
// taint and the parameter passthrough provenance.
|
||||
let mut param_caps = Cap::empty();
|
||||
let mut param_internal: SmallVec<[TaintOrigin; 2]> = SmallVec::new();
|
||||
let mut param_params: u64 = 0;
|
||||
let mut param_receiver: bool = false;
|
||||
|
||||
// Return(None) fallback param passthrough: caps swept from ALL live
|
||||
// param-derived values on implicit-return paths. Kept separate because
|
||||
// it is noisy (it re-adds caps the callee may have sanitized), so it
|
||||
// only acts as a last-resort fallback, never joined into the derived
|
||||
// channel.
|
||||
let mut param_fallback_caps = Cap::empty();
|
||||
let mut param_fallback_internal: SmallVec<[TaintOrigin; 2]> = SmallVec::new();
|
||||
let mut param_fallback_params: u64 = 0;
|
||||
let mut param_fallback_receiver: bool = false;
|
||||
|
||||
// Join of the return value's [`PathFact`] across every return block.
|
||||
// Seeded with `None` (no observation) and widened conservatively to
|
||||
// [`PathFact::top`] if any return block gives Top or the value is
|
||||
|
|
@ -3120,13 +3165,13 @@ fn extract_inline_return_taint(
|
|||
// Fall back to collecting all live values.
|
||||
for (val, taint) in &exit.values {
|
||||
if param_values.contains(val) {
|
||||
param_caps |= taint.caps;
|
||||
param_fallback_caps |= taint.caps;
|
||||
for orig in &taint.origins {
|
||||
classify_and_push(
|
||||
orig,
|
||||
&mut param_internal,
|
||||
&mut param_params,
|
||||
&mut param_receiver,
|
||||
&mut param_fallback_internal,
|
||||
&mut param_fallback_params,
|
||||
&mut param_fallback_receiver,
|
||||
);
|
||||
}
|
||||
} else {
|
||||
|
|
@ -3145,17 +3190,35 @@ fn extract_inline_return_taint(
|
|||
}
|
||||
}
|
||||
|
||||
// Prefer derived caps; fall back to param-return caps for passthrough functions.
|
||||
let (final_caps, final_internal, final_params, final_receiver) = if !derived_caps.is_empty() {
|
||||
(
|
||||
derived_caps,
|
||||
derived_internal,
|
||||
derived_params,
|
||||
derived_receiver,
|
||||
)
|
||||
} else {
|
||||
(param_caps, param_internal, param_params, param_receiver)
|
||||
};
|
||||
// Join the derived channel (taint produced inside the callee) with the
|
||||
// explicit-return param-passthrough channel (return paths whose return
|
||||
// value IS a formal parameter). A mixed-return helper
|
||||
// (`if (c) return src(); return x;`) has a non-empty derived channel on
|
||||
// one path AND a param passthrough on another; the correct return-value
|
||||
// abstraction across paths is the union of both, not a choose-one.
|
||||
// The noisy Return(None) fallback (`param_fallback_*`) is used only when
|
||||
// BOTH structured channels are empty, so it never re-adds caps the
|
||||
// callee sanitized on an explicit-return path.
|
||||
let (final_caps, final_internal, final_params, final_receiver) =
|
||||
if !derived_caps.is_empty() || !param_caps.is_empty() {
|
||||
let mut internal = derived_internal;
|
||||
for orig in ¶m_internal {
|
||||
push_internal(&mut internal, orig);
|
||||
}
|
||||
(
|
||||
derived_caps | param_caps,
|
||||
internal,
|
||||
derived_params | param_params,
|
||||
derived_receiver || param_receiver,
|
||||
)
|
||||
} else {
|
||||
(
|
||||
param_fallback_caps,
|
||||
param_fallback_internal,
|
||||
param_fallback_params,
|
||||
param_fallback_receiver,
|
||||
)
|
||||
};
|
||||
|
||||
let return_path_fact =
|
||||
return_path_fact_acc.unwrap_or_else(crate::abstract_interp::PathFact::top);
|
||||
|
|
@ -4874,11 +4937,19 @@ pub(super) fn transfer_inst(
|
|||
&resolved.propagating_params
|
||||
};
|
||||
|
||||
if !resolved.param_return_paths.is_empty() && !effective_params.is_empty() {
|
||||
if !effective_params.is_empty() {
|
||||
// Per-parameter application: each propagating param
|
||||
// contributes taint narrowed by its own per-path
|
||||
// sanitizer. Origins are still aggregated across
|
||||
// params, they name source anchors, not transforms.
|
||||
// contributes taint narrowed by ITS OWN sanitizer
|
||||
// (`effective_param_sanitizer`, which prefers the
|
||||
// param's per-return-path strip bits and otherwise
|
||||
// falls back to its own `param_to_return_strip` —
|
||||
// never the cross-param aggregate). This avoids the
|
||||
// cross-parameter sanitizer bleed of unioning all
|
||||
// args then stripping the aggregate: for
|
||||
// `f(a,b){ return a + escape(b) }`, param 1's
|
||||
// HTML_ESCAPE strip must not clear param 0's taint.
|
||||
// Origins are still aggregated across params, they
|
||||
// name source anchors, not transforms.
|
||||
let mut any_origin_added = false;
|
||||
for ¶m_idx in effective_params {
|
||||
let arg_caps_origins =
|
||||
|
|
@ -4898,6 +4969,11 @@ pub(super) fn transfer_inst(
|
|||
// Sentinel reference to silence unused on cold paths.
|
||||
let _ = any_origin_added;
|
||||
} else {
|
||||
// No positional argument info (`arg_uses` empty), so
|
||||
// per-param attribution is impossible. Union all
|
||||
// propagating-arg taint; the aggregate sanitizer is
|
||||
// applied below. This is the only path that may
|
||||
// over-strip, and only when positions are unknown.
|
||||
let (prop_caps, prop_origins) =
|
||||
collect_args_taint(args, receiver, state, effective_params);
|
||||
return_bits |= prop_caps;
|
||||
|
|
@ -7847,7 +7923,15 @@ fn collect_block_events(
|
|||
.and_then(|vd| vd.var_name.as_deref());
|
||||
if let Some(name) = var_name {
|
||||
if let Some(sym) = transfer.interner.get(name) {
|
||||
return state.validated_may.contains(sym);
|
||||
// Require validation on EVERY reaching path
|
||||
// (intersection-on-join) before suppressing the
|
||||
// sink. `validated_may` is the union over
|
||||
// predecessors (validated on SOME path), so a value
|
||||
// validated on one branch but bypassable on another
|
||||
// would be wrongly silenced (`if (valid(x)) {...}
|
||||
// else {} sink(x)`). `validated_must` is the sound
|
||||
// "all paths validated" gate.
|
||||
return state.validated_must.contains(sym);
|
||||
}
|
||||
}
|
||||
false
|
||||
|
|
@ -7994,8 +8078,15 @@ fn is_noreturn_call(lang: Lang, callee: &str) -> bool {
|
|||
if !matches!(lang, Lang::C | Lang::Cpp) {
|
||||
return false;
|
||||
}
|
||||
let method = crate::labels::bare_method_name(callee);
|
||||
matches!(method, "exit" | "_Exit" | "quick_exit" | "abort")
|
||||
// Only free-function calls terminate the process. A receiver-qualified
|
||||
// method whose trailing segment happens to be `exit`/`abort`/etc.
|
||||
// (e.g. `transaction.abort()`, `app.exit()`) is an ordinary call and
|
||||
// must NOT wipe the taint state. Reject any callee carrying a receiver
|
||||
// (`.` member access or `->` pointer access) and match the full text.
|
||||
if callee.contains('.') || callee.contains("->") {
|
||||
return false;
|
||||
}
|
||||
matches!(callee, "exit" | "_Exit" | "quick_exit" | "abort")
|
||||
}
|
||||
|
||||
// ── Primary sink-site attribution ───────────────────────────────────────
|
||||
|
|
@ -10978,8 +11069,24 @@ fn is_string_safe_for_ssrf(sf: &crate::abstract_interp::StringFact) -> bool {
|
|||
// Absolute-path prefix (e.g. "/projects/..."), internal redirect, not open redirect.
|
||||
// The leading "/" locks the path to the same origin; the attacker cannot control the scheme
|
||||
// or host, so this is not an SSRF vector.
|
||||
//
|
||||
// Exception: a bare "/" lock is NOT safe. When the only constant is a
|
||||
// single "/" and the attacker-controlled suffix begins with another "/",
|
||||
// the full value is "//evil.com/..." — a protocol-relative URL that
|
||||
// browsers and common HTTP clients fetch from the attacker's host.
|
||||
// Likewise a prefix already starting with "//" is protocol-relative.
|
||||
// Require a fixed non-slash second character so the leading-slash lock
|
||||
// cannot be widened into "//host".
|
||||
if prefix.starts_with('/') {
|
||||
return true;
|
||||
let mut chars = prefix.chars();
|
||||
chars.next(); // consume leading '/'
|
||||
match chars.next() {
|
||||
// "/x..." with x != '/': same-origin path lock, safe.
|
||||
Some(c) if c != '/' => return true,
|
||||
// "//..." protocol-relative, or bare "/" that a tainted "/host"
|
||||
// suffix can turn protocol-relative: not safe.
|
||||
_ => return false,
|
||||
}
|
||||
}
|
||||
if let Some(after_scheme) = prefix.find("://") {
|
||||
let host_and_rest = &prefix[after_scheme + 3..];
|
||||
|
|
@ -11223,6 +11330,19 @@ struct ResolvedSummary {
|
|||
param_to_gate_filters: Vec<(usize, Cap)>,
|
||||
propagates_taint: bool,
|
||||
propagating_params: Vec<usize>,
|
||||
/// Per-parameter sanitizer strip-bits lifted from
|
||||
/// `SsaFuncSummary::param_to_return`. Each `(param_idx, bits)` entry
|
||||
/// means "taint flowing through THIS parameter to the return value has
|
||||
/// `bits` stripped" (a `StripBits` transform on that param). Params with
|
||||
/// an `Identity` transform contribute no entry.
|
||||
///
|
||||
/// Kept per-parameter so the call site can strip a param's sanitizer
|
||||
/// only from that param's own taint contribution, instead of unioning
|
||||
/// every param's StripBits into one aggregate (`sanitizer_caps`) and
|
||||
/// applying it to the union of all propagating args — which bleeds one
|
||||
/// param's sanitizer onto an unsanitized sibling arg
|
||||
/// (`f(a,b){ return a + escape(b) }` would otherwise strip `a`'s caps).
|
||||
param_to_return_strip: Vec<(usize, Cap)>,
|
||||
/// Parameter indices whose container identity flows to return value.
|
||||
param_container_to_return: Vec<usize>,
|
||||
/// (src_param, container_param) pairs: src taint stored into container.
|
||||
|
|
@ -11418,6 +11538,7 @@ fn resolve_callee_full(
|
|||
param_to_sink_sites: vec![],
|
||||
propagates_taint: !ls.propagating_params.is_empty(),
|
||||
propagating_params: ls.propagating_params.clone(),
|
||||
param_to_return_strip: vec![],
|
||||
param_container_to_return: vec![],
|
||||
param_to_container_store: vec![],
|
||||
return_type: None,
|
||||
|
|
@ -11490,6 +11611,7 @@ fn resolve_callee_full(
|
|||
param_to_sink_sites: vec![],
|
||||
propagates_taint: false,
|
||||
propagating_params: vec![],
|
||||
param_to_return_strip: vec![],
|
||||
param_container_to_return: vec![],
|
||||
param_to_container_store: vec![],
|
||||
return_type: None,
|
||||
|
|
@ -11694,6 +11816,7 @@ fn resolve_callee_full(
|
|||
param_to_sink_sites: vec![],
|
||||
propagates_taint: !ls.propagating_params.is_empty(),
|
||||
propagating_params: ls.propagating_params.clone(),
|
||||
param_to_return_strip: vec![],
|
||||
param_container_to_return: vec![],
|
||||
param_to_container_store: vec![],
|
||||
return_type: None,
|
||||
|
|
@ -11748,6 +11871,7 @@ fn resolve_callee_full(
|
|||
param_to_sink_sites: fs.param_to_sink.clone(),
|
||||
propagates_taint: fs.propagates_any(),
|
||||
propagating_params: fs.propagating_params.clone(),
|
||||
param_to_return_strip: vec![],
|
||||
param_container_to_return: vec![],
|
||||
param_to_container_store: vec![],
|
||||
return_type: None,
|
||||
|
|
@ -11817,6 +11941,7 @@ fn resolve_callee_full(
|
|||
param_to_sink_sites: fs.param_to_sink.clone(),
|
||||
propagates_taint: fs.propagates_any(),
|
||||
propagating_params: fs.propagating_params.clone(),
|
||||
param_to_return_strip: vec![],
|
||||
param_container_to_return: vec![],
|
||||
param_to_container_store: vec![],
|
||||
return_type: None,
|
||||
|
|
@ -11840,6 +11965,34 @@ fn resolve_callee_full(
|
|||
None
|
||||
}
|
||||
|
||||
/// Strip bits recorded for a single parameter (`param_to_return_strip`),
|
||||
/// or `Cap::empty()` when this parameter has no `StripBits` transform.
|
||||
///
|
||||
/// This is the per-parameter sanitizer. When per-parameter decomposition
|
||||
/// EXISTS (`param_to_return_strip` non-empty) it deliberately does NOT
|
||||
/// consult the cross-param aggregate (`resolved.sanitizer_caps`), so one
|
||||
/// parameter's sanitizer cannot bleed onto a sibling argument's taint
|
||||
/// (`f(a,b){ return a + escape(b) }`).
|
||||
///
|
||||
/// When `param_to_return_strip` is ENTIRELY empty, the summary carries no
|
||||
/// per-param decomposition at all — this is the coarse cross-file tier
|
||||
/// (`ResolvedSummary` built from a `FuncSummary`/`LocalFuncSummary` where
|
||||
/// only the aggregate `sanitizer_caps` is known, e.g. cross-file resolution
|
||||
/// in [`resolve_callee_full`]). There is no sibling param to bleed from, so
|
||||
/// fall back to the aggregate — matching the pre-#37 behaviour and keeping
|
||||
/// cross-file sanitizers (`clean(input)` in another file) effective.
|
||||
fn param_strip_bits(resolved: &ResolvedSummary, param_idx: usize) -> Cap {
|
||||
if resolved.param_to_return_strip.is_empty() {
|
||||
return resolved.sanitizer_caps;
|
||||
}
|
||||
resolved
|
||||
.param_to_return_strip
|
||||
.iter()
|
||||
.find(|(i, _)| *i == param_idx)
|
||||
.map(|(_, caps)| *caps)
|
||||
.unwrap_or_else(Cap::empty)
|
||||
}
|
||||
|
||||
/// Compute the effective sanitizer bits that apply at the call site for a
|
||||
/// specific parameter, narrowed by the caller's predicate state.
|
||||
///
|
||||
|
|
@ -11850,10 +12003,11 @@ fn resolve_callee_full(
|
|||
/// not know which return path the callee took, so only bits stripped on
|
||||
/// EVERY compatible path can be considered cleared.
|
||||
///
|
||||
/// Falls back to `resolved.sanitizer_caps` (the aggregate) when:
|
||||
/// Falls back to the parameter's own `param_to_return_strip` bits (see
|
||||
/// [`param_strip_bits`], NOT the cross-param aggregate) when:
|
||||
/// * the summary has no per-path data for this parameter;
|
||||
/// * every path is predicate-compatible (the narrowing adds no information);
|
||||
/// * no path is predicate-compatible (conservative: keep aggregate).
|
||||
/// * no path is predicate-compatible (conservative: keep the per-param bits).
|
||||
fn effective_param_sanitizer(
|
||||
resolved: &ResolvedSummary,
|
||||
param_idx: usize,
|
||||
|
|
@ -11867,7 +12021,14 @@ fn effective_param_sanitizer(
|
|||
.find(|(i, _)| *i == param_idx)
|
||||
{
|
||||
Some((_, p)) => p,
|
||||
None => return resolved.sanitizer_caps,
|
||||
None => {
|
||||
// No per-return-path decomposition for this param: fall back to
|
||||
// THIS parameter's own strip bits, NOT the cross-param aggregate
|
||||
// (`resolved.sanitizer_caps`). Using the aggregate here bleeds a
|
||||
// sibling param's sanitizer onto this param's taint
|
||||
// (`f(a,b){ return a + escape(b) }`).
|
||||
return param_strip_bits(resolved, param_idx);
|
||||
}
|
||||
};
|
||||
|
||||
// Caller-side predicate envelope: union of known_true / known_false bits
|
||||
|
|
@ -11898,9 +12059,10 @@ fn effective_param_sanitizer(
|
|||
|
||||
if compatible.is_empty() {
|
||||
// No path applies, the caller's predicate state contradicts every
|
||||
// recorded return. Fall back to the aggregate rather than
|
||||
// synthesise a sanitiser from zero data.
|
||||
return resolved.sanitizer_caps;
|
||||
// recorded return. Fall back to THIS parameter's own strip bits
|
||||
// rather than the cross-param aggregate (which would bleed a sibling
|
||||
// param's sanitizer onto this param's taint).
|
||||
return param_strip_bits(resolved, param_idx);
|
||||
}
|
||||
|
||||
// Intersection of strip-bits across compatible paths. Identity
|
||||
|
|
@ -11927,7 +12089,7 @@ fn effective_param_sanitizer(
|
|||
}
|
||||
}
|
||||
if !saw_any {
|
||||
resolved.sanitizer_caps
|
||||
param_strip_bits(resolved, param_idx)
|
||||
} else {
|
||||
common
|
||||
}
|
||||
|
|
@ -11952,11 +12114,19 @@ fn convert_ssa_to_resolved_for_caller(
|
|||
.map(|(idx, _)| *idx)
|
||||
.collect();
|
||||
|
||||
// Compute effective sanitizer caps: union of StripBits across all params
|
||||
// Compute effective sanitizer caps: union of StripBits across all params.
|
||||
// Retained for the aggregate fallback / non-positional resolution paths,
|
||||
// but the per-parameter `param_to_return_strip` below is what the call
|
||||
// site should prefer so one param's sanitizer does not bleed onto a
|
||||
// sibling arg's taint.
|
||||
let mut sanitizer_caps = Cap::empty();
|
||||
for (_, transform) in &ssa_sum.param_to_return {
|
||||
let mut param_to_return_strip: Vec<(usize, Cap)> = Vec::new();
|
||||
for (idx, transform) in &ssa_sum.param_to_return {
|
||||
if let TaintTransform::StripBits(bits) = transform {
|
||||
sanitizer_caps |= *bits;
|
||||
if !bits.is_empty() {
|
||||
param_to_return_strip.push((*idx, *bits));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -12005,6 +12175,7 @@ fn convert_ssa_to_resolved_for_caller(
|
|||
param_to_sink_sites,
|
||||
propagates_taint: !propagating_params.is_empty(),
|
||||
propagating_params,
|
||||
param_to_return_strip,
|
||||
param_container_to_return: ssa_sum.param_container_to_return.clone(),
|
||||
param_to_container_store: ssa_sum.param_to_container_store.clone(),
|
||||
return_type: ssa_sum.return_type.clone(),
|
||||
|
|
@ -12128,6 +12299,23 @@ fn merge_resolved_summaries_fanout(
|
|||
}
|
||||
}
|
||||
|
||||
// param_to_return_strip: intersect per-parameter strip bits, mirroring
|
||||
// the `sanitizer_caps` AND rule. Only bits stripped by EVERY
|
||||
// implementer for a given parameter can be considered cleared, since the
|
||||
// virtual dispatch could land on an implementer that does not sanitize
|
||||
// that parameter. A param present in only one side is dropped (the
|
||||
// other implementer strips nothing for it).
|
||||
let mut merged_strip: Vec<(usize, Cap)> = Vec::new();
|
||||
for (idx, caps) in &acc.param_to_return_strip {
|
||||
if let Some((_, other)) = r.param_to_return_strip.iter().find(|(i, _)| i == idx) {
|
||||
let inter = *caps & *other;
|
||||
if !inter.is_empty() {
|
||||
merged_strip.push((*idx, inter));
|
||||
}
|
||||
}
|
||||
}
|
||||
acc.param_to_return_strip = merged_strip;
|
||||
|
||||
// SSA-precision fields: drop on any disagreement.
|
||||
if acc.return_type != r.return_type {
|
||||
acc.return_type = None;
|
||||
|
|
@ -12150,3 +12338,76 @@ fn merge_resolved_summaries_fanout(
|
|||
|
||||
acc
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod engine_audit_fixes_tests {
|
||||
use super::*;
|
||||
|
||||
// Finding #9: `!== null` must NOT be read as an equality null-check.
|
||||
#[test]
|
||||
fn null_check_strict_inequality_is_not_success_branch() {
|
||||
// Equality forms: null/none/no-error outcome is the TRUE branch.
|
||||
assert!(null_check_true_branch_is_success("err == null"));
|
||||
assert!(null_check_true_branch_is_success("err === null"));
|
||||
assert!(null_check_true_branch_is_success("err == nil"));
|
||||
assert!(null_check_true_branch_is_success("err == none"));
|
||||
assert!(null_check_true_branch_is_success("err is none"));
|
||||
assert!(null_check_true_branch_is_success("is_ok(err)"));
|
||||
|
||||
// Inequality forms: null outcome is the FALSE branch. The strict
|
||||
// `!==` form previously matched `== null` as a substring.
|
||||
assert!(!null_check_true_branch_is_success("err !== null"));
|
||||
assert!(!null_check_true_branch_is_success("err != null"));
|
||||
assert!(!null_check_true_branch_is_success("err != nil"));
|
||||
assert!(!null_check_true_branch_is_success("err != none"));
|
||||
assert!(!null_check_true_branch_is_success("is_err(err)"));
|
||||
}
|
||||
|
||||
// Finding #6: only free-function exit/abort wipes taint state; a
|
||||
// receiver-qualified method with the same trailing segment must not.
|
||||
#[test]
|
||||
fn noreturn_only_matches_free_functions() {
|
||||
// Free functions in C/C++ terminate the process.
|
||||
assert!(is_noreturn_call(Lang::C, "exit"));
|
||||
assert!(is_noreturn_call(Lang::Cpp, "abort"));
|
||||
assert!(is_noreturn_call(Lang::C, "_Exit"));
|
||||
assert!(is_noreturn_call(Lang::Cpp, "quick_exit"));
|
||||
|
||||
// Method calls (dot or arrow receiver) are ordinary calls.
|
||||
assert!(!is_noreturn_call(Lang::Cpp, "transaction.abort"));
|
||||
assert!(!is_noreturn_call(Lang::Cpp, "app.exit"));
|
||||
assert!(!is_noreturn_call(Lang::C, "handlers->abort"));
|
||||
assert!(!is_noreturn_call(Lang::Cpp, "state_machine.abort"));
|
||||
|
||||
// Other languages never treat these as noreturn.
|
||||
assert!(!is_noreturn_call(Lang::JavaScript, "exit"));
|
||||
}
|
||||
|
||||
// Finding #7: protocol-relative `//host` bypass of the leading-slash
|
||||
// SSRF origin lock.
|
||||
#[test]
|
||||
fn ssrf_safe_rejects_protocol_relative_prefix() {
|
||||
use crate::abstract_interp::StringFact;
|
||||
|
||||
// A real same-origin path lock is safe.
|
||||
assert!(is_string_safe_for_ssrf(&StringFact::from_prefix("/api/")));
|
||||
assert!(is_string_safe_for_ssrf(&StringFact::from_prefix(
|
||||
"/projects"
|
||||
)));
|
||||
|
||||
// A bare "/" lock is NOT safe: a tainted "/evil.com" suffix yields
|
||||
// a protocol-relative "//evil.com" URL.
|
||||
assert!(!is_string_safe_for_ssrf(&StringFact::from_prefix("/")));
|
||||
|
||||
// An already protocol-relative prefix is not safe either.
|
||||
assert!(!is_string_safe_for_ssrf(&StringFact::from_prefix("//")));
|
||||
assert!(!is_string_safe_for_ssrf(&StringFact::from_prefix(
|
||||
"//evil.com"
|
||||
)));
|
||||
|
||||
// Scheme-locked absolute URL with a path separator stays safe.
|
||||
assert!(is_string_safe_for_ssrf(&StringFact::from_prefix(
|
||||
"https://api.internal/"
|
||||
)));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -992,14 +992,34 @@ pub(super) fn summarise_return_predicates(state: &SsaTaintState) -> (u64, u8, u8
|
|||
let hash = h.finish();
|
||||
// Intersect known_true / known_false across all tracked variables:
|
||||
// the bits that hold for EVERY predicate-tracked var at this return.
|
||||
let known_true = sorted
|
||||
.iter()
|
||||
.map(|(_, kt, _)| *kt)
|
||||
.fold(u8::MAX, |a, b| a & b);
|
||||
let known_false = sorted
|
||||
.iter()
|
||||
.map(|(_, _, kf)| *kf)
|
||||
.fold(u8::MAX, |a, b| a & b);
|
||||
//
|
||||
// When `sorted` is empty (a return path guarded only by
|
||||
// ValidationCall/AllowlistCheck/TypeCheck, which populate `validated_must`
|
||||
// but never the `predicates` list because `predicate_kind_bit` returns
|
||||
// `None` for those kinds), the fold's `u8::MAX` identity would yield
|
||||
// `0xFF`/`0xFF`, claiming every predicate kind is simultaneously known-true
|
||||
// AND known-false on this path, a contradiction. The consumer
|
||||
// (`effective_param_sanitizer`) then prunes this validated path from the
|
||||
// compatible set whenever the caller tracks any predicate bit, over-broadly
|
||||
// clearing taint. No tracked predicate means no bit is known either way,
|
||||
// so the correct identity on the empty list is `0` (matching the Top-state
|
||||
// convention returned above).
|
||||
let known_true = if sorted.is_empty() {
|
||||
0
|
||||
} else {
|
||||
sorted
|
||||
.iter()
|
||||
.map(|(_, kt, _)| *kt)
|
||||
.fold(u8::MAX, |a, b| a & b)
|
||||
};
|
||||
let known_false = if sorted.is_empty() {
|
||||
0
|
||||
} else {
|
||||
sorted
|
||||
.iter()
|
||||
.map(|(_, _, kf)| *kf)
|
||||
.fold(u8::MAX, |a, b| a & b)
|
||||
};
|
||||
// Use `1` for the "no predicates but validated_must non-empty" case to
|
||||
// avoid colliding with the unguarded sentinel (0).
|
||||
let hash = if hash == 0 { 1 } else { hash };
|
||||
|
|
@ -1406,3 +1426,72 @@ pub(crate) fn extract_container_flow_summary(
|
|||
container_store.sort();
|
||||
(ctr, container_store)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::state::symbol::SymbolId;
|
||||
use crate::taint::domain::PredicateSummary;
|
||||
|
||||
/// Top state (no predicates, no validated_must) → all-zero sentinel.
|
||||
#[test]
|
||||
fn summarise_return_predicates_top_is_zero() {
|
||||
let state = SsaTaintState::initial();
|
||||
assert_eq!(summarise_return_predicates(&state), (0, 0, 0));
|
||||
}
|
||||
|
||||
/// A return path guarded only by a ValidationCall/AllowlistCheck/TypeCheck
|
||||
/// populates `validated_must` but leaves `predicates` empty. The
|
||||
/// known-true / known-false fields MUST be 0 (no tracked predicate ⇒ no bit
|
||||
/// known either way), not the `u8::MAX` fold identity, which would claim
|
||||
/// every predicate kind is simultaneously known-true AND known-false and
|
||||
/// cause the consumer to over-prune the validated path.
|
||||
#[test]
|
||||
fn summarise_return_predicates_validated_only_path_has_zero_bits() {
|
||||
let mut state = SsaTaintState::initial();
|
||||
state.validated_must.insert(SymbolId(3));
|
||||
let (hash, kt, kf) = summarise_return_predicates(&state);
|
||||
// Non-zero hash (validated_must is non-empty → distinct from unguarded).
|
||||
assert_ne!(
|
||||
hash, 0,
|
||||
"validated path must hash distinctly from unguarded"
|
||||
);
|
||||
assert_eq!(
|
||||
kt, 0,
|
||||
"no tracked predicate ⇒ known_true must be 0, not 0xFF"
|
||||
);
|
||||
assert_eq!(
|
||||
kf, 0,
|
||||
"no tracked predicate ⇒ known_false must be 0, not 0xFF"
|
||||
);
|
||||
// The result must not be a self-contradiction.
|
||||
assert_eq!(kt & kf, 0, "known_true & known_false contradiction");
|
||||
}
|
||||
|
||||
/// When predicates ARE tracked, the intersection still reflects the
|
||||
/// actual per-var bits (regression guard that the empty-list special case
|
||||
/// did not disturb the populated path).
|
||||
#[test]
|
||||
fn summarise_return_predicates_tracked_bits_intersect() {
|
||||
let mut state = SsaTaintState::initial();
|
||||
state.predicates.push((
|
||||
SymbolId(1),
|
||||
PredicateSummary {
|
||||
known_true: 0b0000_0011,
|
||||
known_false: 0b0000_0100,
|
||||
},
|
||||
));
|
||||
state.predicates.push((
|
||||
SymbolId(2),
|
||||
PredicateSummary {
|
||||
known_true: 0b0000_0001,
|
||||
known_false: 0b0000_1100,
|
||||
},
|
||||
));
|
||||
let (_hash, kt, kf) = summarise_return_predicates(&state);
|
||||
// Intersection of known_true: 0b011 & 0b001 = 0b001.
|
||||
assert_eq!(kt, 0b0000_0001);
|
||||
// Intersection of known_false: 0b100 & 0b1100 = 0b100.
|
||||
assert_eq!(kf, 0b0000_0100);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1650,6 +1650,7 @@ mod fanout_merge_tests {
|
|||
sink_caps: Cap::empty(),
|
||||
param_to_sink: vec![],
|
||||
param_to_sink_sites: vec![],
|
||||
param_to_return_strip: vec![],
|
||||
propagates_taint: false,
|
||||
propagating_params: vec![],
|
||||
param_container_to_return: vec![],
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue