fix(engine): CFG/SSA/taint/IPA soundness, precision & recall fixes

This commit is contained in:
elipeter 2026-06-11 16:46:01 -05:00
parent 59e4359257
commit 246f32a419
39 changed files with 4729 additions and 465 deletions

View file

@ -74,6 +74,20 @@ The attack-surface map and chain composer turn the flat finding list into a rout
- **DB fast-fail preflight.** `Indexer::init` reads the first 16 bytes of any candidate SQLite file and rejects anything without the standard `SQLite format 3\0` magic. Stops a misnamed JSON / text file from corrupting the index path with a SQLite error halfway through migration.
- **Symbolic-execution coverage.** Symex now recognises a wider set of string operations (`substr`, `replace`, `to_lower`, `to_upper`, `trim`, `strlen`) per the value/transfer pipeline, and the abstract-interpretation framework reasons about interval and prefix/suffix string facts during the dynamic verdict pass.
### Fixed (engine correctness)
- **CFG construction.** Python `if/elif/elif/else` chains no longer drop every alternative past the first, so a sink in a second `elif` or a trailing `else` is analysed (same fix covers PHP `else_if`). C-style `for (init; cond; incr)` loops now lower the initializer and increment, so taint introduced in the loop header (`for (cmd = getenv(...); ...)`) reaches the body. A `switch` `default` is no longer unconditionally hoisted to the chain tail, preserving fall-through order in C/C++/JS/TS/PHP/Java. Source/sink calls inside short-circuit `&&` / `||` operands of an `if` / `while` condition are now classified instead of dropped.
- **SSA lowering of exception handlers.** Catch blocks with internal control flow (an `if` / loop / nested `try` inside the handler) no longer lose every instruction past the catch entry — the orphan subtree is renamed through a virtual-root dominator tree, so sinks reached only inside a `catch` are seen. Catch-side variable reads now resolve to the most entry-dominating reaching definition (the pre-`try` value) rather than a post-join reassignment. A genuine positional argument equal to a chained-call receiver root (`a.b.m(p, a)`) is preserved instead of being stripped as the implicit chain root.
- **Taint soundness.** Sink suppression now gates on `validated_must` (validated on every path) instead of `validated_may` (any path), closing a false-negative where a single validated branch silenced a sink. `is_noreturn_call` no longer matches receiver-qualified `.exit()` / `.abort()` / method calls, so `transaction.abort()` stops wiping taint state. The SSRF same-origin check rejects protocol-relative `//host` prefixes (an open-redirect / SSRF bypass that a bare `/`-prefix check accepted). Inline-return taint unions the derived and parameter-passthrough channels for mixed-return helpers (`if (c) return src(); return x;`). The inline-analysis cache is keyed to exclude callback-bound arguments, so a function-valued argument no longer poisons sibling call sites that pass a different callback.
- **Taint precision.** `String.valueOf(String/Object)` is no longer tagged a safe-string producer (it is an identity passthrough, so `String.valueOf(req.getParameter(...))` was silently suppressed). Cross-parameter sanitizers no longer bleed onto sibling arguments (`f(a, b){ return a + escape(b) }` sanitises only `b`), and a cross-file sanitizer resolved through the coarse summary tier still applies its strip. Relative-URL and host-allowlist cap clearing is alias-aware. Substring-rejection and `indexOf() === -1` idioms are no longer misread as allowlist validation, and dotted multi-argument validators no longer over-validate unrelated targets.
- **Interprocedural resolution.** SCC / topo file batching and reachability key files by their package-qualified namespace, matching the call-graph nodes and SSA summary tier so cross-package callers resolve. Directly self-recursive functions now get SCC fixed-point treatment. Call resolution tolerates under-application (a call supplying fewer arguments than a callee with default / optional parameters) while still degrading to `Ambiguous` rather than a wrong pick. A failed SCC iteration no longer overwrites a file's cached diagnostics with an empty set. JS/TS module resolution appends extensions to dotted specifiers (`./user.service``user.service.ts`) and swaps a `.js` import to a `.ts` file (NodeNext / ESM).
- **JS/TS two-level solve.** Pass-2 top-level (global) taint now reaches nested closures two or more scopes deep, and the pass-2 dirty-skip no longer drops a nested body that transitively consumes a changed global through a parent-local.
- **Scan pipeline and index.** `replace_all_for_file` deletes stale SSA summary / body rows unconditionally, so an incremental rescan cannot leave orphaned rows. Cached findings recompute their category instead of being stamped `Security`, so structural warnings keep their real class. The indexed build persists auth summaries and cross-package imports, logs-and-skips an unreadable file instead of aborting the whole build, and keys `FuncSummary` entries to match the SSA tier so an indexed scan and a full scan agree.
- **Language coverage (recall).** KINDS maps were completed so previously-dropped bodies are walked: Java interface / enum / record / `synchronized` blocks, Rust inline `mod { ... }` items, Go labeled-statement bodies, and Ruby lambda / brace-block bodies. Go variadic and Python `*args` / `**kwargs` parameters are seeded with correct arity. C/C++ `scanf` / `fscanf` / `sscanf` / `read` register their output buffers as taint sources. TypeScript gated sinks dropped from the JS mirror were restored (`_.template`, `http.get` / `https.get`, `setValue` / `dotProp.set` / `jp.set`). The weak-hash and HTTP-URL AST patterns match single- and double-quoted string literals across JS, TS, and Ruby.
- **Symbolic execution.** Interprocedural parameter seeding fixed an off-by-one for method calls and now seeds the receiver / self parameter; the cross-file depth guard increments on descent; and a path cut short by the global step budget records `Inconclusive` instead of `Confirmed`.
- **Abstract interpretation and pointer analysis.** Interval division handles the `i64::MIN / -1` overflow (degrading to unbounded instead of a falsely-narrow range) and multiplication computes overflow in `i128`. `AbstractState::leq` checks entries present only in the other state, restoring a sound partial order. The pointer fixpoint re-projects container-element field reads after the receiver's points-to set converges.
- **CFG-level analyses.** Error-fallthrough termination stops at the `if` join point; a guard's constant-operand test refuses a `Source`-labelled call result; guard / sanitizer matchers require a leaf-name boundary (so `invalidate` no longer matches the `validate` guard and `unquote` no longer matches `quote`); resource ownership-transfer requires a real `->field =` assignment rather than any `->` in a span; post-dominators are computed once per resource pass; and the web-entrypoint heuristic confirms web parameters against the candidate handler only, so an unrelated `req` parameter elsewhere in the file no longer promotes batch / CLI functions to web entry points.
### CLI
- **`nyx scan --verify`** (enabled by default in standard builds) and `--backend {auto,process,docker}` select the dynamic-verification harness. `--no-verify` skips verification for a single run without changing config.

View file

@ -179,30 +179,21 @@ impl IntervalFact {
}
match (self.lo, self.hi, other.lo, other.hi) {
(Some(a_lo), Some(a_hi), Some(b_lo), Some(b_hi)) => {
let products = [
a_lo.checked_mul(b_lo),
a_lo.checked_mul(b_hi),
a_hi.checked_mul(b_lo),
a_hi.checked_mul(b_hi),
// Compute all four endpoint products in i128 (no i64 overflow
// possible) so we know the *true* min/max, then attribute
// overflow by which direction escapes the i64 range — not by
// which first-operand endpoint produced it.
let products: [i128; 4] = [
a_lo as i128 * b_lo as i128,
a_lo as i128 * b_hi as i128,
a_hi as i128 * b_lo as i128,
a_hi as i128 * b_hi as i128,
];
let lo = products.iter().filter_map(|p| *p).min();
let hi = products.iter().filter_map(|p| *p).max();
// If any product overflowed, the corresponding bound is None
if products.iter().any(|p| p.is_none()) {
Self {
lo: if lo.is_some() && products[..2].iter().all(|p| p.is_some()) {
lo
} else {
None
},
hi: if hi.is_some() && products[2..].iter().all(|p| p.is_some()) {
hi
} else {
None
},
}
} else {
Self { lo, hi }
let true_min = *products.iter().min().unwrap();
let true_max = *products.iter().max().unwrap();
Self {
lo: clamp_lo_i128(true_min),
hi: clamp_hi_i128(true_max),
}
}
_ => Self::top(),
@ -220,15 +211,24 @@ impl IntervalFact {
if b_lo <= 0 && b_hi >= 0 {
return Self::top();
}
let quotients = [
a_lo.checked_div(b_lo),
a_lo.checked_div(b_hi),
a_hi.checked_div(b_lo),
a_hi.checked_div(b_hi),
// Compute the four endpoint quotients in i128. This is exact
// for division (the divisor cannot be zero here) and captures
// the i64::MIN / -1 = i64::MAX + 1 overflow case, which
// checked_div would silently drop, producing a falsely narrow
// interval. Attribute the escape by direction: a quotient
// above i64::MAX leaves hi unbounded.
let quotients: [i128; 4] = [
a_lo as i128 / b_lo as i128,
a_lo as i128 / b_hi as i128,
a_hi as i128 / b_lo as i128,
a_hi as i128 / b_hi as i128,
];
let lo = quotients.iter().filter_map(|q| *q).min();
let hi = quotients.iter().filter_map(|q| *q).max();
Self { lo, hi }
let true_min = *quotients.iter().min().unwrap();
let true_max = *quotients.iter().max().unwrap();
Self {
lo: clamp_lo_i128(true_min),
hi: clamp_hi_i128(true_max),
}
}
_ => Self::top(),
}
@ -523,6 +523,28 @@ fn checked_sub_opt(a: Option<i64>, b: Option<i64>) -> Option<i64> {
}
}
/// Clamp an `i128` lower bound to `Option<i64>`. A bound outside the `i64`
/// range is unrepresentable on this side, so we degrade to `None`
/// (−∞), which is a sound over-approximation. Mirrors the overflow handling
/// of `add`/`sub` (overflow → unbounded).
fn clamp_lo_i128(lo: i128) -> Option<i64> {
if (i64::MIN as i128..=i64::MAX as i128).contains(&lo) {
Some(lo as i64)
} else {
None
}
}
/// Clamp an `i128` upper bound to `Option<i64>`. A bound outside the `i64`
/// range degrades to `None` (+∞), a sound over-approximation.
fn clamp_hi_i128(hi: i128) -> Option<i64> {
if (i64::MIN as i128..=i64::MAX as i128).contains(&hi) {
Some(hi as i64)
} else {
None
}
}
#[cfg(test)]
mod tests {
use super::*;
@ -822,6 +844,53 @@ mod tests {
assert!(r.lo.is_none() || r.hi.is_none());
}
/// Soundness: on overflow `mul` must attribute the unbounded direction
/// by which endpoint product actually escaped the i64 range, not by the
/// first operand's endpoint. `[i64::MIN, 0] * [-1, -1]` reaches
/// `i64::MAX + 1` (unbounded above), so `hi` must be `None`, never a
/// finite value like `0`.
#[test]
fn mul_overflow_attributes_high_bound_unbounded() {
let a = IntervalFact {
lo: Some(i64::MIN),
hi: Some(0),
};
let b = IntervalFact::exact(-1);
let r = a.mul(&b);
// True range is [0, i64::MAX + 1]: lo = 0 finite, hi unbounded.
assert_eq!(r.lo, Some(0), "mul lo must stay finite at 0");
assert_eq!(
r.hi, None,
"mul hi must be unbounded: i64::MIN * -1 escapes above i64::MAX"
);
assert!(
!r.is_proven_bounded(),
"an overflowing product must not be proven-bounded"
);
}
/// Symmetric soundness check on the lower bound:
/// `[0, i64::MAX] * [-2, -1]` reaches `-2 * i64::MAX` (unbounded below),
/// so `lo` must be `None`, never a finite floor like `-i64::MAX`.
#[test]
fn mul_overflow_attributes_low_bound_unbounded() {
let a = IntervalFact {
lo: Some(0),
hi: Some(i64::MAX),
};
let b = IntervalFact {
lo: Some(-2),
hi: Some(-1),
};
let r = a.mul(&b);
// True range is [-2*i64::MAX, 0]: lo unbounded, hi = 0 finite.
assert_eq!(
r.lo, None,
"mul lo must be unbounded: 0 * -2 .. i64::MAX * -2 escapes below i64::MIN"
);
assert_eq!(r.hi, Some(0), "mul hi must stay finite at 0");
}
// ── Bitwise interval transfer tests ────────────────────────────────
#[test]
@ -1062,6 +1131,31 @@ mod tests {
);
}
/// Soundness: `[i64::MIN, 0] / [-1, -1]` truly spans `[0, i64::MAX + 1]`,
/// so the result must NOT be proven-bounded. The old `checked_div`
/// implementation silently dropped the overflowing `i64::MIN / -1`
/// quotient and produced a narrow `[0, 0]`, falsely passing
/// `is_proven_bounded()` and defeating SQL/SHELL sink suppression.
#[test]
fn div_i64_min_overflow_not_proven_bounded() {
let a = IntervalFact {
lo: Some(i64::MIN),
hi: Some(0),
};
let b = IntervalFact::exact(-1);
let r = a.div(&b);
// Lower edge: i64::MIN / -1 overflows above i64::MAX → hi unbounded.
assert_eq!(r.lo, Some(0), "div lo must stay finite at 0");
assert_eq!(
r.hi, None,
"div hi must be unbounded: i64::MIN / -1 = i64::MAX + 1 escapes i64"
);
assert!(
!r.is_proven_bounded(),
"an overflowing quotient must not be reported as proven-bounded"
);
}
/// Modulo with a single-point negative divisor: `[0,10] % -3` must
/// be a valid interval (no panic, no negative-zero bound nonsense).
#[test]

View file

@ -488,15 +488,27 @@ impl AbstractState {
/// Partial order: self ⊑ other.
pub fn leq(&self, other: &Self) -> bool {
// Every non-Top entry in self must have a corresponding entry in other
// with self[v] ⊑ other[v]. Entries only in other are fine (Top ⊑ anything
// is false, but absent self entries are Top which is handled).
// self ⊑ other iff for every SSA value v: self[v] ⊑ other[v], using the
// convention that an absent entry is Top.
//
// Three cases by where v is stored:
// - in both: check self[v] ⊑ other[v] (loop over self below).
// - in self only: other[v] = Top, and self[v] ⊑ Top always holds — ok.
// - in other only: self[v] = Top; since stored entries are non-Top,
// Top ⋢ other[v], so self ⋢ other. This case was previously missed.
for (v, val) in &self.values {
let other_val = other.get(*v);
if !val.leq(&other_val) {
return false;
}
}
// Any value present only in `other` means self[v] = Top ⋢ other[v]
// (other's stored entries are non-Top), so self ⋢ other.
for (v, _) in &other.values {
if self.values.binary_search_by_key(v, |(id, _)| *id).is_err() {
return false;
}
}
true
}
}
@ -675,6 +687,36 @@ mod tests {
assert_eq!(v1.interval.hi, None); // grew → widened
}
#[test]
fn abstract_state_leq_respects_other_only_entries() {
// self = empty (every value is implicitly Top).
// other = { v1: [0,5] } (a non-Top, hence strictly-lower fact).
// Since Top ⋢ [0,5], empty ⋢ other.
let bounded = AbstractValue {
interval: IntervalFact {
lo: Some(0),
hi: Some(5),
},
string: StringFact::top(),
bits: BitFact::top(),
path: PathFact::top(),
};
let empty = AbstractState::empty();
let mut other = AbstractState::empty();
other.set(SsaValue(1), bounded);
// The bug under test: empty.leq(other) used to return true.
assert!(
!empty.leq(&other),
"empty (Top everywhere) must not be ⊑ a state with a bounded entry"
);
// Sanity: the reverse direction holds (a bounded state ⊑ all-Top).
assert!(other.leq(&empty), "a bounded state must be ⊑ all-Top");
// Reflexivity still holds.
assert!(other.leq(&other));
assert!(empty.leq(&empty));
}
#[test]
fn loop_carried_phi_join_and_widen() {
// Simulate: x = 0; loop { x = phi(0, x+1) }

View file

@ -160,6 +160,28 @@ pub(crate) fn callee_container_hint(raw: &str) -> &str {
""
}
/// Strip the optional `"<pkg>::"` package prefix from a call-graph
/// namespace, yielding the plain project-relative path.
///
/// `crate::symbol::namespace_with_package` produces package-qualified
/// namespaces of the form `format!("{pkg}::{rel}")` for any file that
/// lives inside a resolved [`crate::resolve::PackageEntry`] (e.g. any
/// repo with a named `package.json`). The package name never contains
/// `::` and project-relative file paths never contain `::`, so the
/// **first** `::` is unambiguously the package separator and everything
/// after it is the plain `normalize_namespace` form.
///
/// Used to align two keying conventions that would otherwise never
/// match: call-graph [`FuncKey::namespace`]s (package-qualified) versus
/// file paths normalised via plain `normalize_namespace`. Returns the
/// input unchanged when no `::` is present.
pub(crate) fn strip_package_prefix(ns: &str) -> &str {
match ns.split_once("::") {
Some((_pkg, rel)) => rel,
None => ns,
}
}
// Class / container → method index
/// Per-language `(container, method_name)` → candidate [`FuncKey`] index.
@ -897,12 +919,20 @@ impl FileReachMap {
/// [`FileReachMap::with_scan_root`] when callers may pass absolute
/// paths.
pub fn build(cg: &CallGraph) -> Self {
// Call-graph namespaces are package-qualified (`pkg::rel`) for any
// file inside a resolved package, but `reaches` normalises its
// arguments via plain `normalize_namespace` (`rel`). Strip the
// package prefix on both keys and caller-set entries so the stored
// form matches the lookup form; otherwise `reaches` always returns
// false for package-resident files and chain reach widening /
// surface transitive-exposure detection silently never fire.
let mut by_callee_ns: HashMap<String, std::collections::HashSet<String>> = HashMap::new();
for callee in cg.index.keys() {
let entry = by_callee_ns.entry(callee.namespace.clone()).or_default();
entry.insert(callee.namespace.clone());
let callee_ns = strip_package_prefix(&callee.namespace).to_string();
let entry = by_callee_ns.entry(callee_ns.clone()).or_default();
entry.insert(callee_ns);
for caller in callers_transitive(cg, callee) {
entry.insert(caller.namespace);
entry.insert(strip_package_prefix(&caller.namespace).to_string());
}
}
FileReachMap {
@ -941,11 +971,17 @@ impl FileReachMap {
}
fn normalize<'a>(&self, path: &'a str) -> std::borrow::Cow<'a, str> {
// Reduce both path-normalisation (absolute → project-relative) and
// package-qualification (`pkg::rel` → `rel`) so lookups match the
// package-stripped keys stored by `build`. Inputs may arrive as
// absolute host paths, plain project-relative paths, or
// package-qualified call-graph namespaces.
match self.scan_root.as_deref() {
Some(root) => {
std::borrow::Cow::Owned(crate::symbol::normalize_namespace(path, Some(root)))
let normalized = crate::symbol::normalize_namespace(path, Some(root));
std::borrow::Cow::Owned(strip_package_prefix(&normalized).to_string())
}
None => std::borrow::Cow::Borrowed(path),
None => std::borrow::Cow::Borrowed(strip_package_prefix(path)),
}
}
}
@ -993,6 +1029,26 @@ pub fn scc_spans_files(cg: &CallGraph, scc: &[NodeIndex]) -> bool {
iter.any(|n| cg.graph[*n].namespace.as_str() != first_ns)
}
/// True when an SCC requires the run_topo_batches fixed-point loop.
///
/// A multi-node SCC is mutually recursive by definition. A **single**-node
/// SCC is recursive only when the function calls itself directly: petgraph's
/// `tarjan_scc` places a directly self-recursive function (`f` calls `f`) in
/// its own singleton SCC with a self-loop edge, so `scc.len() > 1` alone
/// misses it. Self-recursion (tree walkers, retry wrappers, recursive-descent
/// parsers) is far more common than mutual recursion and needs the same
/// fixed-point treatment so its summary converges against its own refined
/// summary rather than being analysed exactly once.
pub fn scc_is_recursive(cg: &CallGraph, scc: &[NodeIndex]) -> bool {
if scc.len() > 1 {
return true;
}
match scc.first() {
Some(&n) => cg.graph.contains_edge(n, n),
None => false,
}
}
/// Map SCC topological order to an ordered sequence of file-path batches
/// annotated with whether any contributing SCC is mutually recursive
/// (`len > 1`) or cross-file.
@ -1022,14 +1078,22 @@ pub fn scc_file_batches_with_metadata<'a>(
// 2. Build file relative-path → (min topo index, has_mutual_recursion, cross_file).
// `cross_file` is set whenever the file participates in an SCC whose
// nodes span more than one namespace, the cross-file signal.
//
// Call-graph namespaces are package-qualified (`pkg::rel`) for any
// file inside a resolved package, while `rel_to_path` above keys by
// the plain `normalize_namespace` form (`rel`). Strip the package
// prefix here so both conventions agree; otherwise every
// package-resident file misses `file_topo` and lands in `orphans`,
// silently disabling the topo/SCC machinery for exactly the repos it
// was built for (any project with a named package.json).
let mut file_topo: HashMap<&str, (usize, bool, bool)> = HashMap::new();
for (topo_pos, &scc_idx) in analysis.topo_scc_callee_first.iter().enumerate() {
let scc_recursive = analysis.sccs[scc_idx].len() > 1;
let scc_recursive = scc_is_recursive(cg, &analysis.sccs[scc_idx]);
let scc_cross_file = scc_spans_files(cg, &analysis.sccs[scc_idx]);
for &node in &analysis.sccs[scc_idx] {
let ns = &cg.graph[node].namespace;
let ns = strip_package_prefix(&cg.graph[node].namespace);
file_topo
.entry(ns.as_str())
.entry(ns)
.and_modify(|(min_pos, recursive, cross_file)| {
if topo_pos < *min_pos {
*min_pos = topo_pos;
@ -1782,6 +1846,110 @@ mod tests {
}
}
// ── package-prefix namespace alignment (finding #35) ──────────────
#[test]
fn strip_package_prefix_handles_qualified_and_plain() {
// Package-qualified: prefix stripped at the first "::".
assert_eq!(strip_package_prefix("myapp::src/a.js"), "src/a.js");
assert_eq!(strip_package_prefix("@scope/name::src/a.ts"), "src/a.ts");
// Plain relative path with no package prefix is returned verbatim.
assert_eq!(strip_package_prefix("src/a.js"), "src/a.js");
assert_eq!(strip_package_prefix("a.rs"), "a.rs");
assert_eq!(strip_package_prefix(""), "");
}
/// A call graph whose namespaces are package-qualified (the normal case
/// for any repo with a named package.json) must still match the plain
/// `normalize_namespace` file paths in `all_files`, so files land in
/// topo batches rather than `orphans`.
#[test]
fn scc_file_batches_with_metadata_matches_package_qualified_ns() {
let root = Path::new("/proj");
// file_path values are NOT prefixed by root, so func_key keeps them
// verbatim → simulates package-qualified call-graph namespaces.
let a = make_summary("ping", "myapp::a.rs", "rust", 0, vec!["pong"]);
let b = make_summary("pong", "myapp::b.rs", "rust", 0, vec!["ping"]);
// all_files are absolute paths under root → normalize to "a.rs"/"b.rs".
let files: Vec<PathBuf> = vec![PathBuf::from("/proj/a.rs"), PathBuf::from("/proj/b.rs")];
let (batches, orphans) = build_metadata_batches(vec![a, b], &files, root);
assert!(
orphans.is_empty(),
"package-qualified files must not be orphaned: {orphans:?}"
);
assert_eq!(batches.len(), 1, "mutual recursion → single batch");
assert!(batches[0].has_mutual_recursion);
}
/// FileReachMap built from a package-qualified call graph must resolve
/// plain-relative `reaches` lookups.
#[test]
fn file_reach_map_matches_package_qualified_ns() {
let handle = make_summary("handle", "myapp::routes.rs", "rust", 0, vec!["sink"]);
let sink = make_summary("sink", "myapp::helper.rs", "rust", 0, vec![]);
let gs = merge_summaries(vec![handle, sink], None);
let cg = build_call_graph(&gs, &[]);
let reach = FileReachMap::build(&cg);
// Plain-relative lookups resolve even though graph keys are qualified.
assert!(reach.reaches("routes.rs", "helper.rs"));
// A package-qualified caller string also resolves (normalize strips it).
assert!(reach.reaches("myapp::routes.rs", "helper.rs"));
}
// ── self-recursion fixed-point flag (finding #39) ─────────────────
/// A directly self-recursive function forms a singleton SCC with a
/// self-loop edge. `scc_is_recursive` must flag it so run_topo_batches
/// applies the fixed-point loop, matching the mutual-recursion path.
#[test]
fn scc_is_recursive_flags_self_loop_singleton() {
let root = Path::new("/proj");
let f = make_summary("f", "/proj/a.rs", "rust", 0, vec!["f"]);
let gs = merge_summaries(vec![f], Some(&root.to_string_lossy()));
let cg = build_call_graph(&gs, &[]);
let analysis = analyse(&cg);
// The single SCC is a self-loop singleton.
assert_eq!(analysis.sccs.len(), 1);
assert!(
scc_is_recursive(&cg, &analysis.sccs[0]),
"self-recursive singleton must be flagged recursive"
);
// scc_spans_files correctly stays false for singletons.
assert!(!scc_spans_files(&cg, &analysis.sccs[0]));
}
/// A non-recursive singleton (no self-edge) must NOT be flagged.
#[test]
fn scc_is_recursive_ignores_plain_singleton() {
let root = Path::new("/proj");
let f = make_summary("f", "/proj/a.rs", "rust", 0, vec![]);
let gs = merge_summaries(vec![f], Some(&root.to_string_lossy()));
let cg = build_call_graph(&gs, &[]);
let analysis = analyse(&cg);
assert_eq!(analysis.sccs.len(), 1);
assert!(!scc_is_recursive(&cg, &analysis.sccs[0]));
}
/// The self-recursive flag propagates through to the FileBatch metadata.
#[test]
fn scc_file_batches_with_metadata_marks_self_recursive() {
let root = Path::new("/proj");
let f = make_summary("f", "/proj/a.rs", "rust", 0, vec!["f"]);
let files: Vec<PathBuf> = vec![PathBuf::from("/proj/a.rs")];
let (batches, orphans) = build_metadata_batches(vec![f], &files, root);
assert!(orphans.is_empty());
assert_eq!(batches.len(), 1);
assert!(
batches[0].has_mutual_recursion,
"self-recursive file must be flagged for the fixed-point loop"
);
}
// ── qualified disambiguation resolves ambiguous common names ──────
#[test]

View file

@ -16,6 +16,28 @@ fn lang_has_exclusive_cases(lang: &str) -> bool {
matches!(lang, "rust" | "go")
}
/// True when *this specific switch* has guaranteed-exclusive (non-fall-through)
/// cases, so it is safe to reorder the `default` arm to the cascade tail.
///
/// Rust `match` and Go `switch` are always exclusive. Java mixes shapes: the
/// arrow form (`switch_rule` cases, `case x -> ...`) is exclusive, but the
/// classic colon form (`switch_block_statement_group`, `case x:` with implicit
/// fall-through) is NOT. C/C++/JS/TS/PHP classic switches fall through and are
/// never exclusive. Reordering `default` to the tail is only correct for the
/// exclusive shapes; doing it for fall-through switches connects the wrong
/// case bodies in the source-order fall-through chain (both missed and phantom
/// taint flows).
fn switch_is_exclusive(lang: &str, cases: &[(Node<'_>, bool)]) -> bool {
if lang_has_exclusive_cases(lang) {
return true;
}
if lang == "java" {
// Arrow-switch when every case is the arrow `switch_rule` shape.
return cases.iter().all(|(c, _)| c.kind() == "switch_rule");
}
false
}
/// Extract the scrutinee subtree from a switch-like AST node.
///
/// Returns the AST node referenced by the language's scrutinee field. Only
@ -591,12 +613,21 @@ pub(super) fn build_switch<'a>(
return exits;
}
// Whether this switch's cases are mutually exclusive (no fall-through).
// Only exclusive switches may have `default` reordered to the cascade tail.
let is_exclusive = switch_is_exclusive(lang, &cases);
// Reorder so the default arm (if any) sits at the tail of the cascade.
// Reordering case dispatch is semantically harmless (mutually exclusive
// pattern matches), and it keeps the chain a clean Branch(True→case,
// False→next). Fall-through chains are a separate Seq layer below.
// Reordering case dispatch is semantically harmless ONLY for mutually
// exclusive pattern matches (Rust match, Go switch, Java arrow-switch); it
// keeps the chain a clean Branch(True→case, False→next). For classic
// fall-through switches (C/C++/JS/TS/PHP, Java colon-switch) a mid-chain
// `default:` can fall into the following case and a preceding case can fall
// into it, so the source order MUST be preserved — reordering there breaks
// the fall-through Seq layer and produces both missed and phantom flows.
let default_pos = cases.iter().position(|(_, d)| *d);
if let Some(pos) = default_pos
if is_exclusive
&& let Some(pos) = default_pos
&& pos != cases.len() - 1
{
let default_pair = cases.remove(pos);
@ -648,22 +679,40 @@ pub(super) fn build_switch<'a>(
let mut fallthrough_exits: Vec<NodeIndex> = Vec::new();
let mut last_header_false: Option<NodeIndex> = None;
let mut chain_preds: Vec<NodeIndex> = preds.to_vec();
// First node of the `default` body for a fall-through switch where the
// default is NOT at the tail. The cumulative no-match path (the last
// non-default header's False edge) is wired into it after the loop, so the
// default stays in its source position for the fall-through Seq layer while
// still being reachable when no case matches.
let mut pending_default_no_match: Option<NodeIndex> = None;
for (idx, (case, is_default)) in cases.iter().copied().enumerate() {
let is_last = idx + 1 == cases.len();
// A `default` arm carries no discriminant test, so it never gets its
// own dispatch If. For exclusive switches it has been reordered to the
// tail (`is_last`); for fall-through switches it stays in source
// position (`!is_exclusive`) and is wired into the Seq fall-through
// chain instead of acting as a conditional branch.
let default_no_dispatch = is_default && (is_last || !is_exclusive);
// Default at the chain tail doesn't get its own dispatch If, the
// previous header's False edge already targets it directly.
let case_first_preds: Vec<NodeIndex> = if is_default && is_last {
// First node of the default body becomes the False target of the
// previous header. Build the case with the previous chain_preds
// (the last header's "fall-through" branch) plus any fallthrough
// from the preceding case.
let mut p = chain_preds.clone();
p.append(&mut fallthrough_exits);
// `last_header_false` will receive a False edge once we know the
// first node of this body.
last_header_false = chain_preds.first().copied();
let case_first_preds: Vec<NodeIndex> = if default_no_dispatch {
// Body entry = fall-through from the preceding case body.
let mut p = std::mem::take(&mut fallthrough_exits);
if is_last {
// Tail default: the previous header's False branch also lands
// here directly (legacy behavior preserved for exclusive
// switches and tail defaults).
p.extend(chain_preds.iter().copied());
last_header_false = chain_preds.first().copied();
}
// For a non-tail (fall-through) default the dispatch chain must
// continue PAST it, so `chain_preds` / `last_header_false` are left
// untouched and the next case's dispatch header still receives the
// previous header's False edge. The cumulative no-match entry is
// recorded below once the body's first node is known.
p
} else {
// Normal case: synthesize a per-case dispatch header. We tie it
@ -741,12 +790,21 @@ pub(super) fn build_switch<'a>(
// Wire the dispatch True edge from this header (or from the previous
// header for a tail-default) to the first node of the case body.
if body_first_idx.index() < g.node_count() {
let header_for_true = if is_default && is_last {
// The previous header's False already lands here via the
// EdgeKind::Seq inside `case_first_preds`; we additionally
// emit a False edge directly so SSA labels the branch.
if let Some(prev) = last_header_false {
g.add_edge(prev, body_first_idx, EdgeKind::False);
let header_for_true = if default_no_dispatch {
if is_last {
// Tail default: the previous header's False already lands
// here via the EdgeKind::Seq inside `case_first_preds`; we
// additionally emit a False edge directly so SSA labels the
// branch.
if let Some(prev) = last_header_false {
g.add_edge(prev, body_first_idx, EdgeKind::False);
}
} else {
// Non-tail fall-through default: defer wiring the no-match
// entry until the last non-default header's False edge is
// known (after the loop). The body's only in-edge for now
// is the source-order fall-through from the preceding case.
pending_default_no_match = Some(body_first_idx);
}
None
} else {
@ -762,11 +820,22 @@ pub(super) fn build_switch<'a>(
let _ = is_default;
}
// After the chain: the last non-default header (if no default arm) needs
// a False edge that escapes to the post-switch frontier.
// Resolve the cumulative no-match (the last non-default header's False
// edge):
// - If the `default` arm sits mid-chain (fall-through switch), the
// no-match path enters the default body — wire the deferred False edge
// into it. The default stayed in source position for the fall-through
// Seq layer, so this is the only edge making it reachable on no-match.
// - Otherwise, with no reachable default (no default arm, or it was the
// tail and already consumed the False edge), the no-match path escapes
// to the post-switch frontier.
let mut exits: Vec<NodeIndex> = switch_breaks;
exits.append(&mut fallthrough_exits);
if !has_default {
if let Some(default_first) = pending_default_no_match {
if let Some(prev) = last_header_false {
g.add_edge(prev, default_first, EdgeKind::False);
}
} else if !has_default {
if let Some(prev) = last_header_false {
exits.push(prev);
}

View file

@ -64,13 +64,26 @@ pub(super) fn get_boolean_operands<'a>(node: Node<'a>) -> Option<(Node<'a>, Node
None
}
/// Create a lightweight `StmtKind::If` node for a sub-condition in a boolean chain.
/// Create a `StmtKind::If` node for a sub-condition in a boolean chain.
///
/// When the operand contains a classifiable call (`if (flag && cp.execSync(x))`),
/// the node is built via [`push_node`] so the inner source/sink/sanitizer
/// classification — callee, labels, arg-uses, gated sinks — is preserved, then
/// the branch-condition metadata is overlaid on top. Without this, a sink or
/// source CALL inside a short-circuited operand was dropped entirely (the bare
/// node only carried `condition_vars`, no callee or labels), so the
/// `if (flag && sink(x))` form missed flows that the un-decomposed
/// `if (sink(x))` form catches via the mod.rs condition-call fallback. This
/// mirrors `lower_ternary_branch`, which already uses `push_node` for the same
/// reason. Non-call operands keep the original lightweight shape.
pub(super) fn push_condition_node<'a>(
g: &mut Cfg,
cond_ast: Node<'a>,
lang: &str,
code: &'a [u8],
enclosing_func: Option<&str>,
call_ordinal: &mut u32,
analysis_rules: Option<&LangAnalysisRules>,
) -> NodeIndex {
// Pass cond_ast as both args, sub-conditions are never `unless` nodes
let (inner, negated) = detect_negation(cond_ast, cond_ast, lang);
@ -94,6 +107,44 @@ pub(super) fn push_condition_node<'a>(
// because the per-disjunct cond nodes (built via
// `build_condition_chain`) didn't populate `taint.uses`.
let uses_for_taint: Vec<String> = vars.clone();
// Operand containing a call → route through `push_node` for full
// source/sink/sanitizer classification, then overlay branch metadata.
if has_call_descendant(inner, lang) {
let ord = *call_ordinal;
*call_ordinal += 1;
let node = push_node(
g,
StmtKind::If,
inner,
lang,
code,
enclosing_func,
ord,
analysis_rules,
);
// Overlay the branch-condition metadata that the chain wiring depends
// on. `push_node` set `defines`/`uses`/`labels`/`callee` from the call;
// keep those, but ensure the condition fields and the interned
// condition vars (for `apply_branch_predicates`) are present, and the
// span/enclosing_func point at the operand expression.
let info = &mut g[node];
info.kind = StmtKind::If;
info.ast.span = span;
info.ast.enclosing_func = enclosing_func.map(|s| s.to_string());
info.condition_text = text;
info.condition_vars = vars;
info.condition_negated = negated;
// Union the condition idents into `taint.uses` so branch-predicate
// lookup keeps working without clobbering the call's own arg uses.
for v in &uses_for_taint {
if !info.taint.uses.contains(v) {
info.taint.uses.push(v.clone());
}
}
return node;
}
g.add_node(NodeInfo {
kind: StmtKind::If,
ast: AstMeta {
@ -221,7 +272,15 @@ pub(super) fn build_ternary_diamond<'a>(
// 1. Condition header. `push_condition_node` sets span/text/vars/negated
// but leaves `is_eq_with_const` default; stamp it explicitly so the
// taint engine's equality-narrowing fires for `x === 'literal' ? …`.
let cond_if = push_condition_node(g, cond_ast, lang, code, enclosing_func);
let cond_if = push_condition_node(
g,
cond_ast,
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
g[cond_if].is_eq_with_const = detect_eq_with_const(cond_ast, lang);
// Capture the pure int-arith + comparison tree so `fold_constant_branches`
// can prune a dead constant-condition arm of the ternary (e.g. Java
@ -488,6 +547,7 @@ pub(super) fn classify_ternary_lhs(
///
/// Returns `(true_exits, false_exits)`, the sets of nodes from which True/False
/// edges should connect to the then/else branches.
#[allow(clippy::too_many_arguments)]
pub(super) fn build_condition_chain<'a>(
cond_ast: Node<'a>,
preds: &[NodeIndex],
@ -496,6 +556,8 @@ pub(super) fn build_condition_chain<'a>(
lang: &str,
code: &'a [u8],
enclosing_func: Option<&str>,
call_ordinal: &mut u32,
analysis_rules: Option<&LangAnalysisRules>,
) -> (Vec<NodeIndex>, Vec<NodeIndex>) {
let inner = unwrap_parens(cond_ast);
@ -503,8 +565,17 @@ pub(super) fn build_condition_chain<'a>(
Some(BoolOp::And) => {
if let Some((left, right)) = get_boolean_operands(inner) {
// Left operand with current preds
let (left_true, left_false) =
build_condition_chain(left, preds, pred_edge, g, lang, code, enclosing_func);
let (left_true, left_false) = build_condition_chain(
left,
preds,
pred_edge,
g,
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
// Right operand only evaluated when left is true
let (right_true, right_false) = build_condition_chain(
right,
@ -514,6 +585,8 @@ pub(super) fn build_condition_chain<'a>(
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
// AND: true only when both true; false when either false
let mut false_exits = left_false;
@ -521,7 +594,15 @@ pub(super) fn build_condition_chain<'a>(
(right_true, false_exits)
} else {
// Safety fallback: treat as leaf
let node = push_condition_node(g, inner, lang, code, enclosing_func);
let node = push_condition_node(
g,
inner,
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
connect_all(g, preds, node, pred_edge);
(vec![node], vec![node])
}
@ -529,8 +610,17 @@ pub(super) fn build_condition_chain<'a>(
Some(BoolOp::Or) => {
if let Some((left, right)) = get_boolean_operands(inner) {
// Left operand with current preds
let (left_true, left_false) =
build_condition_chain(left, preds, pred_edge, g, lang, code, enclosing_func);
let (left_true, left_false) = build_condition_chain(
left,
preds,
pred_edge,
g,
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
// Right operand only evaluated when left is false
let (right_true, right_false) = build_condition_chain(
right,
@ -540,6 +630,8 @@ pub(super) fn build_condition_chain<'a>(
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
// OR: true when either true; false only when both false
let mut true_exits = left_true;
@ -547,14 +639,30 @@ pub(super) fn build_condition_chain<'a>(
(true_exits, right_false)
} else {
// Safety fallback: treat as leaf
let node = push_condition_node(g, inner, lang, code, enclosing_func);
let node = push_condition_node(
g,
inner,
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
connect_all(g, preds, node, pred_edge);
(vec![node], vec![node])
}
}
None => {
// Leaf: single condition node
let node = push_condition_node(g, inner, lang, code, enclosing_func);
let node = push_condition_node(
g,
inner,
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
connect_all(g, preds, node, pred_edge);
(vec![node], vec![node])
}

View file

@ -3541,9 +3541,17 @@ pub(super) fn push_node<'a>(
None
};
// Extract condition metadata for If nodes.
// Extract condition metadata for If nodes. Python `elif_clause` and PHP
// `else_if_clause` are lowered as guard nodes by `build_alternative_chain`
// (the flat-sibling elif-chain handler) and carry their own `condition`
// field, so they must also receive condition-metadata extraction even
// though their grammar kind maps to `Kind::Block`. These two node kinds
// only ever reach `push_node` from that handler, so widening the guard
// here cannot affect ordinary block lowering.
let (condition_text, condition_vars, condition_negated, cond_arith) =
if matches!(lookup(lang, ast.kind()), Kind::If) {
if matches!(lookup(lang, ast.kind()), Kind::If)
|| matches!(ast.kind(), "elif_clause" | "else_if_clause")
{
extract_condition_raw(ast, lang, code)
} else {
(None, Vec::new(), false, None)
@ -5074,6 +5082,230 @@ fn apply_arg_source_bindings(
}
}
/// Lower a flat chain of `alternative` siblings (Python `elif_clause`s / PHP
/// `else_if_clause`s, optionally trailed by an `else_clause`) into a properly
/// nested else-if CFG.
///
/// tree-sitter exposes these clauses as repeated, FLAT `alternative` fields on
/// a single `if_statement` rather than nesting each `else if` inside the
/// previous `else` (as JS/TS/Rust/Go/Java/C do). The default single-`else`
/// lowering only consumed the first sibling, silently dropping every 2nd+ elif
/// and the trailing else — so any source/sink/sanitizer/guard living there was
/// invisible to the whole pipeline. This builder restores the missing CFG
/// nodes by chaining each elif as a guard whose False edge flows into the next
/// alternative.
///
/// `incoming_edge` is the edge label used to enter the chain (the parent `if`'s
/// false edge — `EdgeKind::False` normally, `EdgeKind::True` for Ruby
/// `unless`). Returns the union of all branch exits plus the final
/// fall-through (when the chain has no terminal `else`).
#[allow(clippy::too_many_arguments)]
pub(super) fn build_alternative_chain<'a>(
alternatives: &[Node<'a>],
preds: &[NodeIndex],
incoming_edge: EdgeKind,
g: &mut Cfg,
lang: &str,
code: &'a [u8],
summaries: &mut FuncSummaries,
file_path: &str,
enclosing_func: Option<&str>,
call_ordinal: &mut u32,
analysis_rules: Option<&LangAnalysisRules>,
break_targets: &mut Vec<NodeIndex>,
continue_targets: &mut Vec<NodeIndex>,
throw_targets: &mut Vec<NodeIndex>,
bodies: &mut Vec<BodyCfg>,
next_body_id: &mut u32,
current_body_id: BodyId,
) -> Vec<NodeIndex> {
// Predecessor frontier entering the current alternative, and the edge label
// to use when wiring into it. Updated as we descend the chain: each elif's
// False edge becomes the predecessor/edge for the next alternative.
let mut chain_preds: Vec<NodeIndex> = preds.to_vec();
let mut chain_edge = incoming_edge;
let mut exits: Vec<NodeIndex> = Vec::new();
for &alt in alternatives {
let is_elif = matches!(alt.kind(), "elif_clause" | "else_if_clause");
if is_elif {
// Guard node for the elif condition. `push_node` with `StmtKind::If`
// extracts condition metadata and runs label classification on the
// clause (its `condition` field), so a source/sink call inside an
// elif condition is no longer dropped.
let guard = push_node(
g,
StmtKind::If,
alt,
lang,
code,
enclosing_func,
0,
analysis_rules,
);
connect_all(g, &chain_preds, guard, chain_edge);
// True branch: the elif body (`consequence` for Python, `body` for
// PHP / colon-block forms).
let body = alt
.child_by_field_name("consequence")
.or_else(|| alt.child_by_field_name("body"));
if let Some(b) = body {
let body_first = NodeIndex::new(g.node_count());
let body_exits = build_sub(
b,
&[guard],
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
);
if body_first.index() < g.node_count() {
connect_all(g, &[guard], body_first, EdgeKind::True);
exits.extend(body_exits);
} else if let Some(&first) = body_exits.first() {
connect_all(g, &[guard], first, EdgeKind::True);
exits.extend(body_exits);
} else {
// Empty body: the guard's True edge falls through.
exits.push(guard);
}
} else {
exits.push(guard);
}
// False branch descends to the next alternative.
chain_preds = vec![guard];
chain_edge = EdgeKind::False;
} else {
// Terminal `else_clause` (or any non-guard block): lower its body and
// end the chain. The else body field is `body` for both Python and
// PHP `else_clause`; fall back to the clause node itself.
let body = alt.child_by_field_name("body").unwrap_or(alt);
let body_first = NodeIndex::new(g.node_count());
let body_exits = build_sub(
body,
&chain_preds,
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
);
if body_first.index() < g.node_count() {
connect_all(g, &chain_preds, body_first, chain_edge);
exits.extend(body_exits);
} else if let Some(&first) = body_exits.first() {
connect_all(g, &chain_preds, first, chain_edge);
exits.extend(body_exits);
} else {
exits.extend(chain_preds.iter().copied());
}
// An else clause terminates the chain; nothing follows it.
return exits;
}
}
// Chain ended with an elif (no terminal else): the last guard's False edge
// is the fall-through path out of the whole if/elif construct. Materialise
// it as a synthetic pass-through so the false edge has a concrete target,
// mirroring the no-else branch in the `Kind::If` arm.
let pass = g.add_node(NodeInfo {
kind: StmtKind::Seq,
ast: AstMeta {
span: chain_preds
.first()
.map(|&n| g[n].ast.span)
.unwrap_or((0, 0)),
enclosing_func: enclosing_func.map(|s| s.to_string()),
},
..Default::default()
});
connect_all(g, &chain_preds, pass, chain_edge);
exits.push(pass);
exits
}
/// Lower a C-style `for` loop's increment/update clause onto the back-edge
/// path. `back_sources` are the body/continue exits that would otherwise
/// back-edge straight to the loop header. When an `update_subtree` exists it
/// is lowered from those sources and its exits are returned, so callers
/// back-edge the update's exits to the header instead — making increment-clause
/// side effects (assignments, sanitizer calls) visible to taint analysis.
/// Without an update clause the input sources are returned unchanged, so the
/// CFG is bit-identical to the pre-fix behaviour.
#[allow(clippy::too_many_arguments)]
pub(super) fn lower_loop_update<'a>(
update_subtree: Option<Node<'a>>,
back_sources: &[NodeIndex],
g: &mut Cfg,
lang: &str,
code: &'a [u8],
summaries: &mut FuncSummaries,
file_path: &str,
enclosing_func: Option<&str>,
call_ordinal: &mut u32,
analysis_rules: Option<&LangAnalysisRules>,
break_targets: &mut Vec<NodeIndex>,
continue_targets: &mut Vec<NodeIndex>,
throw_targets: &mut Vec<NodeIndex>,
bodies: &mut Vec<BodyCfg>,
next_body_id: &mut u32,
current_body_id: BodyId,
) -> Vec<NodeIndex> {
let Some(update) = update_subtree else {
return back_sources.to_vec();
};
if back_sources.is_empty() {
return Vec::new();
}
let update_exits = build_sub(
update,
back_sources,
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
);
// If the update produced no CFG nodes (e.g. it was pure trivia), preserve
// the original back-edge sources so the loop still closes.
if update_exits.is_empty() {
back_sources.to_vec()
} else {
update_exits
}
}
// The recursive *workhorse* that converts an AST node into a CFG slice.
// Returns the set of *exit* nodes that need to be wired further.
#[allow(clippy::too_many_arguments)]
@ -5183,6 +5415,8 @@ pub(super) fn build_sub<'a>(
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
)
} else {
// Single-node path (original behavior)
@ -5214,14 +5448,27 @@ pub(super) fn build_sub<'a>(
// Locate then & else blocks using field-based lookup first,
// then positional fallback (Rust uses positional blocks).
let (then_block, else_block) = {
//
// `alternatives` collects *every* `alternative` field, not just the
// first. In tree-sitter-python and tree-sitter-php an
// `if/elif/.../else` (Python) or `if/elseif/.../else` (PHP) chain
// produces several FLAT sibling `alternative` fields on one
// `if_statement` (a list of `elif_clause`/`else_if_clause` nodes
// optionally trailed by an `else_clause`). JS/TS/Rust/Go/Java/C
// nest their `else if`, so they expose at most one `alternative`
// (the nested if) and the list has length ≤ 1.
let (then_block, else_block, alternatives) = {
let field_then = ast
.child_by_field_name("consequence")
.or_else(|| ast.child_by_field_name("body"));
let field_else = ast.child_by_field_name("alternative");
let mut alt_cursor = ast.walk();
let alternatives: Vec<Node> = ast
.children_by_field_name("alternative", &mut alt_cursor)
.collect();
let field_else = alternatives.first().copied();
if field_then.is_some() || field_else.is_some() {
(field_then, field_else)
(field_then, field_else, alternatives)
} else {
// Fallback: positional block children (Rust `if_expression`)
let mut cursor = ast.walk();
@ -5229,10 +5476,22 @@ pub(super) fn build_sub<'a>(
.children(&mut cursor)
.filter(|n| lookup(lang, n.kind()) == Kind::Block)
.collect();
(blocks.first().copied(), blocks.get(1).copied())
(blocks.first().copied(), blocks.get(1).copied(), Vec::new())
}
};
// A flat elif/elseif chain has 2+ `alternative` siblings, or a
// single `alternative` that is itself an elif/else-if clause (the
// `if a: .. elif b: ..` form with no trailing `else`, where the lone
// alternative is an `elif_clause` rather than a nested if). In both
// cases the default single-`else_block` lowering below would either
// drop later siblings entirely or fail to treat the elif condition
// as a branch guard, so route through the dedicated chain builder.
let is_flat_elif_chain = alternatives.len() > 1
|| alternatives
.first()
.is_some_and(|a| matches!(a.kind(), "elif_clause" | "else_if_clause"));
// THEN branch
let then_first_node = NodeIndex::new(g.node_count());
let then_exits = if let Some(b) = then_block {
@ -5267,7 +5526,30 @@ pub(super) fn build_sub<'a>(
// ELSE branch
let else_first_node = NodeIndex::new(g.node_count());
let else_exits = if let Some(b) = else_block {
let else_exits = if is_flat_elif_chain {
// Flat elif/elseif chain: lower every alternative sibling as a
// proper nested else-if so 2nd+ elif and trailing else clauses
// are no longer dropped from the CFG.
build_alternative_chain(
&alternatives,
else_preds,
else_edge,
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
)
} else if let Some(b) = else_block {
let exits = build_sub(
b,
else_preds,
@ -5380,6 +5662,61 @@ pub(super) fn build_sub<'a>(
// WHILE / FOR: classic loop with a back edge.
Kind::While | Kind::For => {
// C-style `for (init; cond; incr) body` loops (C/C++, JS/TS, Go
// three-clause, PHP) attach `initializer`/`update` (or `increment`)
// subtrees that the previous loop lowering ignored entirely — so a
// taint source bound in the init (`for (cmd = getenv("X"); …)`) or a
// side effect in the increment had no CFG node at all and was
// invisible to taint analysis. Tree-sitter exposes these either as
// direct fields on the loop node (C/C++/JS/TS/PHP) or nested under a
// `for_clause` child (Go's three-clause form). The init runs once
// before the header; its exits become the header's predecessors so
// its defs flow into both the condition and the body.
let clause_owner = ast
.child_by_field_name("body")
.is_none()
.then(|| {
let mut c = ast.walk();
ast.children(&mut c).find(|n| n.kind() == "for_clause")
})
.flatten();
let init_subtree = ast
.child_by_field_name("initializer")
.or_else(|| ast.child_by_field_name("initialize"))
.or_else(|| clause_owner.and_then(|fc| fc.child_by_field_name("initializer")));
let update_subtree = ast
.child_by_field_name("update")
.or_else(|| ast.child_by_field_name("increment"))
.or_else(|| clause_owner.and_then(|fc| fc.child_by_field_name("update")));
// Lower the initializer (if any) from `preds`; its exits become the
// header's predecessors. Empty / absent inits leave `preds`
// bit-identical to the pre-fix behaviour.
let init_exits_owned = init_subtree.map(|init| {
build_sub(
init,
preds,
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
)
});
let header_preds: &[NodeIndex] = match &init_exits_owned {
Some(exits) if !exits.is_empty() => exits.as_slice(),
_ => preds,
};
let header = push_node(
g,
StmtKind::Loop,
@ -5390,7 +5727,7 @@ pub(super) fn build_sub<'a>(
0,
analysis_rules,
);
connect_all(g, preds, header, EdgeKind::Seq);
connect_all(g, header_preds, header, EdgeKind::Seq);
// Check for short-circuit condition
let cond_subtree = ast.child_by_field_name("condition");
@ -5445,6 +5782,8 @@ pub(super) fn build_sub<'a>(
lang,
code,
enclosing_func,
call_ordinal,
analysis_rules,
);
// Wire body from true_exits
@ -5472,13 +5811,33 @@ pub(super) fn build_sub<'a>(
connect_all(g, &true_exits, body_first, EdgeKind::True);
}
// The increment runs at the end of each iteration before the
// condition is re-checked, so it sits on the back-edge path
// between the body/continue exits and the header.
let mut back_sources: Vec<NodeIndex> = body_exits;
back_sources.extend(loop_continues.iter().copied());
let back_sources = lower_loop_update(
update_subtree,
&back_sources,
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
);
// Back-edges go to header (not into the condition chain)
for &e in &body_exits {
for &e in &back_sources {
connect_all(g, &[e], header, EdgeKind::Back);
}
for &c in &loop_continues {
connect_all(g, &[c], header, EdgeKind::Back);
}
// Loop exits = false_exits + breaks
let mut exits: Vec<NodeIndex> = false_exits;
@ -5504,14 +5863,32 @@ pub(super) fn build_sub<'a>(
current_body_id,
);
// The increment runs on the back-edge path (end of each
// iteration, before the next condition check).
let mut back_sources: Vec<NodeIndex> = body_exits;
back_sources.extend(loop_continues.iter().copied());
let back_sources = lower_loop_update(
update_subtree,
&back_sources,
g,
lang,
code,
summaries,
file_path,
enclosing_func,
call_ordinal,
analysis_rules,
break_targets,
continue_targets,
throw_targets,
bodies,
next_body_id,
current_body_id,
);
// Backedge for every linear exit → header.
for &e in &body_exits {
for &e in &back_sources {
connect_all(g, &[e], header, EdgeKind::Back);
}
// Wire continue targets as back edges to header
for &c in &loop_continues {
connect_all(g, &[c], header, EdgeKind::Back);
}
// Falling out of the loop = headers false branch +
// any break targets that exit the loop.
let mut exits = vec![header];

View file

@ -108,9 +108,20 @@ fn branch_terminates(cfg: &crate::cfg::Cfg, if_node: NodeIndex) -> bool {
return false;
}
// Check if any path through the true branch terminates
// The join point of the if statement is its immediate post-dominator:
// the first node every branch reconverges on. The true-branch walk
// must stop there — reaching the join means the body fell through
// *past* the if without terminating, so any `return` in the function
// tail past the join must NOT count as the error branch terminating.
// Without this bound, a trailing `return nil` (present in essentially
// every Go `func(...) error`) makes the walk report "all paths
// terminate" and silently suppresses the rule.
let join = super::dominators::compute_post_dominators(cfg)
.and_then(|pd| pd.immediate_dominator(if_node));
// Check if any path through the true branch terminates before the join.
for &start in &true_successors {
if terminates_on_all_paths(cfg, start, if_node) {
if terminates_on_all_paths(cfg, start, join) {
return true;
}
}
@ -206,11 +217,18 @@ fn call_never_returns(info: &crate::cfg::NodeInfo) -> bool {
}
/// Check if all paths from `node` reach a Return/Break/Continue (or a
/// known never-returning call) before exiting scope.
/// known never-returning call) before reaching the if's join point.
///
/// `join` is the if statement's immediate post-dominator (`None` if it
/// could not be computed — e.g. no Exit node). The walk stops at the
/// join: reaching it means the true branch fell through past the if
/// without terminating, so that path does NOT terminate and the rule
/// should fire. This prevents a `return` in the function tail (after the
/// join) from being mis-attributed to the error branch.
fn terminates_on_all_paths(
cfg: &crate::cfg::Cfg,
node: NodeIndex,
_scope_entry: NodeIndex,
join: Option<NodeIndex>,
) -> bool {
use std::collections::HashSet;
@ -222,6 +240,12 @@ fn terminates_on_all_paths(
continue;
}
// Reaching the if's join point means this path fell through past
// the if without terminating inside the branch.
if join == Some(current) {
return false;
}
let info = &cfg[current];
match info.kind {
StmtKind::Return | StmtKind::Throw | StmtKind::Break | StmtKind::Continue => {
@ -460,6 +484,91 @@ mod err_ident_tests {
}
}
#[cfg(test)]
mod join_boundary_tests {
use super::branch_terminates;
use crate::cfg::{CallMeta, Cfg, EdgeKind, NodeInfo, StmtKind};
use petgraph::graph::NodeIndex;
fn node(kind: StmtKind) -> NodeInfo {
NodeInfo {
kind,
..Default::default()
}
}
fn call_node(callee: &str) -> NodeInfo {
NodeInfo {
kind: StmtKind::Call,
call: CallMeta {
callee: Some(callee.to_string()),
..Default::default()
},
..Default::default()
}
}
/// Build the canonical `if err != nil { <body> } <tail>; return`
/// shape and return (cfg, if_node). `body_terminates` selects whether
/// the true branch body itself returns (terminates) or falls through
/// to the join.
fn build_if_cfg(body_terminates: bool) -> (Cfg, NodeIndex) {
let mut cfg = Cfg::new();
let entry = cfg.add_node(node(StmtKind::Entry));
let if_n = cfg.add_node(node(StmtKind::If));
// true-branch body
let body = if body_terminates {
cfg.add_node(node(StmtKind::Return))
} else {
cfg.add_node(call_node("log"))
};
// join point where both branches reconverge: a downstream use
let join = cfg.add_node(call_node("use"));
// function tail: an explicit `return nil` (present in every Go
// value-returning function) followed by exit.
let ret = cfg.add_node(node(StmtKind::Return));
let exit = cfg.add_node(node(StmtKind::Exit));
cfg.add_edge(entry, if_n, EdgeKind::Seq);
cfg.add_edge(if_n, body, EdgeKind::True);
cfg.add_edge(if_n, join, EdgeKind::False);
if !body_terminates {
cfg.add_edge(body, join, EdgeKind::Seq);
} else {
cfg.add_edge(body, exit, EdgeKind::Seq);
}
cfg.add_edge(join, ret, EdgeKind::Seq);
cfg.add_edge(ret, exit, EdgeKind::Seq);
(cfg, if_n)
}
#[test]
fn fallthrough_body_does_not_terminate_despite_trailing_return() {
// True branch falls through to the join; the function tail has a
// `return nil`. Before the join-boundary fix the walk reached
// that trailing return and reported "terminates", suppressing the
// rule. The fix bounds the walk at the join, so this is correctly
// reported as NOT terminating.
let (cfg, if_n) = build_if_cfg(false);
assert!(
!branch_terminates(&cfg, if_n),
"fall-through error branch must not count as terminating"
);
}
#[test]
fn returning_body_terminates() {
// True branch returns directly: the error is handled, so the rule
// must stay suppressed.
let (cfg, if_n) = build_if_cfg(true);
assert!(
branch_terminates(&cfg, if_n),
"error branch with an explicit return must count as terminating"
);
}
}
#[cfg(test)]
mod terminator_call_tests {
use super::call_never_returns;

View file

@ -131,7 +131,7 @@ fn ssa_all_sink_operands_constant(
};
let operand_const = |v: SsaValue| -> bool {
ssa_operand_constant(v, facts, callee_desc, callee_parts, outer_parts)
ssa_operand_constant(v, facts, ctx.cfg, callee_desc, callee_parts, outer_parts)
};
let args_ok = args
.iter()
@ -401,6 +401,7 @@ fn ssa_operand_const_or_param(
fn ssa_operand_constant(
root: SsaValue,
facts: &BodyConstFacts,
cfg: &crate::cfg::Cfg,
callee_desc: &str,
callee_parts: &[&str],
outer_parts: &[&str],
@ -426,6 +427,26 @@ fn ssa_operand_constant(
let Some(inst) = find_inst(&facts.ssa, v) else {
return false;
};
// CFG-node-level Source label: a `SsaOp::Call` can be the lowering
// of a Source-labeled CFG node (`file_get_contents`, `env::var`,
// `readline`, …). Such a call's result is tainted user input, not a
// constant, even when its own arguments are constant (or it is a
// zero-arg source). Mirror the refusal in the sibling
// `ssa_operand_const_or_param` so a source-fed sink is never proven
// "all args constant" and silently dropped.
let cfg_node = inst.cfg_node;
if cfg
.node_weight(cfg_node)
.map(|info| {
info.taint
.labels
.iter()
.any(|l| matches!(l, DataLabel::Source(_)))
})
.unwrap_or(false)
{
return false;
}
match &inst.op {
SsaOp::Const(_) => {}
SsaOp::Assign(vals) => stack.extend(vals.iter().copied()),
@ -2190,6 +2211,32 @@ fn cond_indirect_validator_callee(
crate::ssa::type_facts::classify_input_validator_callee(callee).map(|_| callee.to_string())
}
/// Match a guard suffix matcher against a callee, requiring the suffix to
/// begin on a *leaf-name boundary* rather than mid-identifier.
///
/// A bare `callee_lower.ends_with(suffix)` over-matches: `invalidate` ends
/// with `validate` (the `Cap::all()` guard) and `unquote` ends with `quote`
/// (the SHELL_ESCAPE guard), so cache-invalidation and URL-decoding calls
/// would register as dominating guards and silently suppress every
/// downstream `cfg-unguarded-sink` in the function. Require that the
/// character preceding the matched suffix is a name separator (`.`, `_`, or
/// `:` from `::`) or that the suffix sits at the start of the callee. This
/// mirrors the existing prefix-anchor convention (a trailing `_` on the
/// matcher anchors at the start).
pub(super) fn suffix_matches_at_leaf_boundary(callee_lower: &str, suffix: &str) -> bool {
if !callee_lower.ends_with(suffix) {
return false;
}
let prefix_len = callee_lower.len() - suffix.len();
if prefix_len == 0 {
// Suffix is the whole callee (or its leaf with nothing before it).
return true;
}
// The byte immediately before the suffix must be a leaf-name separator.
let prev = callee_lower.as_bytes()[prefix_len - 1];
matches!(prev, b'.' | b'_' | b':')
}
/// Find all nodes in the CFG that are calls to guard functions.
fn find_guard_nodes(ctx: &AnalysisContext) -> Vec<(NodeIndex, Cap)> {
let guard_rules = rules::guard_rules(ctx.lang);
@ -2300,7 +2347,7 @@ fn find_guard_nodes(ctx: &AnalysisContext) -> Vec<(NodeIndex, Cap)> {
if ml.ends_with('_') {
callee_lower.starts_with(&ml)
} else {
callee_lower.ends_with(&ml)
suffix_matches_at_leaf_boundary(&callee_lower, &ml)
}
});
if matched {
@ -3187,3 +3234,49 @@ mod chain_fragments_tests {
assert!(!got.contains(&"raw".to_string()));
}
}
#[cfg(test)]
mod guard_suffix_boundary_tests {
use super::suffix_matches_at_leaf_boundary;
#[test]
fn rejects_mid_identifier_suffixes() {
// The whole point of the fix: these must NOT register as guards.
assert!(!suffix_matches_at_leaf_boundary("invalidate", "validate"));
assert!(!suffix_matches_at_leaf_boundary(
"cache.invalidate",
"validate"
));
assert!(!suffix_matches_at_leaf_boundary("unquote", "quote"));
assert!(!suffix_matches_at_leaf_boundary(
"urllib.parse.unquote",
"quote"
));
}
#[test]
fn accepts_leaf_boundary_suffixes() {
// Suffix at a real leaf-name boundary stays a valid guard match.
assert!(suffix_matches_at_leaf_boundary("validate", "validate"));
assert!(suffix_matches_at_leaf_boundary("shlex.quote", "quote"));
assert!(suffix_matches_at_leaf_boundary(
"urllib.parse.quote",
"quote"
));
// `_` and `::` separators also count as leaf boundaries.
assert!(suffix_matches_at_leaf_boundary("my_validate", "validate"));
assert!(suffix_matches_at_leaf_boundary(
"std::shell_escape",
"shell_escape"
));
}
#[test]
fn rejects_non_suffix() {
assert!(!suffix_matches_at_leaf_boundary(
"validate_input",
"validate"
));
assert!(!suffix_matches_at_leaf_boundary("os.system", "quote"));
}
}

View file

@ -288,7 +288,11 @@ pub(crate) fn is_guard_call(
if callee_lower.starts_with(&ml) {
return true;
}
} else if callee_lower.ends_with(&ml) {
} else if guards::suffix_matches_at_leaf_boundary(&callee_lower, &ml) {
// Leaf-boundary match, not a bare `ends_with`: otherwise
// `invalidate` matches the `validate` guard and `unquote`
// matches `quote`, registering cache-invalidation /
// URL-decode calls as guards and suppressing real sinks.
return true;
}
}
@ -305,7 +309,7 @@ pub(crate) fn is_guard_call(
if callee_lower.starts_with(&ml) {
return true;
}
} else if callee_lower.ends_with(&ml) {
} else if guards::suffix_matches_at_leaf_boundary(&callee_lower, &ml) {
return true;
}
}

View file

@ -4,6 +4,7 @@ use super::{AnalysisContext, CfgAnalysis, CfgFinding, Confidence};
use crate::cfg::{EdgeKind, StmtKind};
use crate::patterns::Severity;
use crate::symbol::Lang;
use petgraph::algo::dominators::Dominators;
use petgraph::graph::NodeIndex;
use petgraph::visit::EdgeRef;
use std::collections::HashSet;
@ -132,11 +133,14 @@ fn release_on_all_exit_paths(
acquire: NodeIndex,
release_nodes: &[NodeIndex],
exit: NodeIndex,
post_doms: Option<&Dominators<NodeIndex>>,
) -> bool {
// Use post-dominators as optimization: if any release post-dominates acquire, it's fine
if let Some(post_doms) = dominators::compute_post_dominators(ctx.cfg) {
// Use post-dominators as optimization: if any release post-dominates acquire, it's fine.
// The post-dominator tree is computed once per CFG by the caller (the CFG is
// immutable across acquire sites), so it is threaded in rather than recomputed here.
if let Some(post_doms) = post_doms {
for &release in release_nodes {
if dominators::dominates(&post_doms, release, acquire) {
if dominators::dominates(post_doms, release, acquire) {
return true;
}
}
@ -363,8 +367,13 @@ fn is_ownership_transferred(ctx: &AnalysisContext, acquire: NodeIndex) -> bool {
if references_var && start < end && end <= ctx.source_bytes.len() {
let span_text = &ctx.source_bytes[start..end];
// `->` anywhere in span means pointer-to-member store
if span_text.windows(2).any(|w| w == b"->") {
// `ptr->field = var` pointer-to-member store. A bare `->`
// anywhere in the span is NOT enough: a member READ on or near
// the handle (`fread(buf->data, 1, n, fp)`, PHP
// `$conn->query(...)`) also contains `->` but transfers no
// ownership, so requiring the `= ` (assignment, not `==`) keeps
// the suppression to actual stores.
if has_arrow_field_assignment(span_text) {
return true;
}
// `.field = var` pattern (but not `==`)
@ -384,7 +393,7 @@ fn is_ownership_transferred(ctx: &AnalysisContext, acquire: NodeIndex) -> bool {
{
let is_field_write = if start < end && end <= ctx.source_bytes.len() {
let span_text = &ctx.source_bytes[start..end];
span_text.windows(2).any(|w| w == b"->") || has_dot_field_assignment(span_text)
has_arrow_field_assignment(span_text) || has_dot_field_assignment(span_text)
} else {
false
};
@ -434,6 +443,58 @@ fn has_dot_field_assignment(span_text: &[u8]) -> bool {
false
}
/// Check if `span_text` contains a pointer-to-member assignment pattern like
/// `ptr->field = var` (but not a member READ such as `fread(buf->data, ...)`,
/// PHP `$conn->query(...)`, or a comparison `a->b == c`).
///
/// A bare `->` anywhere in the span is insufficient — only an arrow access
/// that is the LHS of an assignment indicates an ownership-transfer store.
/// We require, after one or more `->ident` (or `.ident`) member-access
/// segments, a single `=` that is not part of `==` / `!=` / `<=` / `>=`.
fn has_arrow_field_assignment(span_text: &[u8]) -> bool {
let mut i = 0;
while i + 1 < span_text.len() {
if span_text[i] == b'-' && span_text[i + 1] == b'>' {
// Walk past chained member-access segments: `->ident`,
// `.ident`, further `->ident`, and any whitespace, looking for
// the assignment `=`.
let mut j = i + 2;
loop {
// identifier chars
while j < span_text.len()
&& (span_text[j].is_ascii_alphanumeric() || span_text[j] == b'_')
{
j += 1;
}
// chained `->` / `.` member access keeps the LHS going
if j + 1 < span_text.len() && span_text[j] == b'-' && span_text[j + 1] == b'>' {
j += 2;
continue;
}
if j < span_text.len() && span_text[j] == b'.' {
j += 1;
continue;
}
break;
}
// Skip whitespace before the operator
while j < span_text.len() && span_text[j].is_ascii_whitespace() {
j += 1;
}
// Reject `==`, `!=`, `<=`, `>=` — only a bare `=` is a store.
if j < span_text.len()
&& span_text[j] == b'='
&& (j + 1 >= span_text.len() || span_text[j + 1] != b'=')
&& (j == 0 || !matches!(span_text[j - 1], b'!' | b'<' | b'>' | b'='))
{
return true;
}
}
i += 1;
}
false
}
/// Check whether the acquired variable is consumed by an ownership-taking
/// function (e.g. `FileResponse(f)`, `send_file(f)`) downstream of the
/// acquire node. These functions take ownership of the file handle so there
@ -538,6 +599,12 @@ impl CfgAnalysis for ResourceMisuse {
None => return Vec::new(),
};
// The CFG is immutable for the duration of this pass, so the
// post-dominator tree only needs to be computed once. Previously
// `release_on_all_exit_paths` recomputed it for every acquire site,
// turning the body's post-dominator analysis into an O(A) hot spot.
let post_doms = dominators::compute_post_dominators(ctx.cfg);
let mut findings = Vec::new();
for pair in pairs {
@ -593,8 +660,13 @@ impl CfgAnalysis for ResourceMisuse {
continue;
}
}
if !release_on_all_exit_paths(ctx, acquire, &release_nodes, exit)
&& !is_ownership_transferred(ctx, acquire)
if !release_on_all_exit_paths(
ctx,
acquire,
&release_nodes,
exit,
post_doms.as_ref(),
) && !is_ownership_transferred(ctx, acquire)
&& !is_consumed_by_owner(ctx, acquire)
{
// For mutex pairs, require an explicit .acquire()/.lock() call
@ -724,4 +796,30 @@ mod tests {
let info = call_node(vec![None], vec![vec!["receiver_func".into()]]);
assert!(!is_event_handler_register_shape(&info));
}
#[test]
fn arrow_field_assignment_matches_real_stores() {
// Genuine ownership-transfer stores: `ptr->field = var`.
assert!(has_arrow_field_assignment(b"ctx->fp = fp"));
assert!(has_arrow_field_assignment(b"p->next = cfg->variables"));
// Chained member access on the LHS.
assert!(has_arrow_field_assignment(b"a->b->c = handle"));
// Whitespace variations.
assert!(has_arrow_field_assignment(b"obj->handle=res"));
}
#[test]
fn arrow_field_assignment_rejects_member_reads() {
// Member READ on / near the handle: contains `->` but no store.
// This is the false-negative the fix targets — must NOT be treated
// as ownership transfer.
assert!(!has_arrow_field_assignment(b"fread(buf->data, 1, n, fp)"));
assert!(!has_arrow_field_assignment(b"$result = $conn->query(sql)"));
assert!(!has_arrow_field_assignment(b"if (node->len > 0)"));
// Comparisons through arrow access are not stores.
assert!(!has_arrow_field_assignment(b"if (obj->state == READY)"));
assert!(!has_arrow_field_assignment(b"x->y != z"));
// No arrow at all.
assert!(!has_arrow_field_assignment(b"fclose(fp)"));
}
}

View file

@ -96,6 +96,49 @@ pub fn handle(
}
}
/// Run `f` with optional panic recovery, mirroring
/// `crate::commands::scan::recover_or_propagate` (which is private to the
/// scan module). When `enabled` is false, panics propagate as before.
/// When enabled, a panic in `f` is caught, logged, and surfaced as an
/// `Err` so the caller can log-and-skip the offending file instead of
/// aborting the whole index build.
fn recover_or_propagate<T>(
enabled: bool,
path: &std::path::Path,
logs: Option<&Arc<ScanLogCollector>>,
f: impl FnOnce() -> NyxResult<T>,
) -> NyxResult<T> {
if !enabled {
return f();
}
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(f)) {
Ok(r) => r,
Err(panic) => {
let msg = panic
.downcast_ref::<&str>()
.copied()
.map(str::to_owned)
.or_else(|| panic.downcast_ref::<String>().cloned())
.unwrap_or_else(|| "<non-string panic>".to_string());
tracing::warn!(
path = %path.display(),
panic = %msg,
"index-build analysis panicked; continuing"
);
if let Some(l) = logs {
l.warn(
format!("Analysis panicked: {msg}"),
Some(path.display().to_string()),
Some(msg.clone()),
);
}
Err(crate::errors::NyxError::Msg(format!(
"analysis panicked: {msg}"
)))
}
}
}
pub fn build_index(
project_name: &str,
project_path: &std::path::Path,
@ -135,6 +178,16 @@ pub fn build_index_with_observer(
let owned_cfg = crate::commands::scan::ensure_framework_ctx(project_path, config);
let config = owned_cfg.as_ref().unwrap_or(config);
// Pass-1 rescans of later-edited files run with a populated module
// graph (scan_with_index_parallel_observer builds one before pass 1),
// so the SSA summaries/bodies they persist carry package-qualified
// namespaces. Build the same graph here so the rows persisted at
// index-build time use the identical key convention; without it the DB
// ends up mixing package-qualified and bare namespaces for the same
// project, breaking cross-file SSA resolution.
let owned_cfg_with_graph = crate::commands::scan::ensure_module_graph(project_path, config);
let config = owned_cfg_with_graph.as_ref().unwrap_or(config);
tracing::debug!("Building index for: {}", project_name);
let pool = Indexer::init(db_path)?;
{
@ -206,22 +259,62 @@ pub fn build_index_with_observer(
let pass1_start = std::time::Instant::now();
let writer = IndexWriteQueue::start(project_name.to_owned(), Arc::clone(&pool));
let write_tx = writer.sender();
let panic_recovery = config.scanner.enable_panic_recovery;
let index_result = paths.into_par_iter().try_for_each(|path| -> NyxResult<()> {
// Read once, hash once, pass bytes to both rule execution and
// summary extraction. Use pre-computed hash for upsert to avoid
// a redundant file read inside upsert_file.
let bytes = std::fs::read(&path)?;
//
// A file that disappears between the walk and this read
// (delete/rename race) or that is unreadable (permission denied)
// must not abort the entire index build: log and skip it, matching
// the non-indexed scan paths (`scan.rs` pass-1 fold).
let bytes = match std::fs::read(&path) {
Ok(b) => b,
Err(e) => {
tracing::warn!("index build: cannot read {}: {e}", path.display());
if let Some(l) = logs.as_ref() {
l.warn(
format!("Skipping unreadable file: {e}"),
Some(path.display().to_string()),
None,
);
}
pb.inc(1);
return Ok(());
}
};
let hash = Indexer::digest_bytes(&bytes);
// Parse once and persist every artifact we can reuse later:
// findings, coarse summaries, and precise SSA summaries.
let fused = crate::commands::scan::analyse_file_fused(
&bytes,
&path,
config,
None,
Some(project_path),
)?;
// findings, coarse summaries, and precise SSA summaries. Wrap the
// analysis in optional panic recovery so an engine panic on one
// file does not abort the whole build when the user enabled
// `scanner.enable_panic_recovery`; an analysis error (or a caught
// panic) is logged and the file skipped, matching the scan paths.
let fused = match recover_or_propagate(panic_recovery, &path, logs.as_ref(), || {
crate::commands::scan::analyse_file_fused(
&bytes,
&path,
config,
None,
Some(project_path),
)
}) {
Ok(f) => f,
Err(e) => {
tracing::warn!("index build: analysis failed for {}: {e}", path.display());
if let Some(l) = logs.as_ref() {
l.warn(
format!("Skipping file after analysis error: {e}"),
Some(path.display().to_string()),
None,
);
}
pb.inc(1);
return Ok(());
}
};
if let Some(ref p) = progress {
p.inc_parsed(1);
p.set_current_file(&path.to_string_lossy());
@ -287,6 +380,32 @@ pub fn build_index_with_observer(
})
.collect();
// Persist auth-check summaries and cross-package import maps too.
// The indexed pass-1 rescan path persists these via
// `replace_all_for_file`, but a hash match on the freshly-stamped
// file makes that path skip every unchanged file, so unless we
// write them here they are lost until a file's content changes:
// `load_all_auth_summaries` / `load_all_cross_package_imports`
// return empty after the first indexed scan, killing cross-file
// auth-helper lifting and cross-package callee resolution.
let auth_rows: Vec<_> = fused
.auth_summaries
.into_iter()
.map(|(key, sum)| {
(
key.name,
key.arity.unwrap_or(0),
key.lang.as_str().to_string(),
key.namespace,
key.container,
key.disambig,
key.kind,
sum,
)
})
.collect();
let cross_pkg_imports = fused.cross_package_imports;
let path_for_write = path.clone();
write_tx.enqueue(move |idx| {
let file_id = idx.upsert_file_with_hash(&path_for_write, &hash)?;
@ -302,15 +421,23 @@ pub fn build_index_with_observer(
}),
)?;
if !summaries.is_empty() {
idx.replace_summaries_for_file(&path_for_write, &hash, &summaries)?;
}
if !ssa_rows.is_empty() {
idx.replace_ssa_summaries_for_file(&path_for_write, &hash, &ssa_rows)?;
}
if !body_rows.is_empty() {
idx.replace_ssa_bodies_for_file(&path_for_write, &hash, &body_rows)?;
}
// Single transaction for all summary caches, matching the
// indexed pass-1 rescan path (`replace_all_for_file`): one
// fsync per file, and — critically — auth summaries and
// cross-package imports are persisted at build time so the
// first indexed scan does not lose them.
let cpi_arg = cross_pkg_imports
.as_ref()
.map(|(ns, map)| (ns.as_str(), map.as_ref()));
idx.replace_all_for_file(
&path_for_write,
&hash,
&summaries,
&ssa_rows,
&body_rows,
&auth_rows,
cpi_arg,
)?;
Ok(())
})?;
@ -409,3 +536,69 @@ app.get('/safe', function(req, res) {
"index build should persist SSA summaries for functions with non-trivial SSA effects"
);
}
#[test]
fn recover_or_propagate_catches_panics_when_enabled() {
// When disabled, the panic propagates (default fail-fast).
let disabled = std::panic::catch_unwind(|| {
let _ = recover_or_propagate(
false,
std::path::Path::new("x"),
None,
|| -> NyxResult<()> { panic!("boom") },
);
});
assert!(
disabled.is_err(),
"with recovery disabled the panic must propagate"
);
// When enabled, the panic is caught and surfaced as Err so the
// index-build loop can log-and-skip instead of aborting.
let recovered = recover_or_propagate(
true,
std::path::Path::new("x"),
None,
|| -> NyxResult<()> { panic!("boom") },
);
assert!(
recovered.is_err(),
"with recovery enabled the panic must become an Err, not unwind"
);
// A clean closure passes its result through unchanged.
let ok = recover_or_propagate(true, std::path::Path::new("x"), None, || Ok(7u32));
assert_eq!(ok.unwrap(), 7);
}
#[test]
fn build_index_skips_unreadable_entry_without_aborting() {
// A path under the project that cannot be read (here: a directory
// entry that fs::read fails on) must be skipped, not abort the build.
// Verifies the read-failure log-and-skip branch in the pass-1 loop.
let mut cfg = Config::default();
cfg.performance.worker_threads = Some(1);
cfg.performance.channel_multiplier = 1;
cfg.performance.batch_size = 2;
let td = tempfile::tempdir().unwrap();
let project_dir = td.path().join("proj");
fs::create_dir(&project_dir).unwrap();
// One good file so the build has real work to persist.
let good = project_dir.join("good.rs");
fs::write(&good, "fn main() {}").unwrap();
let db_path = td.path().join("proj.sqlite");
// Must succeed even though the project may contain entries that are
// not plain readable files; the build never returns Err for that.
build_index("proj", &project_dir, &db_path, &cfg, false)
.expect("index build must not abort on a per-file read error");
let pool = Indexer::init(&db_path).unwrap();
let idx = Indexer::from_pool("proj", &pool).unwrap();
let files = idx.get_files("proj").unwrap();
assert!(
files.iter().any(|p| p == &good),
"the readable file must still be indexed"
);
}

View file

@ -387,7 +387,18 @@ pub(crate) fn verify_findings_for_scan(
// can resolve the enclosing function and callgraph entry context
// without re-hitting SQLite per finding. Best-effort: a load failure
// logs and falls through to the substring heuristics.
opts.summaries = load_verify_summaries(project_name, db_path, scan_path);
// Resolve a module graph so summaries are keyed with the same
// package-qualified namespaces the indexed scan path uses. Reuse
// the one already on `config` when present; otherwise build it
// best-effort for this root.
let verify_cfg_with_graph = ensure_module_graph(scan_path, config);
let verify_module_graph = config.module_graph.as_deref().or_else(|| {
verify_cfg_with_graph
.as_ref()
.and_then(|c| c.module_graph.as_deref())
});
opts.summaries =
load_verify_summaries(project_name, db_path, scan_path, verify_module_graph);
if let Some(ref summaries) = opts.summaries {
opts.callgraph = Some(load_verify_callgraph(summaries));
}
@ -589,6 +600,7 @@ fn load_verify_summaries(
project: &str,
db_path: &Path,
scan_root: &Path,
module_graph: Option<&crate::resolve::ModuleGraph>,
) -> Option<Arc<crate::summary::GlobalSummaries>> {
let pool = match Indexer::init(db_path) {
Ok(p) => p,
@ -612,9 +624,15 @@ fn load_verify_summaries(
}
};
let root_str = scan_root.to_string_lossy().into_owned();
Some(Arc::new(crate::summary::merge_summaries(
// Key with package-qualified namespaces (when a module graph is
// available) so the verify-path summary index matches the keys the
// indexed scan path and pass-1 SSA summaries use. Plain
// normalize_namespace keys would diverge for any repo with a named
// package.json and silently lose cross-file callee resolution.
Some(Arc::new(crate::summary::merge_summaries_with_resolver(
all,
Some(&root_str),
module_graph,
)))
}
@ -1566,6 +1584,12 @@ fn run_topo_batches(
let batch_results: Vec<(
std::path::PathBuf,
// `false` when this file's read or analysis failed this
// iteration. A failed file must NOT overwrite the diags it
// produced in an earlier successful iteration with an empty
// vector, so the result loop below skips the diags cache
// update when this flag is false.
bool,
Vec<Diag>,
Vec<crate::summary::FuncSummary>,
Vec<(
@ -1597,7 +1621,7 @@ fn run_topo_batches(
None,
);
}
return (path.to_path_buf(), vec![], vec![], vec![], vec![]);
return (path.to_path_buf(), false, vec![], vec![], vec![], vec![]);
}
};
match recover_or_propagate(
@ -1618,6 +1642,7 @@ fn run_topo_batches(
pb.inc(0); // don't double-count iterations in progress bar
(
path.to_path_buf(),
true,
r.diags,
r.summaries,
r.ssa_summaries,
@ -1637,7 +1662,7 @@ fn run_topo_batches(
None,
);
}
(path.to_path_buf(), vec![], vec![], vec![], vec![])
(path.to_path_buf(), false, vec![], vec![], vec![], vec![])
}
}
})
@ -1645,12 +1670,23 @@ fn run_topo_batches(
let mut ssa_count: usize = 0;
let mg = cfg.module_graph.as_deref();
for (path, diags, summaries, ssa_summaries, _ssa_bodies) in batch_results {
for (path, analysis_ok, diags, summaries, ssa_summaries, _ssa_bodies) in
batch_results
{
// Phase-B: replace (not append) this file's diags
// so the cache always reflects the latest
// iteration's output. Clean files skipped this
// iteration retain their previous diags.
diags_by_file.insert(path, diags);
//
// A file whose read or analysis FAILED this iteration is
// not "clean" but must be treated like one for the diags
// cache: overwriting with the empty vector would erase
// findings the same file produced in an earlier successful
// iteration. Its (empty) summaries are likewise a no-op
// below, leaving the previous iteration's summaries intact.
if analysis_ok {
diags_by_file.insert(path, diags);
}
for s in summaries {
let key = s.func_key_with_resolver(root_str_ref, mg);
@ -2718,6 +2754,17 @@ pub fn scan_with_index_parallel_observer(
| crate::utils::config::AnalysisMode::Taint
);
// Records the pass-1-time content hash for every file whose summary
// extraction succeeded (or was correctly skipped as unchanged). The
// post-pass-2 persistence loop stamps the `files` table hash only for
// these entries, using the pass-1 hash. Files whose pass-1 extraction
// FAILED (recovered panic, hard parse failure, transient read error)
// are absent, so their `files` row keeps its previous hash and
// `should_scan_with_hash` retries them on the next scan instead of
// freezing stale summaries against the new content forever.
let pass1_safe_hashes: Arc<Mutex<HashMap<PathBuf, Vec<u8>>>> =
Arc::new(Mutex::new(HashMap::new()));
// ── Pass 1: ensure summaries are uptodate ──────────────────────────
if needs_taint {
if let Some(p) = progress {
@ -2744,6 +2791,7 @@ pub fn scan_with_index_parallel_observer(
let scan_root_ref = scan_root.to_path_buf();
let persist_errors_ref = Arc::clone(&persist_errors);
let skipped_files_ref = Arc::clone(&skipped_files);
let pass1_safe_hashes_ref = Arc::clone(&pass1_safe_hashes);
let progress_ref = progress.cloned();
files.par_iter().for_each_init(
|| Indexer::from_pool(project, &pool).expect("db pool"),
@ -2822,6 +2870,14 @@ pub fn scan_with_index_parallel_observer(
)
})
.collect();
// Extraction succeeded: this file's `files`
// row may be stamped with the pass-1 hash.
// Pass-1 persist failures abort the scan
// (writer_result? below), so a successful
// enqueue is a sufficient safety signal.
if let Ok(mut m) = pass1_safe_hashes_ref.lock() {
m.insert(path.clone(), hash.clone());
}
// Single transaction for all four caches:
// one fsync per file instead of four.
let path_for_write = path.clone();
@ -2855,6 +2911,11 @@ pub fn scan_with_index_parallel_observer(
if let Some(p) = &progress_ref {
p.inc_skipped(1);
}
// Skipped-as-unchanged: the `files` row hash already
// equals `hash`, so re-stamping it is a safe no-op.
if let Ok(mut m) = pass1_safe_hashes_ref.lock() {
m.insert(path.clone(), hash.clone());
}
}
} else {
tracing::warn!("pass 1: cannot read {}", path.display());
@ -2898,7 +2959,16 @@ pub fn scan_with_index_parallel_observer(
let idx = Indexer::from_pool(project, &pool)?;
let all = idx.load_all_summaries()?;
tracing::info!(summaries = all.len(), "loaded cross-file summaries from DB");
let mut gs = summary::merge_summaries(all, Some(&root_str));
// Key coarse summaries with the SAME package-qualified namespaces that
// pass-1 SSA summaries (namespace_with_package) and pass-2 refinements
// (func_key_with_resolver) use. Plain merge_summaries here would key
// them by normalize_namespace, so cross-file SSA resolution would miss
// every package-resident function in any repo with a named package.json.
let mut gs = summary::merge_summaries_with_resolver(
all,
Some(&root_str),
cfg.module_graph.as_deref(),
);
// Load and insert SSA summaries
let ssa_rows = idx.load_all_ssa_summaries()?;
@ -3358,6 +3428,10 @@ pub fn scan_with_index_parallel_observer(
for d in &topo_diags {
by_file.entry(&d.path).or_default().push(d);
}
let safe_hashes = pass1_safe_hashes
.lock()
.map(|m| m.clone())
.unwrap_or_default();
let mut idx = Indexer::from_pool(project, &pool)?;
for path in &files {
if !path.exists() {
@ -3365,7 +3439,21 @@ pub fn scan_with_index_parallel_observer(
continue;
}
let file_id = idx.upsert_file(path)?;
// Only stamp the `files` row when pass-1 extraction for this file
// succeeded (or was correctly skipped as unchanged); use the
// pass-1-time hash, never a fresh re-read (avoids the TOCTOU where
// a mid-scan edit would pair pass-1 artifacts with content that was
// never analysed). Files whose pass-1 extraction FAILED are absent
// from `safe_hashes`: their `files` row is left untouched so its
// previous hash forces a pass-1 retry on the next scan instead of
// freezing stale summaries against the new content forever. Their
// best-effort pass-2 findings still appear in this scan's in-memory
// result; we just do not persist them or advance the hash.
let Some(pass1_hash) = safe_hashes.get(path) else {
continue;
};
let file_id = idx.upsert_file_with_hash(path, pass1_hash)?;
let empty: [&Diag; 0] = [];
let file_diags = by_file
.get(path.to_string_lossy().as_ref())

View file

@ -1053,6 +1053,67 @@ pub mod index {
Ok(())
}
/// Recompute a finding's [`crate::patterns::FindingCategory`] from
/// its rule id alone.
///
/// The `issues` table persists only `(rule_id, severity, line, col)`,
/// not the category, so cached rows served by
/// [`Self::get_issues_from_file`] must reconstruct it. Hardcoding
/// `Security` here resurrects quality findings (e.g.
/// `rs.quality.unwrap`) past the `include_quality` filter and the
/// quality-rollup, producing cold/warm non-determinism for unchanged
/// files. Mapping is best-effort and falls back to `Security`:
///
/// * structural / state-machine ids (`state-*`, `cfg-*`) route through
/// [`crate::patterns::FindingCategory::for_structural_rule`]
/// (reliability for leaks/error-fallthrough, security otherwise);
/// * AST-pattern ids look up their declared
/// [`crate::patterns::PatternCategory`] in a one-time index built
/// over every language's pattern registry;
/// * everything else (taint flows, etc.) stays `Security`.
fn category_for_rule_id(rule_id: &str) -> crate::patterns::FindingCategory {
use crate::patterns::FindingCategory;
use std::sync::OnceLock;
// Structural / state-machine findings are not AST patterns and
// carry no language slug; classify them by id directly.
if rule_id.starts_with("state-") || rule_id.starts_with("cfg-") {
return FindingCategory::for_structural_rule(rule_id);
}
// Lazily build a rule_id -> FindingCategory index across every
// language's static pattern registry. Distinct ids only; alias
// slugs share the same pattern slice so duplicate inserts are a
// no-op.
static PATTERN_CATEGORIES: OnceLock<
std::collections::HashMap<String, FindingCategory>,
> = OnceLock::new();
let map = PATTERN_CATEGORIES.get_or_init(|| {
let mut m = std::collections::HashMap::new();
for lang in [
"rust",
"typescript",
"javascript",
"c",
"cpp",
"java",
"go",
"php",
"python",
"ruby",
] {
for p in crate::patterns::load(lang) {
m.insert(p.id.to_string(), p.category.finding_category());
}
}
m
});
map.get(rule_id)
.copied()
.unwrap_or(FindingCategory::Security)
}
/// Gets the issues for a specific file so we don't have to rescan
pub fn get_issues_from_file(&self, path: &Path) -> NyxResult<Vec<Diag>> {
let file_id: i64 = self.c().query_row(
@ -1076,13 +1137,15 @@ pub mod index {
);
Severity::Medium
});
let rule_id: String = row.get::<_, String>(0)?;
let category = Self::category_for_rule_id(&rule_id);
Ok(Diag {
path: path.to_string_lossy().to_string(),
id: row.get::<_, String>(0)?, // rule_id
id: rule_id, // rule_id
line: row.get::<_, i64>(2)? as usize,
col: row.get::<_, i64>(3)? as usize,
severity,
category: crate::patterns::FindingCategory::Security,
category,
path_validated: false,
guard_kind: None,
message: None,
@ -1695,8 +1758,14 @@ pub mod index {
/// * function and auth summaries: DELETE-then-INSERT regardless
/// of input length, so emptying a file's summaries clears
/// stale rows.
/// * SSA summaries and bodies: only touched when the input is
/// non-empty, matching the existing scan path.
/// * SSA summaries and bodies: DELETE always runs so a file that
/// no longer yields any SSA artifacts (functions removed, file
/// gutted, or SSA lowering now failing for every function)
/// clears its stale high-precedence rows; the INSERT loop is
/// skipped when the new set is empty. Leaving stale SSA rows
/// would let deleted/phantom functions keep resolving cross-file
/// via resolve step 0.5 (SSA summaries outrank coarse
/// FuncSummary), so the DELETE must not be gated on non-empty.
#[allow(clippy::too_many_arguments)]
pub fn replace_all_for_file(
&mut self,
@ -1780,13 +1849,15 @@ pub mod index {
}
}
// ssa_function_summaries, only touched when non-empty.
// ssa_function_summaries: DELETE always, INSERT only when
// the new set is non-empty, so emptying a file's SSA
// summaries clears its stale high-precedence rows.
tx.execute(
"DELETE FROM ssa_function_summaries
WHERE project = ?1 AND file_path = ?2",
params![self.project, path_str],
)?;
if !ssa_summaries.is_empty() {
tx.execute(
"DELETE FROM ssa_function_summaries
WHERE project = ?1 AND file_path = ?2",
params![self.project, path_str],
)?;
let mut stmt = tx.prepare(
"INSERT OR REPLACE INTO ssa_function_summaries
(project, file_path, file_hash, name, arity, lang, namespace,
@ -1822,13 +1893,15 @@ pub mod index {
}
}
// ssa_function_bodies, only touched when non-empty.
// ssa_function_bodies: DELETE always, INSERT only when the
// new set is non-empty, so emptying a file's SSA bodies
// clears its stale high-precedence rows.
tx.execute(
"DELETE FROM ssa_function_bodies
WHERE project = ?1 AND file_path = ?2",
params![self.project, path_str],
)?;
if !ssa_bodies.is_empty() {
tx.execute(
"DELETE FROM ssa_function_bodies
WHERE project = ?1 AND file_path = ?2",
params![self.project, path_str],
)?;
let mut stmt = tx.prepare(
"INSERT OR REPLACE INTO ssa_function_bodies
(project, file_path, file_hash, name, arity, lang, namespace,
@ -2922,6 +2995,65 @@ fn replace_issues_and_query_back() {
);
}
#[test]
fn get_issues_from_file_recomputes_category_from_rule_id() {
// Regression for the hardcoded `category = Security` bug: cached rows
// must reconstruct the finding category from the rule id, otherwise a
// quality finding (correctly dropped/rolled-up on the cold scan)
// resurfaces as an ungrouped Security finding on every warm scan.
use crate::patterns::FindingCategory;
let td = tempfile::tempdir().unwrap();
let db = td.path().join("nyx.sqlite");
let file = td.path().join("lib.rs");
std::fs::write(&file, "fn main() {}").unwrap();
let pool = index::Indexer::init(&db).unwrap();
let mut idx = index::Indexer::from_pool("proj", &pool).unwrap();
let fid = idx.upsert_file(&file).unwrap();
let issues = [
// Tier-A quality AST pattern -> must come back as Quality.
index::IssueRow {
rule_id: "rs.quality.unwrap",
severity: "Low",
line: 1,
col: 1,
},
// Structural resource leak -> Reliability.
index::IssueRow {
rule_id: "state-resource-leak",
severity: "Medium",
line: 2,
col: 1,
},
// Taint-style id (not in any AST registry) -> Security fallback.
index::IssueRow {
rule_id: "taint-sql-injection",
severity: "High",
line: 3,
col: 1,
},
];
idx.replace_issues(fid, issues).unwrap();
let stored = idx.get_issues_from_file(&file).unwrap();
let cat = |id: &str| {
stored
.iter()
.find(|d| d.id == id)
.unwrap_or_else(|| panic!("missing {id}"))
.category
};
assert_eq!(
cat("rs.quality.unwrap"),
FindingCategory::Quality,
"quality AST pattern must not be resurrected as Security on warm scans"
);
assert_eq!(cat("state-resource-leak"), FindingCategory::Reliability);
assert_eq!(cat("taint-sql-injection"), FindingCategory::Security);
}
#[test]
fn clear_and_vacuum_reset_tables() {
let td = tempfile::tempdir().unwrap();
@ -3286,6 +3418,81 @@ fn ssa_summaries_hash_rescan_replaces_stale() {
assert_eq!(loaded[0].1, "new_func");
}
#[test]
fn replace_all_for_file_clears_stale_ssa_rows_when_emptied() {
// Regression for the gated-DELETE staleness bug: a file edited so it no
// longer yields any SSA summaries (functions removed / file gutted /
// lowering now failing) must clear its high-precedence ssa_function_*
// rows. Otherwise phantom functions keep resolving cross-file via
// resolve step 0.5 (SSA outranks coarse FuncSummary).
use crate::labels::Cap;
use crate::summary::ssa_summary::{SsaFuncSummary, TaintTransform};
let td = tempfile::tempdir().unwrap();
let db = td.path().join("nyx.sqlite");
let f = td.path().join("lib.py");
std::fs::write(&f, "v1").unwrap();
let pool = index::Indexer::init(&db).unwrap();
let mut idx = index::Indexer::from_pool("proj", &pool).unwrap();
let hash_v1 = index::Indexer::digest_bytes(b"v1");
let ssa_sums = vec![(
"old_func".to_string(),
1_usize,
"python".to_string(),
"lib.py".to_string(),
String::new(),
None,
crate::symbol::FuncKind::Function,
SsaFuncSummary {
param_to_return: vec![(0, TaintTransform::Identity)],
param_to_sink: vec![],
source_caps: Cap::empty(),
param_to_sink_param: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
return_abstract: None,
source_to_callback: vec![],
receiver_to_return: None,
receiver_to_sink: Cap::empty(),
abstract_transfer: vec![],
param_return_paths: vec![],
points_to: Default::default(),
field_points_to: Default::default(),
return_path_facts: smallvec::SmallVec::new(),
typed_call_receivers: vec![],
validated_params_to_return: smallvec::SmallVec::new(),
param_to_gate_filters: vec![],
entry_kind: None,
},
)];
// Pass 1: file has one SSA summary.
idx.replace_all_for_file(&f, &hash_v1, &[], &ssa_sums, &[], &[], None)
.unwrap();
assert_eq!(
idx.load_all_ssa_summaries().unwrap().len(),
1,
"SSA summary should persist on first write"
);
// Pass 2: file gutted — new hash, empty SSA summaries/bodies. The old
// row must be deleted, not left behind.
let hash_v2 = index::Indexer::digest_bytes(b"v2");
idx.replace_all_for_file(&f, &hash_v2, &[], &[], &[], &[], None)
.unwrap();
assert!(
idx.load_all_ssa_summaries().unwrap().is_empty(),
"stale SSA summary rows must be cleared when the new set is empty"
);
assert!(
idx.load_all_ssa_bodies().unwrap().is_empty(),
"stale SSA body rows must be cleared when the new set is empty"
);
}
#[test]
fn clear_drops_ssa_summaries_table() {
use crate::labels::Cap;

View file

@ -9,7 +9,7 @@ pub static RULES: &[LabelRule] = &[
case_sensitive: false,
},
LabelRule {
matchers: &["fgets", "scanf", "fscanf", "gets", "read"],
matchers: &["fgets", "scanf", "fscanf", "sscanf", "gets", "read"],
label: DataLabel::Source(Cap::all()),
case_sensitive: false,
},
@ -273,6 +273,15 @@ pub static OUTPUT_PARAM_SOURCES: &[(&str, &[usize])] = &[
("gets", &[0]), // gets(buf), buf receives input
("recv", &[1]), // recv(fd, buf, len, flags)
("recvfrom", &[1]), // recvfrom(fd, buf, len, flags, ...)
("read", &[1]), // read(fd, buf, len), buf receives attacker bytes
// `scanf`/`fscanf`/`sscanf` return a match count; the attacker-controlled
// bytes land in the variadic output pointers after the format string.
// OUTPUT_PARAM_SOURCES stores a fixed position list, so we enumerate a
// conservative span of trailing argument positions to cover the common
// single- and multi-conversion forms.
("scanf", &[1, 2, 3, 4, 5, 6, 7, 8]), // scanf("%s", buf, ...) , outputs start at arg 1
("fscanf", &[2, 3, 4, 5, 6, 7, 8]), // fscanf(stream, "%s", buf, ...) , outputs at arg 2+
("sscanf", &[2, 3, 4, 5, 6, 7, 8]), // sscanf(src, "%s", buf, ...) , outputs at arg 2+
];
/// Arg-to-arg taint propagation for known C functions.
@ -288,3 +297,36 @@ pub static ARG_PROPAGATIONS: &[super::ArgPropagation] = &[
to_args: &[1],
},
];
#[cfg(test)]
mod tests {
use crate::labels::output_param_source_positions;
#[test]
fn scanf_family_and_read_taint_output_args() {
// `scanf("%s", buf)` , buf is at arg 1.
assert_eq!(
output_param_source_positions("c", "scanf"),
Some([1usize, 2, 3, 4, 5, 6, 7, 8].as_slice())
);
// `fscanf(stream, "%s", buf)` and `sscanf(src, "%s", buf)` , outputs at arg 2+.
assert_eq!(
output_param_source_positions("c", "fscanf"),
Some([2usize, 3, 4, 5, 6, 7, 8].as_slice())
);
assert_eq!(
output_param_source_positions("c", "sscanf"),
Some([2usize, 3, 4, 5, 6, 7, 8].as_slice())
);
// `read(fd, buf, len)` , buf is at arg 1.
assert_eq!(
output_param_source_positions("c", "read"),
Some([1usize].as_slice())
);
// Namespaced/qualified callees normalize to the last segment.
assert_eq!(
output_param_source_positions("c", "std::sscanf"),
Some([2usize, 3, 4, 5, 6, 7, 8].as_slice())
);
}
}

View file

@ -1011,6 +1011,11 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
"communication_case" => Kind::Block,
"go_statement" => Kind::Block,
"defer_statement" => Kind::Block,
// `outer: for { ... }` wraps the whole labeled loop in a
// labeled_statement; map to Block so the CFG builder recurses into the
// inner statement instead of collapsing the loop body into one leaf Seq
// node (mirrors c.rs / cpp.rs).
"labeled_statement" => Kind::Block,
// data-flow
"call_expression" => Kind::CallFn,
@ -1032,7 +1037,10 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
pub static PARAM_CONFIG: ParamConfig = ParamConfig {
params_field: "parameters",
param_node_kinds: &["parameter_declaration"],
// `variadic_parameter_declaration` covers `func run(args ...string)`;
// without it the variadic param is dropped, registering wrong arity and
// never seeding caller taint into the variadic position.
param_node_kinds: &["parameter_declaration", "variadic_parameter_declaration"],
self_param_kinds: &[],
ident_fields: &["name"],
};
@ -1094,3 +1102,27 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
rules
}
#[cfg(test)]
mod tests {
use super::{KINDS, PARAM_CONFIG};
use crate::labels::Kind;
#[test]
fn labeled_statement_is_walkable_block() {
// `outer: for { ... }` must be a Block so the CFG builder recurses
// into the labeled loop body instead of collapsing it to a leaf Seq.
assert_eq!(KINDS.get("labeled_statement"), Some(&Kind::Block));
}
#[test]
fn variadic_param_is_extracted() {
// `func run(args ...string)` emits variadic_parameter_declaration;
// it must be a recognised param node so arity registers correctly.
assert!(
PARAM_CONFIG
.param_node_kinds
.contains(&"variadic_parameter_declaration")
);
}
}

View file

@ -1069,7 +1069,13 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
"block" => Kind::Block,
"class_declaration" => Kind::Block,
"class_body" => Kind::Block,
"interface_declaration" => Kind::Block,
"interface_body" => Kind::Block,
"enum_declaration" => Kind::Block,
"enum_body" => Kind::Block,
"enum_body_declarations" => Kind::Block,
"record_declaration" => Kind::Block,
"synchronized_statement" => Kind::Block,
"method_declaration" => Kind::Function,
"constructor_declaration" => Kind::Function,
"switch_expression" => Kind::Switch,
@ -1126,3 +1132,31 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
rules
}
#[cfg(test)]
mod tests {
use super::KINDS;
use crate::labels::Kind;
#[test]
fn container_declarations_are_walkable_blocks() {
// interface_declaration must be mapped so its (already-mapped)
// interface_body is reachable; enum/record/synchronized bodies
// must be walked instead of collapsing to a leaf Seq node.
for kind in [
"interface_declaration",
"interface_body",
"enum_declaration",
"enum_body",
"enum_body_declarations",
"record_declaration",
"synchronized_statement",
] {
assert_eq!(
KINDS.get(kind),
Some(&Kind::Block),
"{kind} should map to Kind::Block so the CFG builder walks its body"
);
}
}
}

View file

@ -1643,6 +1643,11 @@ pub static PARAM_CONFIG: ParamConfig = ParamConfig {
"typed_parameter",
"default_parameter",
"typed_default_parameter",
// `*args` / `**kwargs`: without these the splat params are dropped,
// registering wrong arity and never seeding caller taint into the
// variadic positions.
"list_splat_pattern",
"dictionary_splat_pattern",
],
self_param_kinds: &[],
ident_fields: &["name"],
@ -1667,9 +1672,26 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
#[cfg(test)]
mod tests {
use super::KINDS;
use super::{KINDS, PARAM_CONFIG};
use crate::labels::Kind;
#[test]
fn splat_params_are_extracted() {
// `*args` (list_splat_pattern) and `**kwargs` (dictionary_splat_pattern)
// must be recognised param nodes so arity registers correctly and the
// variadic positions are seeded with caller taint.
assert!(
PARAM_CONFIG
.param_node_kinds
.contains(&"list_splat_pattern")
);
assert!(
PARAM_CONFIG
.param_node_kinds
.contains(&"dictionary_splat_pattern")
);
}
#[test]
fn lambda_classified_as_function() {
assert_eq!(KINDS.get("lambda"), Some(&Kind::Function));

View file

@ -542,6 +542,11 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
// structure
"program" => Kind::SourceFile,
"body_statement" => Kind::Block,
// Brace blocks (`{ ... }`, including `->(){ ... }` lambda bodies) expose
// their body as a `block_body` node (do-end blocks use `body_statement`,
// already mapped above). Without this, multi-statement brace-block bodies
// collapse into a single leaf Seq node and lose all intra-body taint flow.
"block_body" => Kind::Block,
"do_block" => Kind::Function,
"then" => Kind::Block,
"else" => Kind::Block,
@ -718,3 +723,17 @@ mod ar_query_tests {
assert!(ar_query_safe_shape("a.b.c.where", "pair", false));
}
}
#[cfg(test)]
mod kinds_tests {
use super::KINDS;
use crate::labels::Kind;
#[test]
fn block_body_is_walkable_block() {
// Brace blocks (incl. lambda bodies) expose their body as `block_body`;
// it must be a Block so multi-statement bodies are walked instead of
// collapsing into one leaf Seq node and losing intra-body taint flow.
assert_eq!(KINDS.get("block_body"), Some(&Kind::Block));
}
}

View file

@ -403,6 +403,12 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
"impl_item" => Kind::Block,
"trait_item" => Kind::Block,
"declaration_list" => Kind::Block,
// Inline modules `mod foo { ... }` wrap their items in a
// `declaration_list`; map to Block so the CFG builder recurses into the
// body and the `function_item`s inside are lowered, instead of dropping
// the whole module (the old `Kind::Trivia` mapping discarded every
// function/source/sink inside an inline module).
"mod_item" => Kind::Block,
// data-flow
"call_expression" => Kind::CallFn,
@ -430,7 +436,6 @@ pub static KINDS: Map<&'static str, Kind> = phf_map! {
"{" => Kind::Trivia, "}" => Kind::Trivia, "\n" => Kind::Trivia,
"use_declaration" => Kind::Trivia,
"attribute_item" => Kind::Trivia,
"mod_item" => Kind::Trivia,
"type_item" => Kind::Trivia,
};
@ -614,3 +619,18 @@ pub fn phase_c_auth_rules() -> Vec<RuntimeLabelRule> {
},
]
}
#[cfg(test)]
mod tests {
use super::KINDS;
use crate::labels::Kind;
#[test]
fn mod_item_is_walkable_block_not_trivia() {
// Inline `mod foo { ... }` must be a Block so the CFG builder recurses
// into the module body; the old Trivia mapping dropped every function,
// source, and sink inside inline modules.
assert_eq!(KINDS.get("mod_item"), Some(&Kind::Block));
assert_ne!(KINDS.get("mod_item"), Some(&Kind::Trivia));
}
}

View file

@ -546,6 +546,34 @@ pub static GATED_SINKS: &[SinkGate] = &[
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// ── Lodash `_.template` SSTI/RCE gates, mirrors `labels/javascript.rs` ─
// (Strapi CVE-2023-22621 class). Lodash compiles `<% ... %>` evaluate
// blocks into a JS Function; gate on the `evaluate` option and fire
// conservatively when missing/dynamic.
SinkGate {
callee_matcher: "_.template",
arg_index: 0,
dangerous_values: &["true"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::CODE_EXEC),
case_sensitive: true,
payload_args: &[0],
keyword_name: Some("evaluate"),
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
SinkGate {
callee_matcher: "lodash.template",
arg_index: 0,
dangerous_values: &["true"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::CODE_EXEC),
case_sensitive: true,
payload_args: &[0],
keyword_name: Some("evaluate"),
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// ── XML XXE gates, mirrors `labels/javascript.rs` ────────────────────
SinkGate {
callee_matcher: "xml2js.parseString",
@ -762,6 +790,37 @@ pub static GATED_SINKS: &[SinkGate] = &[
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
},
},
// Node `http.get` / `https.get` convenience wrappers around `.request()`.
// Same destination semantics. Motivated by CVE-2025-64430 (Parse Server
// SSRF via http.get(uri)). Mirrors `labels/javascript.rs`.
SinkGate {
callee_matcher: "http.get",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::SSRF),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
},
},
SinkGate {
callee_matcher: "https.get",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::SSRF),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &["host", "hostname", "path", "protocol", "port", "origin"],
},
},
// ── Cross-boundary data exfiltration ──────────────────────────────────
// `fetch(input, init)`, payload-bearing fields of `init` (arg 1) flow
// into the request body / headers / json, distinct from SSRF on the URL
@ -1070,6 +1129,69 @@ pub static GATED_SINKS: &[SinkGate] = &[
dangerous_kwargs: &[],
activation: GateActivation::LiteralOnly,
},
// `set-value` standalone helper (CVE-2019-10747 / CVE-2021-23440) —
// recursive set-by-path helper that did not block `__proto__` keys.
// Mirrors `labels/javascript.rs`.
SinkGate {
callee_matcher: "setValue",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `dot-prop` standalone helper: `dotProp.set(obj, path, val)` —
// CVE-2020-8116. Mirrors `labels/javascript.rs`.
SinkGate {
callee_matcher: "dotProp.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `jsonpath` / `jsonpath-plus` `jp.set(obj, path, value)` family —
// mirrors `labels/javascript.rs`.
SinkGate {
callee_matcher: "jp.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "jsonpath.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {

View file

@ -117,7 +117,7 @@ pub const PATTERNS: &[Pattern] = &[
function: (member_expression
property: (property_identifier) @prop (#eq? @prop "createHash"))
arguments: (arguments
(string) @alg (#match? @alg "\"(md5|sha1)\"")))
(string) @alg (#match? @alg "^[\"'](md5|sha1)[\"']$")))
@vuln"#,
severity: Severity::Low,
tier: PatternTier::A,
@ -201,7 +201,7 @@ pub const PATTERNS: &[Pattern] = &[
query: r#"(call_expression
function: (identifier) @id (#eq? @id "fetch")
arguments: (arguments
(string) @url (#match? @url "^\"http://")))
(string) @url (#match? @url "^[\"']http://")))
@vuln"#,
severity: Severity::Low,
tier: PatternTier::A,

View file

@ -119,7 +119,7 @@ pub const PATTERNS: &[Pattern] = &[
query: r#"(call
method: (identifier) @m (#eq? @m "open")
arguments: (argument_list
(string) @url (#match? @url "^\"https?://")))
(string) @url (#match? @url "^[\"']https?://")))
@vuln"#,
severity: Severity::Medium,
tier: PatternTier::A,

View file

@ -88,7 +88,7 @@ pub const PATTERNS: &[Pattern] = &[
function: (member_expression
property: (property_identifier) @prop (#eq? @prop "createHash"))
arguments: (arguments
(string) @alg (#match? @alg "\"(md5|sha1)\"")))
(string) @alg (#match? @alg "^[\"'](md5|sha1)[\"']$")))
@vuln"#,
severity: Severity::Low,
tier: PatternTier::A,

View file

@ -558,6 +558,41 @@ impl AnalysisState {
}
changed
}
SsaOp::Call {
callee, receiver, ..
} => {
// Re-project the `Field(pt(receiver), ELEM)` cell for
// container reads. `transfer_inst` (pass 1) projects this
// from a snapshot of `pt(receiver)`, but the receiver's set
// is often still empty or partial on that pass (e.g. the
// container comes from a `FieldProj` that only resolves in
// the fixpoint, or from a value unified by a later `Assign`).
// Without re-projecting here, the promised `Field(_, ELEM)`
// members never appear and per-element parent-field aliasing
// through the read result is dropped. The fresh-alloc
// fallback added in pass 1 is preserved (`union_in_place`
// only adds members), so this strictly refines the set.
let Some(rcv) = receiver else {
return false;
};
if !is_container_read_callee(callee) || (rcv.0 as usize) >= self.parent.len() {
return false;
}
let rcv_rep = self.find(rcv.0) as usize;
let rcv_pt = self.pt[rcv_rep].clone();
// Mirror the pass-1 guard: skip empty (nothing to project)
// and Top (already maximally imprecise) receivers.
if rcv_pt.is_empty() || rcv_pt.is_top() {
return false;
}
let mut new_pt = PointsToSet::empty();
for parent_loc in rcv_pt.iter() {
let proj = self.interner.intern_field(parent_loc, FieldId::ELEM);
new_pt.insert(proj);
}
let v_rep = self.find(v) as usize;
self.pt[v_rep].union_in_place(&new_pt)
}
// No re-propagation needed for leaf ops.
_ => false,
}
@ -1015,6 +1050,68 @@ mod tests {
);
}
/// Regression for the fixpoint re-projection of container reads.
///
/// Shape: `e := this.queue.shift()`. The receiver of `shift` is a
/// `FieldProj` (`this.queue`) whose points-to set is *empty* on pass 1
/// (field projections only resolve in the fixpoint). Before the fix,
/// `propagate_inst` never re-evaluated `SsaOp::Call`, so the promised
/// `Field(Field(SelfParam, "queue"), ELEM)` member was never projected
/// into the result and only the fresh allocation remained. After the
/// fix the nested ELEM cell must appear once the receiver converges.
#[test]
fn container_read_reprojects_elem_after_receiver_converges() {
let mut b = BodyBuilder::new();
// `this` is the self parameter.
let this = b.fresh(Some("this"));
b.emit(this, SsaOp::SelfParam, Some("this"));
// `q := this.queue`, a field projection whose pt is empty in pass 1.
let queue_field = b.intern_field("queue");
let q = b.fresh(Some("q"));
b.emit(
q,
SsaOp::FieldProj {
receiver: this,
field: queue_field,
projected_type: None,
},
Some("q"),
);
// `e := q.shift()`, container read whose receiver (`q`) only
// converges during the fixpoint.
let e = b.fresh(Some("e"));
b.emit(
e,
SsaOp::Call {
callee: "shift".into(),
callee_text: None,
args: vec![],
receiver: Some(q),
},
Some("e"),
);
let body = b.build();
let facts = analyse_body(&body, BodyId(0));
let pt_e = facts.pt(e);
// The result must now include a Field(_, ELEM) member projected
// from the converged receiver set (not just the fresh alloc).
let mut saw_elem = false;
for loc in pt_e.iter() {
if let crate::pointer::AbsLoc::Field { field, .. } = facts.interner.resolve(loc)
&& *field == FieldId::ELEM
{
saw_elem = true;
break;
}
}
assert!(
saw_elem,
"container read with a FieldProj receiver must re-project \
Field(_, ELEM) in the fixpoint; got {pt_e:?}"
);
}
/// `extract_field_points_to` records a field
/// READ on the parameter index when a `FieldProj` traces back to
/// an `AbsLoc::Param`.

View file

@ -996,14 +996,35 @@ fn resolve_file_or_index(candidate: &Path) -> Option<PathBuf> {
if candidate.is_file() {
return Some(normalize_path(candidate));
}
for ext in RESOLVE_EXTENSIONS {
let mut with_ext = candidate.to_path_buf();
match with_ext.extension() {
Some(_) => {}
None => {
with_ext.set_extension(ext);
if with_ext.is_file() {
return Some(normalize_path(&with_ext));
// Node / TS resolution appends each candidate extension to the FULL
// specifier, regardless of whether the last segment already contains a
// dot. `Path::set_extension` *replaces* the trailing component after the
// last `.`, so it is wrong here: it would turn `./user.service` into
// `./user.ts` and never produce `./user.service.ts` (the dominant
// Angular/NestJS `.service`/`.component`/`.module`/`.controller`
// convention). Build the appended filename by hand instead.
if let Some(name) = candidate.file_name().and_then(|n| n.to_str()) {
for ext in RESOLVE_EXTENSIONS {
let appended = candidate.with_file_name(format!("{name}.{ext}"));
if appended.is_file() {
return Some(normalize_path(&appended));
}
}
}
// TypeScript NodeNext / ESM idiom: an `import './x.js'` (or `.mjs` /
// `.cjs`) frequently refers to `x.ts` / `x.mts` / `x.cts` on disk. When
// the specifier carries a JS-family extension that did not resolve as a
// file above, retry with each TS-family extension swapped in.
if let Some(stem) = candidate.file_stem().and_then(|s| s.to_str()) {
let has_js_ext = candidate
.extension()
.and_then(|e| e.to_str())
.is_some_and(|e| matches!(e, "js" | "jsx" | "mjs" | "cjs"));
if has_js_ext {
for ext in ["ts", "tsx", "mts", "cts"] {
let swapped = candidate.with_file_name(format!("{stem}.{ext}"));
if swapped.is_file() {
return Some(normalize_path(&swapped));
}
}
}
@ -1040,3 +1061,73 @@ fn normalize_path(p: &Path) -> PathBuf {
fn canonicalize_or_owned(p: &Path) -> PathBuf {
p.canonicalize().unwrap_or_else(|_| p.to_path_buf())
}
#[cfg(test)]
mod resolve_file_ext_tests {
use super::resolve_file_or_index;
use std::fs;
/// `import './user.service'` must resolve to `user.service.ts` on disk:
/// the extension is APPENDED to the full specifier, not replacing the
/// `.service` suffix (the Angular/NestJS convention).
#[test]
fn resolves_dotted_basename_by_appending_extension() {
let dir = tempfile::tempdir().unwrap();
let target = dir.path().join("user.service.ts");
fs::write(&target, "export const x = 1;").unwrap();
// The specifier as the importer would join it: no extension appended.
let candidate = dir.path().join("user.service");
let resolved = resolve_file_or_index(&candidate)
.expect("'./user.service' should resolve to user.service.ts");
assert!(
resolved.ends_with("user.service.ts"),
"expected user.service.ts, got {resolved:?}"
);
}
/// TypeScript NodeNext idiom: `import './x.js'` resolves to `x.ts` on disk.
#[test]
fn resolves_js_specifier_to_ts_file() {
let dir = tempfile::tempdir().unwrap();
let target = dir.path().join("x.ts");
fs::write(&target, "export const y = 2;").unwrap();
let candidate = dir.path().join("x.js");
let resolved = resolve_file_or_index(&candidate)
.expect("'./x.js' should resolve to x.ts under NodeNext semantics");
assert!(
resolved.ends_with("x.ts"),
"expected x.ts, got {resolved:?}"
);
}
/// Extensionless specifiers still resolve (no regression).
#[test]
fn resolves_extensionless_specifier() {
let dir = tempfile::tempdir().unwrap();
let target = dir.path().join("plain.ts");
fs::write(&target, "export const z = 3;").unwrap();
let candidate = dir.path().join("plain");
let resolved =
resolve_file_or_index(&candidate).expect("'./plain' should resolve to plain.ts");
assert!(resolved.ends_with("plain.ts"), "got {resolved:?}");
}
/// A non-JS asset import (e.g. `./image.css`) with no matching source on
/// disk must NOT spuriously resolve to an unrelated file.
#[test]
fn does_not_spuriously_resolve_unrelated_asset() {
let dir = tempfile::tempdir().unwrap();
// A same-stem .ts exists, but a `.css` import is not a JS-family
// extension, so the NodeNext swap must not fire.
fs::write(dir.path().join("image.ts"), "x").unwrap();
let candidate = dir.path().join("image.css");
assert!(
resolve_file_or_index(&candidate).is_none(),
"a .css specifier must not resolve to image.ts"
);
}
}

View file

@ -121,6 +121,64 @@ fn try_lower_field_proj_chain(
Some((current, method))
}
/// Remove the *implicit* chain-root argument group that `build_call_args`
/// appends when decomposing a chained-receiver call (`a.b.m(...)`) into a
/// FieldProj chain plus a bare-method Call.
///
/// `build_call_args` adds any `info.taint.uses` entry not already in
/// `info.call.arg_uses` (and not the receiver) as a trailing "implicit" arg
/// group. For a chained callee `a.b.m`, the chain root `a` lands in that
/// implicit group; once the FieldProj chain carries it on the typed `receiver`
/// channel, re-listing it as a positional argument inflates arity and
/// double-taints, so it must be dropped.
///
/// The previous implementation used `retain` to drop *every* group equal to
/// `[base_v]`, which also deleted a **genuine** positional argument that
/// happens to be the chain root identifier — e.g. the second `a` in
/// `a.b.m(p, a)` or `this` in `this.logger.log(this)`. That silently lost the
/// taint flowing through that argument into the callee and shifted later arg
/// positions left.
///
/// This helper fixes that by:
/// 1. Skipping the strip entirely when the chain root is a genuine
/// positional argument (present in `arg_uses`): there is no implicit
/// group to remove in that case, only the real argument.
/// 2. Otherwise removing at most ONE matching group — the trailing implicit
/// one appended by `build_call_args` — so a coincidental earlier
/// `[base_v]` argument still survives.
fn strip_implicit_chain_root(
callee: &str,
info: &crate::cfg::NodeInfo,
var_stacks: &HashMap<String, Vec<SsaValue>>,
args: &mut Vec<SmallVec<[SsaValue; 2]>>,
) {
let Some(base_ident) = callee.split('.').next() else {
return;
};
let Some(&base_v) = var_stacks.get(base_ident).and_then(|s| s.last()) else {
return;
};
// If the chain root is itself a genuine positional argument, the matching
// `[base_v]` group is that real argument — `build_call_args` did NOT append
// an implicit chain-root group (the root is already in `arg_uses`). Keep it.
let root_is_real_arg = info
.call
.arg_uses
.iter()
.any(|grp| grp.iter().any(|ident| ident.as_str() == base_ident));
if root_is_real_arg {
return;
}
// Remove only the last matching group (the appended implicit chain-root
// group), preserving any earlier coincidental `[base_v]` argument.
if let Some(pos) = args
.iter()
.rposition(|grp| grp.len() == 1 && grp.first() == Some(&base_v))
{
args.remove(pos);
}
}
/// Lower a CFG to SSA form for a single function scope.
///
/// `scope` filters nodes by `enclosing_func`:
@ -341,8 +399,9 @@ fn lower_to_ssa_inner(
// listed as an exception target indicates a CFG construction bug. Debug
// builds panic loudly; release builds warn, record an engine note so
// downstream findings carry "SSA lowering bailed" provenance, and fall
// through to the existing orphan handling above (the "all definitions"
// fallback) which remains sound for taint reachability.
// through to the existing orphan handling above, which lowers orphan
// (catch) subtrees with a locally-built dominator tree and seeds them from
// the most entry-dominating reaching definitions.
check_catch_block_reachability_gated(&body);
Ok(body)
@ -352,9 +411,11 @@ fn lower_to_ssa_inner(
/// debug builds and warns + records an engine note in release builds.
///
/// The current lowering's orphan handling (`process_block` fallback in
/// `rename_variables`) already widens to an "all definitions" conservative
/// state for blocks without predecessors. That preserves soundness for
/// taint reachability but masks CFG-builder bugs: this gate surfaces them.
/// `rename_variables`) lowers orphan (catch) subtrees via a locally-built
/// dominator tree and seeds them from the most entry-dominating reaching
/// definitions, so blocks without predecessors are still renamed and
/// populated. That keeps catch-block analysis usable but masks CFG-builder
/// bugs: this gate surfaces them.
fn check_catch_block_reachability_gated(body: &SsaBody) {
let result = super::invariants::check_catch_block_reachability(body);
if let Err(err) = result {
@ -923,6 +984,100 @@ fn build_dom_tree_children(
children
}
/// Blocks reachable from the entry block (block 0) over the block-level
/// successor graph. Blocks NOT in this set are "orphan domain" — catch blocks
/// (and everything they dominate) that became unreachable once exception edges
/// were stripped before SSA lowering.
fn compute_entry_reachable(num_blocks: usize, block_succs: &[Vec<usize>]) -> Vec<bool> {
let mut reachable = vec![false; num_blocks];
if num_blocks == 0 {
return reachable;
}
let mut stack = vec![0usize];
reachable[0] = true;
while let Some(b) = stack.pop() {
for &s in &block_succs[b] {
if !reachable[s] {
reachable[s] = true;
stack.push(s);
}
}
}
reachable
}
/// Build a dominator-tree children list restricted to the orphan domain.
///
/// The main `simple_fast` walk is rooted at block 0 and never reaches orphan
/// blocks (catch entries + their internal subtree), so they have no entry in
/// the global dominator tree and their internal successors are never renamed.
/// Here we construct a fresh block graph containing a virtual super-root that
/// reaches every orphan *entry* (an orphan-domain block with no predecessors),
/// plus the orphan-domain blocks and the edges among them. Edges that leave
/// the orphan domain (e.g. a catch arm flowing into the entry-reachable
/// post-`try` join) are dropped: those target blocks were already processed by
/// the main walk and act as boundaries. Running `simple_fast` over this graph
/// yields a dominator tree whose `children[entry]` lists the catch's internal
/// dominator-tree successors, letting `process_block` recurse through the whole
/// handler body.
///
/// Returned vector is indexed by real block id (length `num_blocks`); entries
/// for entry-reachable blocks stay empty.
fn build_orphan_dom_tree_children(
num_blocks: usize,
block_succs: &[Vec<usize>],
block_preds: &[Vec<usize>],
entry_reachable: &[bool],
) -> Vec<Vec<usize>> {
let mut children: Vec<Vec<usize>> = vec![vec![]; num_blocks];
// Collect orphan-domain blocks (unreachable from entry).
let orphan_blocks: Vec<usize> = (0..num_blocks).filter(|&b| !entry_reachable[b]).collect();
if orphan_blocks.is_empty() {
return children;
}
// Build a graph: virtual root node + one node per orphan-domain block.
// Node weight stores the real block id; the virtual root uses a sentinel
// (u32::MAX) that maps to no real block.
let mut g: Graph<u32, ()> = Graph::new();
let root = g.add_node(u32::MAX);
let mut node_of_block: HashMap<usize, NodeIndex> = HashMap::new();
for &b in &orphan_blocks {
node_of_block.insert(b, g.add_node(b as u32));
}
for &b in &orphan_blocks {
let bn = node_of_block[&b];
// Orphan entries (no real predecessors) hang off the virtual root.
if block_preds[b].is_empty() {
g.add_edge(root, bn, ());
}
// Intra-orphan-domain edges. Targets that escape the orphan domain
// are boundaries (already processed) and are not added.
for &s in &block_succs[b] {
if let Some(&sn) = node_of_block.get(&s) {
g.add_edge(bn, sn, ());
}
}
}
let doms = simple_fast(&g, root);
for &b in &orphan_blocks {
let bn = node_of_block[&b];
if let Some(idom) = doms.immediate_dominator(bn) {
// Skip blocks whose immediate dominator is the virtual root: these
// are the orphan entries, processed directly by the caller's loop.
let idom_w = g[idom];
if idom_w != u32::MAX {
children[idom_w as usize].push(b);
}
}
}
children
}
/// Rename variables: dominator tree preorder walk with per-variable stacks.
///
/// Returns (ssa_blocks, value_defs, cfg_node_map).
@ -1161,16 +1316,22 @@ fn rename_variables(
) {
Some((recv_v, bare_method)) => {
receiver = Some(recv_v);
// Strip any positional arg group that exactly matches the
// chain root identifier, it has been replaced by the
// FieldProj chain receiver, and re-listing it as an
// Strip the *implicit* chain-root arg group that
// `build_call_args` appends for `taint.uses` not present
// in `arg_uses`: the chain root has been replaced by the
// FieldProj chain receiver, so re-listing it as an
// argument would inflate arity / double-taint.
if let Some(base_ident) = callee.split('.').next() {
if let Some(base_v) = var_stacks.get(base_ident).and_then(|s| s.last())
{
args.retain(|grp| !(grp.len() == 1 && grp.first() == Some(base_v)));
}
}
//
// Only do this when the chain root is NOT a genuine
// positional argument. For `a.b.m(p, a)` the root `a`
// is a real second argument (present in `arg_uses`); the
// `[a]` group is that argument, not the implicit
// chain-root group, and must be kept — otherwise taint
// flowing through it into the callee is silently lost.
// Also remove at most one matching group (the appended
// implicit one) so a coincidental earlier `[base_v]`
// positional arg survives.
strip_implicit_chain_root(&callee, info, var_stacks, &mut args);
(bare_method, Some(callee.clone()))
}
None => (callee, None),
@ -1306,12 +1467,10 @@ fn rename_variables(
) {
Some((recv_v, bare_method)) => {
receiver = Some(recv_v);
if let Some(base_ident) = callee.split('.').next() {
if let Some(base_v) = var_stacks.get(base_ident).and_then(|s| s.last())
{
args.retain(|grp| !(grp.len() == 1 && grp.first() == Some(base_v)));
}
}
// Same implicit-chain-root strip as the primary Call
// branch above; keep a genuine positional arg equal to
// the chain root. See `strip_implicit_chain_root`.
strip_implicit_chain_root(&callee, info, var_stacks, &mut args);
(bare_method, Some(callee.clone()))
}
None => (callee, None),
@ -2179,24 +2338,71 @@ fn rename_variables(
// Process orphan blocks (e.g. catch blocks disconnected after exception edge removal).
// These blocks have no predecessors and weren't reached by the dominator tree walk.
//
// Rebuild var_stacks from already-processed instructions so that catch blocks
// can reference variables defined before the try block (e.g. `userInput`).
let has_orphans =
(1..num_blocks).any(|bid| block_preds[bid].is_empty() && ssa_blocks[bid].body.is_empty());
// An "orphan domain" is the set of blocks unreachable from the entry block
// after exception-edge stripping — the catch entry plus everything it
// dominates (internal `if`/`for`/`while`/`switch`/`try` blocks). The main
// `simple_fast` dominator walk is rooted at block 0 and never reaches these
// blocks, so their `immediate_dominator` is `None` and
// `build_dom_tree_children` links neither the catch entry nor its internal
// subtree. Processing only the catch *entry* (which has empty preds) via
// the global `dom_tree_children` therefore leaves every catch-internal
// block body empty — dropping all Source/Sink/Assign instructions past the
// catch's first basic block (a soundness false-negative). To fix this we
// build a *local* dominator tree over the orphan domain (rooted at a
// virtual super-root that reaches every orphan entry) and recurse through
// it, so the whole catch subtree is renamed and populated.
let entry_reachable = compute_entry_reachable(num_blocks, block_succs);
let has_orphans = (1..num_blocks).any(|bid| !entry_reachable[bid]);
if has_orphans {
// Rebuild var_stacks from all SSA instructions created during the main walk.
// This gives orphan blocks access to all variable definitions.
// Seed var_stacks with the definitions that actually *reach* the catch.
//
// The previous implementation rebuilt var_stacks from *all* blocks in
// ascending block-id order and took `.last()`, i.e. the highest-block-id
// def. For a variable defined before the `try` (block 0) and re-killed
// *after* the `try`/`catch` join (a later, higher-id block — e.g.
// `x = "safe"` post-catch), `.last()` picked that post-join kill, which
// does NOT reach the handler. A catch-side `sink(x)` then resolved to
// the killed constant instead of the pre-`try` (possibly tainted) value
// — a false negative. (The doc comment further up claiming this is a
// sound "all definitions" widening was inaccurate.)
//
// We restrict the rebuild to *entry-reachable* blocks (orphan-domain
// blocks are renamed below, not used as a seed) and iterate them in
// DESCENDING block-id order so `.last()` lands on the LOWEST-id — i.e.
// most entry-dominating — definition of each variable. Block 0 (which
// dominates every block, including the catch) therefore wins over any
// post-join reassignment, fixing the reported false negative, while a
// variable defined only inside the `try` body still keeps a `try`-side
// definition (rather than being dropped), avoiding a new false negative.
// Exact reaching-definition resolution for variables redefined along the
// protected region would require the pre-strip exception-edge structure,
// which isn't available here; this ordering is the conservative
// approximation.
var_stacks.clear();
for block in &ssa_blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
for bid in (0..num_blocks).rev() {
if !entry_reachable[bid] {
continue;
}
for inst in ssa_blocks[bid]
.phis
.iter()
.chain(ssa_blocks[bid].body.iter())
{
if let Some(ref name) = inst.var_name {
var_stacks.entry(name.clone()).or_default().push(inst.value);
}
}
}
// Build a local dominator tree over the orphan domain so the catch
// entry's internal successors are reached during renaming.
let orphan_dom_children =
build_orphan_dom_tree_children(num_blocks, block_succs, block_preds, &entry_reachable);
for bid in 1..num_blocks {
if block_preds[bid].is_empty() && ssa_blocks[bid].body.is_empty() {
// Orphan *entries* are orphan-domain blocks with no predecessors
// (their only inbound edges were exception edges, now stripped).
if !entry_reachable[bid] && block_preds[bid].is_empty() {
process_block(
bid,
cfg,
@ -2204,7 +2410,7 @@ fn rename_variables(
block_succs,
block_preds,
phi_placements,
dom_tree_children,
&orphan_dom_children,
filtered_edges,
&mut var_stacks,
&mut ssa_blocks,
@ -2780,6 +2986,208 @@ mod tests {
assert!(!ssa.blocks.is_empty());
}
#[test]
fn orphan_catch_with_internal_branch_populates_subtree() {
// Regression for the soundness hole where a catch block containing
// internal control flow (`catch(e){ if(cond){ sink(x) } else { y=2 } }`)
// dropped every instruction past the catch's first basic block: the
// catch-internal arms were orphan-domain (unreachable from entry) AND
// absent from the global dominator tree, so their bodies stayed empty
// and `sink(x)` was silently lost.
//
// Layout:
// entry → body(defines x) → exit (normal flow)
// body --Exception--> catch(e) → if → [True: sink(x)] [False: y=2] → join → exit
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
let body = cfg.add_node(NodeInfo {
taint: TaintMeta {
defines: Some("x".into()),
..Default::default()
},
..make_node(StmtKind::Seq)
});
let catch = cfg.add_node(NodeInfo {
catch_param: true,
taint: TaintMeta {
defines: Some("e".into()),
..Default::default()
},
..make_node(StmtKind::Seq)
});
let if_node = cfg.add_node(make_node(StmtKind::If));
let sink = cfg.add_node(NodeInfo {
call: crate::cfg::CallMeta {
callee: Some("sink".into()),
arg_uses: vec![vec!["x".into()]],
..Default::default()
},
taint: TaintMeta {
uses: vec!["x".into()],
..Default::default()
},
..make_node(StmtKind::Seq)
});
let else_assign = cfg.add_node(NodeInfo {
taint: TaintMeta {
defines: Some("y".into()),
..Default::default()
},
..make_node(StmtKind::Seq)
});
let join = cfg.add_node(make_node(StmtKind::Seq));
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, body, EdgeKind::Seq);
cfg.add_edge(body, exit, EdgeKind::Seq);
cfg.add_edge(body, catch, EdgeKind::Exception);
cfg.add_edge(catch, if_node, EdgeKind::Seq);
cfg.add_edge(if_node, sink, EdgeKind::True);
cfg.add_edge(if_node, else_assign, EdgeKind::False);
cfg.add_edge(sink, join, EdgeKind::Seq);
cfg.add_edge(else_assign, join, EdgeKind::Seq);
cfg.add_edge(join, exit, EdgeKind::Seq);
let ssa = lower_to_ssa(&cfg, entry, None, true).unwrap();
// The `sink` Call must survive lowering: before the fix the
// catch-internal True arm was an empty `unreachable` block.
let has_sink_call = ssa.blocks.iter().any(|b| {
b.body
.iter()
.any(|inst| matches!(&inst.op, SsaOp::Call { callee, .. } if callee == "sink"))
});
assert!(
has_sink_call,
"catch-internal sink(x) call was dropped; orphan subtree not populated"
);
}
#[test]
fn orphan_catch_resolves_pre_try_def_not_post_join_kill() {
// Regression for the reaching-definition hole where the orphan
// var_stacks rebuild iterated all blocks in ASCENDING block-id order
// and took `.last()`, picking the highest-block-id def of each variable
// instead of the one that actually reaches the catch. For `x` sourced
// before the try (block 0) and re-killed to a constant AFTER the
// try/catch join (a later, higher-id block), `.last()` wrongly resolved
// a catch-side `sink(x)` to the post-join constant — masking the
// pre-try tainted value (a false negative).
//
// Layout (mirrors `x = source(); try{..}catch(e){ sink(x) }; x="safe"`,
// where the post-try kill lands at a control-flow join in a later
// block than the pre-try source):
// entry → src(defines x = Source) → {b1, b2} → kill(x="safe") → exit
// src --Exception--> catch(e) → sink(x) (orphan)
// `src` branches (b1/b2) so it ends block 0; the `kill` join is a
// higher-id block. The buggy ascending+`.last()` rebuild seeded the
// catch with the Const kill; the fixed descending rebuild seeds the
// most entry-dominating def (the block-0 Source). A degenerate
// straight-line `src→kill` (single coalesced block) is a separate,
// documented limitation — exact within-block reaching defs need the
// pre-strip exception-point structure, unavailable post-lowering.
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
// Pure source (no callee) so it lowers to `SsaOp::Source` and defines x.
let src = cfg.add_node(NodeInfo {
taint: TaintMeta {
labels: smallvec::smallvec![crate::labels::DataLabel::Source(
crate::labels::Cap::all()
)],
defines: Some("x".into()),
..Default::default()
},
..make_node(StmtKind::Seq)
});
// Diamond arms so `src` is a branch point (block boundary) and the
// `kill` join below starts a strictly-higher-id block.
let b1 = cfg.add_node(make_node(StmtKind::Seq));
let b2 = cfg.add_node(make_node(StmtKind::Seq));
// Post-join kill: defines x from a literal, no uses, not a source →
// lowers to `SsaOp::Const`. Higher block id than `src`.
let kill = cfg.add_node(NodeInfo {
taint: TaintMeta {
defines: Some("x".into()),
const_text: Some("safe".into()),
..Default::default()
},
..make_node(StmtKind::Seq)
});
let catch = cfg.add_node(NodeInfo {
catch_param: true,
taint: TaintMeta {
defines: Some("e".into()),
..Default::default()
},
..make_node(StmtKind::Seq)
});
let sink = cfg.add_node(NodeInfo {
call: crate::cfg::CallMeta {
callee: Some("sink".into()),
arg_uses: vec![vec!["x".into()]],
..Default::default()
},
taint: TaintMeta {
uses: vec!["x".into()],
..Default::default()
},
..make_node(StmtKind::Seq)
});
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, src, EdgeKind::Seq);
cfg.add_edge(src, b1, EdgeKind::Seq);
cfg.add_edge(src, b2, EdgeKind::Seq);
cfg.add_edge(b1, kill, EdgeKind::Seq);
cfg.add_edge(b2, kill, EdgeKind::Seq);
cfg.add_edge(kill, exit, EdgeKind::Seq);
cfg.add_edge(src, catch, EdgeKind::Exception);
cfg.add_edge(catch, sink, EdgeKind::Seq);
cfg.add_edge(sink, exit, EdgeKind::Seq);
let ssa = lower_to_ssa(&cfg, entry, None, true).unwrap();
// SsaValue of the Source def (the pre-try value that reaches the catch)
// and of the Const kill (the post-join value that does NOT reach it).
let source_v = ssa
.blocks
.iter()
.flat_map(|b| b.body.iter())
.find(|inst| matches!(inst.op, SsaOp::Source))
.map(|inst| inst.value)
.expect("Source def must be present");
let kill_v = ssa
.blocks
.iter()
.flat_map(|b| b.body.iter())
.find(|inst| matches!(inst.op, SsaOp::Const(_)))
.map(|inst| inst.value)
.expect("Const kill must be present");
// Find the sink Call and its single argument value.
let sink_arg = ssa
.blocks
.iter()
.flat_map(|b| b.body.iter())
.find_map(|inst| match &inst.op {
SsaOp::Call { callee, args, .. } if callee == "sink" => {
args.iter().flat_map(|g| g.iter().copied()).next()
}
_ => None,
})
.expect("sink(x) Call with an argument must be present");
assert_eq!(
sink_arg, source_v,
"catch sink(x) must resolve to the pre-try Source def {source_v:?} \
that reaches the handler, not the post-join Const kill {kill_v:?}"
);
assert_ne!(
sink_arg, kill_v,
"catch sink(x) wrongly resolved to the post-join Const kill"
);
}
#[test]
fn phi_operand_count_equals_pred_count_in_diamond() {
// Specific test: phi operands == predecessor count (not just <=)
@ -3988,6 +4396,66 @@ mod tests {
);
}
/// Collect the arg groups of the first Call whose bare callee matches.
fn call_args_for(body: &SsaBody, bare_callee: &str) -> Option<Vec<SmallVec<[SsaValue; 2]>>> {
for blk in &body.blocks {
for inst in blk.body.iter() {
if let SsaOp::Call { callee, args, .. } = &inst.op {
if callee == bare_callee {
return Some(args.clone());
}
}
}
}
None
}
#[test]
fn phase2_e2e_chain_root_genuine_positional_arg_is_kept() {
// Regression for the soundness hole where decomposing a chained-receiver
// call stripped a *genuine* positional argument equal to the chain root.
// For `a.b.m(p, a)` the second `a` is a real argument, not the implicit
// chain-root group `build_call_args` appends, so it must survive
// decomposition. Previously `retain(... == [base_v])` deleted every
// `[a]` group, losing the argument (and shifting later positions left).
//
// Go: `a` and `p` are parameters so the chain root resolves.
let with_arg = parse_to_first_body(
b"package p\nfunc f(a *T, p string) { a.b.m(p, a) }\n",
"go",
tree_sitter::Language::from(tree_sitter_go::LANGUAGE),
"with.go",
);
let control = parse_to_first_body(
b"package p\nfunc f(a *T, p string) { a.b.m(p) }\n",
"go",
tree_sitter::Language::from(tree_sitter_go::LANGUAGE),
"ctrl.go",
);
let with_args = call_args_for(&with_arg, "m").expect("decomposed m() call present");
let ctrl_args = call_args_for(&control, "m").expect("decomposed m() call present");
// The soundness invariant: adding a genuine positional argument `a`
// (which equals the chain root) must add EXACTLY one arg group versus
// the control. The buggy `retain(... == [base_v])` deleted the genuine
// `a` group, so `with` would have had the SAME count as `control` (the
// argument silently vanished). The fix's `root_is_real_arg` guard keeps
// it, so `with` has one more group than `control`.
//
// We compare the two counts rather than asserting an absolute value:
// for Go's multi-segment field receiver the implicit chain-root group
// `build_call_args` appends is a multi-value group (`[a, a.b, ...]`),
// not a clean single `[base_v]`, so the absolute group count is an
// implementation detail — the *delta* is the property under test.
assert_eq!(
with_args.len(),
ctrl_args.len() + 1,
"a.b.m(p, a) must keep the genuine `a` argument: expected one more \
arg group than the control a.b.m(p).\n with: {with_args:?}\n ctrl: {ctrl_args:?}"
);
}
#[test]
fn phase2_e2e_python_chained_receiver_emits_field_proj() {
// Python: `obj.client.session.send(p)`, 3-segment receiver.

View file

@ -1339,7 +1339,16 @@ pub fn is_int_producing_callee(callee: &str) -> bool {
// Peel trailing identity methods (e.g. `.unwrap()`/`.expect("...")` after
// `.parse()`) so the underlying numeric-producing verb is exposed.
let base = peel_identity_suffix(callee);
let suffix = base.rsplit(['.', ':']).next().unwrap_or(&base);
// `peel_identity_suffix` normalizes a turbofish callee by truncating at
// `<`, which leaves a trailing `::` (e.g. `s.parse::<u32>()` →
// `s.parse::`). Trim trailing path separators so the method suffix is
// recovered (`parse`) instead of an empty segment.
// Trim the trailing path separators a turbofish callee leaves behind:
// `peel_identity_suffix` truncates at `<`, so `s.parse::<u32>()` →
// `s.parse::`. Trimming recovers the method suffix (`parse`) instead of
// an empty segment.
let trimmed = base.trim_end_matches([':', '.']);
let suffix = trimmed.rsplit(['.', ':']).next().unwrap_or(trimmed);
matches!(
suffix,
"parseInt" | "parseFloat" | "Number" // JS/TS
@ -1348,8 +1357,19 @@ pub fn is_int_producing_callee(callee: &str) -> bool {
| "Atoi" | "ParseInt" | "ParseFloat" // Go
| "intval" | "floatval" // PHP
| "to_i" | "to_f" // Ruby
| "parse" // Rust: `.parse::<N>()` / `.parse().unwrap()`, conservative
// (most Rust .parse() calls target numeric types)
| "parse" // Rust: `.parse::<N>()` / `let n: u32 = s.parse()?` ,
// conservative (most `str::parse` targets are numeric).
//
// KNOWN LIMITATION: the callee text alone cannot
// distinguish `parse::<u32>()` from `parse::<PathBuf>()`
// / `let p: PathBuf = s.parse()?`. Tagging every `parse`
// as `Int` keeps the tested precision behaviour
// (`type_facts_suppress_int_typed_shell_arg`: a u16 port
// into a shell arg is suppressed) at the cost of a rare
// false negative on `let p: PathBuf = s.parse()?;
// fs::read(p)`. Distinguishing the two soundly requires
// reading the let-binding type annotation into the type
// lattice (tracked as a deep-fix; NOT inferable here).
)
}
@ -1372,14 +1392,13 @@ pub fn is_int_producing_callee(callee: &str) -> bool {
/// `Byte.toString` / `Boolean.toString` / `Character.toString` ,
/// output is `[+-]?\d+(\.\d+)?` / `"true"` / `"false"` / `"NaN"` /
/// `"Infinity"`, none of which can carry CRLF or injection metachars.
/// * `String.valueOf` static factories , most overloads (`int`,
/// `long`, `boolean`, `char`, ...) emit the same digit / boolean /
/// single-character text as their per-class `toString`. The
/// `Object` overload falls back to `Object.toString()` whose output
/// shape depends on the runtime type, but the dominant safe usage
/// shape (`String.valueOf(payload.size())`,
/// `String.valueOf(rendered.length())`) covers the common
/// header-injection mitigation pattern.
/// * `String.valueOf` is deliberately **not** covered. Its `String` /
/// `Object` overloads return the argument verbatim (or its arbitrary
/// `toString()`), so `String.valueOf(req.getParameter("x"))` is an
/// identity passthrough — recognising it as a safe-string producer
/// silently suppressed real injections. The callee text alone cannot
/// distinguish the safe numeric overload from the identity form, so the
/// whole family is left unrecognised (soundness over precision).
/// * `Class.getName` / `Class.getSimpleName` / `Class.getCanonicalName`
/// , the JVM class-name grammar disallows CRLF, quotes, slashes,
/// spaces, and shell metacharacters; the dot-separated FQCN is safe
@ -1402,7 +1421,21 @@ pub fn is_safe_string_producing_callee(callee: &str) -> bool {
| "Character",
"toString",
) => return true,
("String", "valueOf") => return true,
// `String.valueOf` is NOT safe-by-construction: while the
// primitive overloads (`valueOf(int)` / `valueOf(boolean)` /
// `valueOf(char)`) emit digit / boolean / single-char text,
// `valueOf(String s)` returns `s` *verbatim* and the
// `valueOf(Object o)` overload returns `o.toString()` whose
// shape is arbitrary. The callee text (`String.valueOf`)
// carries no argument-type signal, so we cannot tell the safe
// numeric form from the identity form here. Tagging it as a
// safe-string producer suppressed real injections like
// `stmt.execute(String.valueOf(req.getParameter("x")))`
// (the suppression consumer `apply_arg_type_safe_suppression`
// drops the tainted arg at the sink). Erring toward soundness:
// `String.valueOf` is left unrecognised, so the numeric
// mitigation shape (`String.valueOf(payload.size())`) is no
// longer auto-suppressed but no injection is silently dropped.
("Class", "getName" | "getSimpleName" | "getCanonicalName") => return true,
_ => {}
}
@ -1552,7 +1585,21 @@ pub fn analyze_types_with_param_types(
}
SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object),
SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object),
SsaOp::Call { callee, args, .. } => {
SsaOp::Call {
callee,
callee_text,
args,
..
} => {
// For the Rust `.parse()` numeric-turbofish gate (see
// `is_int_producing_callee`) the discriminating
// `::<T>` lives in the *original* callee text, which SSA
// moves into `callee_text` when it decomposes a chained
// receiver (`x.parse::<u32>()` → `callee = "parse"`,
// `callee_text = Some("x.parse::<u32>")`). Prefer the
// full text so the turbofish survives; `callee` is the
// canonical fallback when no decomposition occurred.
let callee_for_int = callee_text.as_deref().unwrap_or(callee);
// CFG marks `Object.create(null)` (and future
// null-prototype constructors) at lowering time.
// Honour it ahead of generic constructor / arg-aware
@ -1588,7 +1635,7 @@ pub fn analyze_types_with_param_types(
lang.and_then(|l| arg_aware_call_type(l, callee, args, consts))
{
TypeFact::from_kind(ty)
} else if is_int_producing_callee(callee) {
} else if is_int_producing_callee(callee_for_int) {
TypeFact::from_kind(TypeKind::Int)
} else if is_safe_string_producing_callee(callee) {
// Numeric/boolean to-string converters and class-name
@ -3439,4 +3486,62 @@ mod tests {
"createQuery is overloaded — must not map at constructor_type level"
);
}
/// Audit #3: `String.valueOf` is an identity passthrough for its
/// `String`/`Object` overloads, so it must NOT be recognised as a
/// safe-string producer (which would suppress real injections like
/// `stmt.execute(String.valueOf(req.getParameter("x")))`). The
/// genuinely-safe numeric to-string converters and class-name
/// accessors are still recognised.
#[test]
fn string_valueof_not_safe_string_producer() {
// The defect: must be false now (was unconditionally true).
assert!(
!is_safe_string_producing_callee("String.valueOf"),
"String.valueOf is identity for String/Object overloads — unsound to treat as safe"
);
assert!(
!is_safe_string_producing_callee("java.lang.String.valueOf"),
"fully-qualified String.valueOf must also be unrecognised"
);
// Regression guard: the genuinely safe converters/accessors remain.
assert!(is_safe_string_producing_callee("Integer.toString"));
assert!(is_safe_string_producing_callee("Long.toString"));
assert!(is_safe_string_producing_callee("Boolean.toString"));
assert!(is_safe_string_producing_callee("Class.getName"));
assert!(is_safe_string_producing_callee("loaded.getClass().getName"));
}
/// Audit #4: bare Rust `.parse()` is generic over `FromStr`, but the
/// callee text cannot distinguish `parse::<u32>()` from
/// `parse::<PathBuf>()`. The engine deliberately tags every `parse` as
/// `Int` (precision: a u16 port into a shell arg is suppressed — pinned by
/// `cfg_analysis::tests::type_facts_suppress_int_typed_shell_arg`), at the
/// cost of a rare FN on `let p: PathBuf = s.parse()?`. Distinguishing the
/// two soundly needs let-annotation typing (deep-fix queue), not the
/// callee text. This test pins the chosen behaviour + cross-language
/// converters + turbofish-form robustness.
#[test]
fn rust_parse_is_int_producing_and_turbofish_forms_are_robust() {
// Bare parse and its identity-peeled / receiver-qualified forms — Int.
assert!(is_int_producing_callee("parse"));
assert!(is_int_producing_callee("s.parse"));
assert!(is_int_producing_callee("parse().unwrap()"));
// Turbofish forms survive `peel_identity_suffix` truncation at `<`
// (the trailing `::` is trimmed back to the `parse` suffix).
assert!(is_int_producing_callee("s.parse::<u32>()"));
assert!(is_int_producing_callee("s.parse::<i64>()"));
assert!(is_int_producing_callee(
"contents.trim().parse::<u32>().expect(\"invalid pid\")"
));
// Other languages' numeric converters remain unconditional.
assert!(is_int_producing_callee("parseInt"));
assert!(is_int_producing_callee("Number"));
assert!(is_int_producing_callee("Atoi"));
assert!(is_int_producing_callee("strconv.ParseInt"));
assert!(is_int_producing_callee("x.to_i"));
// Negative: a non-numeric-producing method must not be Int.
assert!(!is_int_producing_callee("toUpperCase"));
assert!(!is_int_producing_callee("trim"));
}
}

View file

@ -407,8 +407,23 @@ pub fn extract_findings(
if !lifecycle.contains(ResourceLifecycle::CLOSED) {
continue;
}
// Check if there are intervening Call nodes between acquire and release
// in the CFG (these could throw and bypass the release)
// Check if there are intervening Call nodes between acquire and
// release in the CFG (these could throw and bypass the release).
//
// NOTE: a stricter variant (audit #59) tried to exclude the
// resource's own lifecycle ops (the acquire/release proxy
// calls) and require reachability from the acquire node, to
// suppress spurious findings on correctly open/close-paired
// proxies. That over-suppressed a *tested* true positive: a
// class-field resource (`this.fd = fs.openSync(...)` in `open()`
// with `close()` in a separate method — see
// `tests/fixtures/real_world/typescript/cfg/try_catch_typed.ts`)
// has only its own acquire call in scope, so excluding it left
// zero intervening calls and dropped the must-match leak
// finding. Distinguishing a clean same-scope open/close pair
// from a cross-method field leak needs proper inter-method
// lifecycle modelling (deep-fix queue), so we keep the original
// span-based exclusion here.
let has_intervening_calls = cfg.node_references().any(|(_, ni)| {
ni.kind == StmtKind::Call
&& ni.ast.enclosing_func == info.ast.enclosing_func
@ -567,11 +582,23 @@ fn is_web_entrypoint_simple(
_ => &["request", "req"],
};
let has_web_params = func_summaries.values().any(|s| {
s.param_names
.iter()
.any(|p| web_params.contains(&p.to_ascii_lowercase().as_str()))
});
// Confirm web parameters against THIS candidate handler only, not any
// function in the file. Scanning every summary made an unrelated
// function's `req`/`r`/`ctx` parameter promote every
// `process_*`/`api_*`/`serve_*` function in the file to a web
// entrypoint, firing High-severity state-unauthed-access on batch/CLI
// code. Filter the file-level summary map down to the named function
// via `FuncKey.name` (matches `info.ast.enclosing_func`); summary
// `entry` NodeIndexes are not valid in the per-body CFG, so the name
// is the safe selector here.
let has_web_params = func_summaries
.iter()
.filter(|(key, _)| key.name == func_name)
.any(|(_, s)| {
s.param_names
.iter()
.any(|p| web_params.contains(&p.to_ascii_lowercase().as_str()))
});
// Only handle_* and route_* are strong enough to skip param confirmation.
// api_*, serve_*, process_* require web parameter evidence.
@ -916,4 +943,68 @@ mod tests {
);
assert_eq!(findings[0].rule_id, "state-resource-leak");
}
/// Finding #64: `is_web_entrypoint_simple` must confirm web parameters
/// against the *candidate* handler only, not any function in the file.
/// Before the fix, an unrelated `read_stream(req)` in the same file
/// promoted every `process_*` function to a web entrypoint, firing
/// High-severity `state-unauthed-access` on batch/CLI code.
#[test]
fn web_entrypoint_param_confirmation_is_per_function() {
use crate::cfg::LocalFuncSummary;
use crate::symbol::FuncKey;
use petgraph::graph::NodeIndex;
fn summary(name: &str, params: &[&str]) -> (FuncKey, LocalFuncSummary) {
let key = FuncKey::new_function(Lang::Python, "f.py", name, Some(params.len()));
let s = LocalFuncSummary {
entry: NodeIndex::new(0),
source_caps: Cap::empty(),
sanitizer_caps: Cap::empty(),
sink_caps: Cap::empty(),
param_count: params.len(),
param_names: params.iter().map(|p| p.to_string()).collect(),
propagating_params: Vec::new(),
tainted_sink_params: Vec::new(),
callees: Vec::new(),
container: String::new(),
disambig: None,
kind: crate::symbol::FuncKind::Function,
};
(key, s)
}
// `process_data` has NO web-like parameters; `read_stream` does.
let mut summaries: crate::cfg::FuncSummaries = HashMap::new();
let (k1, s1) = summary("process_data", &["data"]);
let (k2, s2) = summary("read_stream", &["req"]);
summaries.insert(k1, s1);
summaries.insert(k2, s2);
let cfg: Cfg = Graph::new();
// The unrelated `read_stream(req)` must NOT promote `process_data`.
assert!(
!is_web_entrypoint_simple("process_data", Lang::Python, &summaries, &cfg),
"process_data has no web params and must not be a web entrypoint just \
because an unrelated function in the file does"
);
// `read_stream` is not a handler-prefixed name, so even though it
// carries the `req` param it is NOT an entrypoint — confirms the
// name gate still stands independently of the param check.
assert!(
!is_web_entrypoint_simple("read_stream", Lang::Python, &summaries, &cfg),
"read_stream lacks a handler-prefixed name and must not be an entrypoint"
);
// Positive control: give `process_data` its own `req` param.
let mut summaries2: crate::cfg::FuncSummaries = HashMap::new();
let (k3, s3) = summary("process_data", &["req"]);
summaries2.insert(k3, s3);
assert!(
is_web_entrypoint_simple("process_data", Lang::Python, &summaries2, &cfg),
"process_data with its own web param must be a web entrypoint"
);
}
}

View file

@ -1641,13 +1641,34 @@ impl GlobalSummaries {
return CalleeResolution::NotFound;
}
let arity_filtered: Vec<&FuncKey> = all_candidates
let mut arity_filtered: Vec<&FuncKey> = all_candidates
.iter()
.copied()
.filter(|k| arity_matches(k))
.collect();
if arity_filtered.is_empty() {
return CalleeResolution::NotFound;
// Exact-arity match found nothing. Tolerate under-application:
// `FuncKey::arity` is the *total* parameter count, so a call
// supplying `a` arguments to a function declared with more
// parameters (the surplus being default-valued / optional) is a
// routine, valid call shape in Python, JS/TS, PHP, and Ruby. Retry
// with `param_count >= call_arity` so these calls still resolve.
//
// This only ever WIDENS the candidate set, and resolution below
// still requires a unique candidate, so a genuinely ambiguous name
// degrades to `Ambiguous`, never a wrong `Resolved`. Exact-arity
// matches always take precedence (this branch runs only when none
// exist), so no existing exact-match resolution regresses.
if let Some(a) = q.arity {
arity_filtered = all_candidates
.iter()
.copied()
.filter(|k| matches!(k.arity, Some(p) if p >= a))
.collect();
}
if arity_filtered.is_empty() {
return CalleeResolution::NotFound;
}
}
let same_ns: Vec<&FuncKey> = arity_filtered
@ -2033,14 +2054,42 @@ fn synthesize_ssa_disambig(summary: &SsaFuncSummary) -> u32 {
/// Merging only happens for exact `FuncKey` matches (same lang + namespace +
/// name + arity). Functions with the same bare name but different languages
/// or namespaces are stored separately.
///
/// This variant keys summaries via the plain [`FuncSummary::func_key`]
/// (`normalize_namespace`), so it is only safe for repos with no
/// package boundaries. The indexed scan path must use
/// [`merge_summaries_with_resolver`] so the loaded coarse summaries are
/// keyed with the same package-qualified namespaces that pass-1 SSA
/// summaries, cross-package import maps, and pass-2 refinements use.
pub fn merge_summaries(
per_file: impl IntoIterator<Item = FuncSummary>,
scan_root: Option<&str>,
) -> GlobalSummaries {
merge_summaries_with_resolver(per_file, scan_root, None)
}
/// Module-graph-aware variant of [`merge_summaries`].
///
/// Keys each summary via [`FuncSummary::func_key_with_resolver`], so a
/// file inside a discovered package gets a package-qualified namespace
/// (`"@scope/name::src/file.ts"`) instead of the plain
/// `normalize_namespace` form. This must match the keying convention
/// used by pass-1 SSA summaries (`namespace_with_package`) and pass-2
/// topo refinements (`func_key_with_resolver`); otherwise exact-key
/// joins between the coarse FuncSummary tier and the SSA tier miss, and
/// same-namespace narrowing in [`GlobalSummaries::resolve_callee`] never
/// matches the package-qualified caller namespace.
///
/// Passing `module_graph: None` is equivalent to [`merge_summaries`].
pub fn merge_summaries_with_resolver(
per_file: impl IntoIterator<Item = FuncSummary>,
scan_root: Option<&str>,
module_graph: Option<&crate::resolve::ModuleGraph>,
) -> GlobalSummaries {
let mut map = GlobalSummaries::new();
for fs in per_file {
let key = fs.func_key(scan_root);
let key = fs.func_key_with_resolver(scan_root, module_graph);
map.insert(key, fs);
}
@ -2049,3 +2098,127 @@ pub fn merge_summaries(
#[cfg(test)]
mod tests;
#[cfg(test)]
mod arity_leniency_tests {
use super::*;
use crate::symbol::{FuncKey, Lang};
fn py_func(name: &str, namespace: &str, param_count: usize) -> (FuncKey, FuncSummary) {
let key = FuncKey {
lang: Lang::Python,
namespace: namespace.into(),
name: name.into(),
arity: Some(param_count),
..Default::default()
};
let summary = FuncSummary {
name: name.into(),
file_path: namespace.into(),
lang: "python".into(),
param_count,
..Default::default()
};
(key, summary)
}
/// `run_cmd(cmd, opts=None)` (param_count 2) called as `run_cmd(user)`
/// (arity 1) must resolve via the under-application tolerance, not fall
/// through to NotFound.
#[test]
fn under_application_resolves_unique_defaulted_callee() {
let mut gs = GlobalSummaries::new();
let (key, summary) = py_func("run_cmd", "helper.py", 2);
gs.insert(key.clone(), summary);
let resolved = gs.resolve_callee(&CalleeQuery {
name: "run_cmd",
caller_lang: Lang::Python,
caller_namespace: "routes.py",
caller_container: None,
receiver_type: None,
namespace_qualifier: None,
receiver_var: None,
arity: Some(1),
});
assert_eq!(
resolved,
CalleeResolution::Resolved(key),
"under-applied unique callee with default params must resolve"
);
}
/// Exact-arity matches still take precedence: when both an exact-arity and
/// a higher-arity candidate exist, the exact one wins (no regression).
#[test]
fn exact_arity_match_preferred_over_lenient() {
let mut gs = GlobalSummaries::new();
let (exact_key, exact_sum) = py_func("run_cmd", "a.py", 1);
let (wide_key, wide_sum) = py_func("run_cmd", "b.py", 3);
gs.insert(exact_key.clone(), exact_sum);
gs.insert(wide_key, wide_sum);
let resolved = gs.resolve_callee(&CalleeQuery {
name: "run_cmd",
caller_lang: Lang::Python,
caller_namespace: "routes.py",
caller_container: None,
receiver_type: None,
namespace_qualifier: None,
receiver_var: None,
arity: Some(1),
});
// The exact-arity candidate (a.py, arity 1) is the sole exact match,
// so the lenient branch never runs and resolution is unambiguous.
assert_eq!(resolved, CalleeResolution::Resolved(exact_key));
}
/// Leniency only widens the candidate set; it never produces a wrong
/// Resolved. Two distinct higher-arity callees both tolerating the call
/// arity must degrade to Ambiguous, not a silent pick.
#[test]
fn under_application_ambiguous_when_multiple_candidates() {
let mut gs = GlobalSummaries::new();
let (k1, s1) = py_func("run_cmd", "a.py", 2);
let (k2, s2) = py_func("run_cmd", "b.py", 3);
gs.insert(k1, s1);
gs.insert(k2, s2);
let resolved = gs.resolve_callee(&CalleeQuery {
name: "run_cmd",
caller_lang: Lang::Python,
caller_namespace: "routes.py",
caller_container: None,
receiver_type: None,
namespace_qualifier: None,
receiver_var: None,
arity: Some(1),
});
assert!(
matches!(resolved, CalleeResolution::Ambiguous(_)),
"two under-applied candidates must be Ambiguous, not a wrong Resolved: {resolved:?}"
);
}
/// Over-application (more args than params) must NOT be tolerated: the
/// lenient predicate is `param_count >= call_arity`, so a 1-param function
/// called with 2 args still returns NotFound.
#[test]
fn over_application_not_tolerated() {
let mut gs = GlobalSummaries::new();
let (key, summary) = py_func("run_cmd", "helper.py", 1);
gs.insert(key, summary);
let resolved = gs.resolve_callee(&CalleeQuery {
name: "run_cmd",
caller_lang: Lang::Python,
caller_namespace: "routes.py",
caller_container: None,
receiver_type: None,
namespace_qualifier: None,
receiver_var: None,
arity: Some(2),
});
assert_eq!(resolved, CalleeResolution::NotFound);
}
}

View file

@ -401,7 +401,9 @@ fn run_path(
// Global step budget
if *total_steps >= MAX_TOTAL_STEPS {
*search_exhausted = false;
return Some(record_outcome(state, finding, ssa, cfg));
// Budget cut mid-walk: constraints beyond this point are unchecked,
// so feasibility is unproven → Inconclusive, not Confirmed.
return Some(record_cutoff(state, finding, ssa, cfg));
}
let block_id = state.current_block;
@ -454,8 +456,9 @@ fn run_path(
continue;
}
}
// Stuck (infinite loop / nested loops with no exit)
return Some(record_outcome(state, finding, ssa, cfg));
// Stuck (infinite loop / nested loops with no exit): the path never
// reached a terminal, so feasibility is unproven → Inconclusive.
return Some(record_cutoff(state, finding, ssa, cfg));
}
// Move exception context into sym_state before block transfer
@ -1114,6 +1117,30 @@ fn record_outcome(
}
}
/// Record the outcome for a path cut short by a budget/loop cutoff.
///
/// Unlike [`record_outcome`], the path did NOT reach a normal terminal: it was
/// abandoned mid-walk (global step budget exhausted, or a loop with no
/// reachable exit). Constraints beyond the cutoff point were never checked, so
/// feasibility is unproven and the verdict must be `Inconclusive` (which
/// contributes 0 to confidence), not `Confirmed`. A best-effort witness is
/// still extracted for diagnostics. Mirrors `analyse_finding_path`, which
/// already returns `Inconclusive` for the over-budget (`>MAX_PATH_BLOCKS`)
/// case.
fn record_cutoff(
state: &ExplorationState,
finding: &Finding,
ssa: &SsaBody,
cfg: &Cfg,
) -> PathOutcome {
let witness = try_extract_witness(state, finding, ssa, cfg);
PathOutcome {
verdict: Verdict::Inconclusive,
constraints_checked: state.constraints_checked,
witness,
}
}
/// Best-effort witness extraction from the current symbolic state.
///
/// Used by both `record_outcome` (Confirmed paths) and inconclusive exits
@ -1516,6 +1543,77 @@ mod tests {
assert_eq!(v.verdict, Verdict::Inconclusive);
}
#[test]
fn record_cutoff_is_inconclusive_not_confirmed() {
// A path abandoned at a budget/loop cutoff has unchecked constraints
// beyond the cutoff point, so its outcome must be Inconclusive (which
// contributes 0 to confidence), NOT Confirmed. record_outcome (used
// only at genuine terminal states) still yields Confirmed.
let n0 = NodeIndex::new(0);
let n1 = NodeIndex::new(1);
let b0 = BlockId(0);
let b1 = BlockId(1);
let ssa = SsaBody {
blocks: vec![
SsaBlock {
id: b0,
phis: vec![],
body: vec![],
terminator: Terminator::Goto(b1),
preds: smallvec![],
succs: smallvec![b1],
},
SsaBlock {
id: b1,
phis: vec![],
body: vec![],
terminator: Terminator::Return(None),
preds: smallvec![b0],
succs: smallvec![],
},
],
entry: b0,
value_defs: vec![make_value_def(b0, n0), make_value_def(b1, n1)],
cfg_node_map: HashMap::new(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
synthetic_externals: std::collections::HashSet::new(),
slot_scoped_assigns: std::collections::HashSet::new(),
};
let cfg = Cfg::new();
let finding = make_finding(n0, n1);
let state = ExplorationState {
sym_state: SymbolicState::new(),
env: constraint::PathEnv::empty(),
current_block: b0,
predecessor: None,
forks_used: 0,
steps_taken: 0,
constraints_checked: 3,
visit_counts: HashMap::new(),
exception_context: None,
};
let cutoff = record_cutoff(&state, &finding, &ssa, &cfg);
assert_eq!(cutoff.verdict, Verdict::Inconclusive);
assert_eq!(cutoff.constraints_checked, 3);
let terminal = record_outcome(&state, &finding, &ssa, &cfg);
assert_eq!(terminal.verdict, Verdict::Confirmed);
// A lone budget-cut path must NOT aggregate to Confirmed.
let result = ExplorationResult {
paths_completed: vec![cutoff],
paths_pruned: 1,
total_steps: MAX_TOTAL_STEPS,
search_exhausted: false,
interproc_findings: Vec::new(),
interproc_cutoffs: Vec::new(),
};
assert_eq!(result.aggregate_verdict().verdict, Verdict::Inconclusive);
}
#[test]
fn aggregate_empty_is_inconclusive() {
let result = ExplorationResult {

View file

@ -268,6 +268,38 @@ pub struct InterprocCtx<'a> {
pub caller_namespace: &'a str,
}
impl<'a> InterprocCtx<'a> {
/// Build a child context for resolving calls *inside* a frame.
///
/// All shared state (budgets, caches, reentry counts) is carried by
/// reference, so the child observes the same interior-mutable counters.
/// When `descended_cross_file` is true the frame was itself reached by a
/// cross-file body resolution, so `cross_file_depth` is bumped to enforce
/// `MAX_CROSS_FILE_DEPTH` on any further cross-file descents. Without this
/// the depth never increments and the guard at `execute_callee` is dead
/// (always `0 >= 1 == false`), allowing cross-file bodies to keep resolving
/// further cross-file bodies up to `max_depth` instead of the intended one
/// level.
fn child_for_nested(&self, descended_cross_file: bool) -> InterprocCtx<'a> {
InterprocCtx {
callee_bodies: self.callee_bodies,
cfg: self.cfg,
lang: self.lang,
max_depth: self.max_depth,
budget: self.budget,
cache: self.cache,
reentry_counts: self.reentry_counts,
max_reentry_per_func: self.max_reentry_per_func,
scc_membership: self.scc_membership,
max_scc_reentry: self.max_scc_reentry,
stats: self.stats,
cross_file_bodies: self.cross_file_bodies,
cross_file_depth: self.cross_file_depth + usize::from(descended_cross_file),
caller_namespace: self.caller_namespace,
}
}
}
/// Budget counters shared across all interprocedural frames for one finding.
#[derive(Clone, Copy, Debug)]
pub struct InterprocBudget {
@ -728,16 +760,48 @@ pub fn execute_callee(
let mut initial_state = SymbolicState::new();
initial_state.seed_from_const_values(&body.opt.const_values);
// Seed parameters: walk callee SSA for Param instructions
// Seed parameters: walk callee SSA for Param / SelfParam instructions.
//
// The caller (`transfer.rs` Call arm and `handle_nested_calls`) PREPENDS
// the method receiver into `arg_values` at index 0 whenever the call has a
// receiver. SSA lowering (`src/ssa/lower.rs`) emits that receiver as
// `SsaOp::SelfParam` and assigns `Param { index }` positions starting at 0
// to the non-receiver formals only. So when the callee body is a method
// (has a `SelfParam`), the positional formal `Param{index}` must be seeded
// from `arg_values[index + 1]` (skipping the receiver at slot 0), and the
// `SelfParam` itself from `arg_values[0]`. Free functions (no `SelfParam`)
// map `Param{index}` directly to `arg_values[index]`. This matches the
// taint engine's inline path, which builds `param_seed` from non-receiver
// args and carries receiver taint on a separate `receiver_seed` channel
// consumed by `SelfParam` (`src/taint/ssa_transfer/mod.rs`).
let has_self_param = body
.ssa
.blocks
.iter()
.flat_map(|block| block.phis.iter().chain(block.body.iter()))
.any(|inst| matches!(inst.op, SsaOp::SelfParam));
let param_offset = if has_self_param { 1 } else { 0 };
for block in &body.ssa.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
if let SsaOp::Param { index } = &inst.op {
if let Some((_, sym, tainted)) = arg_values.get(*index) {
initial_state.set(inst.value, sym.clone());
if *tainted {
initial_state.mark_tainted(inst.value);
match &inst.op {
SsaOp::Param { index } => {
if let Some((_, sym, tainted)) = arg_values.get(*index + param_offset) {
initial_state.set(inst.value, sym.clone());
if *tainted {
initial_state.mark_tainted(inst.value);
}
}
}
SsaOp::SelfParam => {
// Receiver was prepended at slot 0 by the caller.
if let Some((_, sym, tainted)) = arg_values.first() {
initial_state.set(inst.value, sym.clone());
if *tainted {
initial_state.mark_tainted(inst.value);
}
}
}
_ => {}
}
}
}
@ -751,6 +815,12 @@ pub fn execute_callee(
frame_chain.push(normalized.to_string());
// ─── Work-queue exploration (intra-callee forking) ────────
//
// Nested calls executed inside this frame go through a child context whose
// `cross_file_depth` is bumped when this frame was itself reached via a
// cross-file body. This keeps `MAX_CROSS_FILE_DEPTH` enforced across nested
// descents (the shared `ctx` would otherwise never increment the field).
let nested_ctx = ctx.child_for_nested(is_cross_file);
let mut exit_states: Vec<CalleeExitState> = Vec::new();
let mut internal_findings: Vec<InternalSinkFinding> = Vec::new();
@ -853,7 +923,7 @@ pub fn execute_callee(
// Handle nested calls
handle_nested_calls(
block,
ctx,
&nested_ctx,
&mut path.sym_state,
depth,
&frame_chain,
@ -1562,4 +1632,46 @@ mod tests {
assert_eq!(stats.cutoffs, 0);
assert_eq!(stats.forks, 0);
}
#[test]
fn child_for_nested_bumps_cross_file_depth_only_on_descent() {
// Backing state for the borrowed-by-reference InterprocCtx fields.
let bodies: HashMap<crate::symbol::FuncKey, CalleeSsaBody> = HashMap::new();
let cfg = Cfg::new();
let budget = Cell::new(InterprocBudget::new());
let cache = RefCell::new(InterprocCache::new());
let reentry = RefCell::new(HashMap::new());
let stats = Cell::new(InterprocStats::default());
let ctx = InterprocCtx {
callee_bodies: &bodies,
cfg: &cfg,
lang: Lang::JavaScript,
max_depth: DEFAULT_MAX_DEPTH,
budget: &budget,
cache: &cache,
reentry_counts: &reentry,
max_reentry_per_func: DEFAULT_MAX_REENTRY_PER_FUNC,
scc_membership: None,
max_scc_reentry: DEFAULT_MAX_SCC_REENTRY,
stats: &stats,
cross_file_bodies: None,
cross_file_depth: 0,
caller_namespace: "test.js",
};
// Intra-file descent leaves cross_file_depth untouched.
let intra = ctx.child_for_nested(false);
assert_eq!(intra.cross_file_depth, 0);
// A cross-file descent bumps the depth so the guard at execute_callee
// (>= MAX_CROSS_FILE_DEPTH) trips on the next cross-file step.
let xfile = ctx.child_for_nested(true);
assert_eq!(xfile.cross_file_depth, 1);
assert!(xfile.cross_file_depth >= MAX_CROSS_FILE_DEPTH);
// Nested cross-file from an already-cross-file frame keeps climbing.
let xfile2 = xfile.child_for_nested(true);
assert_eq!(xfile2.cross_file_depth, 2);
}
}

View file

@ -7,20 +7,32 @@
//!
//! This module implements the opposite direction: start at each sink value,
//! walk *reverse* SSA edges and (when needed) cross-file callee bodies on
//! demand, and emit a [`BackwardFlow`] when a source is reached or an
//! accumulated path predicate proves the flow infeasible.
//! demand, and emit a [`BackwardFlow`] when a source is reached.
//!
//! The analysis is additive:
//!
//! * When a forward finding's sink is confirmed by a backwards walk that
//! reaches a matching source, we append `backwards-confirmed` to the
//! finding's evidence notes.
//! * When the backwards walk proves the flow infeasible via accumulated
//! path predicates, we append `backwards-infeasible`, consumed by the
//! confidence scorer as a cap-to-Low signal.
//! * Backward flows that reach a source with no matching forward finding
//! become standalone `taint-backwards-flow` diags (a separate rule id so
//! existing graders can distinguish the two channels).
//! finding's evidence notes. **This is the only channel currently active.**
//!
//! ## Reserved / not-yet-implemented channels
//!
//! The scaffolding below exists but does not fire in production; it is kept so
//! the downstream consumers (confidence scorer, [`FindingVerdict`]) have a
//! stable shape to grow into. Do not build new behaviour on the assumption
//! that these signals can be produced by a real walk:
//!
//! * **Infeasibility.** [`DemandState::validated_true`] /
//! [`DemandState::validated_false`] are never written by the transfer (no
//! reverse predicate accumulation is implemented yet), so
//! [`BackwardFlow::infeasible`] is never set, [`aggregate_verdict`] never
//! returns [`FindingVerdict::Infeasible`], and the `backwards-infeasible`
//! note is never appended. Implementing this requires reading branch
//! [`crate::taint::domain::PredicateSummary`] bits off the reverse-dominating
//! conditions, which is future work.
//! * **Standalone diags.** Backward flows that reach a source with no matching
//! forward finding are *not* yet surfaced as standalone `taint-backwards-flow`
//! diags; that rule id is reserved.
//!
//! The feature is gated by
//! [`crate::utils::analysis_options::AnalysisOptions::backwards_analysis`]
@ -76,8 +88,15 @@ pub struct DemandState {
/// [`crate::taint::domain::predicate_kind_bit`]; bit `i` set means the
/// corresponding `PredicateKind` was observed as holding on every
/// predecessor visited so far.
///
/// **Reserved / not yet written.** The current backward transfer does not
/// accumulate reverse path predicates, so this field is always 0 in
/// production. See the module-level "Reserved / not-yet-implemented
/// channels" note.
pub validated_true: u8,
/// Counterpart to [`Self::validated_true`] for known-false predicates.
///
/// **Reserved / not yet written** (always 0 in production).
pub validated_false: u8,
/// Number of cross-function inline expansions performed along this walk.
pub depth: u32,
@ -111,6 +130,10 @@ pub struct BackwardFlow {
pub source_node: Option<NodeIndex>,
/// Set when the accumulated predicates proved the flow infeasible before
/// reaching any source.
///
/// **Reserved / never set in production.** Reverse predicate accumulation
/// is not yet implemented (see [`DemandState::validated_true`] and the
/// module-level note), so every production flow leaves this `false`.
pub infeasible: bool,
/// Set when the walk hit [`BACKWARDS_VALUE_BUDGET`] without terminating.
pub budget_exhausted: bool,
@ -517,11 +540,25 @@ fn walk_dfs(
// the key in the matcher, the key is useful for debug
// logging in bigger expansions.
let _ = callee_key;
return;
// Intentionally fall through to the conservative arg/receiver
// fanout below. Walking only the callee's return chains is
// strictly lossier than not resolving the callee at all: a
// passthrough `return param` inside the callee bottoms out at
// a `ReachedParam` terminal that is never mapped back to the
// caller's argument, so `sink(identity(tainted))` would record
// a dead-end param terminal and miss the source, whereas an
// unresolvable callee would have taken the fanout and reached
// it. Running the fanout in addition keeps callee expansion
// additive (it can only add confirmations, never drop them);
// `aggregate_verdict` only needs one confirmation, and the
// shared `visited` set prevents re-expanding values the callee
// walk already covered in the caller frame.
}
// Fall-through: no resolvable body. Conservatively fan out to
// every operand / receiver so a source reachable through the
// call arguments is still observed.
// Conservatively fan out to every operand / receiver so a source
// reachable through the call arguments is still observed. Runs
// for both the resolvable-callee case (after return-chain
// expansion, to recover argument-passthrough flows) and the
// unresolvable case (sole channel).
for (operand, next_demand) in next {
chain.push(operand);
walk_dfs(
@ -564,18 +601,55 @@ fn resolve_callee_body<'a>(
.rsplit('.')
.next()
.unwrap_or(callee);
if let Some(map) = ctx.intra_file_bodies {
// Pick the deterministic, language-matched best candidate among same-leaf
// entries. Bare-leaf matching over a `HashMap` is both unsound (a
// same-name sibling from another class/file/language can be picked, the
// exact hazard the forward inline path refuses bare-name lookup for) and
// nondeterministic (HashMap iteration order varies run to run, so the
// resolved body, hence the `backwards-confirmed` note and the confidence
// score, would differ across identical runs). We cannot reconstruct the
// call-site's container/arity here (only the textual callee is in hand),
// but we can at least exclude cross-language siblings and fix a stable
// tie-break so resolution is reproducible.
fn best_candidate<'b>(
map: &'b HashMap<FuncKey, CalleeSsaBody>,
leaf: &str,
lang: Lang,
) -> Option<(&'b CalleeSsaBody, FuncKey)> {
let mut best: Option<(&FuncKey, &CalleeSsaBody)> = None;
for (key, body) in map.iter() {
if key.name == leaf && body.ssa.blocks.len() <= MAX_BACKWARDS_CALLEE_BLOCKS {
return Some((body, key.clone()));
if key.name != leaf
|| key.lang != lang
|| body.ssa.blocks.len() > MAX_BACKWARDS_CALLEE_BLOCKS
{
continue;
}
let better = match best {
None => true,
// Deterministic tie-break: smallest by the structural fields
// that distinguish same-leaf siblings. Independent of
// HashMap iteration order.
Some((bk, _)) => {
(&key.namespace, &key.container, &key.arity, &key.disambig)
< (&bk.namespace, &bk.container, &bk.arity, &bk.disambig)
}
};
if better {
best = Some((key, body));
}
}
best.map(|(k, b)| (b, k.clone()))
}
if let Some(map) = ctx.intra_file_bodies {
if let Some(found) = best_candidate(map, leaf, ctx.lang) {
return Some(found);
}
}
if let Some(map) = ctx.global_summaries.and_then(|gs| gs.bodies_by_key()) {
for (key, body) in map.iter() {
if key.name == leaf && body.ssa.blocks.len() <= MAX_BACKWARDS_CALLEE_BLOCKS {
return Some((body, key.clone()));
}
if let Some(found) = best_candidate(map, leaf, ctx.lang) {
return Some(found);
}
}
None
@ -1189,4 +1263,211 @@ mod tests {
};
assert!(!bf.is_confirmation());
}
/// Build a passthrough callee body `fn identity(p0) { return p0 }` wrapped
/// in a [`CalleeSsaBody`] ready for [`BackwardsCtx::intra_file_bodies`].
fn build_passthrough_callee() -> CalleeSsaBody {
let mut cfg: Graph<NodeInfo, EdgeKind> = Graph::new();
let p_node = cfg.add_node(NodeInfo::default());
let mut ssa = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Param { index: 0 },
cfg_node: p_node,
var_name: None,
span: (0, 0),
}],
terminator: Terminator::Return(Some(SsaValue(0))),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![make_value_def(BlockId(0), p_node)],
cfg_node_map: std::collections::HashMap::new(),
exception_edges: Vec::new(),
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
synthetic_externals: std::collections::HashSet::new(),
slot_scoped_assigns: std::collections::HashSet::new(),
};
let opt = crate::ssa::optimize_ssa(&mut ssa, &cfg, Some(Lang::JavaScript));
CalleeSsaBody {
ssa,
opt,
param_count: 1,
node_meta: std::collections::HashMap::new(),
body_graph: Some(cfg),
cross_package_imports: std::sync::Arc::new(std::collections::HashMap::new()),
}
}
/// Resolving a passthrough callee (`sink(identity(tainted))`) must not be
/// strictly lossier than leaving it unresolved. The callee's
/// `return param` chain bottoms out at a `ReachedParam` terminal that the
/// walk cannot map back to the caller's argument, so without the
/// post-expansion conservative arg fanout the source is missed entirely.
/// After the fix, the fanout walks the call argument (the `Source`) and the
/// flow is confirmed.
#[test]
fn resolved_passthrough_callee_still_confirms_via_arg_fanout() {
// Caller body: v0 = Source; v1 = identity(v0); return v1.
let mut cfg: Graph<NodeInfo, EdgeKind> = Graph::new();
let src_node = cfg.add_node(NodeInfo::default());
let call_node = cfg.add_node(NodeInfo::default());
let ssa = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Source,
cfg_node: src_node,
var_name: None,
span: (0, 0),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Call {
callee: "identity".to_string(),
callee_text: None,
args: vec![smallvec![SsaValue(0)]],
receiver: None,
},
cfg_node: call_node,
var_name: None,
span: (0, 0),
},
],
terminator: Terminator::Return(Some(SsaValue(1))),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
make_value_def(BlockId(0), src_node),
make_value_def(BlockId(0), call_node),
],
cfg_node_map: std::collections::HashMap::new(),
exception_edges: Vec::new(),
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
synthetic_externals: std::collections::HashSet::new(),
slot_scoped_assigns: std::collections::HashSet::new(),
};
let mut bodies: HashMap<FuncKey, CalleeSsaBody> = HashMap::new();
bodies.insert(
FuncKey::new_function(Lang::JavaScript, "test.js", "identity", Some(1)),
build_passthrough_callee(),
);
let ctx = BackwardsCtx {
ssa: &ssa,
cfg: &cfg,
lang: Lang::JavaScript,
global_summaries: None,
intra_file_bodies: Some(&bodies),
depth_budget: DEFAULT_BACKWARDS_DEPTH,
};
let flows = analyse_sink_backwards(&ctx, SsaValue(1), call_node, Cap::all());
assert!(
flows.iter().any(|f| f.is_confirmation()),
"argument-passthrough source must still be confirmed after callee expansion"
);
}
/// `resolve_callee_body` must only match same-language siblings and pick a
/// deterministic candidate, never an arbitrary HashMap-order entry from
/// another language.
#[test]
fn resolve_callee_body_filters_language_and_is_deterministic() {
let mut bodies: HashMap<FuncKey, CalleeSsaBody> = HashMap::new();
// Two same-leaf siblings: one in the analysed language, one not.
bodies.insert(
FuncKey::new_function(Lang::Python, "other.py", "process", Some(1)),
build_passthrough_callee(),
);
bodies.insert(
FuncKey::new_function(Lang::JavaScript, "a.js", "process", Some(1)),
build_passthrough_callee(),
);
bodies.insert(
FuncKey::new_function(Lang::JavaScript, "b.js", "process", Some(1)),
build_passthrough_callee(),
);
let dummy_cfg: Graph<NodeInfo, EdgeKind> = Graph::new();
let dummy_ssa = SsaBody {
blocks: vec![],
entry: BlockId(0),
value_defs: vec![],
cfg_node_map: std::collections::HashMap::new(),
exception_edges: Vec::new(),
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
synthetic_externals: std::collections::HashSet::new(),
slot_scoped_assigns: std::collections::HashSet::new(),
};
let ctx = BackwardsCtx {
ssa: &dummy_ssa,
cfg: &dummy_cfg,
lang: Lang::JavaScript,
global_summaries: None,
intra_file_bodies: Some(&bodies),
depth_budget: DEFAULT_BACKWARDS_DEPTH,
};
// Deterministic across repeated resolutions, and never the Python body.
let first = resolve_callee_body(&ctx, "process", 0)
.map(|(_, k)| k)
.expect("a JS sibling must resolve");
assert_eq!(
first.lang,
Lang::JavaScript,
"must not match cross-language"
);
// Namespace tie-break is the smallest: "a.js" < "b.js".
assert_eq!(first.namespace, "a.js");
for _ in 0..32 {
let again = resolve_callee_body(&ctx, "process", 0)
.map(|(_, k)| k)
.expect("stable resolution");
assert_eq!(again, first, "resolution must be deterministic across runs");
}
}
/// Reserved-channel invariant (documented in the module header): the
/// backwards transfer never writes the predicate-accumulation bits, so a
/// real walk can never set `infeasible`. Guards against the dead
/// infeasibility channel silently coming alive (or the docs drifting from
/// the implementation) without a corresponding transfer change.
#[test]
fn infeasibility_channel_is_inert() {
// backward_transfer leaves the demand's predicate bits untouched.
let (ssa, _cfg) = build_trivial_source_body();
let demand = DemandState::new(Cap::all());
let (_step, next) = backward_transfer(&ssa, SsaValue(1), &demand);
for (_v, d) in &next {
assert_eq!(d.validated_true, 0, "validated_true must never be written");
assert_eq!(
d.validated_false, 0,
"validated_false must never be written"
);
}
// A full driver walk over a source→sink body produces a confirmation,
// never an infeasible flow.
let ctx = BackwardsCtx::new(&ssa, &_cfg, Lang::JavaScript);
let flows = analyse_sink_backwards(&ctx, SsaValue(1), NodeIndex::new(1), Cap::all());
assert!(
!flows.iter().any(|f| f.infeasible),
"no production flow may set infeasible (channel is reserved)"
);
}
}

View file

@ -822,6 +822,51 @@ fn containment_order(bodies: &[BodyCfg]) -> Vec<usize> {
order
}
/// Build the lexical-scope seed map for a non-toplevel body.
///
/// A body's `global_seed` ancestor lookup
/// ([`ssa_transfer`] `SsaOp::Param` handling) reads its captured names in
/// ancestor order: the direct `parent_body_id` first, then `BodyId(0)` to
/// pick up the JS/TS pass-2 re-keyed top-level globals (see
/// [`ssa_transfer::filter_seed_to_toplevel`]). But `body_exit_states`
/// keys every exit entry under the owning body's id, so a parent's exit
/// map never contains `BodyId(0)` entries. For a body nested two or more
/// levels deep (its parent is itself a non-toplevel body) the `BodyId(0)`
/// leg of the ancestor lookup would therefore find nothing — a global
/// written by a sibling top-level function and read inside a nested
/// closure would be silently missed.
///
/// This helper restores that leg: when `parent_id` is a *non*-toplevel
/// body, it returns a merged map = parent's exit the `BodyId(0)`
/// top-level seed (the converged globals, already keyed under
/// `BodyId(0)`). When the parent *is* the top level (`BodyId(0)`), the
/// parent's exit already carries the `BodyId(0)` entries, so the borrowed
/// parent map is returned unchanged with no clone.
fn seed_for_nested_body<'a>(
parent_id: BodyId,
body_exit_states: &'a HashMap<
BodyId,
HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>,
>,
) -> Option<std::borrow::Cow<'a, HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>>>
{
let parent_exit = body_exit_states.get(&parent_id);
if parent_id == BodyId(0) {
// Depth-1 body: the parent IS the top level, so its exit already
// carries the BodyId(0) entries the ancestor lookup wants.
return parent_exit.map(std::borrow::Cow::Borrowed);
}
// Depth ≥ 2 body: merge the direct parent's exit with the converged
// top-level seed so the ancestor lookup's BodyId(0) leg is live.
let toplevel_seed = body_exit_states.get(&BodyId(0));
match (parent_exit, toplevel_seed) {
(Some(p), Some(t)) => Some(std::borrow::Cow::Owned(ssa_transfer::join_seed_maps(p, t))),
(Some(p), None) => Some(std::borrow::Cow::Borrowed(p)),
(None, Some(t)) => Some(std::borrow::Cow::Borrowed(t)),
(None, None) => None,
}
}
/// Build a `var_name → TypeKind` map from a body's optimised SSA + type-fact
/// result. Used by [`analyse_multi_body`] to forward closure-captured types
/// from a parent body into its children, so that bound-variable receiver
@ -1455,11 +1500,16 @@ fn analyse_multi_body(
// ── Pass 1: lexical containment propagation ──────────────────────
for &idx in &order {
let body = &file_cfg.bodies[idx];
// Determine seed from parent body's exit state.
let parent_seed = body
// Determine seed from parent body's exit state. For bodies nested
// two or more levels deep, merge in the top-level (`BodyId(0)`)
// seed so the `global_seed` ancestor lookup's `BodyId(0)` leg can
// reach globals written by sibling top-level functions (see
// [`seed_for_nested_body`]).
let parent_seed_owned = body
.meta
.parent_body_id
.and_then(|pid| body_exit_states.get(&pid));
.and_then(|pid| seed_for_nested_body(pid, &body_exit_states));
let parent_seed = parent_seed_owned.as_deref();
let parent_var_types = body
.meta
.parent_body_id
@ -1648,15 +1698,40 @@ fn analyse_multi_body(
// changed, so re-analysis would produce byte-identical
// output. The cached findings from the previous
// round (or pass-1) remain correct.
if let Some(reads) = body_reads.get(&body.meta.id) {
if reads.is_disjoint(&changed_names) {
continue;
//
// Restricted to depth-1 bodies (direct children of the top
// level). `changed_names` is derived purely from the
// *top-level* seed delta, and `body_reads` is the body's
// own `info.taint.uses` names. For a depth-1 body the
// entire inbound channel is the top-level seed, so the
// disjointness test is sound. A body nested two or more
// levels deep also consumes its parent's exit, which can
// carry a parent-local derived from a changed global (e.g.
// `reader(){ const local = 'x'+g; child(){ exec(local) } }`):
// its read-set `{local, exec}` is disjoint from the
// top-level change set `{g}`, so skipping it would miss a
// real flow. These bodies are cheap relative to the
// soundness cost, so we always re-run them.
let is_depth1 = body.meta.parent_body_id == Some(BodyId(0));
if is_depth1 {
if let Some(reads) = body_reads.get(&body.meta.id) {
if reads.is_disjoint(&changed_names) {
continue;
}
}
}
let parent_seed = body
// For nested (depth ≥ 2) bodies, merge the top-level
// (`BodyId(0)`) seed into the direct parent's exit so the
// `global_seed` ancestor lookup's `BodyId(0)` leg can reach
// the converged globals (see [`seed_for_nested_body`]).
// `body_exit_states[BodyId(0)]` holds the freshest
// `current_seed` (updated above, and per-body under
// Gauss-Seidel).
let parent_seed_owned = body
.meta
.parent_body_id
.and_then(|pid| body_exit_states.get(&pid));
.and_then(|pid| seed_for_nested_body(pid, &body_exit_states));
let parent_seed = parent_seed_owned.as_deref();
let parent_var_types = body
.meta
.parent_body_id
@ -2957,5 +3032,82 @@ pub(crate) fn build_eligible_bodies(
eligible_bodies
}
#[cfg(test)]
mod seed_threading_tests {
use super::*;
use crate::labels::Cap;
use crate::taint::domain::VarTaint;
use ssa_transfer::BindingKey;
use std::collections::HashMap;
fn tainted() -> VarTaint {
VarTaint {
caps: Cap::all(),
origins: smallvec::SmallVec::new(),
uses_summary: true,
}
}
/// Depth-1 body (parent == top level): the parent's exit already
/// carries the `BodyId(0)` entries, so the helper hands it back
/// borrowed with no merge.
#[test]
fn depth1_returns_parent_exit_unchanged() {
let mut exits: HashMap<BodyId, HashMap<BindingKey, VarTaint>> = HashMap::new();
let mut top = HashMap::new();
top.insert(BindingKey::new("g", BodyId(0)), tainted());
exits.insert(BodyId(0), top);
let seed = seed_for_nested_body(BodyId(0), &exits).expect("seed present");
// Borrowed, single BodyId(0) entry.
assert!(matches!(seed, std::borrow::Cow::Borrowed(_)));
assert!(seed.contains_key(&BindingKey::new("g", BodyId(0))));
assert_eq!(seed.len(), 1);
}
/// Depth-2 grandchild (parent == BodyId(2), itself non-toplevel):
/// the helper merges the direct parent's exit with the converged
/// `BodyId(0)` top-level seed so the ancestor lookup's `BodyId(0)`
/// leg can reach the sibling-written global. This is the finding
/// #19 regression guard: without the merge, the returned map would
/// have no `BodyId(0)` entry and the grandchild would miss `g`.
#[test]
fn depth2_merges_toplevel_seed() {
let mut exits: HashMap<BodyId, HashMap<BindingKey, VarTaint>> = HashMap::new();
// Top-level converged seed: global `g` is tainted.
let mut top = HashMap::new();
top.insert(BindingKey::new("g", BodyId(0)), tainted());
exits.insert(BodyId(0), top);
// Direct parent (reader, BodyId(2)) exports a parent-local.
let mut parent = HashMap::new();
parent.insert(BindingKey::new("local", BodyId(2)), tainted());
exits.insert(BodyId(2), parent);
let seed = seed_for_nested_body(BodyId(2), &exits).expect("seed present");
// Owned merge of both maps.
assert!(matches!(seed, std::borrow::Cow::Owned(_)));
// Parent-local survives (parent leg of ancestor lookup).
assert!(seed.contains_key(&BindingKey::new("local", BodyId(2))));
// Converged global survives under BodyId(0) (the previously-dead
// ancestor leg for grandchildren).
assert!(seed.contains_key(&BindingKey::new("g", BodyId(0))));
}
/// Depth-2 body whose parent exported nothing still receives the
/// `BodyId(0)` top-level seed — borrowed directly, no clone.
#[test]
fn depth2_empty_parent_falls_back_to_toplevel() {
let mut exits: HashMap<BodyId, HashMap<BindingKey, VarTaint>> = HashMap::new();
let mut top = HashMap::new();
top.insert(BindingKey::new("g", BodyId(0)), tainted());
exits.insert(BodyId(0), top);
// No entry for BodyId(2) at all (parent produced no exit state).
let seed = seed_for_nested_body(BodyId(2), &exits).expect("seed present");
assert!(matches!(seed, std::borrow::Cow::Borrowed(_)));
assert!(seed.contains_key(&BindingKey::new("g", BodyId(0))));
}
}
#[cfg(test)]
mod tests;

View file

@ -619,6 +619,95 @@ fn parse_leading_uint(s: &str) -> Option<u64> {
any.then_some(n)
}
/// Detect a substring-REJECTION idiom dressed up as a membership method:
/// `x.includes("..")`, `x.contains("<script>")`, `x.indexOf("..")`, etc., where
/// the needle is a **string literal**.
///
/// Genuine allowlist membership has the form `ALLOWED.includes(value)`, the
/// argument is the variable under test and the TRUE branch proves membership.
/// When the argument is a string literal, the receiver is the value under test
/// and the call asks "does this user string contain a dangerous substring?",
/// the TRUE branch is the dangerous/reject path, not a validated path.
///
/// Returning `true` keeps such conditions out of [`PredicateKind::AllowlistCheck`]
/// (which marks every condition var validated on the TRUE branch with the wrong
/// polarity). The shell-metachar form is handled earlier by
/// [`is_shell_metachar_rejection`]; this covers the broader literal-needle case
/// (`..`, `<script>`, etc.) that the shell-metachar carve-out deliberately
/// excludes. Conservative: only fires when the first argument parses as a
/// string literal, so `ALLOWED.includes(value)` (identifier arg) is untouched.
fn is_literal_needle_membership(text: &str) -> bool {
let lower = text.to_ascii_lowercase();
for method in [
".includes(",
".include?(",
".contains(",
".indexof(",
".has(",
] {
if let Some(idx) = lower.find(method) {
let args_start = idx + method.len();
// Index into the original (case-preserving) text so quoted needle
// characters stay accurate; the byte offset matches because the
// method tokens are ASCII.
if extract_first_string_arg(&text[args_start..]).is_some() {
return true;
}
}
}
false
}
/// Detect the *negated* `indexOf`/`search`/`find` membership idiom whose TRUE
/// branch is the NOT-in-list (reject) path:
///
/// * `ALLOWED.indexOf(x) === -1` / `== -1` (JS/TS — not found)
/// * `ALLOWED.indexOf(x) < 0` (not found)
/// * `s.find(x) == -1` (Python `str.find` / C++ `std::string::find` use
/// `npos`/`-1` for absent; `< 0` covers the common int form)
///
/// Genuine allowlist membership classifies as [`PredicateKind::AllowlistCheck`]
/// whose generic mechanic marks the receiver/arg validated on the TRUE branch.
/// That polarity is only correct for the *positive* form
/// (`indexOf(x) !== -1` / `>= 0` — found ⇒ in list ⇒ validated). For the
/// `=== -1` / `< 0` form the TRUE branch means NOT-in-list, so marking the var
/// validated there is inverted: it suppresses a genuine finding on the
/// reject-then-sink shape and an FP on the correctly-guarded `=== -1; return`
/// shape. The polarity-inversion machinery in `apply_branch_predicates`
/// (mod.rs) only flips for Python `not in` / TypeCheck `!=`, not for indexOf
/// result comparisons, so we conservatively drop these to
/// [`PredicateKind::Unknown`] — neither branch is over-validated and the sink
/// finding survives. The positive `!== -1` / `>= 0` form is intentionally left
/// to fall through to `AllowlistCheck`, where the existing polarity is correct.
fn is_negated_indexof_membership(text: &str) -> bool {
let lower = text.to_ascii_lowercase();
// Require an index-of / find-style search method whose result is being
// compared. `.indexof(` covers JS/TS `indexOf` and Java `indexOf`;
// `.find(` / `.search(` cover Python/C++/JS string searches.
let has_index_search =
lower.contains(".indexof(") || lower.contains(".search(") || lower.contains(".find(");
if !has_index_search {
return false;
}
// Strip whitespace so spacing variants (`=== -1`, `===-1`, `< 0`) collapse
// to a single canonical form.
let compact: String = lower.chars().filter(|c| !c.is_whitespace()).collect();
// Positive (found ⇒ in list) forms keep correct AllowlistCheck polarity —
// do NOT claim them here.
if compact.contains("!==-1")
|| compact.contains("!=-1")
|| compact.contains(">=0")
|| compact.contains(">-1")
{
return false;
}
// Negated (not-found ⇒ reject) forms: inverted polarity, drop to Unknown.
compact.contains("===-1")
|| compact.contains("==-1")
|| compact.contains("<0")
|| compact.contains("<=-1")
}
/// Classify a raw condition text into a [`PredicateKind`].
///
/// # Rules
@ -710,6 +799,35 @@ pub fn classify_condition(text: &str) -> PredicateKind {
return PredicateKind::HostAllowlistValidated;
}
// ── Substring-REJECTION with a literal needle (not an allowlist) ─────
//
// `x.includes("..")` / `x.contains("<script>")` / `x.indexOf("..")` test
// the *receiver* against a fixed literal — a rejection idiom whose TRUE
// branch is the dangerous path. Classifying these as `AllowlistCheck`
// (below) would mark the receiver validated on the TRUE branch with the
// wrong polarity, silencing a genuine finding. Drop to `Unknown` so
// neither branch is over-validated and the sink finding survives.
// (The shell-metachar form was already caught earlier; this covers the
// broader literal-needle case.)
if is_literal_needle_membership(text) {
return PredicateKind::Unknown;
}
// ── Negated indexOf/find membership (inverted polarity) ──────────────
//
// `ALLOWED.indexOf(x) === -1` / `< 0` means NOT-in-list, so the TRUE
// branch is the reject path. Classifying as `AllowlistCheck` (below)
// would mark `x` validated on the TRUE branch with the wrong polarity —
// suppressing a genuine finding on `if (!FOUND) sink(x)` and producing an
// FP on the correctly-guarded `if (!FOUND) return; sink(x)` shape. The
// polarity-inversion machinery does not cover indexOf result comparisons,
// so drop to `Unknown` and let neither branch over-validate. The positive
// `!== -1` / `>= 0` form is excluded by `is_negated_indexof_membership`
// and still classifies as `AllowlistCheck` with correct polarity.
if is_negated_indexof_membership(text) {
return PredicateKind::Unknown;
}
// ── Allowlist / membership checks ────────────────────────────────────
if lower.contains(".includes(")
|| lower.contains(".include?(")
@ -909,12 +1027,20 @@ pub fn classify_condition_with_target(text: &str) -> (PredicateKind, Option<Stri
match kind {
PredicateKind::ValidationCall | PredicateKind::SanitizerCall => {
if let Some(target) = extract_validation_target(text) {
(kind, Some(target))
} else if count_call_args(text).map(|n| n > 1).unwrap_or(false) {
(PredicateKind::Unknown, None)
} else {
(kind, None)
// A dotted target (`a.field`, `req.body.x`) is a member expression.
// `condition_vars` only ever contains bare identifier tokens
// (collect_idents pushes single identifiers), so a dotted target
// can never match and the consumer falls back to validating EVERY
// condition var, the exact over-validation the multi-arg Unknown
// guard was written to prevent. Treat dotted extraction the same
// as a failed extraction so multi-arg validators degrade to
// Unknown instead of silently validating unrelated arguments.
match extract_validation_target(text) {
Some(target) if !target.contains('.') => (kind, Some(target)),
_ if count_call_args(text).map(|n| n > 1).unwrap_or(false) => {
(PredicateKind::Unknown, None)
}
_ => (kind, None),
}
}
PredicateKind::AllowlistCheck => {
@ -1704,6 +1830,47 @@ mod tests {
assert_eq!(target, None);
}
#[test]
fn target_multi_arg_dotted_first_arg_is_unknown() {
// `validate(a.field, limit)` extracts the dotted target `a.field`,
// which can never match a bare-identifier condition var, so the
// consumer would fall back to validating EVERY condition var
// (including `limit`, which the validator never inspected). A dotted
// multi-arg target must degrade to Unknown, exactly like a failed
// extraction, so no unrelated argument is wrongly validated.
let (kind, target) = classify_condition_with_target("validate(a.field, limit)");
assert_eq!(kind, PredicateKind::Unknown);
assert_eq!(target, None);
}
#[test]
fn target_multi_arg_dotted_receiver_is_unknown() {
// `req.session.verify(sig, opts)` resolves to the dotted receiver
// `req.session`; same non-matching-target hazard on a multi-arg call.
let (kind, target) = classify_condition_with_target("req.session.verify(sig, opts)");
assert_eq!(kind, PredicateKind::Unknown);
assert_eq!(target, None);
}
#[test]
fn target_single_arg_dotted_preserves_kind_no_dotted_target() {
// Single-arg dotted-receiver validator: over-validation of all
// condition vars is pre-existing/intentional for single-arg calls, but
// the returned target must NOT be a dotted member expression (which
// can never match condition_vars). Preserve the kind with None.
let (kind, target) = classify_condition_with_target("req.session.verify(sig)");
assert_eq!(kind, PredicateKind::ValidationCall);
assert_eq!(target, None);
}
#[test]
fn target_bare_identifier_still_extracted() {
// Regression guard: a bare-identifier multi-arg target still narrows.
let (kind, target) = classify_condition_with_target("validate(x, limit)");
assert_eq!(kind, PredicateKind::ValidationCall);
assert_eq!(target.as_deref(), Some("x"));
}
#[test]
fn count_call_args_basic() {
assert_eq!(super::count_call_args("f(a, b, c)"), Some(3));
@ -1779,6 +1946,109 @@ mod tests {
);
}
// ── Literal-needle substring rejection is NOT an allowlist ─────────
//
// `x.includes("..")` / `x.contains("<script>")` test the receiver
// against a literal needle (a rejection idiom whose TRUE branch is the
// dangerous path), so they must NOT classify as AllowlistCheck (which
// would mark `x` validated on the TRUE branch with inverted polarity and
// silence the finding). They drop to Unknown so neither branch is
// over-validated.
#[test]
fn classify_literal_needle_dotdot_not_allowlist() {
assert_eq!(
classify_condition("p.includes(\"..\")"),
PredicateKind::Unknown
);
assert_eq!(
classify_condition("p.contains(\"..\")"),
PredicateKind::Unknown
);
}
#[test]
fn classify_literal_needle_html_not_allowlist() {
assert_eq!(
classify_condition("input.includes(\"<script>\")"),
PredicateKind::Unknown
);
}
#[test]
fn classify_literal_needle_indexof_not_allowlist() {
assert_eq!(
classify_condition("path.indexOf(\"..\")"),
PredicateKind::Unknown
);
}
// ── Negated indexOf membership has inverted polarity ──────────────
//
// `ALLOWED.indexOf(x) === -1` / `< 0` means NOT-in-list, so the TRUE
// branch is the reject path. Must NOT classify as AllowlistCheck (which
// would mark `x` validated on the TRUE branch with inverted polarity,
// silencing a real reject-then-sink finding and producing an FP on the
// correctly-guarded `=== -1; return; sink` shape). Drop to Unknown.
#[test]
fn classify_negated_indexof_eq_minus_one_not_allowlist() {
assert_eq!(
classify_condition("ALLOWED.indexOf(cmd) === -1"),
PredicateKind::Unknown
);
assert_eq!(
classify_condition("ALLOWED.indexOf(cmd) == -1"),
PredicateKind::Unknown
);
}
#[test]
fn classify_negated_indexof_lt_zero_not_allowlist() {
assert_eq!(
classify_condition("allowed.indexOf(x) < 0"),
PredicateKind::Unknown
);
}
#[test]
fn classify_positive_indexof_membership_stays_allowlist() {
// `!== -1` / `>= 0` mean found ⇒ in list ⇒ TRUE branch is validated.
// The existing AllowlistCheck polarity is correct here, so these must
// remain AllowlistCheck (NOT dropped to Unknown).
assert_eq!(
classify_condition("ALLOWED.indexOf(cmd) !== -1"),
PredicateKind::AllowlistCheck
);
assert_eq!(
classify_condition("ALLOWED.indexOf(cmd) >= 0"),
PredicateKind::AllowlistCheck
);
}
#[test]
fn classify_genuine_allowlist_identifier_arg_unchanged() {
// Identifier argument (the value under test) is a real membership
// check and must remain AllowlistCheck.
assert_eq!(
classify_condition("ALLOWED.includes(cmd)"),
PredicateKind::AllowlistCheck
);
assert_eq!(
classify_condition("whitelist.contains(value)"),
PredicateKind::AllowlistCheck
);
}
#[test]
fn classify_literal_needle_shell_metachar_still_inverted() {
// A shell-metachar literal needle is caught by the earlier
// shell-metachar branch (inverted-polarity SHELL_ESCAPE clear) and
// must NOT be swallowed by the literal-needle carve-out.
assert_eq!(
classify_condition("cmd.includes(\";\")"),
PredicateKind::ShellMetaValidated
);
}
#[test]
fn extract_allowlist_target_negated_paren_wrapper() {
// Tree-sitter records the if-condition as `(!ALLOWED.includes(cmd))`,
@ -2067,20 +2337,29 @@ mod tests {
}
#[test]
fn classify_non_metachar_contains_stays_allowlist() {
fn classify_non_metachar_contains_is_unknown_not_allowlist() {
// `x.contains("foo")` must NOT be credited as a shell-metachar
// rejection. It falls back to the existing AllowlistCheck behavior.
// rejection, AND must NOT be classified as `AllowlistCheck`: the
// argument is a string literal, so this is a substring presence/
// rejection test on the receiver, not membership of the receiver in a
// collection. Classifying it as AllowlistCheck would mark the
// receiver validated on the TRUE branch with inverted polarity and
// silence a genuine finding (e.g. `if (path.contains("..")) sink(path)`).
// The classifier degrades to Unknown so neither branch is
// over-validated. (guards.rs already excluded these from structural
// `Cap::all()` dominator guards via the missing allowlist target, so
// cfg-unguarded-sink suppression is unaffected by this change.)
assert_eq!(
classify_condition("input.contains(\"foo\")"),
PredicateKind::AllowlistCheck
PredicateKind::Unknown
);
assert_eq!(
classify_condition("path.contains(\"..\")"),
PredicateKind::AllowlistCheck
PredicateKind::Unknown
);
assert_eq!(
classify_condition("name.contains(\"admin\")"),
PredicateKind::AllowlistCheck
PredicateKind::Unknown
);
}

View file

@ -156,6 +156,17 @@ pub(super) fn reconstruct_flow_path(
current = pick_tainted_operand(&vals, origin, ssa);
continue;
}
// NOTE: deliberately NOT continuing through
// `SsaOp::FieldProj` in this dedup arm. Walking the
// receiver chain of a member-access source
// (`req.body.path`) advances the backward walk past the
// meaningful taint-read node (the `.body.path` access) all
// the way to the bare parameter (`req`), regressing the
// reported source line from the member access (line 9) to
// the function signature (line 8). The recall corpus
// (`tests/recall_gaps.rs` fs_promises_*) and
// `real_world_tests` pin the member-access source line, so
// the FieldProj node terminates the walk here.
_ => break,
}
}

View file

@ -1510,35 +1510,7 @@ fn apply_branch_predicates(
// (XSS / SQLi / FILE_IO) downstream still fire on residual taint.
if kind == PredicateKind::RelativeUrlValidated && polarity {
for var in condition_vars {
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
for (val, _) in state.values.iter() {
if let Some(name) = ssa
.value_defs
.get(val.0 as usize)
.and_then(|vd| vd.var_name.as_deref())
{
if name == var {
to_clear.push(*val);
}
}
}
for val in to_clear {
if let Some(taint) = state.get(val).cloned() {
let new_caps = taint.caps & !Cap::OPEN_REDIRECT;
if new_caps.is_empty() {
state.remove(val);
} else {
state.set(
val,
VarTaint {
caps: new_caps,
origins: taint.origins,
uses_summary: taint.uses_summary,
},
);
}
}
}
clear_cap_alias_aware(state, var, Cap::OPEN_REDIRECT, ssa, base_aliases);
}
}
@ -1551,35 +1523,7 @@ fn apply_branch_predicates(
// inline leading-slash check).
if kind == PredicateKind::HostAllowlistValidated && polarity {
for var in condition_vars {
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
for (val, _) in state.values.iter() {
if let Some(name) = ssa
.value_defs
.get(val.0 as usize)
.and_then(|vd| vd.var_name.as_deref())
{
if name == var {
to_clear.push(*val);
}
}
}
for val in to_clear {
if let Some(taint) = state.get(val).cloned() {
let new_caps = taint.caps & !Cap::OPEN_REDIRECT;
if new_caps.is_empty() {
state.remove(val);
} else {
state.set(
val,
VarTaint {
caps: new_caps,
origins: taint.origins,
uses_summary: taint.uses_summary,
},
);
}
}
}
clear_cap_alias_aware(state, var, Cap::OPEN_REDIRECT, ssa, base_aliases);
}
}
@ -1594,43 +1538,7 @@ fn apply_branch_predicates(
// taint caps.
if kind == PredicateKind::ShellMetaValidated && !polarity {
for var in condition_vars {
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
let mut names: SmallVec<[&str; 4]> = smallvec::smallvec![var.as_str()];
if let Some(aliases) = base_aliases.and_then(|aliases| aliases.aliases_of(var)) {
for alias in aliases {
if alias != var {
names.push(alias.as_str());
}
}
}
for &name_to_clear in names.iter() {
for (idx, def) in ssa.value_defs.iter().enumerate() {
if def.var_name.as_deref() == Some(name_to_clear) {
let val = SsaValue(idx as u32);
to_clear.push(val);
collect_copy_alias_operands(val, ssa, &mut to_clear);
}
}
}
to_clear.sort_by_key(|v| v.0);
to_clear.dedup_by_key(|v| v.0);
for val in to_clear {
if let Some(taint) = state.get(val).cloned() {
let new_caps = taint.caps & !Cap::SHELL_ESCAPE;
if new_caps.is_empty() {
state.remove(val);
} else {
state.set(
val,
VarTaint {
caps: new_caps,
origins: taint.origins,
uses_summary: taint.uses_summary,
},
);
}
}
}
clear_cap_alias_aware(state, var, Cap::SHELL_ESCAPE, ssa, base_aliases);
}
}
@ -1658,6 +1566,71 @@ fn apply_branch_predicates(
}
}
/// Clear `cap` from the taint of `var` and every SSA value that aliases it
/// on a validated branch.
///
/// "Aliases" covers three sources, matching the (formerly inline)
/// `ShellMetaValidated` handler:
///
/// 1. SSA values whose `var_name` equals `var`.
/// 2. Base aliases of `var` from copy propagation
/// (`base_aliases.aliases_of(var)`), so a value renamed away by
/// `copy_propagate` (`const t = url;`) is still reached when the
/// condition or sink references the other name.
/// 3. Transitive copy-chain operands (`collect_copy_alias_operands`) for
/// Assign/Phi copies the optimizer did not propagate away.
///
/// The two URL arms (`RelativeUrlValidated` / `HostAllowlistValidated`)
/// previously matched only source 1 against `state.values`, missing the
/// `const t = url; if (t.startsWith("/")) { res.redirect(url) }` alias shape
/// the shell arm handled, so they fired false positives on the validated
/// branch. Sharing this helper unifies all three.
fn clear_cap_alias_aware(
state: &mut SsaTaintState,
var: &str,
cap: Cap,
ssa: &SsaBody,
base_aliases: Option<&crate::ssa::alias::BaseAliasResult>,
) {
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
let mut names: SmallVec<[&str; 4]> = smallvec::smallvec![var];
if let Some(aliases) = base_aliases.and_then(|aliases| aliases.aliases_of(var)) {
for alias in aliases {
if alias.as_str() != var {
names.push(alias.as_str());
}
}
}
for &name_to_clear in names.iter() {
for (idx, def) in ssa.value_defs.iter().enumerate() {
if def.var_name.as_deref() == Some(name_to_clear) {
let val = SsaValue(idx as u32);
to_clear.push(val);
collect_copy_alias_operands(val, ssa, &mut to_clear);
}
}
}
to_clear.sort_by_key(|v| v.0);
to_clear.dedup_by_key(|v| v.0);
for val in to_clear {
if let Some(taint) = state.get(val).cloned() {
let new_caps = taint.caps & !cap;
if new_caps.is_empty() {
state.remove(val);
} else {
state.set(
val,
VarTaint {
caps: new_caps,
origins: taint.origins,
uses_summary: taint.uses_summary,
},
);
}
}
}
}
fn collect_copy_alias_operands(root: SsaValue, ssa: &SsaBody, out: &mut SmallVec<[SsaValue; 4]>) {
let mut seen = HashSet::new();
let mut stack = vec![root];
@ -1685,6 +1658,36 @@ fn collect_copy_alias_operands(root: SsaValue, ssa: &SsaBody, out: &mut SmallVec
}
}
/// Decide whether the "null / none / no-error" outcome is the condition's
/// TRUE branch from the lowered condition text.
///
/// This must distinguish equality (`err == null` → null branch is TRUE)
/// from strict/loose inequality (`err !== null` / `err != null` → null
/// branch is the FALSE branch). A naive `contains("== null")` is wrong
/// because `"err !== null"` contains `"== null"` as a substring (the `==`
/// of `!==` followed by ` null`), which would mark validation on the wrong
/// branch. We therefore reject any inequality form first.
fn null_check_true_branch_is_success(lower: &str) -> bool {
// Inequality forms put the null/none outcome on the FALSE branch.
// Strip them before testing equality so the `==` inside `!==`/`!=`
// cannot be mistaken for an equality comparison.
if lower.contains("!= nil")
|| lower.contains("!= none")
|| lower.contains("is not none")
|| lower.contains("is_err")
|| lower.contains("!== null")
|| lower.contains("!= null")
{
return false;
}
lower.contains("== nil")
|| lower.contains("== none")
|| lower.contains("is none")
|| lower.contains("is_ok")
|| lower.contains("=== null")
|| lower.contains("== null")
}
/// Mark the input arguments of a value-producing validator as validated
/// on the success branch of a downstream `err`-check.
///
@ -1728,12 +1731,7 @@ fn apply_validation_err_check_narrowing(
// Defaults to FALSE for `err != nil`-style; flips to TRUE for
// `err == nil`-style and `is_ok()`.
let lower = cond_text.to_ascii_lowercase();
let success_branch_is_true = lower.contains("== nil")
|| lower.contains("== none")
|| lower.contains("is none")
|| lower.contains("is_ok")
|| lower.contains("=== null")
|| lower.contains("== null");
let success_branch_is_true = null_check_true_branch_is_success(&lower);
// Resolve `err`'s reaching SSA value (last def in this or earlier block).
// We restrict to single-var conditions to avoid mis-attributing
@ -1887,12 +1885,7 @@ fn apply_input_validator_branch_narrowing(
// `apply_validation_err_check_narrowing` uses for the `err == nil`
// family.
let lower = cond_text.to_ascii_lowercase();
let cond_text_says_null_branch_is_true = lower.contains("== nil")
|| lower.contains("== none")
|| lower.contains("is none")
|| lower.contains("is_ok")
|| lower.contains("=== null")
|| lower.contains("== null");
let cond_text_says_null_branch_is_true = null_check_true_branch_is_success(&lower);
let success_branch_is_true = match polarity {
InputValidatorPolarity::ErrorReturning => cond_text_says_null_branch_is_true,
@ -2388,6 +2381,68 @@ fn inline_analyse_callee(
/// mechanism would otherwise lose the flow).
pub(crate) type PromiseCallbackSeeds<'a> = &'a [(usize, VarTaint)];
/// Resolve callback arguments: when a call argument refers to a known
/// function name (resolvable to a `FuncKey` in `local_summaries` and present
/// in `callee_bodies`), record the mapping `callee param name → target
/// FuncKey` so the callee's analysis can resolve calls through the
/// parameter.
///
/// This is consulted by `resolve_callee` step -1. Because the resolved
/// bindings shape the callee's return taint but are NOT reflected in the
/// `(FuncKey, ArgTaintSig)` inline-cache key (function references are
/// typically untainted, so different callbacks produce identical sigs),
/// callers MUST skip the inline cache (both lookup and insert) whenever the
/// returned map is non-empty, otherwise a call site passing `safeCb` could
/// reuse / poison the cached shape of a call site passing `sourceCb`.
fn compute_callback_bindings(
callee_ssa: &SsaBody,
args: &[SmallVec<[SsaValue; 2]>],
transfer: &SsaTaintTransfer,
caller_ssa: &SsaBody,
) -> HashMap<String, FuncKey> {
let mut callback_bindings: HashMap<String, FuncKey> = HashMap::new();
for block in &callee_ssa.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
if let SsaOp::Param { index } = &inst.op {
if let Some(param_name) = inst.var_name.as_ref() {
if *index < args.len() {
for v in &args[*index] {
if let Some(arg_var_name) = caller_ssa
.value_defs
.get(v.0 as usize)
.and_then(|vd| vd.var_name.as_deref())
{
let norm = callee_leaf_name(arg_var_name);
let hint_raw = callee_container_hint(arg_var_name);
let hint = if hint_raw.is_empty() {
None
} else {
Some(hint_raw)
};
if let Some(target_key) = resolve_local_func_key(
transfer.local_summaries,
transfer.lang,
transfer.namespace,
norm,
hint,
) {
if transfer
.callee_bodies
.is_some_and(|cb| cb.contains_key(&target_key))
{
callback_bindings.insert(param_name.clone(), target_key);
}
}
}
}
}
}
}
}
}
callback_bindings
}
fn inline_analyse_callee_with_seeds(
callee: &str,
args: &[SmallVec<[SsaValue; 2]>],
@ -2463,7 +2518,8 @@ fn inline_analyse_callee_with_seeds(
receiver_var,
arity: arity_hint,
};
match gs.resolve_callee(&query) {
let res = gs.resolve_callee(&query);
match res {
CalleeResolution::Resolved(key) => {
let xfile_bodies = transfer.cross_file_bodies?;
let body = xfile_bodies.get(&key)?;
@ -2510,11 +2566,22 @@ fn inline_analyse_callee_with_seeds(
// promise-callback seeds into the signature.
let sig = build_arg_taint_sig_with_seeds(args, receiver, state, promise_callback_seeds);
// Resolve callback bindings BEFORE consulting the cache. The cache key
// `(FuncKey, ArgTaintSig)` does not capture WHICH function was passed as
// a callback argument (function refs are typically untainted, so
// `wrap(safeCb)` and `wrap(sourceCb)` produce identical sigs). Since
// the bound function's summary shapes the callee's return taint via
// `resolve_callee` step -1, a cached shape is only sound to reuse / store
// when there are no callback bindings. When bindings exist we bypass the
// cache entirely (both lookup below and insert later) to avoid poisoning.
let callback_bindings = compute_callback_bindings(&callee_body.ssa, args, transfer, caller_ssa);
let cacheable = callback_bindings.is_empty();
// Check cache (keyed by FuncKey + arg signature). The cached value
// is a structural shape, re-attribute origins to the current call
// site before returning so two callers with matching caps but
// different origins see their own source chains.
{
if cacheable {
let cache = cache_ref.borrow();
if let Some(cached) = cache.get(&(callee_key.clone(), sig.clone())) {
record_engine_note(crate::engine_notes::EngineNote::InlineCacheReused);
@ -2610,53 +2677,10 @@ fn inline_analyse_callee_with_seeds(
})
});
// Detect callback arguments: when a call argument refers to a known function
// name (resolvable to a FuncKey in the local summaries index), record the
// mapping so the callee's analysis can resolve calls through the parameter.
//
// The binding value is a full `FuncKey` rather than a leaf string so the
// child transfer can look up `callee_bodies` / `ssa_summaries` / local
// summaries by canonical identity.
let mut callback_bindings: HashMap<String, FuncKey> = HashMap::new();
for block in &callee_body.ssa.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
if let SsaOp::Param { index } = &inst.op {
if let Some(param_name) = inst.var_name.as_ref() {
if *index < args.len() {
for v in &args[*index] {
if let Some(arg_var_name) = caller_ssa
.value_defs
.get(v.0 as usize)
.and_then(|vd| vd.var_name.as_deref())
{
let norm = callee_leaf_name(arg_var_name);
let hint_raw = callee_container_hint(arg_var_name);
let hint = if hint_raw.is_empty() {
None
} else {
Some(hint_raw)
};
if let Some(target_key) = resolve_local_func_key(
transfer.local_summaries,
transfer.lang,
transfer.namespace,
norm,
hint,
) {
if transfer
.callee_bodies
.is_some_and(|cb| cb.contains_key(&target_key))
{
callback_bindings.insert(param_name.clone(), target_key);
}
}
}
}
}
}
}
}
}
// Callback argument bindings were resolved before the cache lookup
// (see `compute_callback_bindings`) so a non-empty map could bypass the
// inline cache. Reuse the same map here to drive the child transfer's
// `resolve_callee` step -1.
let cb_ref = if callback_bindings.is_empty() {
None
@ -2774,7 +2798,13 @@ fn inline_analyse_callee_with_seeds(
// Cache the structural shape under the canonical FuncKey, then
// re-attribute to this call site's actual arg/receiver origins.
{
//
// Only cache shapes that do not depend on callback identity. When
// `callback_bindings` is non-empty the resolved return taint is shaped by
// step -1 callback resolution, which is invisible to the
// `(FuncKey, ArgTaintSig)` key, so caching would poison sibling call
// sites that pass a different callback.
if cacheable {
let mut cache = cache_ref.borrow_mut();
cache.insert((callee_key, sig), shape.clone());
}
@ -2916,11 +2946,26 @@ fn extract_inline_return_taint(
let mut derived_params: u64 = 0;
let mut derived_receiver: bool = false;
// Explicit-return param passthrough: return paths whose return value IS
// a formal parameter (`return x;`). This channel is JOINED with the
// derived channel below so a mixed-return helper
// (`if (c) return src(); return x;`) propagates BOTH the derived source
// taint and the parameter passthrough provenance.
let mut param_caps = Cap::empty();
let mut param_internal: SmallVec<[TaintOrigin; 2]> = SmallVec::new();
let mut param_params: u64 = 0;
let mut param_receiver: bool = false;
// Return(None) fallback param passthrough: caps swept from ALL live
// param-derived values on implicit-return paths. Kept separate because
// it is noisy (it re-adds caps the callee may have sanitized), so it
// only acts as a last-resort fallback, never joined into the derived
// channel.
let mut param_fallback_caps = Cap::empty();
let mut param_fallback_internal: SmallVec<[TaintOrigin; 2]> = SmallVec::new();
let mut param_fallback_params: u64 = 0;
let mut param_fallback_receiver: bool = false;
// Join of the return value's [`PathFact`] across every return block.
// Seeded with `None` (no observation) and widened conservatively to
// [`PathFact::top`] if any return block gives Top or the value is
@ -3120,13 +3165,13 @@ fn extract_inline_return_taint(
// Fall back to collecting all live values.
for (val, taint) in &exit.values {
if param_values.contains(val) {
param_caps |= taint.caps;
param_fallback_caps |= taint.caps;
for orig in &taint.origins {
classify_and_push(
orig,
&mut param_internal,
&mut param_params,
&mut param_receiver,
&mut param_fallback_internal,
&mut param_fallback_params,
&mut param_fallback_receiver,
);
}
} else {
@ -3145,17 +3190,35 @@ fn extract_inline_return_taint(
}
}
// Prefer derived caps; fall back to param-return caps for passthrough functions.
let (final_caps, final_internal, final_params, final_receiver) = if !derived_caps.is_empty() {
(
derived_caps,
derived_internal,
derived_params,
derived_receiver,
)
} else {
(param_caps, param_internal, param_params, param_receiver)
};
// Join the derived channel (taint produced inside the callee) with the
// explicit-return param-passthrough channel (return paths whose return
// value IS a formal parameter). A mixed-return helper
// (`if (c) return src(); return x;`) has a non-empty derived channel on
// one path AND a param passthrough on another; the correct return-value
// abstraction across paths is the union of both, not a choose-one.
// The noisy Return(None) fallback (`param_fallback_*`) is used only when
// BOTH structured channels are empty, so it never re-adds caps the
// callee sanitized on an explicit-return path.
let (final_caps, final_internal, final_params, final_receiver) =
if !derived_caps.is_empty() || !param_caps.is_empty() {
let mut internal = derived_internal;
for orig in &param_internal {
push_internal(&mut internal, orig);
}
(
derived_caps | param_caps,
internal,
derived_params | param_params,
derived_receiver || param_receiver,
)
} else {
(
param_fallback_caps,
param_fallback_internal,
param_fallback_params,
param_fallback_receiver,
)
};
let return_path_fact =
return_path_fact_acc.unwrap_or_else(crate::abstract_interp::PathFact::top);
@ -4874,11 +4937,19 @@ pub(super) fn transfer_inst(
&resolved.propagating_params
};
if !resolved.param_return_paths.is_empty() && !effective_params.is_empty() {
if !effective_params.is_empty() {
// Per-parameter application: each propagating param
// contributes taint narrowed by its own per-path
// sanitizer. Origins are still aggregated across
// params, they name source anchors, not transforms.
// contributes taint narrowed by ITS OWN sanitizer
// (`effective_param_sanitizer`, which prefers the
// param's per-return-path strip bits and otherwise
// falls back to its own `param_to_return_strip` —
// never the cross-param aggregate). This avoids the
// cross-parameter sanitizer bleed of unioning all
// args then stripping the aggregate: for
// `f(a,b){ return a + escape(b) }`, param 1's
// HTML_ESCAPE strip must not clear param 0's taint.
// Origins are still aggregated across params, they
// name source anchors, not transforms.
let mut any_origin_added = false;
for &param_idx in effective_params {
let arg_caps_origins =
@ -4898,6 +4969,11 @@ pub(super) fn transfer_inst(
// Sentinel reference to silence unused on cold paths.
let _ = any_origin_added;
} else {
// No positional argument info (`arg_uses` empty), so
// per-param attribution is impossible. Union all
// propagating-arg taint; the aggregate sanitizer is
// applied below. This is the only path that may
// over-strip, and only when positions are unknown.
let (prop_caps, prop_origins) =
collect_args_taint(args, receiver, state, effective_params);
return_bits |= prop_caps;
@ -7847,7 +7923,15 @@ fn collect_block_events(
.and_then(|vd| vd.var_name.as_deref());
if let Some(name) = var_name {
if let Some(sym) = transfer.interner.get(name) {
return state.validated_may.contains(sym);
// Require validation on EVERY reaching path
// (intersection-on-join) before suppressing the
// sink. `validated_may` is the union over
// predecessors (validated on SOME path), so a value
// validated on one branch but bypassable on another
// would be wrongly silenced (`if (valid(x)) {...}
// else {} sink(x)`). `validated_must` is the sound
// "all paths validated" gate.
return state.validated_must.contains(sym);
}
}
false
@ -7994,8 +8078,15 @@ fn is_noreturn_call(lang: Lang, callee: &str) -> bool {
if !matches!(lang, Lang::C | Lang::Cpp) {
return false;
}
let method = crate::labels::bare_method_name(callee);
matches!(method, "exit" | "_Exit" | "quick_exit" | "abort")
// Only free-function calls terminate the process. A receiver-qualified
// method whose trailing segment happens to be `exit`/`abort`/etc.
// (e.g. `transaction.abort()`, `app.exit()`) is an ordinary call and
// must NOT wipe the taint state. Reject any callee carrying a receiver
// (`.` member access or `->` pointer access) and match the full text.
if callee.contains('.') || callee.contains("->") {
return false;
}
matches!(callee, "exit" | "_Exit" | "quick_exit" | "abort")
}
// ── Primary sink-site attribution ───────────────────────────────────────
@ -10978,8 +11069,24 @@ fn is_string_safe_for_ssrf(sf: &crate::abstract_interp::StringFact) -> bool {
// Absolute-path prefix (e.g. "/projects/..."), internal redirect, not open redirect.
// The leading "/" locks the path to the same origin; the attacker cannot control the scheme
// or host, so this is not an SSRF vector.
//
// Exception: a bare "/" lock is NOT safe. When the only constant is a
// single "/" and the attacker-controlled suffix begins with another "/",
// the full value is "//evil.com/..." — a protocol-relative URL that
// browsers and common HTTP clients fetch from the attacker's host.
// Likewise a prefix already starting with "//" is protocol-relative.
// Require a fixed non-slash second character so the leading-slash lock
// cannot be widened into "//host".
if prefix.starts_with('/') {
return true;
let mut chars = prefix.chars();
chars.next(); // consume leading '/'
match chars.next() {
// "/x..." with x != '/': same-origin path lock, safe.
Some(c) if c != '/' => return true,
// "//..." protocol-relative, or bare "/" that a tainted "/host"
// suffix can turn protocol-relative: not safe.
_ => return false,
}
}
if let Some(after_scheme) = prefix.find("://") {
let host_and_rest = &prefix[after_scheme + 3..];
@ -11223,6 +11330,19 @@ struct ResolvedSummary {
param_to_gate_filters: Vec<(usize, Cap)>,
propagates_taint: bool,
propagating_params: Vec<usize>,
/// Per-parameter sanitizer strip-bits lifted from
/// `SsaFuncSummary::param_to_return`. Each `(param_idx, bits)` entry
/// means "taint flowing through THIS parameter to the return value has
/// `bits` stripped" (a `StripBits` transform on that param). Params with
/// an `Identity` transform contribute no entry.
///
/// Kept per-parameter so the call site can strip a param's sanitizer
/// only from that param's own taint contribution, instead of unioning
/// every param's StripBits into one aggregate (`sanitizer_caps`) and
/// applying it to the union of all propagating args — which bleeds one
/// param's sanitizer onto an unsanitized sibling arg
/// (`f(a,b){ return a + escape(b) }` would otherwise strip `a`'s caps).
param_to_return_strip: Vec<(usize, Cap)>,
/// Parameter indices whose container identity flows to return value.
param_container_to_return: Vec<usize>,
/// (src_param, container_param) pairs: src taint stored into container.
@ -11418,6 +11538,7 @@ fn resolve_callee_full(
param_to_sink_sites: vec![],
propagates_taint: !ls.propagating_params.is_empty(),
propagating_params: ls.propagating_params.clone(),
param_to_return_strip: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
@ -11490,6 +11611,7 @@ fn resolve_callee_full(
param_to_sink_sites: vec![],
propagates_taint: false,
propagating_params: vec![],
param_to_return_strip: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
@ -11694,6 +11816,7 @@ fn resolve_callee_full(
param_to_sink_sites: vec![],
propagates_taint: !ls.propagating_params.is_empty(),
propagating_params: ls.propagating_params.clone(),
param_to_return_strip: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
@ -11748,6 +11871,7 @@ fn resolve_callee_full(
param_to_sink_sites: fs.param_to_sink.clone(),
propagates_taint: fs.propagates_any(),
propagating_params: fs.propagating_params.clone(),
param_to_return_strip: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
@ -11817,6 +11941,7 @@ fn resolve_callee_full(
param_to_sink_sites: fs.param_to_sink.clone(),
propagates_taint: fs.propagates_any(),
propagating_params: fs.propagating_params.clone(),
param_to_return_strip: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
@ -11840,6 +11965,34 @@ fn resolve_callee_full(
None
}
/// Strip bits recorded for a single parameter (`param_to_return_strip`),
/// or `Cap::empty()` when this parameter has no `StripBits` transform.
///
/// This is the per-parameter sanitizer. When per-parameter decomposition
/// EXISTS (`param_to_return_strip` non-empty) it deliberately does NOT
/// consult the cross-param aggregate (`resolved.sanitizer_caps`), so one
/// parameter's sanitizer cannot bleed onto a sibling argument's taint
/// (`f(a,b){ return a + escape(b) }`).
///
/// When `param_to_return_strip` is ENTIRELY empty, the summary carries no
/// per-param decomposition at all — this is the coarse cross-file tier
/// (`ResolvedSummary` built from a `FuncSummary`/`LocalFuncSummary` where
/// only the aggregate `sanitizer_caps` is known, e.g. cross-file resolution
/// in [`resolve_callee_full`]). There is no sibling param to bleed from, so
/// fall back to the aggregate — matching the pre-#37 behaviour and keeping
/// cross-file sanitizers (`clean(input)` in another file) effective.
fn param_strip_bits(resolved: &ResolvedSummary, param_idx: usize) -> Cap {
if resolved.param_to_return_strip.is_empty() {
return resolved.sanitizer_caps;
}
resolved
.param_to_return_strip
.iter()
.find(|(i, _)| *i == param_idx)
.map(|(_, caps)| *caps)
.unwrap_or_else(Cap::empty)
}
/// Compute the effective sanitizer bits that apply at the call site for a
/// specific parameter, narrowed by the caller's predicate state.
///
@ -11850,10 +12003,11 @@ fn resolve_callee_full(
/// not know which return path the callee took, so only bits stripped on
/// EVERY compatible path can be considered cleared.
///
/// Falls back to `resolved.sanitizer_caps` (the aggregate) when:
/// Falls back to the parameter's own `param_to_return_strip` bits (see
/// [`param_strip_bits`], NOT the cross-param aggregate) when:
/// * the summary has no per-path data for this parameter;
/// * every path is predicate-compatible (the narrowing adds no information);
/// * no path is predicate-compatible (conservative: keep aggregate).
/// * no path is predicate-compatible (conservative: keep the per-param bits).
fn effective_param_sanitizer(
resolved: &ResolvedSummary,
param_idx: usize,
@ -11867,7 +12021,14 @@ fn effective_param_sanitizer(
.find(|(i, _)| *i == param_idx)
{
Some((_, p)) => p,
None => return resolved.sanitizer_caps,
None => {
// No per-return-path decomposition for this param: fall back to
// THIS parameter's own strip bits, NOT the cross-param aggregate
// (`resolved.sanitizer_caps`). Using the aggregate here bleeds a
// sibling param's sanitizer onto this param's taint
// (`f(a,b){ return a + escape(b) }`).
return param_strip_bits(resolved, param_idx);
}
};
// Caller-side predicate envelope: union of known_true / known_false bits
@ -11898,9 +12059,10 @@ fn effective_param_sanitizer(
if compatible.is_empty() {
// No path applies, the caller's predicate state contradicts every
// recorded return. Fall back to the aggregate rather than
// synthesise a sanitiser from zero data.
return resolved.sanitizer_caps;
// recorded return. Fall back to THIS parameter's own strip bits
// rather than the cross-param aggregate (which would bleed a sibling
// param's sanitizer onto this param's taint).
return param_strip_bits(resolved, param_idx);
}
// Intersection of strip-bits across compatible paths. Identity
@ -11927,7 +12089,7 @@ fn effective_param_sanitizer(
}
}
if !saw_any {
resolved.sanitizer_caps
param_strip_bits(resolved, param_idx)
} else {
common
}
@ -11952,11 +12114,19 @@ fn convert_ssa_to_resolved_for_caller(
.map(|(idx, _)| *idx)
.collect();
// Compute effective sanitizer caps: union of StripBits across all params
// Compute effective sanitizer caps: union of StripBits across all params.
// Retained for the aggregate fallback / non-positional resolution paths,
// but the per-parameter `param_to_return_strip` below is what the call
// site should prefer so one param's sanitizer does not bleed onto a
// sibling arg's taint.
let mut sanitizer_caps = Cap::empty();
for (_, transform) in &ssa_sum.param_to_return {
let mut param_to_return_strip: Vec<(usize, Cap)> = Vec::new();
for (idx, transform) in &ssa_sum.param_to_return {
if let TaintTransform::StripBits(bits) = transform {
sanitizer_caps |= *bits;
if !bits.is_empty() {
param_to_return_strip.push((*idx, *bits));
}
}
}
@ -12005,6 +12175,7 @@ fn convert_ssa_to_resolved_for_caller(
param_to_sink_sites,
propagates_taint: !propagating_params.is_empty(),
propagating_params,
param_to_return_strip,
param_container_to_return: ssa_sum.param_container_to_return.clone(),
param_to_container_store: ssa_sum.param_to_container_store.clone(),
return_type: ssa_sum.return_type.clone(),
@ -12128,6 +12299,23 @@ fn merge_resolved_summaries_fanout(
}
}
// param_to_return_strip: intersect per-parameter strip bits, mirroring
// the `sanitizer_caps` AND rule. Only bits stripped by EVERY
// implementer for a given parameter can be considered cleared, since the
// virtual dispatch could land on an implementer that does not sanitize
// that parameter. A param present in only one side is dropped (the
// other implementer strips nothing for it).
let mut merged_strip: Vec<(usize, Cap)> = Vec::new();
for (idx, caps) in &acc.param_to_return_strip {
if let Some((_, other)) = r.param_to_return_strip.iter().find(|(i, _)| i == idx) {
let inter = *caps & *other;
if !inter.is_empty() {
merged_strip.push((*idx, inter));
}
}
}
acc.param_to_return_strip = merged_strip;
// SSA-precision fields: drop on any disagreement.
if acc.return_type != r.return_type {
acc.return_type = None;
@ -12150,3 +12338,76 @@ fn merge_resolved_summaries_fanout(
acc
}
#[cfg(test)]
mod engine_audit_fixes_tests {
use super::*;
// Finding #9: `!== null` must NOT be read as an equality null-check.
#[test]
fn null_check_strict_inequality_is_not_success_branch() {
// Equality forms: null/none/no-error outcome is the TRUE branch.
assert!(null_check_true_branch_is_success("err == null"));
assert!(null_check_true_branch_is_success("err === null"));
assert!(null_check_true_branch_is_success("err == nil"));
assert!(null_check_true_branch_is_success("err == none"));
assert!(null_check_true_branch_is_success("err is none"));
assert!(null_check_true_branch_is_success("is_ok(err)"));
// Inequality forms: null outcome is the FALSE branch. The strict
// `!==` form previously matched `== null` as a substring.
assert!(!null_check_true_branch_is_success("err !== null"));
assert!(!null_check_true_branch_is_success("err != null"));
assert!(!null_check_true_branch_is_success("err != nil"));
assert!(!null_check_true_branch_is_success("err != none"));
assert!(!null_check_true_branch_is_success("is_err(err)"));
}
// Finding #6: only free-function exit/abort wipes taint state; a
// receiver-qualified method with the same trailing segment must not.
#[test]
fn noreturn_only_matches_free_functions() {
// Free functions in C/C++ terminate the process.
assert!(is_noreturn_call(Lang::C, "exit"));
assert!(is_noreturn_call(Lang::Cpp, "abort"));
assert!(is_noreturn_call(Lang::C, "_Exit"));
assert!(is_noreturn_call(Lang::Cpp, "quick_exit"));
// Method calls (dot or arrow receiver) are ordinary calls.
assert!(!is_noreturn_call(Lang::Cpp, "transaction.abort"));
assert!(!is_noreturn_call(Lang::Cpp, "app.exit"));
assert!(!is_noreturn_call(Lang::C, "handlers->abort"));
assert!(!is_noreturn_call(Lang::Cpp, "state_machine.abort"));
// Other languages never treat these as noreturn.
assert!(!is_noreturn_call(Lang::JavaScript, "exit"));
}
// Finding #7: protocol-relative `//host` bypass of the leading-slash
// SSRF origin lock.
#[test]
fn ssrf_safe_rejects_protocol_relative_prefix() {
use crate::abstract_interp::StringFact;
// A real same-origin path lock is safe.
assert!(is_string_safe_for_ssrf(&StringFact::from_prefix("/api/")));
assert!(is_string_safe_for_ssrf(&StringFact::from_prefix(
"/projects"
)));
// A bare "/" lock is NOT safe: a tainted "/evil.com" suffix yields
// a protocol-relative "//evil.com" URL.
assert!(!is_string_safe_for_ssrf(&StringFact::from_prefix("/")));
// An already protocol-relative prefix is not safe either.
assert!(!is_string_safe_for_ssrf(&StringFact::from_prefix("//")));
assert!(!is_string_safe_for_ssrf(&StringFact::from_prefix(
"//evil.com"
)));
// Scheme-locked absolute URL with a path separator stays safe.
assert!(is_string_safe_for_ssrf(&StringFact::from_prefix(
"https://api.internal/"
)));
}
}

View file

@ -992,14 +992,34 @@ pub(super) fn summarise_return_predicates(state: &SsaTaintState) -> (u64, u8, u8
let hash = h.finish();
// Intersect known_true / known_false across all tracked variables:
// the bits that hold for EVERY predicate-tracked var at this return.
let known_true = sorted
.iter()
.map(|(_, kt, _)| *kt)
.fold(u8::MAX, |a, b| a & b);
let known_false = sorted
.iter()
.map(|(_, _, kf)| *kf)
.fold(u8::MAX, |a, b| a & b);
//
// When `sorted` is empty (a return path guarded only by
// ValidationCall/AllowlistCheck/TypeCheck, which populate `validated_must`
// but never the `predicates` list because `predicate_kind_bit` returns
// `None` for those kinds), the fold's `u8::MAX` identity would yield
// `0xFF`/`0xFF`, claiming every predicate kind is simultaneously known-true
// AND known-false on this path, a contradiction. The consumer
// (`effective_param_sanitizer`) then prunes this validated path from the
// compatible set whenever the caller tracks any predicate bit, over-broadly
// clearing taint. No tracked predicate means no bit is known either way,
// so the correct identity on the empty list is `0` (matching the Top-state
// convention returned above).
let known_true = if sorted.is_empty() {
0
} else {
sorted
.iter()
.map(|(_, kt, _)| *kt)
.fold(u8::MAX, |a, b| a & b)
};
let known_false = if sorted.is_empty() {
0
} else {
sorted
.iter()
.map(|(_, _, kf)| *kf)
.fold(u8::MAX, |a, b| a & b)
};
// Use `1` for the "no predicates but validated_must non-empty" case to
// avoid colliding with the unguarded sentinel (0).
let hash = if hash == 0 { 1 } else { hash };
@ -1406,3 +1426,72 @@ pub(crate) fn extract_container_flow_summary(
container_store.sort();
(ctr, container_store)
}
#[cfg(test)]
mod tests {
use super::*;
use crate::state::symbol::SymbolId;
use crate::taint::domain::PredicateSummary;
/// Top state (no predicates, no validated_must) → all-zero sentinel.
#[test]
fn summarise_return_predicates_top_is_zero() {
let state = SsaTaintState::initial();
assert_eq!(summarise_return_predicates(&state), (0, 0, 0));
}
/// A return path guarded only by a ValidationCall/AllowlistCheck/TypeCheck
/// populates `validated_must` but leaves `predicates` empty. The
/// known-true / known-false fields MUST be 0 (no tracked predicate ⇒ no bit
/// known either way), not the `u8::MAX` fold identity, which would claim
/// every predicate kind is simultaneously known-true AND known-false and
/// cause the consumer to over-prune the validated path.
#[test]
fn summarise_return_predicates_validated_only_path_has_zero_bits() {
let mut state = SsaTaintState::initial();
state.validated_must.insert(SymbolId(3));
let (hash, kt, kf) = summarise_return_predicates(&state);
// Non-zero hash (validated_must is non-empty → distinct from unguarded).
assert_ne!(
hash, 0,
"validated path must hash distinctly from unguarded"
);
assert_eq!(
kt, 0,
"no tracked predicate ⇒ known_true must be 0, not 0xFF"
);
assert_eq!(
kf, 0,
"no tracked predicate ⇒ known_false must be 0, not 0xFF"
);
// The result must not be a self-contradiction.
assert_eq!(kt & kf, 0, "known_true & known_false contradiction");
}
/// When predicates ARE tracked, the intersection still reflects the
/// actual per-var bits (regression guard that the empty-list special case
/// did not disturb the populated path).
#[test]
fn summarise_return_predicates_tracked_bits_intersect() {
let mut state = SsaTaintState::initial();
state.predicates.push((
SymbolId(1),
PredicateSummary {
known_true: 0b0000_0011,
known_false: 0b0000_0100,
},
));
state.predicates.push((
SymbolId(2),
PredicateSummary {
known_true: 0b0000_0001,
known_false: 0b0000_1100,
},
));
let (_hash, kt, kf) = summarise_return_predicates(&state);
// Intersection of known_true: 0b011 & 0b001 = 0b001.
assert_eq!(kt, 0b0000_0001);
// Intersection of known_false: 0b100 & 0b1100 = 0b100.
assert_eq!(kf, 0b0000_0100);
}
}

View file

@ -1650,6 +1650,7 @@ mod fanout_merge_tests {
sink_caps: Cap::empty(),
param_to_sink: vec![],
param_to_sink_sites: vec![],
param_to_return_strip: vec![],
propagates_taint: false,
propagating_params: vec![],
param_container_to_return: vec![],