mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-27 20:29:39 +02:00
Prerelease cleanup (#46)
* feat: Add const_bound_vars tracking to prevent false positives in ownership checks
* feat: Introduce field interner and typed bounded vars for enhanced type tracking
* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking
* feat: Centralize method name extraction with bare_method_name helper
* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch
* feat: Enhance C++ taint tracking with additional container operations and inline method resolution
* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking
* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis
* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations
* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details
* test: Add comprehensive tests for lattice algebra laws and SSA edge cases
* feat: Add destructured session user handling and safe user ID access patterns
* feat: Implement row-population reverse-walk for enhanced authorization checks
* feat: Enhance authorization checks with local alias chain for self-actor types
* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction
* feat: Implement chained method call inner-gate rebinding for SSRF prevention
* feat: Add observability and error modules, enhance debug functionality, and implement theme context
* feat: Remove Auth Analysis page and update navigation to redirect to Explorer
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity
* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build
The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(closure-capture): flip JS/TS fixtures to required-finding
The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.
Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".
Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis
* feat: Introduce health module and enhance health score computation with calibration tests
* feat: Add expectations configuration and cleanup .gitignore for log files
* feat: Implement theme selection and enhance settings panel for triage sync
* feat: Suppress false positives for strcpy calls with literal sources in AST
* feat: Update analyse_function_ssa to return body CFG for accurate analysis
* feat: Add bug report and feature request templates for improved issue tracking
* feat: removed dev scripts
* feat: update README.md for clarity and consistency in fixture descriptions
* feat: removed dev docs
* feat: clean up error handling and UI elements for improved user experience
* feat: adjust button sizes in HeaderBar for better UI consistency
* feat: enhance taint analysis with additional context for sanitizer and taint findings
* cargo fmt
* prettier
* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts
* feat: add script to frame PNG screenshots with brand gradient
* feat: add fuzzing support with new targets and CI workflows
* refactor: streamline match expressions and improve formatting in CLI and output handling
* feat: enhance configuration display with detailed output options
* feat: stage demo configuration for improved CLI screenshot output
* feat: expose merge_configs function for user-configurable settings
* refactor: simplify code structure and improve readability in config handling
* refactor: improve descriptions for vulnerability patterns in various languages
* feat: update MIT License section with additional usage details and copyright information
* feat: update screenshots
* refactor: update build process and paths for frontend assets
* feat: add cross-file taint fuzzing target and supporting dictionary
* refactor: clean up formatting and comments in fuzz configuration and example files
* refactor: remove outdated comments and clean up CI configuration files
* chore: update changelog dates and improve formatting in documentation
* refactor: update Cargo.toml and CI configuration for improved packaging and build process
* refactor: enhance quote-stripping logic to prevent panics and add regression tests
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
79c29b394d
commit
82f18184b1
348 changed files with 48731 additions and 2925 deletions
|
|
@ -215,6 +215,8 @@ mod tests {
|
|||
value_defs: defs,
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -59,9 +59,12 @@ impl ConstLattice {
|
|||
return ConstLattice::Int(i);
|
||||
}
|
||||
|
||||
// String: strip surrounding quotes
|
||||
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|
||||
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
|
||||
// String: strip surrounding quotes. Require len >= 2 so a lone `'`
|
||||
// or `"` (where starts_with and ends_with both match the same byte)
|
||||
// does not produce an empty `[1..0]` slice and panic.
|
||||
if trimmed.len() >= 2
|
||||
&& ((trimmed.starts_with('"') && trimmed.ends_with('"'))
|
||||
|| (trimmed.starts_with('\'') && trimmed.ends_with('\'')))
|
||||
{
|
||||
let inner = &trimmed[1..trimmed.len() - 1];
|
||||
return ConstLattice::Str(inner.to_string());
|
||||
|
|
@ -279,6 +282,12 @@ fn eval_inst(inst: &SsaInst, values: &HashMap<SsaValue, ConstLattice>) -> ConstL
|
|||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam => ConstLattice::Varying,
|
||||
// FieldProj: projecting a field is dynamic with respect to the
|
||||
// const-propagation lattice — there is no general way to fold
|
||||
// `obj.field` to a known scalar at this phase. Returning Varying
|
||||
// matches Call: callers needing field-level constness will go
|
||||
// through the points-to / heap analysis.
|
||||
SsaOp::FieldProj { .. } => ConstLattice::Varying,
|
||||
SsaOp::Phi(_) => ConstLattice::Varying, // phis in body shouldn't happen
|
||||
SsaOp::Nop => ConstLattice::Varying,
|
||||
// Undef contributes no knowledge: `Top` is the lattice identity
|
||||
|
|
@ -303,6 +312,7 @@ fn inst_uses(inst: &SsaInst) -> Vec<SsaValue> {
|
|||
}
|
||||
vals
|
||||
}
|
||||
SsaOp::FieldProj { receiver, .. } => vec![*receiver],
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
|
|
@ -626,6 +636,8 @@ mod tests {
|
|||
value_defs,
|
||||
cfg_node_map,
|
||||
exception_edges: Vec::new(),
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -751,4 +763,129 @@ mod tests {
|
|||
Some(&ConstLattice::Bool(true))
|
||||
);
|
||||
}
|
||||
|
||||
/// Meet must be commutative: `a ⊓ b == b ⊓ a` for every pair of
|
||||
/// lattice values. Iterates a representative cross product; failure
|
||||
/// would indicate the implementation special-cased one operand.
|
||||
#[test]
|
||||
fn meet_lattice_is_commutative() {
|
||||
let vals = [
|
||||
ConstLattice::Top,
|
||||
ConstLattice::Varying,
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(0),
|
||||
ConstLattice::Int(42),
|
||||
ConstLattice::Bool(true),
|
||||
ConstLattice::Bool(false),
|
||||
ConstLattice::Str("a".into()),
|
||||
ConstLattice::Str("b".into()),
|
||||
];
|
||||
for a in &vals {
|
||||
for b in &vals {
|
||||
assert_eq!(
|
||||
a.meet(b),
|
||||
b.meet(a),
|
||||
"meet should be commutative for ({a:?}, {b:?})"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Meet must be associative: `(a ⊓ b) ⊓ c == a ⊓ (b ⊓ c)`.
|
||||
#[test]
|
||||
fn meet_lattice_is_associative() {
|
||||
let vals = [
|
||||
ConstLattice::Top,
|
||||
ConstLattice::Varying,
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(0),
|
||||
ConstLattice::Int(42),
|
||||
ConstLattice::Bool(true),
|
||||
ConstLattice::Str("x".into()),
|
||||
];
|
||||
for a in &vals {
|
||||
for b in &vals {
|
||||
for c in &vals {
|
||||
let lhs = a.meet(b).meet(c);
|
||||
let rhs = a.meet(&b.meet(c));
|
||||
assert_eq!(lhs, rhs, "associativity broken on ({a:?},{b:?},{c:?})");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Meet must be idempotent: `a ⊓ a == a` for every lattice value.
|
||||
#[test]
|
||||
fn meet_lattice_is_idempotent() {
|
||||
let vals = [
|
||||
ConstLattice::Top,
|
||||
ConstLattice::Varying,
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(7),
|
||||
ConstLattice::Bool(false),
|
||||
ConstLattice::Str("y".into()),
|
||||
];
|
||||
for a in &vals {
|
||||
assert_eq!(a.meet(a), a.clone(), "idempotence broken on {a:?}");
|
||||
}
|
||||
}
|
||||
|
||||
/// Top is the meet identity: `Top ⊓ x == x` for every value.
|
||||
/// Varying is meet-absorbing: `Varying ⊓ x == Varying`.
|
||||
/// Two distinct concrete values meet to Varying.
|
||||
#[test]
|
||||
fn meet_lattice_extremes() {
|
||||
let xs = [
|
||||
ConstLattice::Null,
|
||||
ConstLattice::Int(1),
|
||||
ConstLattice::Bool(true),
|
||||
ConstLattice::Str("a".into()),
|
||||
];
|
||||
for x in &xs {
|
||||
assert_eq!(ConstLattice::Top.meet(x), x.clone());
|
||||
assert_eq!(x.meet(&ConstLattice::Top), x.clone());
|
||||
assert_eq!(ConstLattice::Varying.meet(x), ConstLattice::Varying);
|
||||
assert_eq!(x.meet(&ConstLattice::Varying), ConstLattice::Varying);
|
||||
}
|
||||
assert_eq!(
|
||||
ConstLattice::Int(1).meet(&ConstLattice::Int(2)),
|
||||
ConstLattice::Varying
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::Bool(true).meet(&ConstLattice::Bool(false)),
|
||||
ConstLattice::Varying
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::Str("a".into()).meet(&ConstLattice::Str("b".into())),
|
||||
ConstLattice::Varying
|
||||
);
|
||||
}
|
||||
|
||||
/// Const parsing must round-trip integer signs. i64::MIN/MAX must
|
||||
/// parse without overflow; arbitrary text falls back to a bare-string
|
||||
/// const (current contract — tested here so a future change is
|
||||
/// caught explicitly).
|
||||
#[test]
|
||||
fn const_parse_extremes_and_fallback() {
|
||||
assert_eq!(
|
||||
ConstLattice::parse(&i64::MAX.to_string()),
|
||||
ConstLattice::Int(i64::MAX)
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::parse(&i64::MIN.to_string()),
|
||||
ConstLattice::Int(i64::MIN)
|
||||
);
|
||||
// Larger than i64 falls back to bare-string.
|
||||
let huge = "99999999999999999999";
|
||||
assert_eq!(
|
||||
ConstLattice::parse(huge),
|
||||
ConstLattice::Str(huge.to_string())
|
||||
);
|
||||
// Empty string parses as empty Str (not panic).
|
||||
assert_eq!(ConstLattice::parse(""), ConstLattice::Str("".into()));
|
||||
// Lone quote characters must not panic in the quote-stripping path
|
||||
// (regression for fuzz crash-2f943c14: `'` triggered &s[1..0]).
|
||||
assert_eq!(ConstLattice::parse("'"), ConstLattice::Str("'".into()));
|
||||
assert_eq!(ConstLattice::parse("\""), ConstLattice::Str("\"".into()));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -213,6 +213,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
|
||||
|
|
@ -225,4 +227,494 @@ mod tests {
|
|||
assert!(matches!(body.blocks[0].body[1].op, SsaOp::Nop));
|
||||
assert!(matches!(body.blocks[0].body[2].op, SsaOp::Nop));
|
||||
}
|
||||
|
||||
/// `resolve_root` has a 1000-iteration safety cap to avoid livelock if
|
||||
/// a malformed copy map ever contains a cycle (SSA itself is acyclic,
|
||||
/// but defensively we want this guarantee on the helper). Confirm the
|
||||
/// cap actually fires by feeding a hand-crafted cycle a → b → a.
|
||||
#[test]
|
||||
fn resolve_root_terminates_on_cyclic_copy_map() {
|
||||
let mut map: std::collections::HashMap<SsaValue, SsaValue> =
|
||||
std::collections::HashMap::new();
|
||||
map.insert(SsaValue(0), SsaValue(1));
|
||||
map.insert(SsaValue(1), SsaValue(0));
|
||||
// Must terminate; the exact returned value isn't a correctness
|
||||
// guarantee under malformed input, but no infinite loop is.
|
||||
let _root = resolve_root(SsaValue(0), &map);
|
||||
}
|
||||
|
||||
/// A four-deep copy chain v3 = v2 = v1 = v0 must collapse to v0
|
||||
/// in a single `copy_propagate` pass — the resolved replacement
|
||||
/// map drives downstream alias recovery, so the *transitive*
|
||||
/// closure must be exposed, not just the immediate parent.
|
||||
#[test]
|
||||
fn deep_copy_chain_collapses_to_root() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let nodes: Vec<_> = (0..4)
|
||||
.map(|_| cfg.add_node(make_cfg_node(StmtKind::Seq)))
|
||||
.collect();
|
||||
|
||||
let mut block_body = vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("\"x\"".into())),
|
||||
cfg_node: nodes[0],
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
}];
|
||||
for (i, node) in nodes.iter().enumerate().take(4).skip(1) {
|
||||
block_body.push(SsaInst {
|
||||
value: SsaValue(i as u32),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue((i - 1) as u32), 1)),
|
||||
cfg_node: *node,
|
||||
var_name: Some(format!("v{i}")),
|
||||
span: (i, i + 1),
|
||||
});
|
||||
}
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: block_body,
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: (0..4)
|
||||
.map(|i| ValueDef {
|
||||
var_name: Some(format!("v{i}")),
|
||||
cfg_node: nodes[i],
|
||||
block: BlockId(0),
|
||||
})
|
||||
.collect(),
|
||||
cfg_node_map: nodes
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, n)| (*n, SsaValue(i as u32)))
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 3, "v1, v2, v3 must all be eliminated");
|
||||
for i in 1..4 {
|
||||
assert_eq!(
|
||||
copy_map.get(&SsaValue(i)),
|
||||
Some(&SsaValue(0)),
|
||||
"v{i} must resolve transitively to v0"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
// Skip-conditions: copy-prop must NOT erase semantic info attached
|
||||
// to a copy's CFG node. These guard the three early-exits in
|
||||
// `copy_propagate`: labels, numeric-length, and string_prefix.
|
||||
// ─────────────────────────────────────────────────────────────────
|
||||
|
||||
/// Build a single-block SSA body containing
|
||||
/// v0 = Const, v1 = Assign(v0)
|
||||
/// with `node1_decorator` applied to v1's CFG node so individual
|
||||
/// skip-conditions can be exercised.
|
||||
fn build_two_inst_body(decorate: impl FnOnce(&mut NodeInfo)) -> (Cfg, SsaBody) {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let mut n1_info = make_cfg_node(StmtKind::Seq);
|
||||
decorate(&mut n1_info);
|
||||
let n1 = cfg.add_node(n1_info);
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (3, 5),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
(cfg, body)
|
||||
}
|
||||
|
||||
/// Skip path 1: an Assign whose CFG node carries a label
|
||||
/// (sanitizer/source/sink) must NOT be propagated through. Erasing
|
||||
/// that label would silently drop a sanitization step from the
|
||||
/// taint path.
|
||||
#[test]
|
||||
fn copy_with_label_on_cfg_node_is_not_propagated() {
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
use smallvec::smallvec;
|
||||
let (cfg, mut body) = build_two_inst_body(|info| {
|
||||
info.taint.labels = smallvec![DataLabel::Sanitizer(Cap::SHELL_ESCAPE)];
|
||||
});
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 0, "copy through a labeled node must be skipped");
|
||||
assert!(
|
||||
matches!(body.blocks[0].body[1].op, SsaOp::Assign(_)),
|
||||
"labeled copy must remain an Assign, not be Nop'd"
|
||||
);
|
||||
}
|
||||
|
||||
/// Skip path 2: numeric-length reads (`arr.length`, `map.size`)
|
||||
/// have a different type from their source — propagating through
|
||||
/// would erase the Int type fact.
|
||||
#[test]
|
||||
fn copy_through_numeric_length_access_is_not_propagated() {
|
||||
let (cfg, mut body) = build_two_inst_body(|info| {
|
||||
info.is_numeric_length_access = true;
|
||||
});
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
eliminated, 0,
|
||||
"copy through numeric-length access must be skipped"
|
||||
);
|
||||
}
|
||||
|
||||
/// Skip path 3: an Assign carrying a `string_prefix` (template
|
||||
/// literal or `"lit" + var` RHS) seeds a StringFact on its SSA
|
||||
/// value. Propagating past it erases the prefix-bearing value and
|
||||
/// breaks SSRF prefix-lock suppression downstream.
|
||||
#[test]
|
||||
fn copy_through_string_prefix_node_is_not_propagated() {
|
||||
let (cfg, mut body) = build_two_inst_body(|info| {
|
||||
info.string_prefix = Some("https://api.example.com/".into());
|
||||
});
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
eliminated, 0,
|
||||
"copy through string_prefix-bearing node must be skipped"
|
||||
);
|
||||
}
|
||||
|
||||
/// Multi-operand Assigns (e.g. `v2 = v0 + v1`) are NOT copies and
|
||||
/// must be left alone. Only single-operand Assigns are copies.
|
||||
#[test]
|
||||
fn multi_operand_assign_is_not_a_copy() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("1".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Const(Some("2".into())),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Assign({
|
||||
let mut v: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
v.push(SsaValue(0));
|
||||
v.push(SsaValue(1));
|
||||
v
|
||||
}),
|
||||
cfg_node: n2,
|
||||
var_name: Some("z".into()),
|
||||
span: (4, 5),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("z".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 0, "two-operand Assign is not a copy");
|
||||
assert!(
|
||||
matches!(body.blocks[0].body[2].op, SsaOp::Assign(_)),
|
||||
"multi-operand Assign must be preserved"
|
||||
);
|
||||
}
|
||||
|
||||
/// A Call's argument and receiver slots that reference a
|
||||
/// copy-eliminated value must be rewritten to the root.
|
||||
#[test]
|
||||
fn call_args_and_receiver_rewritten_through_copy() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Call));
|
||||
let mut arg_vec: SmallVec<[SsaValue; 2]> = SmallVec::new();
|
||||
arg_vec.push(SsaValue(1));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("\"x\"".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Call {
|
||||
callee: "f".into(),
|
||||
callee_text: None,
|
||||
args: vec![arg_vec],
|
||||
receiver: Some(SsaValue(1)),
|
||||
},
|
||||
cfg_node: n2,
|
||||
var_name: None,
|
||||
span: (4, 7),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, _) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 1, "v1 should be eliminated");
|
||||
let call_inst = &body.blocks[0].body[2];
|
||||
match &call_inst.op {
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
assert_eq!(receiver, &Some(SsaValue(0)), "receiver rewritten to root");
|
||||
assert_eq!(args[0][0], SsaValue(0), "call arg rewritten to root");
|
||||
}
|
||||
other => panic!("expected Call op, got {:?}", other),
|
||||
}
|
||||
}
|
||||
|
||||
/// Phi operand referencing a copy-eliminated value must be
|
||||
/// rewritten to the root.
|
||||
#[test]
|
||||
fn phi_operand_rewritten_through_copy() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
// Block 0: v0=const, v1=assign(v0)
|
||||
// Block 1: v2 = phi(B0: v1)
|
||||
let mut phi_ops: smallvec::SmallVec<[(BlockId, SsaValue); 2]> = smallvec::SmallVec::new();
|
||||
phi_ops.push((BlockId(0), SsaValue(1)));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![
|
||||
SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("\"v0\"".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: SmallVec::new(),
|
||||
succs: {
|
||||
let mut s = SmallVec::new();
|
||||
s.push(BlockId(1));
|
||||
s
|
||||
},
|
||||
},
|
||||
SsaBlock {
|
||||
id: BlockId(1),
|
||||
phis: vec![SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Phi(phi_ops),
|
||||
cfg_node: n2,
|
||||
var_name: Some("b".into()),
|
||||
span: (4, 5),
|
||||
}],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(Some(SsaValue(2))),
|
||||
preds: {
|
||||
let mut p = SmallVec::new();
|
||||
p.push(BlockId(0));
|
||||
p
|
||||
},
|
||||
succs: SmallVec::new(),
|
||||
},
|
||||
],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(1),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 1);
|
||||
// The phi in block 1 should now reference v0, not v1.
|
||||
let phi = &body.blocks[1].phis[0];
|
||||
match &phi.op {
|
||||
SsaOp::Phi(ops) => {
|
||||
assert_eq!(
|
||||
ops[0].1,
|
||||
SsaValue(0),
|
||||
"phi operand should be rewritten to root v0"
|
||||
);
|
||||
}
|
||||
other => panic!("expected Phi op, got {:?}", other),
|
||||
}
|
||||
}
|
||||
|
||||
/// `copy_propagate` on a body with no Assign instructions returns
|
||||
/// `(0, empty_map)` and leaves the body untouched.
|
||||
#[test]
|
||||
fn no_op_when_no_copies_exist() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let (eliminated, map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 0);
|
||||
assert!(map.is_empty());
|
||||
}
|
||||
}
|
||||
|
|
|
|||
364
src/ssa/dce.rs
364
src/ssa/dce.rs
|
|
@ -143,6 +143,7 @@ fn inst_used_values(inst: &SsaInst) -> Vec<SsaValue> {
|
|||
}
|
||||
vals
|
||||
}
|
||||
SsaOp::FieldProj { receiver, .. } => vec![*receiver],
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
|
|
@ -214,6 +215,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -260,6 +263,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -307,6 +312,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -350,6 +357,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -385,6 +394,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -392,6 +403,142 @@ mod tests {
|
|||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dce_keeps_field_proj_when_used() {
|
||||
// v0 = source(); v1 = field_proj(v0, "field"); ret v1
|
||||
// The terminator references v1, so the FieldProj's receiver chain
|
||||
// (v0) must stay reachable.
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut interner = crate::ssa::ir::FieldInterner::new();
|
||||
let fid = interner.intern("field");
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Source,
|
||||
cfg_node: n0,
|
||||
var_name: Some("obj".into()),
|
||||
span: (0, 5),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(0),
|
||||
field: fid,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: n1,
|
||||
var_name: Some("obj.field".into()),
|
||||
span: (10, 20),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(Some(SsaValue(1))),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("obj".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("obj.field".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: interner,
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 0,
|
||||
"FieldProj reachable from terminator must survive"
|
||||
);
|
||||
assert_eq!(body.blocks[0].body.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dce_removes_dead_field_proj() {
|
||||
// v0 = const("x"); v1 = field_proj(v0, "field"); ret (no v1 use)
|
||||
// Both should be removed since neither has a use and neither is
|
||||
// a Source/Call/labeled instruction.
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut interner = crate::ssa::ir::FieldInterner::new();
|
||||
let fid = interner.intern("field");
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("x".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("obj".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(0),
|
||||
field: fid,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: n1,
|
||||
var_name: Some("obj.field".into()),
|
||||
span: (2, 12),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("obj".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("obj.field".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: interner,
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
// First pass removes the FieldProj (no uses), second removes the Const
|
||||
// (no uses after FieldProj is gone).
|
||||
assert_eq!(
|
||||
removed, 2,
|
||||
"dead FieldProj and its dead receiver const must be removed"
|
||||
);
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn used_def_preserved() {
|
||||
// v0 = const("42"), v1 = assign(v0) — v0 is used, both survive
|
||||
|
|
@ -438,6 +585,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
|
|
@ -446,4 +595,219 @@ mod tests {
|
|||
assert_eq!(removed, 2);
|
||||
assert_eq!(body.blocks[0].body.len(), 0);
|
||||
}
|
||||
|
||||
/// DCE must NEVER remove a Call instruction even when its result has
|
||||
/// zero uses — calls have side effects (I/O, throws, mutations) that
|
||||
/// cannot be modeled as SSA-value uses. This is the conservative
|
||||
/// invariant `is_dead()` enforces; regressing it would silently drop
|
||||
/// real-world code from analysis (sinks, sanitizers expressed as
|
||||
/// expression-statements, etc.).
|
||||
#[test]
|
||||
fn dead_call_with_unused_result_preserved() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Call));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Call {
|
||||
callee: "side_effect".into(),
|
||||
callee_text: None,
|
||||
args: Vec::new(),
|
||||
receiver: None,
|
||||
},
|
||||
cfg_node: n0,
|
||||
var_name: None,
|
||||
span: (0, 12),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 0,
|
||||
"Call with unused result must be preserved (side effects)"
|
||||
);
|
||||
assert_eq!(body.blocks[0].body.len(), 1);
|
||||
assert!(matches!(body.blocks[0].body[0].op, SsaOp::Call { .. }));
|
||||
}
|
||||
|
||||
/// A dead phi must be eliminated. We construct an entry block whose
|
||||
/// successor has a phi merging two unused constants and a Return(None).
|
||||
/// All defs are dead; DCE should strip every body and phi instruction.
|
||||
#[test]
|
||||
fn dead_phi_in_otherwise_dead_block_removed() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let entry_block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("1".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Const(Some("2".into())),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (1, 2),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::from_elem(BlockId(1), 1),
|
||||
};
|
||||
let join_block = SsaBlock {
|
||||
id: BlockId(1),
|
||||
phis: vec![SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Phi(smallvec::smallvec![
|
||||
(BlockId(0), SsaValue(0)),
|
||||
(BlockId(0), SsaValue(1)),
|
||||
]),
|
||||
cfg_node: n2,
|
||||
var_name: Some("phi".into()),
|
||||
span: (2, 3),
|
||||
}],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::from_elem(BlockId(0), 1),
|
||||
succs: SmallVec::new(),
|
||||
};
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![entry_block, join_block],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("phi".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(1),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
// Pass 1: the phi (no uses) goes; that drops the use-counts on v0/v1.
|
||||
// Pass 2: v0 and v1 (now unused) go.
|
||||
assert_eq!(removed, 3, "dead phi + two operands should be removed");
|
||||
assert!(
|
||||
body.blocks[1].phis.is_empty(),
|
||||
"dead phi must be eliminated"
|
||||
);
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
/// DCE iteration: removing v1 should make v0 dead on the next pass.
|
||||
/// Mirrors `used_def_preserved` but explicit about the chain.
|
||||
#[test]
|
||||
fn dce_iterates_until_fixpoint() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("1".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("a".into()),
|
||||
span: (0, 1),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("b".into()),
|
||||
span: (1, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(1), 1)),
|
||||
cfg_node: n2,
|
||||
var_name: Some("c".into()),
|
||||
span: (2, 3),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("a".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("b".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("c".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 3,
|
||||
"DCE must reach fixpoint and remove all 3 dead defs in the chain"
|
||||
);
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -48,6 +48,7 @@ impl fmt::Display for SsaBody {
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} => {
|
||||
if let Some(rv) = receiver {
|
||||
write!(f, "v{}.{callee}(", rv.0)?;
|
||||
|
|
@ -64,6 +65,20 @@ impl fmt::Display for SsaBody {
|
|||
.collect();
|
||||
write!(f, "{})", arg_strs.join(", "))?;
|
||||
}
|
||||
SsaOp::FieldProj {
|
||||
receiver,
|
||||
field,
|
||||
projected_type,
|
||||
} => {
|
||||
// Resolve the field name through the body's interner
|
||||
// so display output matches the original source field.
|
||||
let name = self.field_interner.resolve(*field);
|
||||
if let Some(ty) = projected_type {
|
||||
write!(f, "field_proj(v{}, {name:?}) :: {ty:?}", receiver.0)?;
|
||||
} else {
|
||||
write!(f, "field_proj(v{}, {name:?})", receiver.0)?;
|
||||
}
|
||||
}
|
||||
SsaOp::Source => write!(f, "source()")?,
|
||||
SsaOp::Const(val) => {
|
||||
if let Some(v) = val {
|
||||
|
|
|
|||
|
|
@ -23,7 +23,7 @@
|
|||
#![allow(clippy::collapsible_if, clippy::unnecessary_map_or)]
|
||||
|
||||
use crate::cfg::Cfg;
|
||||
use crate::labels::Cap;
|
||||
use crate::labels::{Cap, bare_method_name};
|
||||
use crate::ssa::ir::*;
|
||||
use crate::ssa::pointsto::{ContainerOp, classify_container_op};
|
||||
use crate::symbol::Lang;
|
||||
|
|
@ -588,7 +588,7 @@ fn is_container_literal(text: &str) -> bool {
|
|||
/// Check if a callee creates a new container (constructor/factory).
|
||||
pub fn is_container_constructor(callee: &str, lang: Lang) -> bool {
|
||||
// Extract last segment after '.' or '::' (whichever comes last)
|
||||
let after_dot = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let after_dot = bare_method_name(callee);
|
||||
let suffix = after_dot.rsplit("::").next().unwrap_or(after_dot);
|
||||
let suffix_lower = suffix.to_ascii_lowercase();
|
||||
|
||||
|
|
|
|||
|
|
@ -548,6 +548,7 @@ fn op_kind(op: &SsaOp) -> &'static str {
|
|||
SsaOp::CatchParam => "CatchParam",
|
||||
SsaOp::Nop => "Nop",
|
||||
SsaOp::Undef => "Undef",
|
||||
SsaOp::FieldProj { .. } => "FieldProj",
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -785,6 +786,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
@ -830,6 +833,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
@ -878,6 +883,8 @@ mod tests {
|
|||
}],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
@ -904,6 +911,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
|
|
|
|||
390
src/ssa/ir.rs
390
src/ssa/ir.rs
|
|
@ -1,8 +1,10 @@
|
|||
use crate::constraint::domain::ConstValue;
|
||||
use crate::constraint::lower::ConditionExpr;
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use smallvec::SmallVec;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Unique identifier for an SSA value (one per definition point).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
|
|
@ -12,6 +14,141 @@ pub struct SsaValue(pub u32);
|
|||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub struct BlockId(pub u32);
|
||||
|
||||
/// Interned field-name identifier, scoped to a single [`SsaBody`].
|
||||
///
|
||||
/// Different bodies may assign different `FieldId`s to the same field name,
|
||||
/// so callers MUST resolve through the owning body's [`FieldInterner`]
|
||||
/// (`SsaBody::field_name`) before using the name in cross-body contexts
|
||||
/// (e.g. summary serialization).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub struct FieldId(pub u32);
|
||||
|
||||
impl FieldId {
|
||||
/// Pointer-Phase 4 sentinel for the abstract "any element of a
|
||||
/// container" field. Steensgaard-grade precision: every numeric
|
||||
/// or dynamic index access (`arr[i]`, `arr.shift()`, `map[k]`)
|
||||
/// projects through the same `Field(pt(container), ELEM)` cell so
|
||||
/// per-element taint propagation is independent of the SSA value
|
||||
/// referencing the container.
|
||||
///
|
||||
/// `u32::MAX` is reserved by convention; the per-body
|
||||
/// [`FieldInterner`] never assigns it because interning is
|
||||
/// monotone-ascending from `0` and bodies don't approach 4 billion
|
||||
/// fields. Consumers should compare with `==` rather than reach
|
||||
/// into the wrapped `u32`.
|
||||
pub const ELEM: FieldId = FieldId(u32::MAX);
|
||||
|
||||
/// "Tainted at every field" wildcard sentinel — distinct from
|
||||
/// [`Self::ELEM`] (which is container-element semantics: every
|
||||
/// numeric/dynamic index access projects through it).
|
||||
/// `ANY_FIELD` represents the case where a writeback-shaped sink
|
||||
/// (`json.NewDecoder(r.Body).Decode(&dest)`,
|
||||
/// `proto.Unmarshal(buf, &msg)`) taints the destination wholesale
|
||||
/// without a per-field decomposition the caller can enumerate.
|
||||
/// Read by [`SsaOp::FieldProj`] as a fallback when no specific
|
||||
/// `(loc, *field)` cell exists, so subsequent struct-field reads
|
||||
/// pick up the writeback's taint without over-tainting unrelated
|
||||
/// containers' element cells. `u32::MAX - 1` is reserved
|
||||
/// alongside `ELEM` and is similarly never assigned by the per-
|
||||
/// body interner.
|
||||
pub const ANY_FIELD: FieldId = FieldId(u32::MAX - 1);
|
||||
}
|
||||
|
||||
/// Per-body interner for field names referenced by [`SsaOp::FieldProj`].
|
||||
///
|
||||
/// Names are deduped within a single SSA body: every distinct field-name
|
||||
/// string is assigned a stable `FieldId(u32)` for the lifetime of the body.
|
||||
/// The interner is serialized alongside the body so deserialization restores
|
||||
/// IDs intact; cross-body summary code is responsible for resolving names
|
||||
/// before passing them across body boundaries.
|
||||
#[derive(Clone, Debug, Default, Serialize, Deserialize)]
|
||||
pub struct FieldInterner {
|
||||
/// Names indexed by `FieldId.0`.
|
||||
names: Vec<String>,
|
||||
/// Reverse lookup: name → existing FieldId.
|
||||
#[serde(skip)]
|
||||
lookup: HashMap<String, u32>,
|
||||
}
|
||||
|
||||
impl FieldInterner {
|
||||
/// Create an empty interner.
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
/// Intern a field name, returning its [`FieldId`]. Reuses the existing
|
||||
/// id if the name has already been interned.
|
||||
pub fn intern(&mut self, name: &str) -> FieldId {
|
||||
if let Some(&id) = self.lookup.get(name) {
|
||||
return FieldId(id);
|
||||
}
|
||||
let id = self.names.len() as u32;
|
||||
self.names.push(name.to_string());
|
||||
self.lookup.insert(name.to_string(), id);
|
||||
FieldId(id)
|
||||
}
|
||||
|
||||
/// Read-only lookup: returns the [`FieldId`] for `name` if it has
|
||||
/// already been interned, or `None` otherwise.
|
||||
///
|
||||
/// Used by cross-call resolvers (Pointer-Phase 5 / W3) to avoid
|
||||
/// growing the caller's interner with field names introduced
|
||||
/// solely by the callee summary — such IDs would never be referenced
|
||||
/// by any other instruction in the caller's body, so the cells
|
||||
/// would be write-only and consume space without contributing
|
||||
/// to taint flow.
|
||||
pub fn lookup(&self, name: &str) -> Option<FieldId> {
|
||||
// Walk `names` directly so we don't require the post-deserialise
|
||||
// `ensure_lookup()` rebuild before this method is callable.
|
||||
// Callers usually own `&SsaBody` — interning was either done at
|
||||
// lowering time or via `ensure_lookup` post-deserialise — so the
|
||||
// hot path goes through the `lookup` table; the linear walk is
|
||||
// a fallback for the (small) deserialised-but-not-rebuilt case.
|
||||
if let Some(&id) = self.lookup.get(name) {
|
||||
return Some(FieldId(id));
|
||||
}
|
||||
for (idx, n) in self.names.iter().enumerate() {
|
||||
if n == name {
|
||||
return Some(FieldId(idx as u32));
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Resolve a [`FieldId`] back to its interned name.
|
||||
pub fn resolve(&self, id: FieldId) -> &str {
|
||||
&self.names[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Number of unique interned names.
|
||||
pub fn len(&self) -> usize {
|
||||
self.names.len()
|
||||
}
|
||||
|
||||
/// Whether the interner contains no names.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.names.is_empty()
|
||||
}
|
||||
|
||||
/// Rebuild the reverse lookup after deserialization. Called lazily by
|
||||
/// [`Self::ensure_lookup`] so deserialized interners can still be used
|
||||
/// for further interning.
|
||||
fn rebuild_lookup(&mut self) {
|
||||
self.lookup.clear();
|
||||
for (i, n) in self.names.iter().enumerate() {
|
||||
self.lookup.entry(n.clone()).or_insert(i as u32);
|
||||
}
|
||||
}
|
||||
|
||||
/// Ensure the reverse lookup is populated (rebuilds after a serde
|
||||
/// roundtrip when the lookup table was skipped).
|
||||
pub fn ensure_lookup(&mut self) {
|
||||
if self.lookup.len() != self.names.len() {
|
||||
self.rebuild_lookup();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// SSA instruction operation.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub enum SsaOp {
|
||||
|
|
@ -20,13 +157,48 @@ pub enum SsaOp {
|
|||
/// Assignment: result depends on the listed SSA values.
|
||||
Assign(SmallVec<[SsaValue; 4]>),
|
||||
/// Function/method call.
|
||||
///
|
||||
/// `callee` is the canonical name SSA-time consumers should match on.
|
||||
/// When SSA lowering decomposes a chained-receiver method call into a
|
||||
/// `FieldProj` chain (e.g. `c.mu.Lock()` → `v_mu = FieldProj(v_c, "mu")`,
|
||||
/// `Call("Lock", [v_mu])`), `callee` carries the bare method name
|
||||
/// (`"Lock"`) and `callee_text` carries the original full path
|
||||
/// (`Some("c.mu.Lock")`). When no decomposition happens, `callee_text`
|
||||
/// is `None` and `callee` already holds the original textual form.
|
||||
Call {
|
||||
callee: String,
|
||||
/// Original textual full path when SSA decomposed a chained receiver.
|
||||
/// `None` when the callee was not rewritten — `callee` already holds
|
||||
/// the source-level textual form.
|
||||
///
|
||||
/// **Debug / display only.** Analysis code must walk the SSA receiver
|
||||
/// chain (through `FieldProj` ops) for precise field structure, or
|
||||
/// use [`crate::labels::bare_method_name`] when only the terminal
|
||||
/// method name is needed from a textual callee.
|
||||
#[doc(hidden)]
|
||||
#[serde(default)]
|
||||
callee_text: Option<String>,
|
||||
/// Per-argument SSA value uses.
|
||||
args: Vec<SmallVec<[SsaValue; 2]>>,
|
||||
/// Receiver SSA value (for method calls).
|
||||
receiver: Option<SsaValue>,
|
||||
},
|
||||
/// Field projection: read field `field` of object `receiver`.
|
||||
///
|
||||
/// Models member-access expressions (`obj.field`) as a first-class SSA
|
||||
/// op. Lowering walks the receiver tree so chained accesses like
|
||||
/// `c.writer.header` produce a chain of `FieldProj` ops with explicit
|
||||
/// per-step receivers — eliminating the textual-prefix parsing that
|
||||
/// previously misclassified deep receivers (the gin/context.go FP).
|
||||
///
|
||||
/// `field` is interned in the owning [`SsaBody`]'s [`FieldInterner`].
|
||||
/// `projected_type` carries the inferred type of the projected field
|
||||
/// when known (populated by type-fact analysis), `None` otherwise.
|
||||
FieldProj {
|
||||
receiver: SsaValue,
|
||||
field: FieldId,
|
||||
projected_type: Option<TypeKind>,
|
||||
},
|
||||
/// Taint source introduction.
|
||||
Source,
|
||||
/// Constant / literal value (no taint).
|
||||
|
|
@ -168,6 +340,31 @@ pub struct SsaBody {
|
|||
/// Recorded during lowering when exception edges are stripped from the CFG.
|
||||
/// Used by taint analysis to seed catch blocks with try-body taint state.
|
||||
pub exception_edges: Vec<(BlockId, BlockId)>,
|
||||
/// Per-body interner for [`SsaOp::FieldProj`] field names.
|
||||
///
|
||||
/// Empty until the lowering phase emits FieldProj ops (Phase 2 of the
|
||||
/// field-projections rollout). Cross-body callers (cross-file
|
||||
/// summaries, debug serialization) MUST resolve interned ids through
|
||||
/// this interner before transporting field references to other bodies.
|
||||
#[serde(default)]
|
||||
pub field_interner: FieldInterner,
|
||||
/// Pointer-Phase 3 / W1: side-table mapping a synthetic base-update
|
||||
/// [`SsaOp::Assign`]'s defined value back to the `(receiver, field)`
|
||||
/// pair it represents. Populated by SSA lowering at the
|
||||
/// `obj.f = rhs` synthesis point so the taint engine can recognise
|
||||
/// the synthetic assign as a structural field WRITE — the assigned
|
||||
/// value is the new "obj" value, the use is the rhs, and the side-
|
||||
/// table records `(prior_obj_value, FieldId("f"))`.
|
||||
///
|
||||
/// Empty by default; only synthetic assigns whose enclosing source
|
||||
/// statement was a dotted-path assignment (`a.b.c = …`) appear here.
|
||||
/// Lookup is `O(log n)` worst case (`HashMap`), but the per-body
|
||||
/// table is small (one entry per synthetic chain link).
|
||||
///
|
||||
/// Serialized via `#[serde(default)]` so pre-W1 SSA blobs decode
|
||||
/// cleanly with an empty map (no migration needed).
|
||||
#[serde(default)]
|
||||
pub field_writes: HashMap<SsaValue, (SsaValue, FieldId)>,
|
||||
}
|
||||
|
||||
impl SsaBody {
|
||||
|
|
@ -190,6 +387,53 @@ impl SsaBody {
|
|||
pub fn def_of(&self, v: SsaValue) -> &ValueDef {
|
||||
&self.value_defs[v.0 as usize]
|
||||
}
|
||||
|
||||
/// Resolve a [`FieldId`] back to the interned field name within this body.
|
||||
pub fn field_name(&self, id: FieldId) -> &str {
|
||||
self.field_interner.resolve(id)
|
||||
}
|
||||
|
||||
/// Intern a field name in this body's [`FieldInterner`], returning its
|
||||
/// stable [`FieldId`].
|
||||
pub fn intern_field(&mut self, name: &str) -> FieldId {
|
||||
self.field_interner.intern(name)
|
||||
}
|
||||
}
|
||||
|
||||
impl SsaInst {
|
||||
/// Iterate over the SSA values used (read) by this instruction.
|
||||
///
|
||||
/// Yields receiver/operand values for `Call`, `Phi`, `Assign`, and
|
||||
/// `FieldProj`; nothing for leaf ops (`Const`, `Param`, `Source`, etc.).
|
||||
/// Callers that need the values as a `Vec` should `.collect()`.
|
||||
pub fn uses_iter(&self) -> SmallVec<[SsaValue; 4]> {
|
||||
match &self.op {
|
||||
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
|
||||
SsaOp::Assign(uses) => uses.iter().copied().collect(),
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
let mut out: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
if let Some(rv) = receiver {
|
||||
out.push(*rv);
|
||||
}
|
||||
for arg in args {
|
||||
out.extend(arg.iter().copied());
|
||||
}
|
||||
out
|
||||
}
|
||||
SsaOp::FieldProj { receiver, .. } => {
|
||||
let mut out: SmallVec<[SsaValue; 4]> = SmallVec::new();
|
||||
out.push(*receiver);
|
||||
out
|
||||
}
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam
|
||||
| SsaOp::Nop
|
||||
| SsaOp::Undef => SmallVec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Errors that can occur during SSA lowering.
|
||||
|
|
@ -211,3 +455,149 @@ impl std::fmt::Display for SsaError {
|
|||
}
|
||||
|
||||
impl std::error::Error for SsaError {}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn field_interner_dedupes_names() {
|
||||
let mut interner = FieldInterner::new();
|
||||
let a = interner.intern("mu");
|
||||
let b = interner.intern("mu");
|
||||
let c = interner.intern("writer");
|
||||
assert_eq!(a, b, "interning same name twice yields same id");
|
||||
assert_ne!(a, c, "different names get different ids");
|
||||
assert_eq!(interner.resolve(a), "mu");
|
||||
assert_eq!(interner.resolve(c), "writer");
|
||||
assert_eq!(interner.len(), 2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_interner_serde_roundtrip_rebuilds_lookup() {
|
||||
let mut interner = FieldInterner::new();
|
||||
let a = interner.intern("mu");
|
||||
let b = interner.intern("writer");
|
||||
let json = serde_json::to_string(&interner).expect("serialize");
|
||||
let mut restored: FieldInterner = serde_json::from_str(&json).expect("deserialize");
|
||||
assert_eq!(restored.resolve(a), "mu");
|
||||
assert_eq!(restored.resolve(b), "writer");
|
||||
// After ensure_lookup, intern("mu") returns the original id (not a new one).
|
||||
restored.ensure_lookup();
|
||||
assert_eq!(restored.intern("mu"), a);
|
||||
assert_eq!(restored.intern("header"), FieldId(2));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_proj_use_iter_includes_receiver() {
|
||||
let inst = SsaInst {
|
||||
value: SsaValue(3),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(1),
|
||||
field: FieldId(0),
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: Some("c.mu".into()),
|
||||
span: (0, 0),
|
||||
};
|
||||
let uses: Vec<SsaValue> = inst.uses_iter().into_iter().collect();
|
||||
assert_eq!(uses, vec![SsaValue(1)]);
|
||||
}
|
||||
|
||||
/// Pointer-Phase 4 / A6 audit: the [`FieldId::ELEM`] sentinel is
|
||||
/// reserved for "any element of a container". The interner assigns
|
||||
/// IDs monotonically from `0`, so the sentinel `u32::MAX` can only
|
||||
/// collide if the body declares ~4 billion fields — a corner case
|
||||
/// no realistic codebase reaches. Pin the contract with a stress
|
||||
/// loop so future implementation drift can't silently shift IDs to
|
||||
/// the sentinel value.
|
||||
#[test]
|
||||
fn field_interner_never_assigns_elem_sentinel() {
|
||||
let mut interner = FieldInterner::new();
|
||||
for i in 0..1024 {
|
||||
let id = interner.intern(&format!("f{i}"));
|
||||
assert_ne!(
|
||||
id,
|
||||
FieldId::ELEM,
|
||||
"intern('f{i}') yielded the ELEM sentinel — invariant broken",
|
||||
);
|
||||
}
|
||||
// Lookup of the sentinel name (used by W3 to round-trip
|
||||
// container-element flow through summary) must NOT match a
|
||||
// real interned name even when the same name is interned.
|
||||
// The wire-format keeps `<elem>` as a *string marker* — it
|
||||
// never goes through `intern`. Instead, callers compare
|
||||
// explicitly against `FieldId::ELEM`.
|
||||
assert_ne!(interner.intern("<elem>"), FieldId::ELEM);
|
||||
}
|
||||
|
||||
/// A6: the `<elem>` marker round-trips through extraction →
|
||||
/// SQLite → caller-side translation without colliding with a
|
||||
/// caller-interned `<elem>` field. When a caller's body has its
|
||||
/// own `<elem>` field, that gets a regular FieldId, distinct from
|
||||
/// the sentinel.
|
||||
#[test]
|
||||
fn elem_marker_distinct_from_interner_assigned_id() {
|
||||
let mut interner = FieldInterner::new();
|
||||
let lit_elem = interner.intern("<elem>");
|
||||
// Sentinel still compares equal to itself only.
|
||||
assert_eq!(FieldId::ELEM, FieldId(u32::MAX));
|
||||
assert_ne!(lit_elem, FieldId::ELEM);
|
||||
// Resolve the literal-string id back to its interned name.
|
||||
assert_eq!(interner.resolve(lit_elem), "<elem>");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_proj_serde_roundtrip_with_field_name() {
|
||||
// Build a tiny body with one FieldProj op and check that the
|
||||
// body's interner survives round-trip and the id resolves back
|
||||
// to the original name.
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("c".into()),
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: FieldInterner::new(),
|
||||
field_writes: HashMap::new(),
|
||||
};
|
||||
let fid = body.intern_field("mu");
|
||||
body.blocks[0].body.push(SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::FieldProj {
|
||||
receiver: SsaValue(0),
|
||||
field: fid,
|
||||
projected_type: None,
|
||||
},
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: Some("c.mu".into()),
|
||||
span: (0, 0),
|
||||
});
|
||||
|
||||
let json = serde_json::to_string(&body).expect("serialize body");
|
||||
let restored: SsaBody = serde_json::from_str(&json).expect("deserialize body");
|
||||
|
||||
let inst = &restored.blocks[0].body[0];
|
||||
match &inst.op {
|
||||
SsaOp::FieldProj {
|
||||
receiver, field, ..
|
||||
} => {
|
||||
assert_eq!(*receiver, SsaValue(0));
|
||||
assert_eq!(restored.field_name(*field), "mu");
|
||||
}
|
||||
other => panic!("expected FieldProj, got {other:?}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
1374
src/ssa/lower.rs
1374
src/ssa/lower.rs
File diff suppressed because it is too large
Load diff
|
|
@ -21,6 +21,7 @@ pub use lower::lower_to_ssa_scoped_nop;
|
|||
pub use lower::lower_to_ssa_with_params;
|
||||
|
||||
use crate::cfg::Cfg;
|
||||
use crate::ssa::type_facts::TypeKind;
|
||||
use crate::symbol::Lang;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
|
|
@ -51,6 +52,19 @@ pub struct OptimizeResult {
|
|||
///
|
||||
/// Pipeline: const propagation → branch pruning → copy propagation → DCE → type facts.
|
||||
pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> OptimizeResult {
|
||||
optimize_ssa_with_param_types(body, cfg, lang, &[])
|
||||
}
|
||||
|
||||
/// Same as [`optimize_ssa`] but seeds [`SsaOp::Param`] values with
|
||||
/// per-position [`TypeKind`] facts derived from the function's
|
||||
/// `BodyMeta.param_types`. Strictly additive: an empty slice or
|
||||
/// `None` entries leave the type-fact analysis behaviour unchanged.
|
||||
pub fn optimize_ssa_with_param_types(
|
||||
body: &mut SsaBody,
|
||||
cfg: &Cfg,
|
||||
lang: Option<Lang>,
|
||||
param_types: &[Option<TypeKind>],
|
||||
) -> OptimizeResult {
|
||||
// 1. Constant propagation (SCCP)
|
||||
let cp = const_prop::const_propagate(body);
|
||||
let branches_pruned = const_prop::apply_const_prop(body, &cp);
|
||||
|
|
@ -65,7 +79,8 @@ pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> Optimi
|
|||
let dead_defs_removed = dce::eliminate_dead_defs(body, cfg);
|
||||
|
||||
// 5. Type fact analysis (uses const prop results + language for constructor inference)
|
||||
let type_facts = type_facts::analyze_types(body, cfg, &cp.values, lang);
|
||||
let type_facts =
|
||||
type_facts::analyze_types_with_param_types(body, cfg, &cp.values, lang, param_types);
|
||||
|
||||
// 6. Points-to analysis (uses allocation site detection + SSA def-use)
|
||||
let points_to = heap::analyze_points_to(body, cfg, lang);
|
||||
|
|
|
|||
|
|
@ -415,6 +415,8 @@ mod tests {
|
|||
value_defs,
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -606,6 +608,7 @@ mod tests {
|
|||
0,
|
||||
SsaOp::Call {
|
||||
callee: "list".to_string(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@
|
|||
//! across all supported languages so that taint flows correctly through
|
||||
//! collection operations.
|
||||
|
||||
use crate::labels::bare_method_name;
|
||||
use crate::symbol::Lang;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
|
|
@ -29,6 +30,14 @@ pub enum ContainerOp {
|
|||
/// `index_arg`: same semantics as `Store::index_arg` — when present and
|
||||
/// provably constant, loads from `HeapSlot::Index(n)`.
|
||||
Load { index_arg: Option<usize> },
|
||||
/// Taint flows from the receiver container into the argument at
|
||||
/// `dest_arg` — i.e. the "writeback" pattern where a method writes its
|
||||
/// decoded/loaded value into a caller-supplied destination rather than
|
||||
/// returning it. Used for the Go `*.Decode(&dest)` family
|
||||
/// (`json.Decoder.Decode`, `xml.Decoder.Decode`, `gob.Decoder.Decode`),
|
||||
/// where `r.Body → json.NewDecoder(r.Body).Decode(&dest)` should taint
|
||||
/// `dest` even though `Decode` returns only an `error`.
|
||||
Writeback { dest_arg: usize },
|
||||
}
|
||||
|
||||
/// Convenience: store with a single value argument, no index tracking.
|
||||
|
|
@ -92,7 +101,7 @@ fn load_indexed(idx_pos: usize) -> Option<ContainerOp> {
|
|||
/// Returns `None` if the callee is not a recognised container operation.
|
||||
pub fn classify_container_op(callee: &str, lang: Lang) -> Option<ContainerOp> {
|
||||
// Extract method name: last segment after '.' (or full name if no dot).
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
let method = bare_method_name(callee);
|
||||
|
||||
match lang {
|
||||
Lang::JavaScript | Lang::TypeScript => classify_js(method),
|
||||
|
|
@ -121,6 +130,10 @@ fn classify_js(method: &str) -> Option<ContainerOp> {
|
|||
// map.get(key) — key at 0
|
||||
"get" => load_indexed(0),
|
||||
"values" | "keys" | "entries" => load(),
|
||||
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
|
||||
// lowering for subscript reads/writes (`arr[i]`, `arr[i] = v`).
|
||||
"__index_get__" => load_indexed(0),
|
||||
"__index_set__" => store_indexed(1, 0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -140,6 +153,10 @@ fn classify_python(method: &str) -> Option<ContainerOp> {
|
|||
"get" => load_indexed(0), // dict.get(key) / list index — key/index at 0
|
||||
"items" | "values" | "keys" => load(),
|
||||
"join" => load(),
|
||||
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
|
||||
// lowering for subscript reads/writes (`arr[i]`, `arr[i] = v`).
|
||||
"__index_get__" => load_indexed(0),
|
||||
"__index_set__" => store_indexed(1, 0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -173,6 +190,24 @@ fn classify_go(method: &str, callee: &str) -> Option<ContainerOp> {
|
|||
match method {
|
||||
"Add" | "Set" | "Store" | "Put" => store(0),
|
||||
"Get" | "Load" | "Pop" => load(),
|
||||
// Stream-decoder writeback. In Go, the canonical decode pattern
|
||||
// takes a destination as the sole positional argument and returns
|
||||
// only an `error`:
|
||||
// decoder := json.NewDecoder(r.Body)
|
||||
// decoder.Decode(&dest)
|
||||
// The decoder's receiver chain carries the source taint
|
||||
// (`r.Body` → `json.NewDecoder(r.Body)` → `decoder`); without a
|
||||
// writeback rule, the destination stays clean and downstream sinks
|
||||
// miss the flow. `Unmarshal` is the matching sibling pattern on
|
||||
// top-level decoders (e.g. `proto.Unmarshal(buf, &msg)`); the
|
||||
// method-call form has the bytes carried via the receiver, not arg 0,
|
||||
// so it lines up with the writeback contract just like `Decode`.
|
||||
"Decode" | "Unmarshal" => Some(ContainerOp::Writeback { dest_arg: 0 }),
|
||||
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
|
||||
// lowering for Go index_expression reads/writes (`arr[i]`,
|
||||
// `m[k] = v`).
|
||||
"__index_get__" => load_indexed(0),
|
||||
"__index_set__" => store_indexed(1, 0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
|
@ -195,9 +230,22 @@ fn classify_php(method: &str) -> Option<ContainerOp> {
|
|||
|
||||
fn classify_cpp(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
"push_back" | "emplace_back" | "insert" | "emplace" | "push" => store(0),
|
||||
"front" | "back" | "pop_back" | "pop_front" | "top" => load(),
|
||||
// vector.at(index) — index at 0
|
||||
// Mutating container operations.
|
||||
// `assign` overwrites the container's contents with the argument
|
||||
// sequence — modeled as Store so the receiver inherits the argument
|
||||
// taint, matching the runtime "the values now live inside this
|
||||
// container" semantics shared with `push_back`/`emplace_back`.
|
||||
"push_back" | "emplace_back" | "insert" | "emplace" | "push" | "assign" => store(0),
|
||||
// Map/unordered_map insertion: `m.insert_or_assign(k, v)` — value at 1.
|
||||
"insert_or_assign" => store_indexed(1, 0),
|
||||
// Read-only container observers. `find`/`count` return iterators or
|
||||
// counts that carry the container's value taint when queried with a
|
||||
// tainted needle; `data` returns a pointer to the underlying buffer
|
||||
// (its real identity-passthrough behaviour for `c_str`/`data` is
|
||||
// refined in the labels phase, but Load propagation gives us the
|
||||
// baseline cap-flow without further plumbing).
|
||||
"front" | "back" | "pop_back" | "pop_front" | "top" | "find" | "count" | "data" => load(),
|
||||
// Indexed reads: `vector::at(i)`, `unordered_map::at(k)`.
|
||||
"at" => load_indexed(0),
|
||||
_ => None,
|
||||
}
|
||||
|
|
@ -255,6 +303,40 @@ mod tests {
|
|||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
// CVE Hunt Session 2 (Owncast CVE-2023-3188 / CVE-2024-31450 family):
|
||||
// Go `*.Decode(&dest)` is the canonical streaming-decoder writeback —
|
||||
// `json.NewDecoder(r.Body).Decode(&dest)`, `xml.NewDecoder(r).Decode(&out)`,
|
||||
// `gob.NewDecoder(buf).Decode(&v)`. The decoder receiver carries the
|
||||
// source taint and the destination is arg 0; the writeback rule is the
|
||||
// only way taint reaches `dest` because `Decode` itself returns only
|
||||
// `error`. The same-shape `Unmarshal` pattern (`proto.Unmarshal`,
|
||||
// `tar.Header.Unmarshal`) on a typed receiver follows the same contract.
|
||||
#[test]
|
||||
fn go_decode_is_writeback_dest_arg_zero() {
|
||||
match classify_container_op("decoder.Decode", Lang::Go) {
|
||||
Some(ContainerOp::Writeback { dest_arg }) => assert_eq!(dest_arg, 0),
|
||||
other => panic!("expected Writeback {{ dest_arg: 0 }}, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn go_unmarshal_is_writeback_dest_arg_zero() {
|
||||
match classify_container_op("hdr.Unmarshal", Lang::Go) {
|
||||
Some(ContainerOp::Writeback { dest_arg }) => assert_eq!(dest_arg, 0),
|
||||
other => panic!("expected Writeback {{ dest_arg: 0 }}, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_decode_is_not_writeback() {
|
||||
// The Writeback rule is a Go-specific pattern; JS/TS `decode`
|
||||
// helpers (`Buffer.from(s, 'base64').toString()` etc.) return their
|
||||
// result and don't have a writeback contract. Make sure we didn't
|
||||
// accidentally widen the rule into other languages.
|
||||
assert!(classify_container_op("decoder.Decode", Lang::JavaScript).is_none());
|
||||
assert!(classify_container_op("decoder.Decode", Lang::Python).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_method_is_none() {
|
||||
assert!(classify_container_op("obj.frobnicate", Lang::JavaScript).is_none());
|
||||
|
|
@ -311,4 +393,102 @@ mod tests {
|
|||
panic!("expected Load");
|
||||
}
|
||||
}
|
||||
|
||||
// ── C++ Phase 1 additions ──────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
fn cpp_push_back_is_store() {
|
||||
let op = classify_container_op("v.push_back", Lang::Cpp);
|
||||
match op {
|
||||
Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) => {
|
||||
assert_eq!(value_args.as_slice(), &[0]);
|
||||
assert_eq!(index_arg, None);
|
||||
}
|
||||
_ => panic!("expected Store"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_assign_is_store() {
|
||||
// vector::assign(args) overwrites the container's contents — the
|
||||
// receiver inherits argument taint just like push_back.
|
||||
let op = classify_container_op("v.assign", Lang::Cpp);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_insert_or_assign_indexes_value() {
|
||||
// map::insert_or_assign(key, value) — value is at arg 1, key at arg 0.
|
||||
match classify_container_op("m.insert_or_assign", Lang::Cpp) {
|
||||
Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) => {
|
||||
assert_eq!(value_args.as_slice(), &[1]);
|
||||
assert_eq!(index_arg, Some(0));
|
||||
}
|
||||
other => panic!("expected indexed Store, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_find_count_data_are_load() {
|
||||
for callee in ["m.find", "m.count", "v.data"] {
|
||||
assert!(
|
||||
matches!(
|
||||
classify_container_op(callee, Lang::Cpp),
|
||||
Some(ContainerOp::Load { .. })
|
||||
),
|
||||
"{callee} should be a Load",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn cpp_at_is_indexed_load() {
|
||||
match classify_container_op("v.at", Lang::Cpp) {
|
||||
Some(ContainerOp::Load { index_arg }) => assert_eq!(index_arg, Some(0)),
|
||||
other => panic!("expected indexed Load, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
/// W5: synthetic `__index_get__` is recognised as an indexed load
|
||||
/// in JS/TS, Python, and Go — driving the index_arg=0 path so a
|
||||
/// constant-key subscript read flows through `HeapSlot::Index(n)`.
|
||||
#[test]
|
||||
fn synth_index_get_classified_as_indexed_load_js_py_go() {
|
||||
for lang in [Lang::JavaScript, Lang::TypeScript, Lang::Python, Lang::Go] {
|
||||
match classify_container_op("__index_get__", lang) {
|
||||
Some(ContainerOp::Load { index_arg }) => {
|
||||
assert_eq!(index_arg, Some(0), "{lang:?} should mark idx arg=0");
|
||||
}
|
||||
other => panic!("{lang:?}: expected indexed Load, got {other:?}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// W5: synthetic `__index_set__` is recognised as an indexed store
|
||||
/// in JS/TS, Python, and Go — value at arg 1, index at arg 0.
|
||||
#[test]
|
||||
fn synth_index_set_classified_as_indexed_store_js_py_go() {
|
||||
for lang in [Lang::JavaScript, Lang::TypeScript, Lang::Python, Lang::Go] {
|
||||
match classify_container_op("__index_set__", lang) {
|
||||
Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) => {
|
||||
assert_eq!(
|
||||
value_args.as_slice(),
|
||||
&[1],
|
||||
"{lang:?} value arg should be 1"
|
||||
);
|
||||
assert_eq!(index_arg, Some(0), "{lang:?} index arg should be 0");
|
||||
}
|
||||
other => panic!("{lang:?}: expected indexed Store, got {other:?}"),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -256,6 +256,7 @@ pub fn analyze(
|
|||
callee,
|
||||
args,
|
||||
receiver,
|
||||
..
|
||||
} => {
|
||||
if candidates.contains_key(&inst.value) && is_rust_map_constructor(callee) {
|
||||
continue;
|
||||
|
|
@ -437,6 +438,8 @@ mod tests {
|
|||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
let cfg: Cfg = Graph::new();
|
||||
let const_values = HashMap::new();
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
#![allow(clippy::if_same_then_else)]
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
|
||||
use super::const_prop::ConstLattice;
|
||||
use super::ir::*;
|
||||
|
|
@ -32,6 +32,40 @@ pub enum TypeKind {
|
|||
/// `label_prefix` — it never participates in label-based callee
|
||||
/// resolution.
|
||||
LocalCollection,
|
||||
/// Phase 6: a framework-injected DTO body whose field types are
|
||||
/// known. Populated only when a parameter is recognised as a typed
|
||||
/// extractor by a Phase 1-2 matcher AND the DTO class / struct /
|
||||
/// Pydantic model is resolvable in the current scan scope.
|
||||
/// Strictly additive — when no DTO definition is found, callers
|
||||
/// fall through to today's pre-Phase-6 behaviour.
|
||||
Dto(DtoFields),
|
||||
}
|
||||
|
||||
/// Phase 6: structural carrier for a recognised DTO type. Maps
|
||||
/// declared field names to their inferred [`TypeKind`]. Nested DTOs
|
||||
/// use [`TypeKind::Dto`] recursively.
|
||||
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct DtoFields {
|
||||
pub class_name: String,
|
||||
/// Sorted-by-key map for stable iteration / serialisation.
|
||||
pub fields: BTreeMap<String, TypeKind>,
|
||||
}
|
||||
|
||||
impl DtoFields {
|
||||
pub fn new(class_name: impl Into<String>) -> Self {
|
||||
Self {
|
||||
class_name: class_name.into(),
|
||||
fields: BTreeMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, field: impl Into<String>, kind: TypeKind) {
|
||||
self.fields.insert(field.into(), kind);
|
||||
}
|
||||
|
||||
pub fn get(&self, field: &str) -> Option<&TypeKind> {
|
||||
self.fields.get(field)
|
||||
}
|
||||
}
|
||||
|
||||
impl TypeKind {
|
||||
|
|
@ -47,6 +81,38 @@ impl TypeKind {
|
|||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Container name used by the typed call-graph devirtualisation
|
||||
/// (`docs/typed-call-graph-prompt.md`, Phase 2).
|
||||
///
|
||||
/// Returns the class / impl / module string under which an SSA
|
||||
/// receiver value of this type would be looked up in
|
||||
/// [`crate::callgraph::ClassMethodIndex`]. Mirrors
|
||||
/// [`Self::label_prefix`] for the security-relevant abstract
|
||||
/// types (HttpClient → `"HttpClient"`, DatabaseConnection →
|
||||
/// `"DatabaseConnection"`, etc.) and additionally returns the DTO
|
||||
/// class name for [`TypeKind::Dto`] receivers.
|
||||
///
|
||||
/// Scalar / unknown types return `None` — they have no defining
|
||||
/// container and would not narrow a method-call edge meaningfully.
|
||||
pub fn container_name(&self) -> Option<String> {
|
||||
if let Some(prefix) = self.label_prefix() {
|
||||
return Some(prefix.to_string());
|
||||
}
|
||||
if let Self::Dto(d) = self {
|
||||
return Some(d.class_name.clone());
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Phase 6: convenience accessor for the inner `DtoFields` if this
|
||||
/// type is a recognised DTO.
|
||||
pub fn as_dto(&self) -> Option<&DtoFields> {
|
||||
match self {
|
||||
Self::Dto(d) => Some(d),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A type fact about an SSA value.
|
||||
|
|
@ -79,6 +145,13 @@ impl TypeFact {
|
|||
};
|
||||
TypeFact { kind, nullable }
|
||||
}
|
||||
|
||||
/// Phase 6: factory used by the field-access propagation rule.
|
||||
pub(crate) fn from_dto_field(receiver: &TypeKind, field: &str) -> Option<Self> {
|
||||
let dto = receiver.as_dto()?;
|
||||
let kind = dto.get(field)?.clone();
|
||||
Some(Self::from_kind(kind))
|
||||
}
|
||||
}
|
||||
|
||||
/// Result of type fact analysis.
|
||||
|
|
@ -107,32 +180,41 @@ impl TypeFactResult {
|
|||
}
|
||||
}
|
||||
|
||||
/// Check whether the given sink-operand SSA values are all int-typed for the
|
||||
/// sink's capability set. Returns `false` when `sink_caps` carries no
|
||||
/// type-suppressible bits, when `values` is empty, or when any value is not
|
||||
/// known to be `TypeKind::Int`. Shared by the SSA taint engine and the
|
||||
/// structural `cfg-unguarded-sink` analysis so both agree on when a sink's
|
||||
/// arguments are provably non-injectable.
|
||||
/// Check whether the given sink-operand SSA values are all type-safe for
|
||||
/// the sink's capability set. Returns `false` when `sink_caps` carries
|
||||
/// no type-suppressible bits, when `values` is empty, or when any value
|
||||
/// is not known to be a payload-incompatible scalar type. Shared by
|
||||
/// the SSA taint engine and the structural `cfg-unguarded-sink`
|
||||
/// analysis so both agree on when a sink's arguments are provably
|
||||
/// non-injectable.
|
||||
///
|
||||
/// Suppression policy:
|
||||
/// * [`TypeKind::Int`] (and float, treated as numeric): suppresses
|
||||
/// `SQL_QUERY`, `FILE_IO`, `SHELL_ESCAPE`, `HTML_ESCAPE`, `SSRF` —
|
||||
/// numeric values cannot carry the metacharacters required to drive
|
||||
/// any of these injection classes.
|
||||
/// * [`TypeKind::Bool`]: suppresses every type-suppressible bit —
|
||||
/// `true`/`false` cannot carry a payload of any kind.
|
||||
pub fn is_type_safe_for_sink(
|
||||
values: &[SsaValue],
|
||||
sink_caps: crate::labels::Cap,
|
||||
type_facts: &TypeFactResult,
|
||||
) -> bool {
|
||||
use crate::labels::Cap;
|
||||
// Int-typed values cannot carry injection payloads for these caps:
|
||||
// SQL_QUERY — digits can't form meta SQL tokens
|
||||
// FILE_IO — digits can't form path traversal sequences
|
||||
// SHELL_ESCAPE — digits can't form shell metacharacters
|
||||
// HTML_ESCAPE — digits can't form HTML metachars (<, >, ", ', &, /, :)
|
||||
// in either text or attribute context
|
||||
let type_suppressible = Cap::SQL_QUERY | Cap::FILE_IO | Cap::SHELL_ESCAPE | Cap::HTML_ESCAPE;
|
||||
let type_suppressible =
|
||||
Cap::SQL_QUERY | Cap::FILE_IO | Cap::SHELL_ESCAPE | Cap::HTML_ESCAPE | Cap::SSRF;
|
||||
if !sink_caps.intersects(type_suppressible) {
|
||||
return false;
|
||||
}
|
||||
if values.is_empty() {
|
||||
return false;
|
||||
}
|
||||
values.iter().all(|v| type_facts.is_int(*v))
|
||||
values.iter().all(|v| {
|
||||
let Some(kind) = type_facts.get_type(*v) else {
|
||||
return false;
|
||||
};
|
||||
matches!(kind, TypeKind::Int | TypeKind::Bool)
|
||||
})
|
||||
}
|
||||
|
||||
/// Infer a type from a constructor, factory, or allocator call.
|
||||
|
|
@ -393,6 +475,21 @@ pub fn analyze_types(
|
|||
cfg: &Cfg,
|
||||
consts: &HashMap<SsaValue, ConstLattice>,
|
||||
lang: Option<Lang>,
|
||||
) -> TypeFactResult {
|
||||
analyze_types_with_param_types(body, cfg, consts, lang, &[])
|
||||
}
|
||||
|
||||
/// Same as [`analyze_types`] but seeds [`SsaOp::Param`] values with
|
||||
/// per-position [`TypeKind`] facts from `param_types` (parallel-vec to
|
||||
/// the function's BodyMeta.params). An entry of `None` (or an out-of-
|
||||
/// range index) leaves the value at the default Param fact (Unknown),
|
||||
/// preserving the pre-Phase-3 behaviour.
|
||||
pub fn analyze_types_with_param_types(
|
||||
body: &SsaBody,
|
||||
cfg: &Cfg,
|
||||
consts: &HashMap<SsaValue, ConstLattice>,
|
||||
lang: Option<Lang>,
|
||||
param_types: &[Option<TypeKind>],
|
||||
) -> TypeFactResult {
|
||||
let mut facts: HashMap<SsaValue, TypeFact> = HashMap::new();
|
||||
|
||||
|
|
@ -424,7 +521,16 @@ pub fn analyze_types(
|
|||
}
|
||||
}
|
||||
SsaOp::Source => TypeFact::from_kind(TypeKind::String),
|
||||
SsaOp::Param { .. } => TypeFact::unknown(),
|
||||
SsaOp::Param { index } => {
|
||||
// Seed from the function's BodyMeta.param_types when
|
||||
// a TypeKind was recovered at CFG construction time.
|
||||
// Out-of-range / None entries fall back to Unknown,
|
||||
// matching the pre-Phase-3 behaviour.
|
||||
match param_types.get(*index).and_then(|t| t.clone()) {
|
||||
Some(tk) => TypeFact::from_kind(tk),
|
||||
None => TypeFact::unknown(),
|
||||
}
|
||||
}
|
||||
SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object),
|
||||
SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object),
|
||||
SsaOp::Call { callee, .. } => {
|
||||
|
|
@ -473,6 +579,14 @@ pub fn analyze_types(
|
|||
// Defer: will be filled in second pass
|
||||
TypeFact::unknown()
|
||||
}
|
||||
// FieldProj: when the projection carries an inferred type
|
||||
// (set during lowering or by future field-type analysis),
|
||||
// honour it; otherwise the field type is unknown until a
|
||||
// points-to / heap query resolves it.
|
||||
SsaOp::FieldProj { projected_type, .. } => match projected_type {
|
||||
Some(tk) => TypeFact::from_kind(tk.clone()),
|
||||
None => TypeFact::unknown(),
|
||||
},
|
||||
// Undef contributes no type information — phi joins
|
||||
// pick up the type from the other (defined) operand.
|
||||
SsaOp::Undef => TypeFact::unknown(),
|
||||
|
|
@ -530,6 +644,38 @@ pub fn analyze_types(
|
|||
}
|
||||
}
|
||||
|
||||
// Phase 6.3: FieldProj receiver-driven type narrowing. When
|
||||
// SSA lowering decomposed `a.b.c()` into a FieldProj chain,
|
||||
// intermediate FieldProj insts default to `projected_type =
|
||||
// None`. If the receiver value carries a Dto fact and the
|
||||
// projected field name is in its `fields` map, route the
|
||||
// FieldProj's type fact to the field's declared TypeKind.
|
||||
for inst in &block.body {
|
||||
let SsaOp::FieldProj {
|
||||
receiver,
|
||||
field,
|
||||
projected_type,
|
||||
} = &inst.op
|
||||
else {
|
||||
continue;
|
||||
};
|
||||
// If the lowering already pinned a type, keep it.
|
||||
if projected_type.is_some() {
|
||||
continue;
|
||||
}
|
||||
let Some(recv_fact) = facts.get(receiver).cloned() else {
|
||||
continue;
|
||||
};
|
||||
let field_name = body.field_name(*field).to_string();
|
||||
let Some(new_fact) = TypeFact::from_dto_field(&recv_fact.kind, &field_name) else {
|
||||
continue;
|
||||
};
|
||||
if facts.get(&inst.value) != Some(&new_fact) {
|
||||
facts.insert(inst.value, new_fact);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
|
||||
// Phi nodes
|
||||
for inst in &block.phis {
|
||||
if let SsaOp::Phi(operands) = &inst.op {
|
||||
|
|
@ -566,13 +712,29 @@ pub fn analyze_types(
|
|||
}
|
||||
if let SsaOp::Assign(uses) = &inst.op {
|
||||
if uses.len() == 1 {
|
||||
let src_fact = facts
|
||||
.get(&uses[0])
|
||||
.cloned()
|
||||
.unwrap_or_else(TypeFact::unknown);
|
||||
// Phase 6.3: when the RHS is a single member-access
|
||||
// expression and the receiver value carries a
|
||||
// `TypeKind::Dto(fields)` fact, route the assignment's
|
||||
// type to the field's declared `TypeKind`. Strictly
|
||||
// additive — falls through to copy-prop when the
|
||||
// receiver isn't a DTO or the field isn't recorded.
|
||||
let dto_field_fact = cfg
|
||||
.node_weight(inst.cfg_node)
|
||||
.and_then(|ni| ni.member_field.as_deref())
|
||||
.and_then(|field| {
|
||||
let recv_kind = facts.get(&uses[0])?.kind.clone();
|
||||
TypeFact::from_dto_field(&recv_kind, field)
|
||||
});
|
||||
let new_fact = match dto_field_fact {
|
||||
Some(f) => f,
|
||||
None => facts
|
||||
.get(&uses[0])
|
||||
.cloned()
|
||||
.unwrap_or_else(TypeFact::unknown),
|
||||
};
|
||||
let old = facts.get(&inst.value);
|
||||
if old != Some(&src_fact) {
|
||||
facts.insert(inst.value, src_fact);
|
||||
if old != Some(&new_fact) {
|
||||
facts.insert(inst.value, new_fact);
|
||||
changed = true;
|
||||
}
|
||||
} else if uses.len() == 2 {
|
||||
|
|
@ -840,6 +1002,8 @@ mod tests {
|
|||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let consts = HashMap::from([
|
||||
|
|
@ -911,6 +1075,7 @@ mod tests {
|
|||
value: SsaValue(0),
|
||||
op: SsaOp::Call {
|
||||
callee: "URL".into(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
@ -922,6 +1087,7 @@ mod tests {
|
|||
value: SsaValue(1),
|
||||
op: SsaOp::Call {
|
||||
callee: "HttpClient.newHttpClient".into(),
|
||||
callee_text: None,
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
|
|
@ -949,6 +1115,8 @@ mod tests {
|
|||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let consts = HashMap::new();
|
||||
|
|
@ -979,6 +1147,291 @@ mod tests {
|
|||
assert_eq!(result.get_type(SsaValue(99)), None);
|
||||
}
|
||||
|
||||
/// Phase 4: Int-typed values must suppress every type-suppressible
|
||||
/// cap — including the freshly-added `SSRF` bit. Numeric IDs
|
||||
/// cannot rewrite a URL host, cannot form path traversal sequences,
|
||||
/// cannot carry SQL/HTML/shell metacharacters.
|
||||
#[test]
|
||||
fn int_suppresses_every_type_suppressible_cap() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
let result = TypeFactResult { facts };
|
||||
|
||||
for cap in [
|
||||
Cap::SQL_QUERY,
|
||||
Cap::FILE_IO,
|
||||
Cap::SHELL_ESCAPE,
|
||||
Cap::HTML_ESCAPE,
|
||||
Cap::SSRF,
|
||||
] {
|
||||
assert!(
|
||||
is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
|
||||
"Int must suppress {cap:?}",
|
||||
);
|
||||
}
|
||||
// Caps outside the type-suppressible set never qualify.
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::CODE_EXEC,
|
||||
&result
|
||||
));
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::DESERIALIZE,
|
||||
&result
|
||||
));
|
||||
}
|
||||
|
||||
/// Phase 4: Bool-typed values are even safer than ints — `true` /
|
||||
/// `false` cannot carry any payload and must suppress every
|
||||
/// type-suppressible cap.
|
||||
#[test]
|
||||
fn bool_suppresses_every_type_suppressible_cap() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Bool));
|
||||
let result = TypeFactResult { facts };
|
||||
|
||||
for cap in [
|
||||
Cap::SQL_QUERY,
|
||||
Cap::FILE_IO,
|
||||
Cap::SHELL_ESCAPE,
|
||||
Cap::HTML_ESCAPE,
|
||||
Cap::SSRF,
|
||||
] {
|
||||
assert!(
|
||||
is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
|
||||
"Bool must suppress {cap:?}",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// String-typed values must NOT trigger suppression — they are the
|
||||
/// canonical injection carrier. Regression guard so a future
|
||||
/// change to `is_type_safe_for_sink` does not silently silence
|
||||
/// real String-payload findings.
|
||||
#[test]
|
||||
fn string_does_not_trigger_sink_suppression() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::String));
|
||||
let result = TypeFactResult { facts };
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::SQL_QUERY,
|
||||
&result
|
||||
));
|
||||
assert!(!is_type_safe_for_sink(&[SsaValue(0)], Cap::SSRF, &result));
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::SHELL_ESCAPE,
|
||||
&result
|
||||
));
|
||||
}
|
||||
|
||||
/// Audit A3: The full `(TypeKind, Cap)` suppression matrix. Encoded
|
||||
/// as a single table-driven test so any future change to
|
||||
/// `is_type_safe_for_sink` requires an intentional matrix edit + a
|
||||
/// test update. Truth values:
|
||||
///
|
||||
/// | TypeKind | SQL | FILE | SHELL | HTML | SSRF | CODE_EXEC | DESERIALIZE |
|
||||
/// |-----------|-----|------|-------|------|------|-----------|-------------|
|
||||
/// | Int | Y | Y | Y | Y | Y | N | N |
|
||||
/// | Bool | Y | Y | Y | Y | Y | N | N |
|
||||
/// | String | N | N | N | N | N | N | N |
|
||||
/// | Url | N | N | N | N | N | N | N |
|
||||
/// | Object | N | N | N | N | N | N | N |
|
||||
/// | Unknown | N | N | N | N | N | N | N |
|
||||
#[test]
|
||||
fn type_kind_cap_suppression_matrix() {
|
||||
use crate::labels::Cap;
|
||||
let caps = [
|
||||
("SQL_QUERY", Cap::SQL_QUERY),
|
||||
("FILE_IO", Cap::FILE_IO),
|
||||
("SHELL_ESCAPE", Cap::SHELL_ESCAPE),
|
||||
("HTML_ESCAPE", Cap::HTML_ESCAPE),
|
||||
("SSRF", Cap::SSRF),
|
||||
("CODE_EXEC", Cap::CODE_EXEC),
|
||||
("DESERIALIZE", Cap::DESERIALIZE),
|
||||
];
|
||||
// (kind_name, kind, [suppress for each cap in `caps` order])
|
||||
let rows: &[(&str, TypeKind, [bool; 7])] = &[
|
||||
(
|
||||
"Int",
|
||||
TypeKind::Int,
|
||||
[true, true, true, true, true, false, false],
|
||||
),
|
||||
(
|
||||
"Bool",
|
||||
TypeKind::Bool,
|
||||
[true, true, true, true, true, false, false],
|
||||
),
|
||||
(
|
||||
"String",
|
||||
TypeKind::String,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
(
|
||||
"Url",
|
||||
TypeKind::Url,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
(
|
||||
"Object",
|
||||
TypeKind::Object,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
(
|
||||
"Unknown",
|
||||
TypeKind::Unknown,
|
||||
[false, false, false, false, false, false, false],
|
||||
),
|
||||
];
|
||||
for (kind_name, kind, expected) in rows {
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(kind.clone()));
|
||||
let result = TypeFactResult { facts };
|
||||
for (i, (cap_name, cap)) in caps.iter().enumerate() {
|
||||
let got = is_type_safe_for_sink(&[SsaValue(0)], *cap, &result);
|
||||
assert_eq!(
|
||||
got, expected[i],
|
||||
"matrix mismatch for ({kind_name}, {cap_name}): expected {}, got {got}",
|
||||
expected[i]
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Audit A3 (companion): empty `values` slice never suppresses,
|
||||
/// regardless of cap or per-value type facts.
|
||||
#[test]
|
||||
fn empty_values_never_suppress() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
let result = TypeFactResult { facts };
|
||||
for cap in [
|
||||
Cap::SQL_QUERY,
|
||||
Cap::FILE_IO,
|
||||
Cap::SHELL_ESCAPE,
|
||||
Cap::HTML_ESCAPE,
|
||||
Cap::SSRF,
|
||||
Cap::CODE_EXEC,
|
||||
Cap::DESERIALIZE,
|
||||
] {
|
||||
assert!(
|
||||
!is_type_safe_for_sink(&[], cap, &result),
|
||||
"empty values must never suppress {cap:?}",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Audit A3 (companion): a Cap with NO type-suppressible bits never
|
||||
/// suppresses, even when the value's type kind is otherwise
|
||||
/// suppression-eligible.
|
||||
#[test]
|
||||
fn caps_without_type_suppressible_bits_never_fire() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
let result = TypeFactResult { facts };
|
||||
for cap in [
|
||||
Cap::CODE_EXEC,
|
||||
Cap::DESERIALIZE,
|
||||
Cap::CRYPTO,
|
||||
Cap::URL_ENCODE,
|
||||
] {
|
||||
assert!(
|
||||
!is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
|
||||
"Int must NOT suppress non-type-suppressible {cap:?}",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Audit A3 (companion): mixed-type operand list — only one Int
|
||||
/// among operands of unknown type — must NOT suppress. The
|
||||
/// suppression rule requires every operand to be payload-incompatible.
|
||||
#[test]
|
||||
fn mixed_type_operands_do_not_suppress() {
|
||||
use crate::labels::Cap;
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
|
||||
facts.insert(SsaValue(1), TypeFact::from_kind(TypeKind::String));
|
||||
let result = TypeFactResult { facts };
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0), SsaValue(1)],
|
||||
Cap::SQL_QUERY,
|
||||
&result
|
||||
));
|
||||
}
|
||||
|
||||
/// Phase 3: Param values seeded from `param_types` must surface
|
||||
/// the right TypeKind for downstream sink suppression. An out-of-
|
||||
/// range index falls back to Unknown (the pre-Phase-3 default).
|
||||
#[test]
|
||||
fn param_types_seed_param_value_facts() {
|
||||
use crate::cfg::Cfg;
|
||||
let n0 = NodeIndex::new(0);
|
||||
let n1 = NodeIndex::new(1);
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Param { index: 0 },
|
||||
cfg_node: n0,
|
||||
var_name: Some("user_id".into()),
|
||||
span: (0, 7),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Param { index: 99 },
|
||||
cfg_node: n1,
|
||||
var_name: Some("oob".into()),
|
||||
span: (8, 11),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("user_id".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("oob".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
field_interner: crate::ssa::ir::FieldInterner::default(),
|
||||
field_writes: std::collections::HashMap::new(),
|
||||
};
|
||||
|
||||
let consts = HashMap::new();
|
||||
let cfg: Cfg = petgraph::Graph::new();
|
||||
let param_types = vec![Some(TypeKind::Int)];
|
||||
|
||||
let result =
|
||||
analyze_types_with_param_types(&body, &cfg, &consts, Some(Lang::Java), ¶m_types);
|
||||
assert_eq!(result.get_type(SsaValue(0)), Some(&TypeKind::Int));
|
||||
// Index 99 is out of range → falls back to Unknown.
|
||||
assert_eq!(result.get_type(SsaValue(1)), Some(&TypeKind::Unknown));
|
||||
|
||||
// Empty slice = pre-Phase-3 behaviour.
|
||||
let result2 = analyze_types(&body, &cfg, &consts, Some(Lang::Java));
|
||||
assert_eq!(result2.get_type(SsaValue(0)), Some(&TypeKind::Unknown));
|
||||
}
|
||||
|
||||
// ── TypeHierarchy::is_subtype_of ─────────────────────────────────────
|
||||
|
||||
#[test]
|
||||
|
|
@ -1484,4 +1937,90 @@ mod tests {
|
|||
Some(TypeKind::HttpClient)
|
||||
);
|
||||
}
|
||||
|
||||
// ── Phase 6 DTO field-level taint ─────────────────────────────────────
|
||||
|
||||
/// Phase 6: `TypeFact::from_dto_field` returns `Some(field_kind)`
|
||||
/// for a DTO receiver whose `fields` map contains the requested
|
||||
/// field, and `None` otherwise.
|
||||
#[test]
|
||||
fn dto_field_lookup_returns_field_type_kind() {
|
||||
let mut dto = DtoFields::new("CreateUser");
|
||||
dto.insert("age", TypeKind::Int);
|
||||
dto.insert("email", TypeKind::String);
|
||||
let recv = TypeKind::Dto(dto);
|
||||
let age = TypeFact::from_dto_field(&recv, "age").expect("age field present");
|
||||
assert_eq!(age.kind, TypeKind::Int);
|
||||
let email = TypeFact::from_dto_field(&recv, "email").expect("email field present");
|
||||
assert_eq!(email.kind, TypeKind::String);
|
||||
assert!(TypeFact::from_dto_field(&recv, "missing").is_none());
|
||||
}
|
||||
|
||||
/// Phase 6: a non-DTO receiver kind never produces a field fact —
|
||||
/// `from_dto_field` falls through to the legacy copy-prop path.
|
||||
#[test]
|
||||
fn dto_field_lookup_on_non_dto_returns_none() {
|
||||
for k in [
|
||||
TypeKind::Int,
|
||||
TypeKind::String,
|
||||
TypeKind::Object,
|
||||
TypeKind::Unknown,
|
||||
TypeKind::HttpClient,
|
||||
] {
|
||||
assert!(
|
||||
TypeFact::from_dto_field(&k, "any_field").is_none(),
|
||||
"non-DTO {k:?} must not produce a field fact",
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/// Phase 6: nested DTO — the parent DTO's field type is
|
||||
/// `TypeKind::Dto`, and `from_dto_field` returns that nested DTO
|
||||
/// fact directly. Phase 6.3 callers can recurse into the inner
|
||||
/// fields by following the returned receiver's `as_dto()` chain.
|
||||
#[test]
|
||||
fn dto_field_lookup_supports_nested_dto() {
|
||||
let mut inner = DtoFields::new("Address");
|
||||
inner.insert("zip", TypeKind::String);
|
||||
let mut outer = DtoFields::new("CreateUser");
|
||||
outer.insert("address", TypeKind::Dto(inner.clone()));
|
||||
outer.insert("age", TypeKind::Int);
|
||||
let recv = TypeKind::Dto(outer);
|
||||
let addr = TypeFact::from_dto_field(&recv, "address").expect("address present");
|
||||
assert_eq!(addr.kind, TypeKind::Dto(inner));
|
||||
}
|
||||
|
||||
/// Phase 6: an empty DTO (class declared but with no inferred
|
||||
/// fields) never resolves field reads. Documents the safe-fallback
|
||||
/// invariant so the legacy path runs when class fields couldn't be
|
||||
/// classified.
|
||||
#[test]
|
||||
fn empty_dto_never_resolves_fields() {
|
||||
let recv = TypeKind::Dto(DtoFields::new("EmptyDto"));
|
||||
assert!(TypeFact::from_dto_field(&recv, "anything").is_none());
|
||||
}
|
||||
|
||||
/// Phase 6: an `Int`-typed field in a DTO survives the
|
||||
/// type-suppression matrix exactly the same way a freestanding
|
||||
/// `Int` does — sanity-check the bridge between Phase 6 and Phase 4.
|
||||
#[test]
|
||||
fn dto_int_field_suppresses_sql_query_via_matrix() {
|
||||
use crate::labels::Cap;
|
||||
let mut dto = DtoFields::new("CreateUser");
|
||||
dto.insert("age", TypeKind::Int);
|
||||
let field = TypeFact::from_dto_field(&TypeKind::Dto(dto), "age").unwrap();
|
||||
let mut facts = HashMap::new();
|
||||
facts.insert(SsaValue(0), field);
|
||||
let result = TypeFactResult { facts };
|
||||
assert!(is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::SQL_QUERY,
|
||||
&result
|
||||
));
|
||||
assert!(!is_type_safe_for_sink(
|
||||
&[SsaValue(0)],
|
||||
Cap::CODE_EXEC,
|
||||
&result
|
||||
));
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue