Prerelease cleanup (#46)

* feat: Add const_bound_vars tracking to prevent false positives in ownership checks

* feat: Introduce field interner and typed bounded vars for enhanced type tracking

* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking

* feat: Centralize method name extraction with bare_method_name helper

* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch

* feat: Enhance C++ taint tracking with additional container operations and inline method resolution

* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking

* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis

* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations

* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details

* test: Add comprehensive tests for lattice algebra laws and SSA edge cases

* feat: Add destructured session user handling and safe user ID access patterns

* feat: Implement row-population reverse-walk for enhanced authorization checks

* feat: Enhance authorization checks with local alias chain for self-actor types

* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction

* feat: Implement chained method call inner-gate rebinding for SSRF prevention

* feat: Add observability and error modules, enhance debug functionality, and implement theme context

* feat: Remove Auth Analysis page and update navigation to redirect to Explorer

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity

* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build

The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(closure-capture): flip JS/TS fixtures to required-finding

The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.

Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".

Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis

* feat: Introduce health module and enhance health score computation with calibration tests

* feat: Add expectations configuration and cleanup .gitignore for log files

* feat: Implement theme selection and enhance settings panel for triage sync

* feat: Suppress false positives for strcpy calls with literal sources in AST

* feat: Update analyse_function_ssa to return body CFG for accurate analysis

* feat: Add bug report and feature request templates for improved issue tracking

* feat: removed dev scripts

* feat: update README.md for clarity and consistency in fixture descriptions

* feat: removed dev docs

* feat: clean up error handling and UI elements for improved user experience

* feat: adjust button sizes in HeaderBar for better UI consistency

* feat: enhance taint analysis with additional context for sanitizer and taint findings

* cargo fmt

* prettier

* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts

* feat: add script to frame PNG screenshots with brand gradient

* feat: add fuzzing support with new targets and CI workflows

* refactor: streamline match expressions and improve formatting in CLI and output handling

* feat: enhance configuration display with detailed output options

* feat: stage demo configuration for improved CLI screenshot output

* feat: expose merge_configs function for user-configurable settings

* refactor: simplify code structure and improve readability in config handling

* refactor: improve descriptions for vulnerability patterns in various languages

* feat: update MIT License section with additional usage details and copyright information

* feat: update screenshots

* refactor: update build process and paths for frontend assets

* feat: add cross-file taint fuzzing target and supporting dictionary

* refactor: clean up formatting and comments in fuzz configuration and example files

* refactor: remove outdated comments and clean up CI configuration files

* chore: update changelog dates and improve formatting in documentation

* refactor: update Cargo.toml and CI configuration for improved packaging and build process

* refactor: enhance quote-stripping logic to prevent panics and add regression tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Eli Peter 2026-04-29 00:58:38 -04:00 committed by GitHub
parent 79c29b394d
commit 82f18184b1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
348 changed files with 48731 additions and 2925 deletions

View file

@ -215,6 +215,8 @@ mod tests {
value_defs: defs,
cfg_node_map: HashMap::new(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
}
}

View file

@ -59,9 +59,12 @@ impl ConstLattice {
return ConstLattice::Int(i);
}
// String: strip surrounding quotes
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
// String: strip surrounding quotes. Require len >= 2 so a lone `'`
// or `"` (where starts_with and ends_with both match the same byte)
// does not produce an empty `[1..0]` slice and panic.
if trimmed.len() >= 2
&& ((trimmed.starts_with('"') && trimmed.ends_with('"'))
|| (trimmed.starts_with('\'') && trimmed.ends_with('\'')))
{
let inner = &trimmed[1..trimmed.len() - 1];
return ConstLattice::Str(inner.to_string());
@ -279,6 +282,12 @@ fn eval_inst(inst: &SsaInst, values: &HashMap<SsaValue, ConstLattice>) -> ConstL
| SsaOp::Param { .. }
| SsaOp::SelfParam
| SsaOp::CatchParam => ConstLattice::Varying,
// FieldProj: projecting a field is dynamic with respect to the
// const-propagation lattice — there is no general way to fold
// `obj.field` to a known scalar at this phase. Returning Varying
// matches Call: callers needing field-level constness will go
// through the points-to / heap analysis.
SsaOp::FieldProj { .. } => ConstLattice::Varying,
SsaOp::Phi(_) => ConstLattice::Varying, // phis in body shouldn't happen
SsaOp::Nop => ConstLattice::Varying,
// Undef contributes no knowledge: `Top` is the lattice identity
@ -303,6 +312,7 @@ fn inst_uses(inst: &SsaInst) -> Vec<SsaValue> {
}
vals
}
SsaOp::FieldProj { receiver, .. } => vec![*receiver],
SsaOp::Source
| SsaOp::Const(_)
| SsaOp::Param { .. }
@ -626,6 +636,8 @@ mod tests {
value_defs,
cfg_node_map,
exception_edges: Vec::new(),
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
}
}
@ -751,4 +763,129 @@ mod tests {
Some(&ConstLattice::Bool(true))
);
}
/// Meet must be commutative: `a ⊓ b == b ⊓ a` for every pair of
/// lattice values. Iterates a representative cross product; failure
/// would indicate the implementation special-cased one operand.
#[test]
fn meet_lattice_is_commutative() {
let vals = [
ConstLattice::Top,
ConstLattice::Varying,
ConstLattice::Null,
ConstLattice::Int(0),
ConstLattice::Int(42),
ConstLattice::Bool(true),
ConstLattice::Bool(false),
ConstLattice::Str("a".into()),
ConstLattice::Str("b".into()),
];
for a in &vals {
for b in &vals {
assert_eq!(
a.meet(b),
b.meet(a),
"meet should be commutative for ({a:?}, {b:?})"
);
}
}
}
/// Meet must be associative: `(a ⊓ b) ⊓ c == a ⊓ (b ⊓ c)`.
#[test]
fn meet_lattice_is_associative() {
let vals = [
ConstLattice::Top,
ConstLattice::Varying,
ConstLattice::Null,
ConstLattice::Int(0),
ConstLattice::Int(42),
ConstLattice::Bool(true),
ConstLattice::Str("x".into()),
];
for a in &vals {
for b in &vals {
for c in &vals {
let lhs = a.meet(b).meet(c);
let rhs = a.meet(&b.meet(c));
assert_eq!(lhs, rhs, "associativity broken on ({a:?},{b:?},{c:?})");
}
}
}
}
/// Meet must be idempotent: `a ⊓ a == a` for every lattice value.
#[test]
fn meet_lattice_is_idempotent() {
let vals = [
ConstLattice::Top,
ConstLattice::Varying,
ConstLattice::Null,
ConstLattice::Int(7),
ConstLattice::Bool(false),
ConstLattice::Str("y".into()),
];
for a in &vals {
assert_eq!(a.meet(a), a.clone(), "idempotence broken on {a:?}");
}
}
/// Top is the meet identity: `Top ⊓ x == x` for every value.
/// Varying is meet-absorbing: `Varying ⊓ x == Varying`.
/// Two distinct concrete values meet to Varying.
#[test]
fn meet_lattice_extremes() {
let xs = [
ConstLattice::Null,
ConstLattice::Int(1),
ConstLattice::Bool(true),
ConstLattice::Str("a".into()),
];
for x in &xs {
assert_eq!(ConstLattice::Top.meet(x), x.clone());
assert_eq!(x.meet(&ConstLattice::Top), x.clone());
assert_eq!(ConstLattice::Varying.meet(x), ConstLattice::Varying);
assert_eq!(x.meet(&ConstLattice::Varying), ConstLattice::Varying);
}
assert_eq!(
ConstLattice::Int(1).meet(&ConstLattice::Int(2)),
ConstLattice::Varying
);
assert_eq!(
ConstLattice::Bool(true).meet(&ConstLattice::Bool(false)),
ConstLattice::Varying
);
assert_eq!(
ConstLattice::Str("a".into()).meet(&ConstLattice::Str("b".into())),
ConstLattice::Varying
);
}
/// Const parsing must round-trip integer signs. i64::MIN/MAX must
/// parse without overflow; arbitrary text falls back to a bare-string
/// const (current contract — tested here so a future change is
/// caught explicitly).
#[test]
fn const_parse_extremes_and_fallback() {
assert_eq!(
ConstLattice::parse(&i64::MAX.to_string()),
ConstLattice::Int(i64::MAX)
);
assert_eq!(
ConstLattice::parse(&i64::MIN.to_string()),
ConstLattice::Int(i64::MIN)
);
// Larger than i64 falls back to bare-string.
let huge = "99999999999999999999";
assert_eq!(
ConstLattice::parse(huge),
ConstLattice::Str(huge.to_string())
);
// Empty string parses as empty Str (not panic).
assert_eq!(ConstLattice::parse(""), ConstLattice::Str("".into()));
// Lone quote characters must not panic in the quote-stripping path
// (regression for fuzz crash-2f943c14: `'` triggered &s[1..0]).
assert_eq!(ConstLattice::parse("'"), ConstLattice::Str("'".into()));
assert_eq!(ConstLattice::parse("\""), ConstLattice::Str("\"".into()));
}
}

View file

@ -213,6 +213,8 @@ mod tests {
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
@ -225,4 +227,494 @@ mod tests {
assert!(matches!(body.blocks[0].body[1].op, SsaOp::Nop));
assert!(matches!(body.blocks[0].body[2].op, SsaOp::Nop));
}
/// `resolve_root` has a 1000-iteration safety cap to avoid livelock if
/// a malformed copy map ever contains a cycle (SSA itself is acyclic,
/// but defensively we want this guarantee on the helper). Confirm the
/// cap actually fires by feeding a hand-crafted cycle a → b → a.
#[test]
fn resolve_root_terminates_on_cyclic_copy_map() {
let mut map: std::collections::HashMap<SsaValue, SsaValue> =
std::collections::HashMap::new();
map.insert(SsaValue(0), SsaValue(1));
map.insert(SsaValue(1), SsaValue(0));
// Must terminate; the exact returned value isn't a correctness
// guarantee under malformed input, but no infinite loop is.
let _root = resolve_root(SsaValue(0), &map);
}
/// A four-deep copy chain v3 = v2 = v1 = v0 must collapse to v0
/// in a single `copy_propagate` pass — the resolved replacement
/// map drives downstream alias recovery, so the *transitive*
/// closure must be exposed, not just the immediate parent.
#[test]
fn deep_copy_chain_collapses_to_root() {
let mut cfg: Cfg = Graph::new();
let nodes: Vec<_> = (0..4)
.map(|_| cfg.add_node(make_cfg_node(StmtKind::Seq)))
.collect();
let mut block_body = vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("\"x\"".into())),
cfg_node: nodes[0],
var_name: Some("a".into()),
span: (0, 1),
}];
for (i, node) in nodes.iter().enumerate().take(4).skip(1) {
block_body.push(SsaInst {
value: SsaValue(i as u32),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue((i - 1) as u32), 1)),
cfg_node: *node,
var_name: Some(format!("v{i}")),
span: (i, i + 1),
});
}
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: block_body,
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: (0..4)
.map(|i| ValueDef {
var_name: Some(format!("v{i}")),
cfg_node: nodes[i],
block: BlockId(0),
})
.collect(),
cfg_node_map: nodes
.iter()
.enumerate()
.map(|(i, n)| (*n, SsaValue(i as u32)))
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 3, "v1, v2, v3 must all be eliminated");
for i in 1..4 {
assert_eq!(
copy_map.get(&SsaValue(i)),
Some(&SsaValue(0)),
"v{i} must resolve transitively to v0"
);
}
}
// ─────────────────────────────────────────────────────────────────
// Skip-conditions: copy-prop must NOT erase semantic info attached
// to a copy's CFG node. These guard the three early-exits in
// `copy_propagate`: labels, numeric-length, and string_prefix.
// ─────────────────────────────────────────────────────────────────
/// Build a single-block SSA body containing
/// v0 = Const, v1 = Assign(v0)
/// with `node1_decorator` applied to v1's CFG node so individual
/// skip-conditions can be exercised.
fn build_two_inst_body(decorate: impl FnOnce(&mut NodeInfo)) -> (Cfg, SsaBody) {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut n1_info = make_cfg_node(StmtKind::Seq);
decorate(&mut n1_info);
let n1 = cfg.add_node(n1_info);
let body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("42".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 2),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("y".into()),
span: (3, 5),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("y".into()),
cfg_node: n1,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
(cfg, body)
}
/// Skip path 1: an Assign whose CFG node carries a label
/// (sanitizer/source/sink) must NOT be propagated through. Erasing
/// that label would silently drop a sanitization step from the
/// taint path.
#[test]
fn copy_with_label_on_cfg_node_is_not_propagated() {
use crate::labels::{Cap, DataLabel};
use smallvec::smallvec;
let (cfg, mut body) = build_two_inst_body(|info| {
info.taint.labels = smallvec![DataLabel::Sanitizer(Cap::SHELL_ESCAPE)];
});
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 0, "copy through a labeled node must be skipped");
assert!(
matches!(body.blocks[0].body[1].op, SsaOp::Assign(_)),
"labeled copy must remain an Assign, not be Nop'd"
);
}
/// Skip path 2: numeric-length reads (`arr.length`, `map.size`)
/// have a different type from their source — propagating through
/// would erase the Int type fact.
#[test]
fn copy_through_numeric_length_access_is_not_propagated() {
let (cfg, mut body) = build_two_inst_body(|info| {
info.is_numeric_length_access = true;
});
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
assert_eq!(
eliminated, 0,
"copy through numeric-length access must be skipped"
);
}
/// Skip path 3: an Assign carrying a `string_prefix` (template
/// literal or `"lit" + var` RHS) seeds a StringFact on its SSA
/// value. Propagating past it erases the prefix-bearing value and
/// breaks SSRF prefix-lock suppression downstream.
#[test]
fn copy_through_string_prefix_node_is_not_propagated() {
let (cfg, mut body) = build_two_inst_body(|info| {
info.string_prefix = Some("https://api.example.com/".into());
});
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
assert_eq!(
eliminated, 0,
"copy through string_prefix-bearing node must be skipped"
);
}
/// Multi-operand Assigns (e.g. `v2 = v0 + v1`) are NOT copies and
/// must be left alone. Only single-operand Assigns are copies.
#[test]
fn multi_operand_assign_is_not_a_copy() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("1".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 1),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Const(Some("2".into())),
cfg_node: n1,
var_name: Some("y".into()),
span: (2, 3),
},
SsaInst {
value: SsaValue(2),
op: SsaOp::Assign({
let mut v: SmallVec<[SsaValue; 4]> = SmallVec::new();
v.push(SsaValue(0));
v.push(SsaValue(1));
v
}),
cfg_node: n2,
var_name: Some("z".into()),
span: (4, 5),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("y".into()),
cfg_node: n1,
block: BlockId(0),
},
ValueDef {
var_name: Some("z".into()),
cfg_node: n2,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 0, "two-operand Assign is not a copy");
assert!(
matches!(body.blocks[0].body[2].op, SsaOp::Assign(_)),
"multi-operand Assign must be preserved"
);
}
/// A Call's argument and receiver slots that reference a
/// copy-eliminated value must be rewritten to the root.
#[test]
fn call_args_and_receiver_rewritten_through_copy() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n2 = cfg.add_node(make_cfg_node(StmtKind::Call));
let mut arg_vec: SmallVec<[SsaValue; 2]> = SmallVec::new();
arg_vec.push(SsaValue(1));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("\"x\"".into())),
cfg_node: n0,
var_name: Some("a".into()),
span: (0, 1),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("b".into()),
span: (2, 3),
},
SsaInst {
value: SsaValue(2),
op: SsaOp::Call {
callee: "f".into(),
callee_text: None,
args: vec![arg_vec],
receiver: Some(SsaValue(1)),
},
cfg_node: n2,
var_name: None,
span: (4, 7),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("a".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("b".into()),
cfg_node: n1,
block: BlockId(0),
},
ValueDef {
var_name: None,
cfg_node: n2,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let (eliminated, _) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 1, "v1 should be eliminated");
let call_inst = &body.blocks[0].body[2];
match &call_inst.op {
SsaOp::Call { args, receiver, .. } => {
assert_eq!(receiver, &Some(SsaValue(0)), "receiver rewritten to root");
assert_eq!(args[0][0], SsaValue(0), "call arg rewritten to root");
}
other => panic!("expected Call op, got {:?}", other),
}
}
/// Phi operand referencing a copy-eliminated value must be
/// rewritten to the root.
#[test]
fn phi_operand_rewritten_through_copy() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
// Block 0: v0=const, v1=assign(v0)
// Block 1: v2 = phi(B0: v1)
let mut phi_ops: smallvec::SmallVec<[(BlockId, SsaValue); 2]> = smallvec::SmallVec::new();
phi_ops.push((BlockId(0), SsaValue(1)));
let mut body = SsaBody {
blocks: vec![
SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("\"v0\"".into())),
cfg_node: n0,
var_name: Some("a".into()),
span: (0, 1),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("b".into()),
span: (2, 3),
},
],
terminator: Terminator::Goto(BlockId(1)),
preds: SmallVec::new(),
succs: {
let mut s = SmallVec::new();
s.push(BlockId(1));
s
},
},
SsaBlock {
id: BlockId(1),
phis: vec![SsaInst {
value: SsaValue(2),
op: SsaOp::Phi(phi_ops),
cfg_node: n2,
var_name: Some("b".into()),
span: (4, 5),
}],
body: vec![],
terminator: Terminator::Return(Some(SsaValue(2))),
preds: {
let mut p = SmallVec::new();
p.push(BlockId(0));
p
},
succs: SmallVec::new(),
},
],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("a".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("b".into()),
cfg_node: n1,
block: BlockId(0),
},
ValueDef {
var_name: Some("b".into()),
cfg_node: n2,
block: BlockId(1),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let (eliminated, _map) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 1);
// The phi in block 1 should now reference v0, not v1.
let phi = &body.blocks[1].phis[0];
match &phi.op {
SsaOp::Phi(ops) => {
assert_eq!(
ops[0].1,
SsaValue(0),
"phi operand should be rewritten to root v0"
);
}
other => panic!("expected Phi op, got {:?}", other),
}
}
/// `copy_propagate` on a body with no Assign instructions returns
/// `(0, empty_map)` and leaves the body untouched.
#[test]
fn no_op_when_no_copies_exist() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("42".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 2),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let (eliminated, map) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 0);
assert!(map.is_empty());
}
}

View file

@ -143,6 +143,7 @@ fn inst_used_values(inst: &SsaInst) -> Vec<SsaValue> {
}
vals
}
SsaOp::FieldProj { receiver, .. } => vec![*receiver],
SsaOp::Source
| SsaOp::Const(_)
| SsaOp::Param { .. }
@ -214,6 +215,8 @@ mod tests {
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
@ -260,6 +263,8 @@ mod tests {
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
@ -307,6 +312,8 @@ mod tests {
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
@ -350,6 +357,8 @@ mod tests {
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
@ -385,6 +394,8 @@ mod tests {
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
@ -392,6 +403,142 @@ mod tests {
assert!(body.blocks[0].body.is_empty());
}
#[test]
fn dce_keeps_field_proj_when_used() {
// v0 = source(); v1 = field_proj(v0, "field"); ret v1
// The terminator references v1, so the FieldProj's receiver chain
// (v0) must stay reachable.
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut interner = crate::ssa::ir::FieldInterner::new();
let fid = interner.intern("field");
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Source,
cfg_node: n0,
var_name: Some("obj".into()),
span: (0, 5),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::FieldProj {
receiver: SsaValue(0),
field: fid,
projected_type: None,
},
cfg_node: n1,
var_name: Some("obj.field".into()),
span: (10, 20),
},
],
terminator: Terminator::Return(Some(SsaValue(1))),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("obj".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("obj.field".into()),
cfg_node: n1,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: interner,
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(
removed, 0,
"FieldProj reachable from terminator must survive"
);
assert_eq!(body.blocks[0].body.len(), 2);
}
#[test]
fn dce_removes_dead_field_proj() {
// v0 = const("x"); v1 = field_proj(v0, "field"); ret (no v1 use)
// Both should be removed since neither has a use and neither is
// a Source/Call/labeled instruction.
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut interner = crate::ssa::ir::FieldInterner::new();
let fid = interner.intern("field");
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("x".into())),
cfg_node: n0,
var_name: Some("obj".into()),
span: (0, 1),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::FieldProj {
receiver: SsaValue(0),
field: fid,
projected_type: None,
},
cfg_node: n1,
var_name: Some("obj.field".into()),
span: (2, 12),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("obj".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("obj.field".into()),
cfg_node: n1,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: interner,
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
// First pass removes the FieldProj (no uses), second removes the Const
// (no uses after FieldProj is gone).
assert_eq!(
removed, 2,
"dead FieldProj and its dead receiver const must be removed"
);
assert!(body.blocks[0].body.is_empty());
}
#[test]
fn used_def_preserved() {
// v0 = const("42"), v1 = assign(v0) — v0 is used, both survive
@ -438,6 +585,8 @@ mod tests {
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
@ -446,4 +595,219 @@ mod tests {
assert_eq!(removed, 2);
assert_eq!(body.blocks[0].body.len(), 0);
}
/// DCE must NEVER remove a Call instruction even when its result has
/// zero uses — calls have side effects (I/O, throws, mutations) that
/// cannot be modeled as SSA-value uses. This is the conservative
/// invariant `is_dead()` enforces; regressing it would silently drop
/// real-world code from analysis (sinks, sanitizers expressed as
/// expression-statements, etc.).
#[test]
fn dead_call_with_unused_result_preserved() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Call));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Call {
callee: "side_effect".into(),
callee_text: None,
args: Vec::new(),
receiver: None,
},
cfg_node: n0,
var_name: None,
span: (0, 12),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: None,
cfg_node: n0,
block: BlockId(0),
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(
removed, 0,
"Call with unused result must be preserved (side effects)"
);
assert_eq!(body.blocks[0].body.len(), 1);
assert!(matches!(body.blocks[0].body[0].op, SsaOp::Call { .. }));
}
/// A dead phi must be eliminated. We construct an entry block whose
/// successor has a phi merging two unused constants and a Return(None).
/// All defs are dead; DCE should strip every body and phi instruction.
#[test]
fn dead_phi_in_otherwise_dead_block_removed() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let entry_block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("1".into())),
cfg_node: n0,
var_name: Some("a".into()),
span: (0, 1),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Const(Some("2".into())),
cfg_node: n1,
var_name: Some("b".into()),
span: (1, 2),
},
],
terminator: Terminator::Goto(BlockId(1)),
preds: SmallVec::new(),
succs: SmallVec::from_elem(BlockId(1), 1),
};
let join_block = SsaBlock {
id: BlockId(1),
phis: vec![SsaInst {
value: SsaValue(2),
op: SsaOp::Phi(smallvec::smallvec![
(BlockId(0), SsaValue(0)),
(BlockId(0), SsaValue(1)),
]),
cfg_node: n2,
var_name: Some("phi".into()),
span: (2, 3),
}],
body: vec![],
terminator: Terminator::Return(None),
preds: SmallVec::from_elem(BlockId(0), 1),
succs: SmallVec::new(),
};
let mut body = SsaBody {
blocks: vec![entry_block, join_block],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("a".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("b".into()),
cfg_node: n1,
block: BlockId(0),
},
ValueDef {
var_name: Some("phi".into()),
cfg_node: n2,
block: BlockId(1),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
// Pass 1: the phi (no uses) goes; that drops the use-counts on v0/v1.
// Pass 2: v0 and v1 (now unused) go.
assert_eq!(removed, 3, "dead phi + two operands should be removed");
assert!(
body.blocks[1].phis.is_empty(),
"dead phi must be eliminated"
);
assert!(body.blocks[0].body.is_empty());
}
/// DCE iteration: removing v1 should make v0 dead on the next pass.
/// Mirrors `used_def_preserved` but explicit about the chain.
#[test]
fn dce_iterates_until_fixpoint() {
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("1".into())),
cfg_node: n0,
var_name: Some("a".into()),
span: (0, 1),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("b".into()),
span: (1, 2),
},
SsaInst {
value: SsaValue(2),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(1), 1)),
cfg_node: n2,
var_name: Some("c".into()),
span: (2, 3),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("a".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("b".into()),
cfg_node: n1,
block: BlockId(0),
},
ValueDef {
var_name: Some("c".into()),
cfg_node: n2,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(
removed, 3,
"DCE must reach fixpoint and remove all 3 dead defs in the chain"
);
assert!(body.blocks[0].body.is_empty());
}
}

View file

@ -48,6 +48,7 @@ impl fmt::Display for SsaBody {
callee,
args,
receiver,
..
} => {
if let Some(rv) = receiver {
write!(f, "v{}.{callee}(", rv.0)?;
@ -64,6 +65,20 @@ impl fmt::Display for SsaBody {
.collect();
write!(f, "{})", arg_strs.join(", "))?;
}
SsaOp::FieldProj {
receiver,
field,
projected_type,
} => {
// Resolve the field name through the body's interner
// so display output matches the original source field.
let name = self.field_interner.resolve(*field);
if let Some(ty) = projected_type {
write!(f, "field_proj(v{}, {name:?}) :: {ty:?}", receiver.0)?;
} else {
write!(f, "field_proj(v{}, {name:?})", receiver.0)?;
}
}
SsaOp::Source => write!(f, "source()")?,
SsaOp::Const(val) => {
if let Some(v) = val {

View file

@ -23,7 +23,7 @@
#![allow(clippy::collapsible_if, clippy::unnecessary_map_or)]
use crate::cfg::Cfg;
use crate::labels::Cap;
use crate::labels::{Cap, bare_method_name};
use crate::ssa::ir::*;
use crate::ssa::pointsto::{ContainerOp, classify_container_op};
use crate::symbol::Lang;
@ -588,7 +588,7 @@ fn is_container_literal(text: &str) -> bool {
/// Check if a callee creates a new container (constructor/factory).
pub fn is_container_constructor(callee: &str, lang: Lang) -> bool {
// Extract last segment after '.' or '::' (whichever comes last)
let after_dot = callee.rsplit('.').next().unwrap_or(callee);
let after_dot = bare_method_name(callee);
let suffix = after_dot.rsplit("::").next().unwrap_or(after_dot);
let suffix_lower = suffix.to_ascii_lowercase();

View file

@ -548,6 +548,7 @@ fn op_kind(op: &SsaOp) -> &'static str {
SsaOp::CatchParam => "CatchParam",
SsaOp::Nop => "Nop",
SsaOp::Undef => "Undef",
SsaOp::FieldProj { .. } => "FieldProj",
}
}
@ -785,6 +786,8 @@ mod tests {
value_defs: vec![],
cfg_node_map: Default::default(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let errs = check_structural_invariants(&body);
assert!(
@ -830,6 +833,8 @@ mod tests {
}],
cfg_node_map: Default::default(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let errs = check_structural_invariants(&body);
assert!(
@ -878,6 +883,8 @@ mod tests {
}],
cfg_node_map: Default::default(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let errs = check_structural_invariants(&body);
assert!(
@ -904,6 +911,8 @@ mod tests {
value_defs: vec![],
cfg_node_map: Default::default(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let errs = check_structural_invariants(&body);
assert!(

View file

@ -1,8 +1,10 @@
use crate::constraint::domain::ConstValue;
use crate::constraint::lower::ConditionExpr;
use crate::ssa::type_facts::TypeKind;
use petgraph::graph::NodeIndex;
use serde::{Deserialize, Serialize};
use smallvec::SmallVec;
use std::collections::HashMap;
/// Unique identifier for an SSA value (one per definition point).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
@ -12,6 +14,141 @@ pub struct SsaValue(pub u32);
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
pub struct BlockId(pub u32);
/// Interned field-name identifier, scoped to a single [`SsaBody`].
///
/// Different bodies may assign different `FieldId`s to the same field name,
/// so callers MUST resolve through the owning body's [`FieldInterner`]
/// (`SsaBody::field_name`) before using the name in cross-body contexts
/// (e.g. summary serialization).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
pub struct FieldId(pub u32);
impl FieldId {
/// Pointer-Phase 4 sentinel for the abstract "any element of a
/// container" field. Steensgaard-grade precision: every numeric
/// or dynamic index access (`arr[i]`, `arr.shift()`, `map[k]`)
/// projects through the same `Field(pt(container), ELEM)` cell so
/// per-element taint propagation is independent of the SSA value
/// referencing the container.
///
/// `u32::MAX` is reserved by convention; the per-body
/// [`FieldInterner`] never assigns it because interning is
/// monotone-ascending from `0` and bodies don't approach 4 billion
/// fields. Consumers should compare with `==` rather than reach
/// into the wrapped `u32`.
pub const ELEM: FieldId = FieldId(u32::MAX);
/// "Tainted at every field" wildcard sentinel — distinct from
/// [`Self::ELEM`] (which is container-element semantics: every
/// numeric/dynamic index access projects through it).
/// `ANY_FIELD` represents the case where a writeback-shaped sink
/// (`json.NewDecoder(r.Body).Decode(&dest)`,
/// `proto.Unmarshal(buf, &msg)`) taints the destination wholesale
/// without a per-field decomposition the caller can enumerate.
/// Read by [`SsaOp::FieldProj`] as a fallback when no specific
/// `(loc, *field)` cell exists, so subsequent struct-field reads
/// pick up the writeback's taint without over-tainting unrelated
/// containers' element cells. `u32::MAX - 1` is reserved
/// alongside `ELEM` and is similarly never assigned by the per-
/// body interner.
pub const ANY_FIELD: FieldId = FieldId(u32::MAX - 1);
}
/// Per-body interner for field names referenced by [`SsaOp::FieldProj`].
///
/// Names are deduped within a single SSA body: every distinct field-name
/// string is assigned a stable `FieldId(u32)` for the lifetime of the body.
/// The interner is serialized alongside the body so deserialization restores
/// IDs intact; cross-body summary code is responsible for resolving names
/// before passing them across body boundaries.
#[derive(Clone, Debug, Default, Serialize, Deserialize)]
pub struct FieldInterner {
/// Names indexed by `FieldId.0`.
names: Vec<String>,
/// Reverse lookup: name → existing FieldId.
#[serde(skip)]
lookup: HashMap<String, u32>,
}
impl FieldInterner {
/// Create an empty interner.
pub fn new() -> Self {
Self::default()
}
/// Intern a field name, returning its [`FieldId`]. Reuses the existing
/// id if the name has already been interned.
pub fn intern(&mut self, name: &str) -> FieldId {
if let Some(&id) = self.lookup.get(name) {
return FieldId(id);
}
let id = self.names.len() as u32;
self.names.push(name.to_string());
self.lookup.insert(name.to_string(), id);
FieldId(id)
}
/// Read-only lookup: returns the [`FieldId`] for `name` if it has
/// already been interned, or `None` otherwise.
///
/// Used by cross-call resolvers (Pointer-Phase 5 / W3) to avoid
/// growing the caller's interner with field names introduced
/// solely by the callee summary — such IDs would never be referenced
/// by any other instruction in the caller's body, so the cells
/// would be write-only and consume space without contributing
/// to taint flow.
pub fn lookup(&self, name: &str) -> Option<FieldId> {
// Walk `names` directly so we don't require the post-deserialise
// `ensure_lookup()` rebuild before this method is callable.
// Callers usually own `&SsaBody` — interning was either done at
// lowering time or via `ensure_lookup` post-deserialise — so the
// hot path goes through the `lookup` table; the linear walk is
// a fallback for the (small) deserialised-but-not-rebuilt case.
if let Some(&id) = self.lookup.get(name) {
return Some(FieldId(id));
}
for (idx, n) in self.names.iter().enumerate() {
if n == name {
return Some(FieldId(idx as u32));
}
}
None
}
/// Resolve a [`FieldId`] back to its interned name.
pub fn resolve(&self, id: FieldId) -> &str {
&self.names[id.0 as usize]
}
/// Number of unique interned names.
pub fn len(&self) -> usize {
self.names.len()
}
/// Whether the interner contains no names.
pub fn is_empty(&self) -> bool {
self.names.is_empty()
}
/// Rebuild the reverse lookup after deserialization. Called lazily by
/// [`Self::ensure_lookup`] so deserialized interners can still be used
/// for further interning.
fn rebuild_lookup(&mut self) {
self.lookup.clear();
for (i, n) in self.names.iter().enumerate() {
self.lookup.entry(n.clone()).or_insert(i as u32);
}
}
/// Ensure the reverse lookup is populated (rebuilds after a serde
/// roundtrip when the lookup table was skipped).
pub fn ensure_lookup(&mut self) {
if self.lookup.len() != self.names.len() {
self.rebuild_lookup();
}
}
}
/// SSA instruction operation.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum SsaOp {
@ -20,13 +157,48 @@ pub enum SsaOp {
/// Assignment: result depends on the listed SSA values.
Assign(SmallVec<[SsaValue; 4]>),
/// Function/method call.
///
/// `callee` is the canonical name SSA-time consumers should match on.
/// When SSA lowering decomposes a chained-receiver method call into a
/// `FieldProj` chain (e.g. `c.mu.Lock()` → `v_mu = FieldProj(v_c, "mu")`,
/// `Call("Lock", [v_mu])`), `callee` carries the bare method name
/// (`"Lock"`) and `callee_text` carries the original full path
/// (`Some("c.mu.Lock")`). When no decomposition happens, `callee_text`
/// is `None` and `callee` already holds the original textual form.
Call {
callee: String,
/// Original textual full path when SSA decomposed a chained receiver.
/// `None` when the callee was not rewritten — `callee` already holds
/// the source-level textual form.
///
/// **Debug / display only.** Analysis code must walk the SSA receiver
/// chain (through `FieldProj` ops) for precise field structure, or
/// use [`crate::labels::bare_method_name`] when only the terminal
/// method name is needed from a textual callee.
#[doc(hidden)]
#[serde(default)]
callee_text: Option<String>,
/// Per-argument SSA value uses.
args: Vec<SmallVec<[SsaValue; 2]>>,
/// Receiver SSA value (for method calls).
receiver: Option<SsaValue>,
},
/// Field projection: read field `field` of object `receiver`.
///
/// Models member-access expressions (`obj.field`) as a first-class SSA
/// op. Lowering walks the receiver tree so chained accesses like
/// `c.writer.header` produce a chain of `FieldProj` ops with explicit
/// per-step receivers — eliminating the textual-prefix parsing that
/// previously misclassified deep receivers (the gin/context.go FP).
///
/// `field` is interned in the owning [`SsaBody`]'s [`FieldInterner`].
/// `projected_type` carries the inferred type of the projected field
/// when known (populated by type-fact analysis), `None` otherwise.
FieldProj {
receiver: SsaValue,
field: FieldId,
projected_type: Option<TypeKind>,
},
/// Taint source introduction.
Source,
/// Constant / literal value (no taint).
@ -168,6 +340,31 @@ pub struct SsaBody {
/// Recorded during lowering when exception edges are stripped from the CFG.
/// Used by taint analysis to seed catch blocks with try-body taint state.
pub exception_edges: Vec<(BlockId, BlockId)>,
/// Per-body interner for [`SsaOp::FieldProj`] field names.
///
/// Empty until the lowering phase emits FieldProj ops (Phase 2 of the
/// field-projections rollout). Cross-body callers (cross-file
/// summaries, debug serialization) MUST resolve interned ids through
/// this interner before transporting field references to other bodies.
#[serde(default)]
pub field_interner: FieldInterner,
/// Pointer-Phase 3 / W1: side-table mapping a synthetic base-update
/// [`SsaOp::Assign`]'s defined value back to the `(receiver, field)`
/// pair it represents. Populated by SSA lowering at the
/// `obj.f = rhs` synthesis point so the taint engine can recognise
/// the synthetic assign as a structural field WRITE — the assigned
/// value is the new "obj" value, the use is the rhs, and the side-
/// table records `(prior_obj_value, FieldId("f"))`.
///
/// Empty by default; only synthetic assigns whose enclosing source
/// statement was a dotted-path assignment (`a.b.c = …`) appear here.
/// Lookup is `O(log n)` worst case (`HashMap`), but the per-body
/// table is small (one entry per synthetic chain link).
///
/// Serialized via `#[serde(default)]` so pre-W1 SSA blobs decode
/// cleanly with an empty map (no migration needed).
#[serde(default)]
pub field_writes: HashMap<SsaValue, (SsaValue, FieldId)>,
}
impl SsaBody {
@ -190,6 +387,53 @@ impl SsaBody {
pub fn def_of(&self, v: SsaValue) -> &ValueDef {
&self.value_defs[v.0 as usize]
}
/// Resolve a [`FieldId`] back to the interned field name within this body.
pub fn field_name(&self, id: FieldId) -> &str {
self.field_interner.resolve(id)
}
/// Intern a field name in this body's [`FieldInterner`], returning its
/// stable [`FieldId`].
pub fn intern_field(&mut self, name: &str) -> FieldId {
self.field_interner.intern(name)
}
}
impl SsaInst {
/// Iterate over the SSA values used (read) by this instruction.
///
/// Yields receiver/operand values for `Call`, `Phi`, `Assign`, and
/// `FieldProj`; nothing for leaf ops (`Const`, `Param`, `Source`, etc.).
/// Callers that need the values as a `Vec` should `.collect()`.
pub fn uses_iter(&self) -> SmallVec<[SsaValue; 4]> {
match &self.op {
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
SsaOp::Assign(uses) => uses.iter().copied().collect(),
SsaOp::Call { args, receiver, .. } => {
let mut out: SmallVec<[SsaValue; 4]> = SmallVec::new();
if let Some(rv) = receiver {
out.push(*rv);
}
for arg in args {
out.extend(arg.iter().copied());
}
out
}
SsaOp::FieldProj { receiver, .. } => {
let mut out: SmallVec<[SsaValue; 4]> = SmallVec::new();
out.push(*receiver);
out
}
SsaOp::Source
| SsaOp::Const(_)
| SsaOp::Param { .. }
| SsaOp::SelfParam
| SsaOp::CatchParam
| SsaOp::Nop
| SsaOp::Undef => SmallVec::new(),
}
}
}
/// Errors that can occur during SSA lowering.
@ -211,3 +455,149 @@ impl std::fmt::Display for SsaError {
}
impl std::error::Error for SsaError {}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn field_interner_dedupes_names() {
let mut interner = FieldInterner::new();
let a = interner.intern("mu");
let b = interner.intern("mu");
let c = interner.intern("writer");
assert_eq!(a, b, "interning same name twice yields same id");
assert_ne!(a, c, "different names get different ids");
assert_eq!(interner.resolve(a), "mu");
assert_eq!(interner.resolve(c), "writer");
assert_eq!(interner.len(), 2);
}
#[test]
fn field_interner_serde_roundtrip_rebuilds_lookup() {
let mut interner = FieldInterner::new();
let a = interner.intern("mu");
let b = interner.intern("writer");
let json = serde_json::to_string(&interner).expect("serialize");
let mut restored: FieldInterner = serde_json::from_str(&json).expect("deserialize");
assert_eq!(restored.resolve(a), "mu");
assert_eq!(restored.resolve(b), "writer");
// After ensure_lookup, intern("mu") returns the original id (not a new one).
restored.ensure_lookup();
assert_eq!(restored.intern("mu"), a);
assert_eq!(restored.intern("header"), FieldId(2));
}
#[test]
fn field_proj_use_iter_includes_receiver() {
let inst = SsaInst {
value: SsaValue(3),
op: SsaOp::FieldProj {
receiver: SsaValue(1),
field: FieldId(0),
projected_type: None,
},
cfg_node: NodeIndex::new(0),
var_name: Some("c.mu".into()),
span: (0, 0),
};
let uses: Vec<SsaValue> = inst.uses_iter().into_iter().collect();
assert_eq!(uses, vec![SsaValue(1)]);
}
/// Pointer-Phase 4 / A6 audit: the [`FieldId::ELEM`] sentinel is
/// reserved for "any element of a container". The interner assigns
/// IDs monotonically from `0`, so the sentinel `u32::MAX` can only
/// collide if the body declares ~4 billion fields — a corner case
/// no realistic codebase reaches. Pin the contract with a stress
/// loop so future implementation drift can't silently shift IDs to
/// the sentinel value.
#[test]
fn field_interner_never_assigns_elem_sentinel() {
let mut interner = FieldInterner::new();
for i in 0..1024 {
let id = interner.intern(&format!("f{i}"));
assert_ne!(
id,
FieldId::ELEM,
"intern('f{i}') yielded the ELEM sentinel — invariant broken",
);
}
// Lookup of the sentinel name (used by W3 to round-trip
// container-element flow through summary) must NOT match a
// real interned name even when the same name is interned.
// The wire-format keeps `<elem>` as a *string marker* — it
// never goes through `intern`. Instead, callers compare
// explicitly against `FieldId::ELEM`.
assert_ne!(interner.intern("<elem>"), FieldId::ELEM);
}
/// A6: the `<elem>` marker round-trips through extraction →
/// SQLite → caller-side translation without colliding with a
/// caller-interned `<elem>` field. When a caller's body has its
/// own `<elem>` field, that gets a regular FieldId, distinct from
/// the sentinel.
#[test]
fn elem_marker_distinct_from_interner_assigned_id() {
let mut interner = FieldInterner::new();
let lit_elem = interner.intern("<elem>");
// Sentinel still compares equal to itself only.
assert_eq!(FieldId::ELEM, FieldId(u32::MAX));
assert_ne!(lit_elem, FieldId::ELEM);
// Resolve the literal-string id back to its interned name.
assert_eq!(interner.resolve(lit_elem), "<elem>");
}
#[test]
fn field_proj_serde_roundtrip_with_field_name() {
// Build a tiny body with one FieldProj op and check that the
// body's interner survives round-trip and the id resolves back
// to the original name.
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("c".into()),
cfg_node: NodeIndex::new(0),
block: BlockId(0),
}],
cfg_node_map: HashMap::new(),
exception_edges: vec![],
field_interner: FieldInterner::new(),
field_writes: HashMap::new(),
};
let fid = body.intern_field("mu");
body.blocks[0].body.push(SsaInst {
value: SsaValue(1),
op: SsaOp::FieldProj {
receiver: SsaValue(0),
field: fid,
projected_type: None,
},
cfg_node: NodeIndex::new(0),
var_name: Some("c.mu".into()),
span: (0, 0),
});
let json = serde_json::to_string(&body).expect("serialize body");
let restored: SsaBody = serde_json::from_str(&json).expect("deserialize body");
let inst = &restored.blocks[0].body[0];
match &inst.op {
SsaOp::FieldProj {
receiver, field, ..
} => {
assert_eq!(*receiver, SsaValue(0));
assert_eq!(restored.field_name(*field), "mu");
}
other => panic!("expected FieldProj, got {other:?}"),
}
}
}

File diff suppressed because it is too large Load diff

View file

@ -21,6 +21,7 @@ pub use lower::lower_to_ssa_scoped_nop;
pub use lower::lower_to_ssa_with_params;
use crate::cfg::Cfg;
use crate::ssa::type_facts::TypeKind;
use crate::symbol::Lang;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
@ -51,6 +52,19 @@ pub struct OptimizeResult {
///
/// Pipeline: const propagation → branch pruning → copy propagation → DCE → type facts.
pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> OptimizeResult {
optimize_ssa_with_param_types(body, cfg, lang, &[])
}
/// Same as [`optimize_ssa`] but seeds [`SsaOp::Param`] values with
/// per-position [`TypeKind`] facts derived from the function's
/// `BodyMeta.param_types`. Strictly additive: an empty slice or
/// `None` entries leave the type-fact analysis behaviour unchanged.
pub fn optimize_ssa_with_param_types(
body: &mut SsaBody,
cfg: &Cfg,
lang: Option<Lang>,
param_types: &[Option<TypeKind>],
) -> OptimizeResult {
// 1. Constant propagation (SCCP)
let cp = const_prop::const_propagate(body);
let branches_pruned = const_prop::apply_const_prop(body, &cp);
@ -65,7 +79,8 @@ pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> Optimi
let dead_defs_removed = dce::eliminate_dead_defs(body, cfg);
// 5. Type fact analysis (uses const prop results + language for constructor inference)
let type_facts = type_facts::analyze_types(body, cfg, &cp.values, lang);
let type_facts =
type_facts::analyze_types_with_param_types(body, cfg, &cp.values, lang, param_types);
// 6. Points-to analysis (uses allocation site detection + SSA def-use)
let points_to = heap::analyze_points_to(body, cfg, lang);

View file

@ -415,6 +415,8 @@ mod tests {
value_defs,
cfg_node_map: HashMap::new(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
}
}
@ -606,6 +608,7 @@ mod tests {
0,
SsaOp::Call {
callee: "list".to_string(),
callee_text: None,
args: vec![],
receiver: None,
},

View file

@ -4,6 +4,7 @@
//! across all supported languages so that taint flows correctly through
//! collection operations.
use crate::labels::bare_method_name;
use crate::symbol::Lang;
use smallvec::SmallVec;
@ -29,6 +30,14 @@ pub enum ContainerOp {
/// `index_arg`: same semantics as `Store::index_arg` — when present and
/// provably constant, loads from `HeapSlot::Index(n)`.
Load { index_arg: Option<usize> },
/// Taint flows from the receiver container into the argument at
/// `dest_arg` — i.e. the "writeback" pattern where a method writes its
/// decoded/loaded value into a caller-supplied destination rather than
/// returning it. Used for the Go `*.Decode(&dest)` family
/// (`json.Decoder.Decode`, `xml.Decoder.Decode`, `gob.Decoder.Decode`),
/// where `r.Body → json.NewDecoder(r.Body).Decode(&dest)` should taint
/// `dest` even though `Decode` returns only an `error`.
Writeback { dest_arg: usize },
}
/// Convenience: store with a single value argument, no index tracking.
@ -92,7 +101,7 @@ fn load_indexed(idx_pos: usize) -> Option<ContainerOp> {
/// Returns `None` if the callee is not a recognised container operation.
pub fn classify_container_op(callee: &str, lang: Lang) -> Option<ContainerOp> {
// Extract method name: last segment after '.' (or full name if no dot).
let method = callee.rsplit('.').next().unwrap_or(callee);
let method = bare_method_name(callee);
match lang {
Lang::JavaScript | Lang::TypeScript => classify_js(method),
@ -121,6 +130,10 @@ fn classify_js(method: &str) -> Option<ContainerOp> {
// map.get(key) — key at 0
"get" => load_indexed(0),
"values" | "keys" | "entries" => load(),
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
// lowering for subscript reads/writes (`arr[i]`, `arr[i] = v`).
"__index_get__" => load_indexed(0),
"__index_set__" => store_indexed(1, 0),
_ => None,
}
}
@ -140,6 +153,10 @@ fn classify_python(method: &str) -> Option<ContainerOp> {
"get" => load_indexed(0), // dict.get(key) / list index — key/index at 0
"items" | "values" | "keys" => load(),
"join" => load(),
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
// lowering for subscript reads/writes (`arr[i]`, `arr[i] = v`).
"__index_get__" => load_indexed(0),
"__index_set__" => store_indexed(1, 0),
_ => None,
}
}
@ -173,6 +190,24 @@ fn classify_go(method: &str, callee: &str) -> Option<ContainerOp> {
match method {
"Add" | "Set" | "Store" | "Put" => store(0),
"Get" | "Load" | "Pop" => load(),
// Stream-decoder writeback. In Go, the canonical decode pattern
// takes a destination as the sole positional argument and returns
// only an `error`:
// decoder := json.NewDecoder(r.Body)
// decoder.Decode(&dest)
// The decoder's receiver chain carries the source taint
// (`r.Body` → `json.NewDecoder(r.Body)` → `decoder`); without a
// writeback rule, the destination stays clean and downstream sinks
// miss the flow. `Unmarshal` is the matching sibling pattern on
// top-level decoders (e.g. `proto.Unmarshal(buf, &msg)`); the
// method-call form has the bytes carried via the receiver, not arg 0,
// so it lines up with the writeback contract just like `Decode`.
"Decode" | "Unmarshal" => Some(ContainerOp::Writeback { dest_arg: 0 }),
// Pointer-Phase 6 / W5: synthetic callees emitted by CFG
// lowering for Go index_expression reads/writes (`arr[i]`,
// `m[k] = v`).
"__index_get__" => load_indexed(0),
"__index_set__" => store_indexed(1, 0),
_ => None,
}
}
@ -195,9 +230,22 @@ fn classify_php(method: &str) -> Option<ContainerOp> {
fn classify_cpp(method: &str) -> Option<ContainerOp> {
match method {
"push_back" | "emplace_back" | "insert" | "emplace" | "push" => store(0),
"front" | "back" | "pop_back" | "pop_front" | "top" => load(),
// vector.at(index) — index at 0
// Mutating container operations.
// `assign` overwrites the container's contents with the argument
// sequence — modeled as Store so the receiver inherits the argument
// taint, matching the runtime "the values now live inside this
// container" semantics shared with `push_back`/`emplace_back`.
"push_back" | "emplace_back" | "insert" | "emplace" | "push" | "assign" => store(0),
// Map/unordered_map insertion: `m.insert_or_assign(k, v)` — value at 1.
"insert_or_assign" => store_indexed(1, 0),
// Read-only container observers. `find`/`count` return iterators or
// counts that carry the container's value taint when queried with a
// tainted needle; `data` returns a pointer to the underlying buffer
// (its real identity-passthrough behaviour for `c_str`/`data` is
// refined in the labels phase, but Load propagation gives us the
// baseline cap-flow without further plumbing).
"front" | "back" | "pop_back" | "pop_front" | "top" | "find" | "count" | "data" => load(),
// Indexed reads: `vector::at(i)`, `unordered_map::at(k)`.
"at" => load_indexed(0),
_ => None,
}
@ -255,6 +303,40 @@ mod tests {
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
// CVE Hunt Session 2 (Owncast CVE-2023-3188 / CVE-2024-31450 family):
// Go `*.Decode(&dest)` is the canonical streaming-decoder writeback —
// `json.NewDecoder(r.Body).Decode(&dest)`, `xml.NewDecoder(r).Decode(&out)`,
// `gob.NewDecoder(buf).Decode(&v)`. The decoder receiver carries the
// source taint and the destination is arg 0; the writeback rule is the
// only way taint reaches `dest` because `Decode` itself returns only
// `error`. The same-shape `Unmarshal` pattern (`proto.Unmarshal`,
// `tar.Header.Unmarshal`) on a typed receiver follows the same contract.
#[test]
fn go_decode_is_writeback_dest_arg_zero() {
match classify_container_op("decoder.Decode", Lang::Go) {
Some(ContainerOp::Writeback { dest_arg }) => assert_eq!(dest_arg, 0),
other => panic!("expected Writeback {{ dest_arg: 0 }}, got {other:?}"),
}
}
#[test]
fn go_unmarshal_is_writeback_dest_arg_zero() {
match classify_container_op("hdr.Unmarshal", Lang::Go) {
Some(ContainerOp::Writeback { dest_arg }) => assert_eq!(dest_arg, 0),
other => panic!("expected Writeback {{ dest_arg: 0 }}, got {other:?}"),
}
}
#[test]
fn js_decode_is_not_writeback() {
// The Writeback rule is a Go-specific pattern; JS/TS `decode`
// helpers (`Buffer.from(s, 'base64').toString()` etc.) return their
// result and don't have a writeback contract. Make sure we didn't
// accidentally widen the rule into other languages.
assert!(classify_container_op("decoder.Decode", Lang::JavaScript).is_none());
assert!(classify_container_op("decoder.Decode", Lang::Python).is_none());
}
#[test]
fn unknown_method_is_none() {
assert!(classify_container_op("obj.frobnicate", Lang::JavaScript).is_none());
@ -311,4 +393,102 @@ mod tests {
panic!("expected Load");
}
}
// ── C++ Phase 1 additions ──────────────────────────────────────
#[test]
fn cpp_push_back_is_store() {
let op = classify_container_op("v.push_back", Lang::Cpp);
match op {
Some(ContainerOp::Store {
value_args,
index_arg,
}) => {
assert_eq!(value_args.as_slice(), &[0]);
assert_eq!(index_arg, None);
}
_ => panic!("expected Store"),
}
}
#[test]
fn cpp_assign_is_store() {
// vector::assign(args) overwrites the container's contents — the
// receiver inherits argument taint just like push_back.
let op = classify_container_op("v.assign", Lang::Cpp);
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
#[test]
fn cpp_insert_or_assign_indexes_value() {
// map::insert_or_assign(key, value) — value is at arg 1, key at arg 0.
match classify_container_op("m.insert_or_assign", Lang::Cpp) {
Some(ContainerOp::Store {
value_args,
index_arg,
}) => {
assert_eq!(value_args.as_slice(), &[1]);
assert_eq!(index_arg, Some(0));
}
other => panic!("expected indexed Store, got {other:?}"),
}
}
#[test]
fn cpp_find_count_data_are_load() {
for callee in ["m.find", "m.count", "v.data"] {
assert!(
matches!(
classify_container_op(callee, Lang::Cpp),
Some(ContainerOp::Load { .. })
),
"{callee} should be a Load",
);
}
}
#[test]
fn cpp_at_is_indexed_load() {
match classify_container_op("v.at", Lang::Cpp) {
Some(ContainerOp::Load { index_arg }) => assert_eq!(index_arg, Some(0)),
other => panic!("expected indexed Load, got {other:?}"),
}
}
/// W5: synthetic `__index_get__` is recognised as an indexed load
/// in JS/TS, Python, and Go — driving the index_arg=0 path so a
/// constant-key subscript read flows through `HeapSlot::Index(n)`.
#[test]
fn synth_index_get_classified_as_indexed_load_js_py_go() {
for lang in [Lang::JavaScript, Lang::TypeScript, Lang::Python, Lang::Go] {
match classify_container_op("__index_get__", lang) {
Some(ContainerOp::Load { index_arg }) => {
assert_eq!(index_arg, Some(0), "{lang:?} should mark idx arg=0");
}
other => panic!("{lang:?}: expected indexed Load, got {other:?}"),
}
}
}
/// W5: synthetic `__index_set__` is recognised as an indexed store
/// in JS/TS, Python, and Go — value at arg 1, index at arg 0.
#[test]
fn synth_index_set_classified_as_indexed_store_js_py_go() {
for lang in [Lang::JavaScript, Lang::TypeScript, Lang::Python, Lang::Go] {
match classify_container_op("__index_set__", lang) {
Some(ContainerOp::Store {
value_args,
index_arg,
}) => {
assert_eq!(
value_args.as_slice(),
&[1],
"{lang:?} value arg should be 1"
);
assert_eq!(index_arg, Some(0), "{lang:?} index arg should be 0");
}
other => panic!("{lang:?}: expected indexed Store, got {other:?}"),
}
}
}
}

View file

@ -256,6 +256,7 @@ pub fn analyze(
callee,
args,
receiver,
..
} => {
if candidates.contains_key(&inst.value) && is_rust_map_constructor(callee) {
continue;
@ -437,6 +438,8 @@ mod tests {
value_defs: vec![],
cfg_node_map: std::collections::HashMap::new(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let cfg: Cfg = Graph::new();
let const_values = HashMap::new();

View file

@ -1,6 +1,6 @@
#![allow(clippy::if_same_then_else)]
use std::collections::HashMap;
use std::collections::{BTreeMap, HashMap};
use super::const_prop::ConstLattice;
use super::ir::*;
@ -32,6 +32,40 @@ pub enum TypeKind {
/// `label_prefix` — it never participates in label-based callee
/// resolution.
LocalCollection,
/// Phase 6: a framework-injected DTO body whose field types are
/// known. Populated only when a parameter is recognised as a typed
/// extractor by a Phase 1-2 matcher AND the DTO class / struct /
/// Pydantic model is resolvable in the current scan scope.
/// Strictly additive — when no DTO definition is found, callers
/// fall through to today's pre-Phase-6 behaviour.
Dto(DtoFields),
}
/// Phase 6: structural carrier for a recognised DTO type. Maps
/// declared field names to their inferred [`TypeKind`]. Nested DTOs
/// use [`TypeKind::Dto`] recursively.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct DtoFields {
pub class_name: String,
/// Sorted-by-key map for stable iteration / serialisation.
pub fields: BTreeMap<String, TypeKind>,
}
impl DtoFields {
pub fn new(class_name: impl Into<String>) -> Self {
Self {
class_name: class_name.into(),
fields: BTreeMap::new(),
}
}
pub fn insert(&mut self, field: impl Into<String>, kind: TypeKind) {
self.fields.insert(field.into(), kind);
}
pub fn get(&self, field: &str) -> Option<&TypeKind> {
self.fields.get(field)
}
}
impl TypeKind {
@ -47,6 +81,38 @@ impl TypeKind {
_ => None,
}
}
/// Container name used by the typed call-graph devirtualisation
/// (`docs/typed-call-graph-prompt.md`, Phase 2).
///
/// Returns the class / impl / module string under which an SSA
/// receiver value of this type would be looked up in
/// [`crate::callgraph::ClassMethodIndex`]. Mirrors
/// [`Self::label_prefix`] for the security-relevant abstract
/// types (HttpClient → `"HttpClient"`, DatabaseConnection →
/// `"DatabaseConnection"`, etc.) and additionally returns the DTO
/// class name for [`TypeKind::Dto`] receivers.
///
/// Scalar / unknown types return `None` — they have no defining
/// container and would not narrow a method-call edge meaningfully.
pub fn container_name(&self) -> Option<String> {
if let Some(prefix) = self.label_prefix() {
return Some(prefix.to_string());
}
if let Self::Dto(d) = self {
return Some(d.class_name.clone());
}
None
}
/// Phase 6: convenience accessor for the inner `DtoFields` if this
/// type is a recognised DTO.
pub fn as_dto(&self) -> Option<&DtoFields> {
match self {
Self::Dto(d) => Some(d),
_ => None,
}
}
}
/// A type fact about an SSA value.
@ -79,6 +145,13 @@ impl TypeFact {
};
TypeFact { kind, nullable }
}
/// Phase 6: factory used by the field-access propagation rule.
pub(crate) fn from_dto_field(receiver: &TypeKind, field: &str) -> Option<Self> {
let dto = receiver.as_dto()?;
let kind = dto.get(field)?.clone();
Some(Self::from_kind(kind))
}
}
/// Result of type fact analysis.
@ -107,32 +180,41 @@ impl TypeFactResult {
}
}
/// Check whether the given sink-operand SSA values are all int-typed for the
/// sink's capability set. Returns `false` when `sink_caps` carries no
/// type-suppressible bits, when `values` is empty, or when any value is not
/// known to be `TypeKind::Int`. Shared by the SSA taint engine and the
/// structural `cfg-unguarded-sink` analysis so both agree on when a sink's
/// arguments are provably non-injectable.
/// Check whether the given sink-operand SSA values are all type-safe for
/// the sink's capability set. Returns `false` when `sink_caps` carries
/// no type-suppressible bits, when `values` is empty, or when any value
/// is not known to be a payload-incompatible scalar type. Shared by
/// the SSA taint engine and the structural `cfg-unguarded-sink`
/// analysis so both agree on when a sink's arguments are provably
/// non-injectable.
///
/// Suppression policy:
/// * [`TypeKind::Int`] (and float, treated as numeric): suppresses
/// `SQL_QUERY`, `FILE_IO`, `SHELL_ESCAPE`, `HTML_ESCAPE`, `SSRF` —
/// numeric values cannot carry the metacharacters required to drive
/// any of these injection classes.
/// * [`TypeKind::Bool`]: suppresses every type-suppressible bit —
/// `true`/`false` cannot carry a payload of any kind.
pub fn is_type_safe_for_sink(
values: &[SsaValue],
sink_caps: crate::labels::Cap,
type_facts: &TypeFactResult,
) -> bool {
use crate::labels::Cap;
// Int-typed values cannot carry injection payloads for these caps:
// SQL_QUERY — digits can't form meta SQL tokens
// FILE_IO — digits can't form path traversal sequences
// SHELL_ESCAPE — digits can't form shell metacharacters
// HTML_ESCAPE — digits can't form HTML metachars (<, >, ", ', &, /, :)
// in either text or attribute context
let type_suppressible = Cap::SQL_QUERY | Cap::FILE_IO | Cap::SHELL_ESCAPE | Cap::HTML_ESCAPE;
let type_suppressible =
Cap::SQL_QUERY | Cap::FILE_IO | Cap::SHELL_ESCAPE | Cap::HTML_ESCAPE | Cap::SSRF;
if !sink_caps.intersects(type_suppressible) {
return false;
}
if values.is_empty() {
return false;
}
values.iter().all(|v| type_facts.is_int(*v))
values.iter().all(|v| {
let Some(kind) = type_facts.get_type(*v) else {
return false;
};
matches!(kind, TypeKind::Int | TypeKind::Bool)
})
}
/// Infer a type from a constructor, factory, or allocator call.
@ -393,6 +475,21 @@ pub fn analyze_types(
cfg: &Cfg,
consts: &HashMap<SsaValue, ConstLattice>,
lang: Option<Lang>,
) -> TypeFactResult {
analyze_types_with_param_types(body, cfg, consts, lang, &[])
}
/// Same as [`analyze_types`] but seeds [`SsaOp::Param`] values with
/// per-position [`TypeKind`] facts from `param_types` (parallel-vec to
/// the function's BodyMeta.params). An entry of `None` (or an out-of-
/// range index) leaves the value at the default Param fact (Unknown),
/// preserving the pre-Phase-3 behaviour.
pub fn analyze_types_with_param_types(
body: &SsaBody,
cfg: &Cfg,
consts: &HashMap<SsaValue, ConstLattice>,
lang: Option<Lang>,
param_types: &[Option<TypeKind>],
) -> TypeFactResult {
let mut facts: HashMap<SsaValue, TypeFact> = HashMap::new();
@ -424,7 +521,16 @@ pub fn analyze_types(
}
}
SsaOp::Source => TypeFact::from_kind(TypeKind::String),
SsaOp::Param { .. } => TypeFact::unknown(),
SsaOp::Param { index } => {
// Seed from the function's BodyMeta.param_types when
// a TypeKind was recovered at CFG construction time.
// Out-of-range / None entries fall back to Unknown,
// matching the pre-Phase-3 behaviour.
match param_types.get(*index).and_then(|t| t.clone()) {
Some(tk) => TypeFact::from_kind(tk),
None => TypeFact::unknown(),
}
}
SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object),
SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object),
SsaOp::Call { callee, .. } => {
@ -473,6 +579,14 @@ pub fn analyze_types(
// Defer: will be filled in second pass
TypeFact::unknown()
}
// FieldProj: when the projection carries an inferred type
// (set during lowering or by future field-type analysis),
// honour it; otherwise the field type is unknown until a
// points-to / heap query resolves it.
SsaOp::FieldProj { projected_type, .. } => match projected_type {
Some(tk) => TypeFact::from_kind(tk.clone()),
None => TypeFact::unknown(),
},
// Undef contributes no type information — phi joins
// pick up the type from the other (defined) operand.
SsaOp::Undef => TypeFact::unknown(),
@ -530,6 +644,38 @@ pub fn analyze_types(
}
}
// Phase 6.3: FieldProj receiver-driven type narrowing. When
// SSA lowering decomposed `a.b.c()` into a FieldProj chain,
// intermediate FieldProj insts default to `projected_type =
// None`. If the receiver value carries a Dto fact and the
// projected field name is in its `fields` map, route the
// FieldProj's type fact to the field's declared TypeKind.
for inst in &block.body {
let SsaOp::FieldProj {
receiver,
field,
projected_type,
} = &inst.op
else {
continue;
};
// If the lowering already pinned a type, keep it.
if projected_type.is_some() {
continue;
}
let Some(recv_fact) = facts.get(receiver).cloned() else {
continue;
};
let field_name = body.field_name(*field).to_string();
let Some(new_fact) = TypeFact::from_dto_field(&recv_fact.kind, &field_name) else {
continue;
};
if facts.get(&inst.value) != Some(&new_fact) {
facts.insert(inst.value, new_fact);
changed = true;
}
}
// Phi nodes
for inst in &block.phis {
if let SsaOp::Phi(operands) = &inst.op {
@ -566,13 +712,29 @@ pub fn analyze_types(
}
if let SsaOp::Assign(uses) = &inst.op {
if uses.len() == 1 {
let src_fact = facts
.get(&uses[0])
.cloned()
.unwrap_or_else(TypeFact::unknown);
// Phase 6.3: when the RHS is a single member-access
// expression and the receiver value carries a
// `TypeKind::Dto(fields)` fact, route the assignment's
// type to the field's declared `TypeKind`. Strictly
// additive — falls through to copy-prop when the
// receiver isn't a DTO or the field isn't recorded.
let dto_field_fact = cfg
.node_weight(inst.cfg_node)
.and_then(|ni| ni.member_field.as_deref())
.and_then(|field| {
let recv_kind = facts.get(&uses[0])?.kind.clone();
TypeFact::from_dto_field(&recv_kind, field)
});
let new_fact = match dto_field_fact {
Some(f) => f,
None => facts
.get(&uses[0])
.cloned()
.unwrap_or_else(TypeFact::unknown),
};
let old = facts.get(&inst.value);
if old != Some(&src_fact) {
facts.insert(inst.value, src_fact);
if old != Some(&new_fact) {
facts.insert(inst.value, new_fact);
changed = true;
}
} else if uses.len() == 2 {
@ -840,6 +1002,8 @@ mod tests {
.into_iter()
.collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let consts = HashMap::from([
@ -911,6 +1075,7 @@ mod tests {
value: SsaValue(0),
op: SsaOp::Call {
callee: "URL".into(),
callee_text: None,
args: vec![],
receiver: None,
},
@ -922,6 +1087,7 @@ mod tests {
value: SsaValue(1),
op: SsaOp::Call {
callee: "HttpClient.newHttpClient".into(),
callee_text: None,
args: vec![],
receiver: None,
},
@ -949,6 +1115,8 @@ mod tests {
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let consts = HashMap::new();
@ -979,6 +1147,291 @@ mod tests {
assert_eq!(result.get_type(SsaValue(99)), None);
}
/// Phase 4: Int-typed values must suppress every type-suppressible
/// cap — including the freshly-added `SSRF` bit. Numeric IDs
/// cannot rewrite a URL host, cannot form path traversal sequences,
/// cannot carry SQL/HTML/shell metacharacters.
#[test]
fn int_suppresses_every_type_suppressible_cap() {
use crate::labels::Cap;
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
let result = TypeFactResult { facts };
for cap in [
Cap::SQL_QUERY,
Cap::FILE_IO,
Cap::SHELL_ESCAPE,
Cap::HTML_ESCAPE,
Cap::SSRF,
] {
assert!(
is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
"Int must suppress {cap:?}",
);
}
// Caps outside the type-suppressible set never qualify.
assert!(!is_type_safe_for_sink(
&[SsaValue(0)],
Cap::CODE_EXEC,
&result
));
assert!(!is_type_safe_for_sink(
&[SsaValue(0)],
Cap::DESERIALIZE,
&result
));
}
/// Phase 4: Bool-typed values are even safer than ints — `true` /
/// `false` cannot carry any payload and must suppress every
/// type-suppressible cap.
#[test]
fn bool_suppresses_every_type_suppressible_cap() {
use crate::labels::Cap;
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Bool));
let result = TypeFactResult { facts };
for cap in [
Cap::SQL_QUERY,
Cap::FILE_IO,
Cap::SHELL_ESCAPE,
Cap::HTML_ESCAPE,
Cap::SSRF,
] {
assert!(
is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
"Bool must suppress {cap:?}",
);
}
}
/// String-typed values must NOT trigger suppression — they are the
/// canonical injection carrier. Regression guard so a future
/// change to `is_type_safe_for_sink` does not silently silence
/// real String-payload findings.
#[test]
fn string_does_not_trigger_sink_suppression() {
use crate::labels::Cap;
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::String));
let result = TypeFactResult { facts };
assert!(!is_type_safe_for_sink(
&[SsaValue(0)],
Cap::SQL_QUERY,
&result
));
assert!(!is_type_safe_for_sink(&[SsaValue(0)], Cap::SSRF, &result));
assert!(!is_type_safe_for_sink(
&[SsaValue(0)],
Cap::SHELL_ESCAPE,
&result
));
}
/// Audit A3: The full `(TypeKind, Cap)` suppression matrix. Encoded
/// as a single table-driven test so any future change to
/// `is_type_safe_for_sink` requires an intentional matrix edit + a
/// test update. Truth values:
///
/// | TypeKind | SQL | FILE | SHELL | HTML | SSRF | CODE_EXEC | DESERIALIZE |
/// |-----------|-----|------|-------|------|------|-----------|-------------|
/// | Int | Y | Y | Y | Y | Y | N | N |
/// | Bool | Y | Y | Y | Y | Y | N | N |
/// | String | N | N | N | N | N | N | N |
/// | Url | N | N | N | N | N | N | N |
/// | Object | N | N | N | N | N | N | N |
/// | Unknown | N | N | N | N | N | N | N |
#[test]
fn type_kind_cap_suppression_matrix() {
use crate::labels::Cap;
let caps = [
("SQL_QUERY", Cap::SQL_QUERY),
("FILE_IO", Cap::FILE_IO),
("SHELL_ESCAPE", Cap::SHELL_ESCAPE),
("HTML_ESCAPE", Cap::HTML_ESCAPE),
("SSRF", Cap::SSRF),
("CODE_EXEC", Cap::CODE_EXEC),
("DESERIALIZE", Cap::DESERIALIZE),
];
// (kind_name, kind, [suppress for each cap in `caps` order])
let rows: &[(&str, TypeKind, [bool; 7])] = &[
(
"Int",
TypeKind::Int,
[true, true, true, true, true, false, false],
),
(
"Bool",
TypeKind::Bool,
[true, true, true, true, true, false, false],
),
(
"String",
TypeKind::String,
[false, false, false, false, false, false, false],
),
(
"Url",
TypeKind::Url,
[false, false, false, false, false, false, false],
),
(
"Object",
TypeKind::Object,
[false, false, false, false, false, false, false],
),
(
"Unknown",
TypeKind::Unknown,
[false, false, false, false, false, false, false],
),
];
for (kind_name, kind, expected) in rows {
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(kind.clone()));
let result = TypeFactResult { facts };
for (i, (cap_name, cap)) in caps.iter().enumerate() {
let got = is_type_safe_for_sink(&[SsaValue(0)], *cap, &result);
assert_eq!(
got, expected[i],
"matrix mismatch for ({kind_name}, {cap_name}): expected {}, got {got}",
expected[i]
);
}
}
}
/// Audit A3 (companion): empty `values` slice never suppresses,
/// regardless of cap or per-value type facts.
#[test]
fn empty_values_never_suppress() {
use crate::labels::Cap;
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
let result = TypeFactResult { facts };
for cap in [
Cap::SQL_QUERY,
Cap::FILE_IO,
Cap::SHELL_ESCAPE,
Cap::HTML_ESCAPE,
Cap::SSRF,
Cap::CODE_EXEC,
Cap::DESERIALIZE,
] {
assert!(
!is_type_safe_for_sink(&[], cap, &result),
"empty values must never suppress {cap:?}",
);
}
}
/// Audit A3 (companion): a Cap with NO type-suppressible bits never
/// suppresses, even when the value's type kind is otherwise
/// suppression-eligible.
#[test]
fn caps_without_type_suppressible_bits_never_fire() {
use crate::labels::Cap;
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
let result = TypeFactResult { facts };
for cap in [
Cap::CODE_EXEC,
Cap::DESERIALIZE,
Cap::CRYPTO,
Cap::URL_ENCODE,
] {
assert!(
!is_type_safe_for_sink(&[SsaValue(0)], cap, &result),
"Int must NOT suppress non-type-suppressible {cap:?}",
);
}
}
/// Audit A3 (companion): mixed-type operand list — only one Int
/// among operands of unknown type — must NOT suppress. The
/// suppression rule requires every operand to be payload-incompatible.
#[test]
fn mixed_type_operands_do_not_suppress() {
use crate::labels::Cap;
let mut facts = HashMap::new();
facts.insert(SsaValue(0), TypeFact::from_kind(TypeKind::Int));
facts.insert(SsaValue(1), TypeFact::from_kind(TypeKind::String));
let result = TypeFactResult { facts };
assert!(!is_type_safe_for_sink(
&[SsaValue(0), SsaValue(1)],
Cap::SQL_QUERY,
&result
));
}
/// Phase 3: Param values seeded from `param_types` must surface
/// the right TypeKind for downstream sink suppression. An out-of-
/// range index falls back to Unknown (the pre-Phase-3 default).
#[test]
fn param_types_seed_param_value_facts() {
use crate::cfg::Cfg;
let n0 = NodeIndex::new(0);
let n1 = NodeIndex::new(1);
let body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Param { index: 0 },
cfg_node: n0,
var_name: Some("user_id".into()),
span: (0, 7),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Param { index: 99 },
cfg_node: n1,
var_name: Some("oob".into()),
span: (8, 11),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("user_id".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("oob".into()),
cfg_node: n1,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
field_interner: crate::ssa::ir::FieldInterner::default(),
field_writes: std::collections::HashMap::new(),
};
let consts = HashMap::new();
let cfg: Cfg = petgraph::Graph::new();
let param_types = vec![Some(TypeKind::Int)];
let result =
analyze_types_with_param_types(&body, &cfg, &consts, Some(Lang::Java), &param_types);
assert_eq!(result.get_type(SsaValue(0)), Some(&TypeKind::Int));
// Index 99 is out of range → falls back to Unknown.
assert_eq!(result.get_type(SsaValue(1)), Some(&TypeKind::Unknown));
// Empty slice = pre-Phase-3 behaviour.
let result2 = analyze_types(&body, &cfg, &consts, Some(Lang::Java));
assert_eq!(result2.get_type(SsaValue(0)), Some(&TypeKind::Unknown));
}
// ── TypeHierarchy::is_subtype_of ─────────────────────────────────────
#[test]
@ -1484,4 +1937,90 @@ mod tests {
Some(TypeKind::HttpClient)
);
}
// ── Phase 6 DTO field-level taint ─────────────────────────────────────
/// Phase 6: `TypeFact::from_dto_field` returns `Some(field_kind)`
/// for a DTO receiver whose `fields` map contains the requested
/// field, and `None` otherwise.
#[test]
fn dto_field_lookup_returns_field_type_kind() {
let mut dto = DtoFields::new("CreateUser");
dto.insert("age", TypeKind::Int);
dto.insert("email", TypeKind::String);
let recv = TypeKind::Dto(dto);
let age = TypeFact::from_dto_field(&recv, "age").expect("age field present");
assert_eq!(age.kind, TypeKind::Int);
let email = TypeFact::from_dto_field(&recv, "email").expect("email field present");
assert_eq!(email.kind, TypeKind::String);
assert!(TypeFact::from_dto_field(&recv, "missing").is_none());
}
/// Phase 6: a non-DTO receiver kind never produces a field fact —
/// `from_dto_field` falls through to the legacy copy-prop path.
#[test]
fn dto_field_lookup_on_non_dto_returns_none() {
for k in [
TypeKind::Int,
TypeKind::String,
TypeKind::Object,
TypeKind::Unknown,
TypeKind::HttpClient,
] {
assert!(
TypeFact::from_dto_field(&k, "any_field").is_none(),
"non-DTO {k:?} must not produce a field fact",
);
}
}
/// Phase 6: nested DTO — the parent DTO's field type is
/// `TypeKind::Dto`, and `from_dto_field` returns that nested DTO
/// fact directly. Phase 6.3 callers can recurse into the inner
/// fields by following the returned receiver's `as_dto()` chain.
#[test]
fn dto_field_lookup_supports_nested_dto() {
let mut inner = DtoFields::new("Address");
inner.insert("zip", TypeKind::String);
let mut outer = DtoFields::new("CreateUser");
outer.insert("address", TypeKind::Dto(inner.clone()));
outer.insert("age", TypeKind::Int);
let recv = TypeKind::Dto(outer);
let addr = TypeFact::from_dto_field(&recv, "address").expect("address present");
assert_eq!(addr.kind, TypeKind::Dto(inner));
}
/// Phase 6: an empty DTO (class declared but with no inferred
/// fields) never resolves field reads. Documents the safe-fallback
/// invariant so the legacy path runs when class fields couldn't be
/// classified.
#[test]
fn empty_dto_never_resolves_fields() {
let recv = TypeKind::Dto(DtoFields::new("EmptyDto"));
assert!(TypeFact::from_dto_field(&recv, "anything").is_none());
}
/// Phase 6: an `Int`-typed field in a DTO survives the
/// type-suppression matrix exactly the same way a freestanding
/// `Int` does — sanity-check the bridge between Phase 6 and Phase 4.
#[test]
fn dto_int_field_suppresses_sql_query_via_matrix() {
use crate::labels::Cap;
let mut dto = DtoFields::new("CreateUser");
dto.insert("age", TypeKind::Int);
let field = TypeFact::from_dto_field(&TypeKind::Dto(dto), "age").unwrap();
let mut facts = HashMap::new();
facts.insert(SsaValue(0), field);
let result = TypeFactResult { facts };
assert!(is_type_safe_for_sink(
&[SsaValue(0)],
Cap::SQL_QUERY,
&result
));
assert!(!is_type_safe_for_sink(
&[SsaValue(0)],
Cap::CODE_EXEC,
&result
));
}
}