Added Cap::DATA_EXFIL and taint fp and fn fixes on real repos (#59)

* feat: Enhance data exfiltration detection with source sensitivity gating for cookies and headers

* feat: Implement cross-file data exfiltration detection with parameter-specific gate filters

* feat: Add calibration tests and refine DATA_EXFIL severity scoring logic

* feat: Introduce per-detector configuration for data exfiltration suppression

* feat: Enhance DATA_EXFIL findings with destination field tracking in diagnostics and SARIF output

* feat: Add tainted body and URL handling for data exfiltration detection

* feat: Add integration tests and fixtures for DATA_EXFIL and SSRF detection in Go

* feat: Add Java integration tests and fixtures for DATA_EXFIL detection across multiple HTTP clients

* feat: Add synthetic externals handling for closure-captured variables in SSA

* feat: Implement closure-based suppression for resource leak findings

* feat: Add regression guards for shell-injection and taint propagation in for-of destructure patterns

* feat: Implement constructor cap narrowing for data exfiltration detection in HTTP request builders

* feat: Add gated sinks for data exfiltration detection in C and C++ using curl_easy_setopt

* feat: Implement DATA_EXFIL cap parity for backwards analysis and add integration tests

* feat: Add data exfiltration sinks for various languages and enhance documentation

* refactor: Simplify formatting and improve readability in various files

* refactor: Improve readability by simplifying conditional statements and adding clippy linting

* docs: Update CHANGELOG and comments for data exfiltration features and configuration

* docs: Clarify configuration instructions for data exfiltration trusted destinations

* docs: Enhance comments for evidence routing logic in data exfiltration
This commit is contained in:
Eli Peter 2026-05-01 10:59:52 -04:00 committed by GitHub
parent a438886217
commit 58f1794a4e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
189 changed files with 8421 additions and 383 deletions

View file

@ -4,7 +4,7 @@ use crate::ssa::type_facts::TypeKind;
use petgraph::graph::NodeIndex;
use serde::{Deserialize, Serialize};
use smallvec::SmallVec;
use std::collections::HashMap;
use std::collections::{HashMap, HashSet};
/// Unique identifier for an SSA value (one per definition point).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
@ -353,6 +353,26 @@ pub struct SsaBody {
/// cleanly with an empty map (no migration needed).
#[serde(default)]
pub field_writes: HashMap<SsaValue, (SsaValue, FieldId)>,
/// SSA values that lowering injected for **free / closure-captured**
/// variables (variables referenced by the body but not declared as
/// formal parameters and not assigned within the body).
///
/// Lowering models every external use as an [`SsaOp::Param`] in block
/// 0 so the rename pass can reference it. Real formal parameters and
/// closure captures end up using the same op variant; this side-table
/// distinguishes the two so downstream analyses (in particular the
/// JS/TS handler-name auto-seed in
/// [`crate::taint::ssa_transfer`]) can avoid treating closure
/// captures as if they were the function's own parameters. Without
/// this distinction, a lambda body that references an out-of-scope
/// `userId` / `cmd` / `payload` would have the synthetic Param
/// auto-seeded as `UserInput`, producing a phantom source on the
/// enclosing function's declaration line.
///
/// `#[serde(default)]` for backward compatibility with summary blobs
/// produced before this field existed.
#[serde(default)]
pub synthetic_externals: HashSet<SsaValue>,
}
impl SsaBody {
@ -560,6 +580,7 @@ mod tests {
exception_edges: vec![],
field_interner: FieldInterner::new(),
field_writes: HashMap::new(),
synthetic_externals: HashSet::new(),
};
let fid = body.intern_field("mu");
body.blocks[0].body.push(SsaInst {