Added Cap::DATA_EXFIL and taint fp and fn fixes on real repos (#59)

* feat: Enhance data exfiltration detection with source sensitivity gating for cookies and headers * feat: Implement cross-file data exfiltration detection with parameter-specific gate filters * feat: Add calibration tests and refine DATA_EXFIL severity scoring logic * feat: Introduce per-detector configuration for data exfiltration suppression * feat: Enhance DATA_EXFIL findings with destination field tracking in diagnostics and SARIF output * feat: Add tainted body and URL handling for data exfiltration detection * feat: Add integration tests and fixtures for DATA_EXFIL and SSRF detection in Go * feat: Add Java integration tests and fixtures for DATA_EXFIL detection across multiple HTTP clients * feat: Add synthetic externals handling for closure-captured variables in SSA * feat: Implement closure-based suppression for resource leak findings * feat: Add regression guards for shell-injection and taint propagation in for-of destructure patterns * feat: Implement constructor cap narrowing for data exfiltration detection in HTTP request builders * feat: Add gated sinks for data exfiltration detection in C and C++ using curl_easy_setopt * feat: Implement DATA_EXFIL cap parity for backwards analysis and add integration tests * feat: Add data exfiltration sinks for various languages and enhance documentation * refactor: Simplify formatting and improve readability in various files * refactor: Improve readability by simplifying conditional statements and adding clippy linting * docs: Update CHANGELOG and comments for data exfiltration features and configuration * docs: Clarify configuration instructions for data exfiltration trusted destinations * docs: Enhance comments for evidence routing logic in data exfiltration
2026-06-21 20:18:06 +02:00 · 2026-05-01 10:59:52 -04:00 · 2026-05-01 10:59:52 -04:00 · 58f1794a4e
commit 58f1794a4e
parent a438886217
189 changed files with 8421 additions and 383 deletions
--- a/docs/detectors.md
+++ b/docs/detectors.md
@ -49,11 +49,13 @@ score = severity_base + analysis_kind + evidence_strength + state_bonus - valida
 | Component | Values |
 |---|---|
 | Severity base | High=60, Medium=30, Low=10 |
-| Analysis kind | taint=+10, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
+| Analysis kind | taint=+10, taint-data-exfiltration=+7, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
 | Evidence strength | +1 per evidence item up to 4; +2 to +6 for source kind |
 | State bonus | use-after-close / unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 |
 | Validation penalty | -5 if path-validated |

+DATA_EXFIL is calibrated below other taint classes by design. Severity is High only when the source carries credential / session material (cookies, env vars); other Sensitive sources (request headers, file system, database, caught exception) downgrade to Medium. Confidence is capped at Medium and only fires Medium when the abstract / symbolic domain corroborates a concrete string body reaching the outbound payload; otherwise it falls to Low. A guarded flow (`path_validated`) drops a confidence tier. The intent is to seat data-exfiltration findings below SSRF / SQLi / command-injection but above informational AST patterns.
+
 Source-kind contributions (taint only):

 | Source | Bonus |
@ -71,7 +73,9 @@ Approximate score ranges:
 | High taint with user input | 76 to 81 |
 | High state (use-after-close) | ~74 |
 | High CFG structural | 63 to 68 |
+| High DATA_EXFIL (cookie / env source, body confirmed) | ~76 |
 | Medium taint with env source | 45 to 50 |
+| Medium DATA_EXFIL (header / fs / db / caught-exception source) | 40 to 45 |
 | Medium state (resource leak) | ~40 |
 | Low AST-only pattern | ~10 |