Added Cap::DATA_EXFIL and taint fp and fn fixes on real repos (#59)

* feat: Enhance data exfiltration detection with source sensitivity gating for cookies and headers * feat: Implement cross-file data exfiltration detection with parameter-specific gate filters * feat: Add calibration tests and refine DATA_EXFIL severity scoring logic * feat: Introduce per-detector configuration for data exfiltration suppression * feat: Enhance DATA_EXFIL findings with destination field tracking in diagnostics and SARIF output * feat: Add tainted body and URL handling for data exfiltration detection * feat: Add integration tests and fixtures for DATA_EXFIL and SSRF detection in Go * feat: Add Java integration tests and fixtures for DATA_EXFIL detection across multiple HTTP clients * feat: Add synthetic externals handling for closure-captured variables in SSA * feat: Implement closure-based suppression for resource leak findings * feat: Add regression guards for shell-injection and taint propagation in for-of destructure patterns * feat: Implement constructor cap narrowing for data exfiltration detection in HTTP request builders * feat: Add gated sinks for data exfiltration detection in C and C++ using curl_easy_setopt * feat: Implement DATA_EXFIL cap parity for backwards analysis and add integration tests * feat: Add data exfiltration sinks for various languages and enhance documentation * refactor: Simplify formatting and improve readability in various files * refactor: Improve readability by simplifying conditional statements and adding clippy linting * docs: Update CHANGELOG and comments for data exfiltration features and configuration * docs: Clarify configuration instructions for data exfiltration trusted destinations * docs: Enhance comments for evidence routing logic in data exfiltration
2026-06-15 20:05:13 +02:00 · 2026-05-01 10:59:52 -04:00 · 2026-05-01 10:59:52 -04:00 · 58f1794a4e
commit 58f1794a4e
parent a438886217
189 changed files with 8421 additions and 383 deletions
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -213,6 +213,26 @@ CLI flag map (each pair is `--enable / --no-enable`):

 **Explain effective engine**: pass `--explain-engine` to print the resolved engine configuration (profile + config + CLI overrides) and exit without scanning.

+### `[detectors.data_exfil]`
+
+Per-project tuning for the `taint-data-exfiltration` rule. All fields are optional.
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled` | bool | `true` | Set `false` to strip `Cap::DATA_EXFIL` from sink caps before emission. No `taint-data-exfiltration` finding reaches the report. Other taint classes are not affected. |
+| `trusted_destinations` | [string] | `[]` | URL prefixes that drop `Cap::DATA_EXFIL` on the call site. Matched against the abstract-string domain prefix of the destination arg, so a literal URL or a template literal with a static prefix both work. Use full origins or origin-pinned paths and include the trailing `/`, otherwise `https://api.` matches `https://api.evil.example.com/` too. |
+
+```toml
+[detectors.data_exfil]
+enabled = true
+trusted_destinations = [
+  "https://api.internal/",
+  "https://telemetry.example.com/",
+]
+```
+
+For the sanitizer convention, source sensitivity gate, and per-language sink coverage, see [Detectors / Taint / DATA_EXFIL](detectors/taint.md#data_exfil-suppression-layers).
+
 ### `[analysis.languages.<slug>]`

 Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby`.
@ -232,7 +252,8 @@ kind = "sanitizer"        # "source" | "sanitizer" | "sink"
 cap = "html_escape"       # "env_var" | "html_escape" | "shell_escape" |
                          # "url_encode" | "json_parse" | "file_io" |
                          # "fmt_string" | "sql_query" | "deserialize" |
-                          # "ssrf" | "code_exec" | "crypto" | "all"
+                          # "ssrf" | "data_exfil" | "code_exec" | "crypto" |
+                          # "unauthorized_id" | "all"
 ```

 ---