mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-06 19:35:13 +02:00
Added Cap::DATA_EXFIL and taint fp and fn fixes on real repos (#59)
* feat: Enhance data exfiltration detection with source sensitivity gating for cookies and headers * feat: Implement cross-file data exfiltration detection with parameter-specific gate filters * feat: Add calibration tests and refine DATA_EXFIL severity scoring logic * feat: Introduce per-detector configuration for data exfiltration suppression * feat: Enhance DATA_EXFIL findings with destination field tracking in diagnostics and SARIF output * feat: Add tainted body and URL handling for data exfiltration detection * feat: Add integration tests and fixtures for DATA_EXFIL and SSRF detection in Go * feat: Add Java integration tests and fixtures for DATA_EXFIL detection across multiple HTTP clients * feat: Add synthetic externals handling for closure-captured variables in SSA * feat: Implement closure-based suppression for resource leak findings * feat: Add regression guards for shell-injection and taint propagation in for-of destructure patterns * feat: Implement constructor cap narrowing for data exfiltration detection in HTTP request builders * feat: Add gated sinks for data exfiltration detection in C and C++ using curl_easy_setopt * feat: Implement DATA_EXFIL cap parity for backwards analysis and add integration tests * feat: Add data exfiltration sinks for various languages and enhance documentation * refactor: Simplify formatting and improve readability in various files * refactor: Improve readability by simplifying conditional statements and adding clippy linting * docs: Update CHANGELOG and comments for data exfiltration features and configuration * docs: Clarify configuration instructions for data exfiltration trusted destinations * docs: Enhance comments for evidence routing logic in data exfiltration
This commit is contained in:
parent
a438886217
commit
58f1794a4e
189 changed files with 8421 additions and 383 deletions
|
|
@ -245,6 +245,19 @@ cross-function body expansion. See `DEFAULT_BACKWARDS_DEPTH`,
|
|||
`BACKWARDS_VALUE_BUDGET`, and `MAX_BACKWARDS_CALLEE_BLOCKS` in
|
||||
`src/taint/backwards.rs` for the exact bounds.
|
||||
|
||||
**Cap parity.** The walk treats `DemandState.caps` as opaque bitflags,
|
||||
every cap defined in `src/labels/mod.rs` round-trips identically through
|
||||
the demand transfer. Including `Cap::DATA_EXFIL` (bit 13): a
|
||||
`taint-data-exfiltration` forward finding receives `backwards-confirmed`
|
||||
exactly like a `taint-unsanitised-flow` SQL/CMD/SSRF finding when its
|
||||
demand walk reaches a Sensitive source. The cap-routing logic in
|
||||
`src/ast.rs` then surfaces the rule id correctly regardless of which
|
||||
direction confirmed the flow. See
|
||||
`tests/backwards_analysis_tests.rs::demand_driven_suite` (the
|
||||
`data_exfil` sub-case) and
|
||||
`taint::backwards::tests::driver_walks_data_exfil_source_to_sink` for
|
||||
the regression guards.
|
||||
|
||||
**Source**: [`src/taint/backwards.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/backwards.rs).
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -213,6 +213,26 @@ CLI flag map (each pair is `--enable / --no-enable`):
|
|||
|
||||
**Explain effective engine**: pass `--explain-engine` to print the resolved engine configuration (profile + config + CLI overrides) and exit without scanning.
|
||||
|
||||
### `[detectors.data_exfil]`
|
||||
|
||||
Per-project tuning for the `taint-data-exfiltration` rule. All fields are optional.
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `enabled` | bool | `true` | Set `false` to strip `Cap::DATA_EXFIL` from sink caps before emission. No `taint-data-exfiltration` finding reaches the report. Other taint classes are not affected. |
|
||||
| `trusted_destinations` | [string] | `[]` | URL prefixes that drop `Cap::DATA_EXFIL` on the call site. Matched against the abstract-string domain prefix of the destination arg, so a literal URL or a template literal with a static prefix both work. Use full origins or origin-pinned paths and include the trailing `/`, otherwise `https://api.` matches `https://api.evil.example.com/` too. |
|
||||
|
||||
```toml
|
||||
[detectors.data_exfil]
|
||||
enabled = true
|
||||
trusted_destinations = [
|
||||
"https://api.internal/",
|
||||
"https://telemetry.example.com/",
|
||||
]
|
||||
```
|
||||
|
||||
For the sanitizer convention, source sensitivity gate, and per-language sink coverage, see [Detectors / Taint / DATA_EXFIL](detectors/taint.md#data_exfil-suppression-layers).
|
||||
|
||||
### `[analysis.languages.<slug>]`
|
||||
|
||||
Per-language custom rules. `<slug>` is one of: `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby`.
|
||||
|
|
@ -232,7 +252,8 @@ kind = "sanitizer" # "source" | "sanitizer" | "sink"
|
|||
cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
|
||||
# "url_encode" | "json_parse" | "file_io" |
|
||||
# "fmt_string" | "sql_query" | "deserialize" |
|
||||
# "ssrf" | "code_exec" | "crypto" | "all"
|
||||
# "ssrf" | "data_exfil" | "code_exec" | "crypto" |
|
||||
# "unauthorized_id" | "all"
|
||||
```
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -49,11 +49,13 @@ score = severity_base + analysis_kind + evidence_strength + state_bonus - valida
|
|||
| Component | Values |
|
||||
|---|---|
|
||||
| Severity base | High=60, Medium=30, Low=10 |
|
||||
| Analysis kind | taint=+10, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
|
||||
| Analysis kind | taint=+10, taint-data-exfiltration=+7, state=+8, cfg with evidence=+5, cfg without evidence=+3, ast=+0 |
|
||||
| Evidence strength | +1 per evidence item up to 4; +2 to +6 for source kind |
|
||||
| State bonus | use-after-close / unauthed=+6, double-close=+3, must-leak=+2, may-leak=+1 |
|
||||
| Validation penalty | -5 if path-validated |
|
||||
|
||||
DATA_EXFIL is calibrated below other taint classes by design. Severity is High only when the source carries credential / session material (cookies, env vars); other Sensitive sources (request headers, file system, database, caught exception) downgrade to Medium. Confidence is capped at Medium and only fires Medium when the abstract / symbolic domain corroborates a concrete string body reaching the outbound payload; otherwise it falls to Low. A guarded flow (`path_validated`) drops a confidence tier. The intent is to seat data-exfiltration findings below SSRF / SQLi / command-injection but above informational AST patterns.
|
||||
|
||||
Source-kind contributions (taint only):
|
||||
|
||||
| Source | Bonus |
|
||||
|
|
@ -71,7 +73,9 @@ Approximate score ranges:
|
|||
| High taint with user input | 76 to 81 |
|
||||
| High state (use-after-close) | ~74 |
|
||||
| High CFG structural | 63 to 68 |
|
||||
| High DATA_EXFIL (cookie / env source, body confirmed) | ~76 |
|
||||
| Medium taint with env source | 45 to 50 |
|
||||
| Medium DATA_EXFIL (header / fs / db / caught-exception source) | 40 to 45 |
|
||||
| Medium state (resource leak) | ~40 |
|
||||
| Low AST-only pattern | ~10 |
|
||||
|
||||
|
|
|
|||
|
|
@ -135,10 +135,130 @@ Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer onl
|
|||
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
|
||||
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
|
||||
| `ssrf` | | URL-prefix locks | `requests.get`, `fetch` URL arg, outbound HTTP destination |
|
||||
| `data_exfil` | | | `fetch` body / headers / json, `XMLHttpRequest.send` body |
|
||||
| `data_exfil` | cookies, headers, env, db rows, file reads (Sensitive-tier sources only) | | `fetch` body / headers / json, `XMLHttpRequest.send` body |
|
||||
| `code_exec` | | | `eval`, `exec`, `Function` |
|
||||
| `crypto` | | | weak-algorithm constructors |
|
||||
| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
|
||||
| `all` | Sources typically use `all` so they match any sink | | |
|
||||
|
||||
Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.
|
||||
|
||||
## Source sensitivity
|
||||
|
||||
Some detector classes need to know not just *that* a value is attacker-influenced but *what kind* of value it is. Each source carries a `SourceKind` (`UserInput`, `Cookie`, `Header`, `EnvironmentConfig`, `FileSystem`, `Database`, `CaughtException`, `Unknown`) and a derived sensitivity tier:
|
||||
|
||||
| Tier | Source kinds | Meaning |
|
||||
|---|---|---|
|
||||
| `Plain` | `UserInput` (request bodies, query strings, form fields, argv, stdin) | Attacker-controlled but already in the attacker's hands. Echoing it back to them is not a disclosure. |
|
||||
| `Sensitive` | `Cookie`, `Header`, `EnvironmentConfig`, `FileSystem`, `Database`, `CaughtException`, `Unknown` | Operator-bound state that should not leak across boundaries. |
|
||||
| `Secret` | (reserved for explicit credential sources) | Highest tier; treated identically to `Sensitive` today. |
|
||||
|
||||
`Cap::DATA_EXFIL` only fires when the contributing source is at least `Sensitive`. Plain user input flowing into an outbound `fetch` body is suppressed at finding-emission time — the canonical false-positive class for API gateways and telemetry forwarders that proxy `req.body`. SSRF and other classes are unaffected; the gate is scoped to `DATA_EXFIL`.
|
||||
|
||||
If a project legitimately classifies a request body as sensitive (e.g. an internal forwarder where `req.body` carries a pre-authenticated user token), override via custom rules in `nyx.conf`:
|
||||
|
||||
```toml
|
||||
# Treat the forwarder's outbound payload as already-sanitized so the
|
||||
# DATA_EXFIL gate stops firing on it.
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["sanitizeOutbound"]
|
||||
kind = "sanitizer"
|
||||
cap = "data_exfil"
|
||||
```
|
||||
|
||||
Or re-classify the source itself with a custom Source rule whose name matches one of the Sensitive substrings (`cookie`, `header`).
|
||||
|
||||
## DATA_EXFIL suppression layers
|
||||
|
||||
Three knobs ship out of the box so projects can match the cap to their architecture without per-call suppressions.
|
||||
|
||||
### 1. Forwarding-wrapper sanitizer convention
|
||||
|
||||
A named function that exists to *forward* a payload across a known boundary is the developer's explicit decision to send the data. The default sanitizer rules treat the following identifiers as `Sanitizer(data_exfil)` in JavaScript and TypeScript:
|
||||
|
||||
```
|
||||
serializeForUpstream
|
||||
forwardPayload
|
||||
tracker.send
|
||||
analytics.track
|
||||
metrics.report
|
||||
logEvent
|
||||
```
|
||||
|
||||
If your codebase follows this convention, the cap stops firing on these calls automatically. Extend the convention with your own forwarding wrappers via the standard custom-rule path:
|
||||
|
||||
```toml
|
||||
[[analysis.languages.javascript.rules]]
|
||||
matchers = ["dispatchTelemetry", "sendToBus"]
|
||||
kind = "sanitizer"
|
||||
cap = "data_exfil"
|
||||
```
|
||||
|
||||
The rule of thumb: a function that *only* exists to ship a payload to a known boundary belongs in this list. A function that *might* leak (a generic HTTP wrapper, a logging helper that writes to an arbitrary destination) does not.
|
||||
|
||||
### 2. Destination allowlist
|
||||
|
||||
Configure a set of trusted outbound prefixes once and the cap is dropped on every site whose destination argument has a static prefix that begins with one of them:
|
||||
|
||||
```toml
|
||||
[detectors.data_exfil]
|
||||
trusted_destinations = [
|
||||
"https://api.internal/",
|
||||
"https://telemetry.",
|
||||
]
|
||||
```
|
||||
|
||||
Use full origins or origin-pinned paths so a partial-host match across unrelated origins cannot occur. `https://api.` would also match `https://api.evil.example.com/` — the entry must include the path separator (`/`) at the end of the host.
|
||||
|
||||
The match consults the abstract string domain: a literal URL is a static prefix; a template literal `\`https://api.internal/${id}\`` exposes the prefix `https://api.internal/`; a fully dynamic URL has no prefix and the cap fires as usual.
|
||||
|
||||
### 3. Detector-class disable
|
||||
|
||||
Some projects forward user-bound payloads as a matter of architecture. Turn the entire detector class off when the noise is permanent:
|
||||
|
||||
```toml
|
||||
[detectors.data_exfil]
|
||||
enabled = false
|
||||
```
|
||||
|
||||
`enabled = false` strips `Cap::DATA_EXFIL` from sink caps before event emission, so no `taint-data-exfiltration` finding reaches the report. The decision is per-project — other projects loaded by the same `nyx serve` instance keep their own settings.
|
||||
|
||||
## DATA_EXFIL sinks per language
|
||||
|
||||
Sinks Nyx ships with for `Cap::DATA_EXFIL`. The body, headers, or json payload arg fires; the URL arg routes through the SSRF gate and emits `taint-unsanitised-flow` instead.
|
||||
|
||||
| Language | Sinks | Example |
|
||||
|---|---|---|
|
||||
| JavaScript, TypeScript | `fetch(url, {body, headers, json})` body-bind, `XMLHttpRequest.prototype.send`, type-qualified `HttpClient.send` | `fetch('/upload', {method: 'POST', body: req.cookies.session})` |
|
||||
| Python | `requests.post / put / patch` body and json kwargs, `httpx.AsyncClient().post` json kwarg, `aiohttp.ClientSession().post` body, dict round-trip into json | `requests.post('https://api.internal/ingest', json={'k': os.environ.get('SECRET')})` |
|
||||
| Java | `HttpClient.send` with `BodyPublishers.ofString`, OkHttp `newCall(req).execute` body chain, Apache `HttpClient.execute(HttpPost)`, `RestTemplate.postForEntity / exchange`, `WebClient.post().bodyValue / body` | `client.send(HttpRequest.newBuilder().uri(...).POST(BodyPublishers.ofString(token)).build(), ...)` |
|
||||
| Go | `http.Post(url, ct, body)` body arg, `http.PostForm` form arg, `(*http.Client).Do(req)` after `http.NewRequest`, `(*http.Request).Body` assignment | `http.Post("https://analytics.internal/track", "text/plain", strings.NewReader(c.Value))` |
|
||||
| Rust | `reqwest::Client.post().body / json / form / multipart().send()`, `ureq::post().send_string / send_form / send_json`, `surf::post().body_string / body_json`, `hyper::Request::builder().body()` | `reqwest::Client::new().post(url).form(&secret).send()` |
|
||||
| Ruby | `Net::HTTP.post(uri, body)` body arg, `Net::HTTP::Post.new(uri).body=`, `RestClient.post / put`, `HTTParty.post(url, body: ...)` body | `Net::HTTP.post(URI('https://analytics.internal/track'), "session=#{request.cookies[:auth]}")` |
|
||||
| C, C++ | `curl_easy_setopt(handle, CURLOPT_POSTFIELDS, body)` and `CURLOPT_COPYPOSTFIELDS` gated sinks (macro-arg activation), `CURLOPT_POSTFIELDSIZE` body-bind | `curl_easy_setopt(curl, CURLOPT_POSTFIELDS, getenv("AUTH_TOKEN"));` |
|
||||
| PHP | `curl_setopt($ch, CURLOPT_POSTFIELDS, $body)`, `Guzzle\Client.post($url, ['body' => $tainted])`, `Symfony\HttpClient->request('POST', $url, ['body' => $tainted])` | `curl_setopt($ch, CURLOPT_POSTFIELDS, $_COOKIE['session']);` |
|
||||
|
||||
Add project-specific sinks with `nyx config add-rule --kind sink --cap data_exfil --matcher <name>` or the equivalent TOML rule.
|
||||
|
||||
## DATA_EXFIL calibration ranges
|
||||
|
||||
`taint-data-exfiltration` is calibrated below the other taint classes on purpose.
|
||||
|
||||
| Source kind | Severity | Confidence ceiling |
|
||||
|---|---|---|
|
||||
| Cookie, environment variable | High | Medium |
|
||||
| Header | Medium | Medium |
|
||||
| File system, database | Medium | Medium |
|
||||
| Caught exception | Medium | Low |
|
||||
|
||||
Path-validated flows (`path_validated: true`) drop one severity tier. Confidence drops to Low when the abstract or symbolic domain cannot corroborate a concrete string reaching the outbound payload (for example, when the body comes from a callee with no summary).
|
||||
|
||||
Attack-surface score ranges:
|
||||
|
||||
| Finding shape | Score |
|
||||
|---|---|
|
||||
| High DATA_EXFIL, cookie or env source, body confirmed | around 76 |
|
||||
| Medium DATA_EXFIL, header, fs, db, or caught-exception source | 40 to 45 |
|
||||
| Low DATA_EXFIL, no abstract corroboration, path-validated | 18 to 25 |
|
||||
|
||||
For reference: High SSRF, SQLi, cmdi land at 76 to 81; Medium taint with env source lands at 45 to 50; AST-only patterns sit around 10. Data-exfil sits below the direct-compromise classes but above informational AST patterns.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue