diff --git a/.gitignore b/.gitignore index 2fe23ecc..61590e17 100644 --- a/.gitignore +++ b/.gitignore @@ -14,3 +14,5 @@ .pitboss .node_modules-target node_modules +__pycache__/ +*.pyc diff --git a/CHANGELOG.md b/CHANGELOG.md index c081c64c..cb537cf5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,15 @@ All notable changes to Nyx are documented here. The format is based on [Keep a C A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go DAO helper precision pass, four CVE corpus pairs, a local web UI visual refresh, and a performance pass on the auth extractor pipeline plus SCCP and the global summaries hash map. +This branch also adds seven new vulnerability classes (LDAP injection, XPath injection, header / CRLF injection, open redirect, server-side template injection, XXE, prototype pollution), a `nyx rules` CLI subcommand, two SSA configuration sidecars (XML parser hardening, XPath variable resolver), two new path-state predicates for inline open-redirect sanitisers, and a flow-sensitive `Object.create(null)` recogniser for prototype-pollution suppression. + +### Detector classes + +- New `Cap` bits and canonical rule ids: `Cap::LDAP_INJECTION` / `taint-ldap-injection`, `Cap::XPATH_INJECTION` / `taint-xpath-injection`, `Cap::HEADER_INJECTION` / `taint-header-injection`, `Cap::OPEN_REDIRECT` / `taint-open-redirect`, `Cap::SSTI` / `taint-template-injection`, `Cap::XXE` / `taint-xxe`, `Cap::PROTOTYPE_POLLUTION` / `taint-prototype-pollution`. Each ships with per-language sink, sanitizer, and (where applicable) gated-sink rules across JS/TS, Python, Java, PHP, Go, Ruby, Rust, and C/C++. Severity, OWASP 2021 mapping, and human-readable description live in a single `CAP_RULE_REGISTRY` table in `src/labels/mod.rs`; `cap_rule_meta()` and `rule_id_for_caps()` are the public lookups. +- `Cap` widened from `u16` to `u32` to fit the new bits. `Evidence.sink_caps` is now `u32`; `RuleInfo.cap_bits` is also `u32`. The serde decoder accepts any unsigned integer width so caches written before the bump still load. SQLite schema bumped 3 to 4 to force a rescan, since older `source_caps` / `sanitizer_caps` / `sink_caps` blobs were emitted before any of the new bits could appear. +- `owasp_bucket_for` consults `CAP_RULE_REGISTRY` first so adding a new cap class does not require a second-table edit. The match requires an exact rule id or a recognised separator (` `, `(`, `.`) so a future `taint-ssrf-allowlist-violation` can no longer silently inherit `taint-ssrf`'s OWASP bucket. The legacy family-token table now also routes `xpath`, `header`, and `xxe` to A03 / A05. +- `issue_category_label` (dashboard badge) routes the seven new rule-id prefixes to dedicated labels: LDAP Injection, XPath Injection, Header Injection, Open Redirect, Template Injection, XXE, Prototype Pollution. + ### Changed - Refreshed the local web UI visual system around the mint-cyan Nyx brand: warmer light surfaces, deep green accents, updated severity/confidence colors, tighter typography, smaller radii, denser cards, table, badge, button, header, and sidebar styling, and matched graph/code-viewer colors. @@ -16,6 +25,32 @@ A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go ### Added +- `nyx rules list` CLI subcommand. Surfaces the same registry the dashboard's `/api/rules` page reads from: built-in cap-class entries (one per `Cap` with a canonical rule id), per-language label rules (sink / source / sanitizer), gated sinks, and any custom rules from config. Filters: `--lang `, `--kind `, `--class-only` for registry entries only, `--no-class` for per-language rules only. `--json` for machine output. Cap-class entries carry `language = "all"` so a language filter still surfaces them unless `--no-class` is set. +- `RuleInfo.is_class` / `RuleInfo.emission_active` flags. Cap-class entries carry `is_class = true` so dashboards can group them separately from per-language label rules. `emission_active = false` marks legacy classes (SQL_QUERY, SSRF, FILE_IO, FMT_STRING, DESERIALIZE, CODE_EXEC, CRYPTO) whose findings still surface under the catch-all `taint-unsanitised-flow` rule id; the seven new classes plus `unauthorized_id` and `data_exfil` are `emission_active = true`. The active set is pinned in `cap_rule_registry_emission_active_set_is_pinned` so a future migration of a legacy cap to its specific rule id can't drift silently. +- XML-parser configuration tracking. New `src/ssa/xml_config.rs` runs alongside type-fact analysis and carries per-receiver `secure_processing` / `disallow_doctype` / `external_entities` flags forward through copy assignments and phi joins (meet for safe flags, sticky union for the unsafe `external_entities` polarity). `xxe_safe()` queries the result at the type-qualified `XmlParser.parse` sink and strips `Cap::XXE` when the parser was provably hardened (JAXP `setFeature(FEATURE_SECURE_PROCESSING, true)`, lxml `XMLParser(resolve_entities=False, no_network=True)`, fast-xml-parser `processEntities: false`). Persisted to `OptimizeResult.xml_parser_config`. +- XPath-receiver configuration tracking. New `src/ssa/xpath_config.rs` mirrors the XML sidecar for Java's `XPath` instances: `setXPathVariableResolver(...)` flips the receiver's `has_resolver` flag, copy assignments union, phi joins meet. `xpath_safe()` strips `Cap::XPATH_INJECTION` at `xpath.evaluate(expr, ...)` / `xpath.compile(expr)` sinks when the receiver was provably bound to a resolver (parameterised XPath shape). Persisted to `OptimizeResult.xpath_config`. +- Five new `TypeKind` variants: `LdapClient` (JNDI `InitialDirContext` / `InitialLdapContext`, Spring `LdapTemplate`, ldapjs `createClient`, python-ldap `initialize`, ldap3 `Connection`), `XPathClient` (JAXP `newXPath`, lxml `etree.XPath`, npm `xpath`), `XmlParser` (JAXP factory products: `newDocumentBuilder`, `newSAXParser`, `getXMLReader`), `Template` (Apache FreeMarker `new Template(...)` / `Configuration.getTemplate`), and `NullPrototypeObject` for JS/TS values produced by `Object.create(null)`. Each is wired into `constructor_type` for return-type inference and into `TypeKind::label_prefix()` for type-qualified callee resolution. `XPathClient` is kept distinct from `DatabaseConnection` so a generic `pdo->query` SQL_QUERY sink does not collide with `xpath.query`. +- `GateActivation::LiteralOnly`. Strict literal-value activation: the gate fires only when the activation argument is a literal that matches `dangerous_values` / `dangerous_prefixes`. Unknown or dynamic activation argument suppresses (no conservative `ALL_ARGS_PAYLOAD` push). Used for ambiguously named matchers where the dangerous shape is identifiable only by an explicit literal flag, e.g. bare `extend` where `jQuery.extend(true, target, src)` is the deep-merge prototype-pollution form but Backbone's `Model.extend({proto})` shares the suffix. +- Two new `PredicateKind` variants in `src/taint/path_state.rs` for inline open-redirect sanitisers. `RelativeUrlValidated` covers `x.startsWith("/")`, `x.starts_with("/")`, `x.startswith("/")`, PHP `strpos($x, "/") === 0`, and direct `x[0] === "/"`. `HostAllowlistValidated` covers `new URL(x).host === ALLOWED`, `urlparse(x).netloc == ALLOWED`, multi-statement `parsed.host_str() == "..."` for Rust, and `parsed.Host == "..."` / `parsed.Hostname() == "..."` for Go. Both are cap-aware: they clear `Cap::OPEN_REDIRECT` only on the validated branch, leaving any non-redirect taint downstream to fire on its own caps. The Go form gates on case-sensitive capital `H` so a lowercase `u.host == X` field comparison falls through to the generic `Comparison` predicate. +- `Object.create(null)` recogniser. New `is_object_create_null_call` in `cfg/literals.rs` matches `Object.create(null)` (and parenthesised, awaited, or TS type-cast wrappers) and tags `CallMeta.produces_null_proto = true` for JS/TS calls. Type-fact analysis lifts the flag to `TypeKind::NullPrototypeObject` on the returned SSA value so the synthetic `__index_set__` sink is suppressed flow-sensitively. Phi joins drop the tag back to `Unknown` so a partial null-proto receiver still fires on the unsafe path. +- CFG-layer prototype-pollution suppression on the synthetic `__index_set__` sink (JS/TS only, recognised by the existing `try_lower_subscript_write` lowering). Three flow-insensitive shapes elide the `Sink(PROTOTYPE_POLLUTION)` label before SSA sees the node: constant-key fold (literal key not in `__proto__` / `constructor` / `prototype`); reject pattern (an enclosing-block sibling `if (idx === "__proto__" || ...) return / throw / break;`); allowlist pattern (an ancestor `if (idx === "name" || idx === "id") { obj[idx] = v }`). Walks stop at the enclosing function so closure-captured guards in an outer scope can't silently authorise inner assignments. +- Spring MVC `return "redirect:" + tainted` open-redirect recogniser (Java only). New `try_lower_spring_redirect_return` in `cfg/mod.rs` matches the leftmost `+`-chain whose root is a `redirect:` string literal and emits a synthetic `__spring_redirect__` Call sink with `Sink(Cap::OPEN_REDIRECT)` between the predecessors and the Return node. Concatenated identifiers from anywhere in the right-hand chain feed the synthetic node's `arg_uses[0]`, so the existing taint pipeline carries any tainted suffix through OPEN_REDIRECT. +- Subscript-set form classification for header sinks. `response.headers["X-Foo"] = bar` / `headers["X-Foo"] = bar` (Ruby `element_reference`, JS/TS `subscript_expression`, Python `subscript`) had no `property` field on the LHS, so the existing classification path skipped it. `push_node` now walks into the subscript's `object` and classifies its member-expression text (`response.headers`, `res.headers`, `self.response.headers`), so `Cap::HEADER_INJECTION` fires on the bare bracket form alongside `setHeader` / `res.set` / `headers_mut.insert`. +- PHP literal extraction extended in `cfg/literals.rs`. `extract_const_string_arg` now folds: PHP `encapsed_string` (double-quoted) when every child is a pure-literal segment; boolean literals (`true` / `false`) so jQuery's `extend(true, target, src)` deep-merge marker activates the `LiteralOnly` gate; leading-string `binary_expression` concat (PHP `"Location: " . $url`, JS/TS `"Location: " + url`) so `dangerous_prefixes` matching activates on partially dynamic concatenations. +- PHP receiver-text strip for chain construction. `helpers::root_receiver_text` now drops the leading `$` from `variable_name` nodes so `$smarty->fetch(...)` / `$twig->createTemplate(...)` reconstruct as `Smarty.fetch` / `Environment.createTemplate` for suffix-matcher gates instead of carrying a `$smarty.fetch` form that fails the boundary rule. +- Gate-callee resolution hardening for member-source rewrites. When `first_member_label` rewrites a call's `text` to a Source like `req.body` (because the wrapper carries a member-source argument), the gate matcher now reads the call's `function` / `method` / `name` field instead, so `setValue(target, req.body, ...)` matches the `setValue` proto-pollution gate instead of the rewritten `req.body` text. Whitespace stripped from the function field so multi-line chains still match flat gate matchers. +- Ruby option-constant lookup in gate activation. Bare `scope_resolution` / `constant` nodes (`Nokogiri::XML::ParseOptions::NOENT`) now fall back to the macro-arg extractor used by C/C++/PHP, so Nokogiri XXE gates activate on idiomatic option-flag arguments rather than firing conservatively on every positional arg. +- Per-language label rules expanded to cover the seven new caps: + - JavaScript / TypeScript: ldapjs `LdapClient.search`, `escapeXpath` / `xpathEscape`, `document.evaluate` / npm `xpath.select`, `setHeader` / `res.set` / `res.append` / `res.headers[]=`, `stripCRLF` / `escapeHeader`, lodash / dot-prop / object-path deep-merge prototype-pollution gates, Handlebars / EJS / Mustache template sinks, fast-xml-parser / xml2js with `processEntities`-aware activation, `redirect` / `Location` open-redirect sinks. + - Python: python-ldap `LDAPObject.search_s`, ldap3 `Connection.search`, lxml `etree.XPath` / `lxml.etree.parse` with parser-config awareness, Flask `response.headers[]=` / `make_response`, Jinja2 `Template(...)` and Mako `Template(...)` SSTI sinks, `flask.redirect` / `aiohttp HTTPFound` open-redirect. + - Java / Kotlin: `DirContext.search`, `XPath.evaluate` / `XPath.compile`, JAXP `DocumentBuilder.parse` / `SAXParser.parse` / `XMLReader.parse`, FreeMarker `Template.process`, Spring `redirect:` view-name synthetic sink, `HttpServletResponse.setHeader` / `addHeader`. + - PHP: `ldap_search` / `ldap_list` / `ldap_read`, `DOMXPath::query` / `DOMXPath::evaluate`, `header()` with leading-prefix activation, Smarty `fetch` / Twig `createTemplate` / Blade compile + `eval` template forms, `loadXML` / `simplexml_load_string` with `LIBXML_NOENT` activation. + - Go: `go-ldap conn.Search`, `etree.Path` / `xmlpath.Compile`, `http.Header.Set` / `Response.Header().Set`, `html/template` and `text/template` `Parse(...)`, `encoding/xml.Unmarshal` / `Decoder.Decode`, `http.Redirect` with relative-URL / host-allowlist gating. + - Ruby: `Net::LDAP#search`, `Nokogiri::XML::Document#xpath`, `response.headers[]=`, `ERB.new` SSTI, `Nokogiri::XML.parse` with `NOENT` / `DTDLOAD` activation, `redirect_to` with relative-URL gate. + - C / C++: libldap `ldap_search_ext_s`, libxml2 `xmlXPathEval`, `curl_easy_setopt` with header-list activation, libxml2 `xmlReadFile` / `xmlReadMemory` with `XML_PARSE_NOENT` activation. + - Rust: actix-web `HeaderMap.insert` / `HeaderValue::from_str` header-injection gates. `Redirect::to` retagged from `Cap::SSRF` to `Cap::OPEN_REDIRECT` so the open-redirect rule fires distinctly from the SSRF rule. +- `NYX_PYTHON_PROTO_POLLUTION` env var flag. Python `dict.update` / `__dict__.update` proto-pollution gates are opt-in: bare `update` overlaps too broadly with `Counter.update` and ordinary state-mutation patterns to ship as a default sink. When the var is set to `1` / `true` / `yes` / `on` the merged slice is leaked into a `'static` reference so the registry's lifetime invariant holds. +- New per-cap integration suites: `tests/{xpath_injection,xxe,ssti,prototype_pollution,header_injection,open_redirect,ldap_injection}_tests.rs`, plus `python_proto_pollution_tests.rs` for the env-gated Python form. Per-cap fixture trees under `tests/fixtures///` cover safe, unsafe, and irrelevant-baseline shapes for every supported language. - FastAPI cross-file `include_router` dependency tracking. New `auth_analysis/router_facts.rs` captures per-file router declarations (` = X(deps=[…])`) and `.include_router(.)` edges in pass 1, persists them into `GlobalSummaries::router_facts_by_module`, and resolves them into the active file's `AuthorizationModel::cross_file_router_deps` at pass 2 entry. Transitive lifts (`grandparent → parent → child`) handled by iterative index walk. Module identity is the file basename without `.py` (approximate, but sufficient for airflow-style `task_instances.router` naming). Closes the airflow execution-API shape where a child router lives in `routes/task_instances.py` and its auth is declared on the parent in `routes/__init__.py`. - FastAPI router-level `dependencies=[...]` propagation. Module-level `router = APIRouter(dependencies=[Security(...)])` declarations are pre-walked once per file, then merged onto every `@.(...)` route attached in the same file. Closes airflow's execution-API routes that re-use a single `ti_id_router` declared once at module scope. - FastAPI `Security(callable, scopes=[...])` recognised distinctly from `Depends(callable)`. Scoped Security promotes the synthetic `AuthCheck` to `AuthCheckKind::Other` (route-level scope-checked authorization), not just Login. New scope-tracking boolean threaded through `expand_decorator_calls` and `extract_fastapi_dependencies`. @@ -55,6 +90,15 @@ A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go ### Fixed (false positives) +- `Object.create(null)` receivers no longer fire prototype-pollution at the synthetic `__index_set__` sink. Suppression is flow-sensitive via `TypeKind::NullPrototypeObject` so a phi join that only sometimes resolves to a null-proto receiver still fires on the unsafe path. +- `cfg-unguarded-sink` over-fires on JS/TS object-literal property writes guarded by an explicit `__proto__` / `constructor` / `prototype` reject `if` (early `return` / `throw` / `break`) or by an allowlist `if` whose true arm contains the assignment. Resolved at the CFG layer before the SSA sink scan. +- Spring MVC `return "redirect:" + url` flagged generic `taint-unsanitised-flow` even when the redirect destination was the load-bearing taint. Now routed through the synthetic `__spring_redirect__` sink so the finding emerges as `taint-open-redirect`. +- `$smarty->fetch(...)` / `$twig->createTemplate(...)` no longer drop their SSTI gate match on idiomatic PHP receiver shapes. Receiver text strip in `helpers::root_receiver_text` rebuilds the chain text with `.` separators. +- `setValue(target, req.body, ...)` and similar wrappers no longer gate-match on the rewritten Source `req.body` text. Gate matcher now reads the call's `function` / `method` / `name` field when a Source label override has clobbered the call text. +- Nokogiri / lxml / fast-xml-parser parser bodies hardened with `setFeature` / `processEntities: false` / `XMLParser(resolve_entities=False)` no longer fire `taint-xxe`. Suppression runs through the new `xml_parser_config` sidecar. +- `XPath` instances bound to `setXPathVariableResolver(...)` no longer fire `taint-xpath-injection` on subsequent `xpath.evaluate(expr, ...)` sinks. Suppression runs through the new `xpath_config` sidecar. +- Inline `if (!url.startsWith("/")) reject` and `if (new URL(url).host !== ALLOWED) reject` open-redirect sanitisers now narrow the `Cap::OPEN_REDIRECT` bit on the validated branch instead of falling through to the generic `Comparison` predicate. Cap-aware: other taint downstream still fires on its own caps. +- Rust `Redirect::to` no longer fires `taint-ssrf` for what is structurally an open redirect. Retagged to `Cap::OPEN_REDIRECT` so the report classifies the issue under the correct cap. - ~957 gitea backend DAO `go.auth.missing_ownership_check` findings (id-scalar precision pass, see Added). - 169 of 216 openmrs `cfg-unguarded-sink` findings (JpaCriteriaQuery type, see Added). Equivalent reductions on xwiki / keycloak Hibernate DAO clusters. - joomla and drupal `php.deser.unserialize` flagged inside `Serializable::unserialize($input)` magic-method bodies (passthrough recognition, see Added). @@ -74,6 +118,8 @@ A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go - New `cfg/cfg_tests.rs` covers ternary-branch CFG lowering shapes. - New `summary/tests.rs` covers cross-file `include_router` summary persistence and resolution. - Refactor passes across `auth_analysis`, `ssa/const_prop`, `ssa/type_facts`, `summary`, and the per-framework auth extractors (cleaner conditional checks, simpler function signatures, deduplicated assertions). No behaviour change. +- `parse_cap` and `CapName::FromStr` accept the new short names (`ldap_injection` / `ldapi`, `xpath_injection` / `xpathi`, `header_injection` / `crlf` / `response_splitting`, `open_redirect` / `redirect`, `ssti` / `template_injection`, `xxe`, `prototype_pollution` / `proto_pollution`, plus the existing `data_exfil` alias). The `nyx config add-rule --cap` flag and `[analysis.languages.*.rules]` entries take any of these. +- Frontend `RuleListItem` carries the new `is_class` flag so the dashboard's Rules page can group cap-class entries separately. `RuleDetailView` adds the same field. ## [0.6.1] - 2026-05-03 diff --git a/README.md b/README.md index edb423e8..f0c021bc 100644 --- a/README.md +++ b/README.md @@ -186,7 +186,7 @@ Two passes over the filesystem, with an optional SQLite index to skip unchanged 3. **Pass 2**: re-analyze each file with cross-file context under bounded context sensitivity (k=1 inlining for intra-file callees, SCC fixpoint capped at 64 iterations, and summary fallback for callees above the inline body-size cap). A forward dataflow worklist propagates taint through the SSA lattice with guaranteed convergence. Call-graph SCCs iterate to fixed-point (within the cap) so mutually recursive functions get accurate summaries. 4. **Rank, dedupe, emit**: findings are scored by severity × evidence strength × source-kind exploitability, then emitted to console, JSON, or SARIF. -Detector families: taint (cross-file source→sink), CFG structural (auth gaps, unguarded sinks, resource leaks), state model (use-after-close, double-close, must-leak, unauthed-access), AST patterns (tree-sitter structural match). Full detector docs: [Detectors](https://elicpeter.github.io/nyx/detectors.html). +Detector families: taint (cross-file source→sink, with cap-specific rule classes for SQLi, XSS, command/code exec, deserialization, SSRF, path traversal, format string, crypto, LDAP injection, XPath injection, HTTP header / response splitting, open redirect, server-side template injection, XXE, prototype pollution, data exfiltration, and the auth fold-in), CFG structural (auth gaps, unguarded sinks, resource leaks), state model (use-after-close, double-close, must-leak, unauthed-access), AST patterns (tree-sitter structural match). Full detector docs: [Detectors](https://elicpeter.github.io/nyx/detectors.html). --- @@ -211,7 +211,7 @@ kind = "sanitizer" cap = "html_escape" ``` -Or add rules interactively: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`. Caps: `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `data_exfil`, `code_exec`, `crypto`, `unauthorized_id`, `all`. Full schema: [Configuration](https://elicpeter.github.io/nyx/configuration.html). +Or add rules interactively: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`. Caps: `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `data_exfil`, `code_exec`, `crypto`, `unauthorized_id`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all`. Full schema: [Configuration](https://elicpeter.github.io/nyx/configuration.html). Run `nyx rules list` to browse the registry from the terminal. --- diff --git a/docs/cli.md b/docs/cli.md index 3635a33f..20177909 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -275,7 +275,7 @@ Add a custom taint rule. Written to `nyx.local`. | `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` | | `--matcher` | Function or property name to match | | `--kind` | `source`, `sanitizer`, `sink` | -| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all` | +| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `data_exfil`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all` | ### `nyx config add-terminator` @@ -287,6 +287,41 @@ Add a terminator function (e.g. `process.exit`). Written to `nyx.local`. --- +## `nyx rules` + +Browse the built-in rule registry from the terminal. Same dataset the dashboard's Rules page reads from: cap-class entries (one per `Cap` with a canonical rule id), per-language label rules (sink / source / sanitizer), gated sinks, and any custom rules from your config. + +### `nyx rules list` + +``` +nyx rules list [--lang ] [--kind ] [--class-only|--no-class] [--json] +``` + +| Flag | Values | +|------|--------| +| `--lang` | Language slug (`javascript`, `typescript`, `python`, `java`, `php`, `go`, `ruby`, `rust`, `c`, `cpp`). Cap-class entries (`language = "all"`) still surface alongside any language filter unless `--no-class` is set. | +| `--kind` | `class` (cap-class entry), `source`, `sink`, `sanitizer` | +| `--class-only` | Show only the cap-class registry entries, suppressing per-language label rules and gated sinks. | +| `--no-class` | Suppress cap-class registry entries, show only per-language label rules and gated sinks. Conflicts with `--class-only`. | +| `--json` | Emit JSON instead of the human-readable table. Schema matches the `/api/rules` response. | + +Examples: + +```bash +# Browse the seven new vulnerability classes +nyx rules list --class-only + +# All Java sinks +nyx rules list --lang java --kind sink + +# JSON output for scripted filtering +nyx rules list --json | jq '.[] | select(.cap == "ldap_injection")' +``` + +The `enabled` column reflects the `analysis.disabled_rules` overlay from your config, so a rule disabled in `nyx.local` shows up here too. Custom rules added via `nyx config add-rule` appear at the end with `is_custom: true`. + +--- + ## Exit codes See [output.md](output.md#exit-codes). Summary: `0` on success (including findings without `--fail-on`), `1` when `--fail-on` trips, non-zero on scan errors. diff --git a/docs/configuration.md b/docs/configuration.md index 5704c66f..eaf610b9 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -253,9 +253,14 @@ cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" | # "url_encode" | "json_parse" | "file_io" | # "fmt_string" | "sql_query" | "deserialize" | # "ssrf" | "data_exfil" | "code_exec" | "crypto" | - # "unauthorized_id" | "all" + # "unauthorized_id" | "ldap_injection" | + # "xpath_injection" | "header_injection" | + # "open_redirect" | "ssti" | "xxe" | + # "prototype_pollution" | "all" ``` +Aliases accepted by `parse_cap` and `[..rules].cap`: `data_exfiltration` for `data_exfil`, `ldapi` for `ldap_injection`, `xpathi` for `xpath_injection`, `crlf` and `response_splitting` for `header_injection`, `redirect` for `open_redirect`, `template_injection` for `ssti`, `proto_pollution` for `prototype_pollution`. + --- ## Example Configurations diff --git a/docs/detectors.md b/docs/detectors.md index 28eab269..8bce55b4 100644 --- a/docs/detectors.md +++ b/docs/detectors.md @@ -13,11 +13,20 @@ The taint family is split into cap-specific rule classes when a sink callee carr | Rule id | Cap | Surface | |---|---|---| -| `taint-unsanitised-flow` | every cap except `data_exfil` and `unauthorized_id` | Default taint flow class | +| `taint-unsanitised-flow` | `sql_query`, `ssrf`, `code_exec`, `file_io`, `fmt_string`, `deserialize`, `crypto` | Catch-all class for the legacy caps that have not migrated to a dedicated rule id yet. | +| `taint-ldap-injection` | `ldap_injection` | Attacker-controlled data concatenated into an LDAP filter or DN without RFC 4515 escaping. Receivers typed as `LdapClient` (JNDI `DirContext`, Spring `LdapTemplate`, ldapjs `Client`, python-ldap `LDAPObject`, ldap3 `Connection`) and chained `.search` / `.searchByEntity` / `.search_s` form the sink set. | +| `taint-xpath-injection` | `xpath_injection` | Attacker-controlled string passed as the XPath expression to `xpath.evaluate` / `xpath.compile` / `document.evaluate` / `DOMXPath::query` / `etree.XPath`. Suppressed when the receiver was bound to an `XPathVariableResolver` (parameterised XPath shape). | +| `taint-header-injection` | `header_injection` | Attacker-controlled bytes landing in an HTTP response header without `\r\n` stripping (response splitting, cache poisoning). Covers `setHeader` / `res.set` / `res.append` / `headers["X-Foo"] = bar` / `Header().Set` / `add_header` / `setcookie` / `http.Header.Set`. | +| `taint-open-redirect` | `open_redirect` | Attacker-controlled URL driving a redirect / `Location` header without an allowlist or relative-URL check. Includes the Spring MVC `return "redirect:" + url` view-name shape via the `__spring_redirect__` synthetic sink. Suppressed by `RelativeUrlValidated` (`startsWith("/")` family) and `HostAllowlistValidated` (`new URL(x).host === ALLOWED`, `urlparse(x).netloc == ...`) inline predicates. | +| `taint-template-injection` | `ssti` | Attacker controls the *template source string* fed to a server-side renderer (Jinja2 / Mako / FreeMarker / Twig / Handlebars / EJS / Mustache / ERB / `text/template` / `html/template` / Smarty / Blade `Template(...)` / `compile(...)`), distinct from rendering a trusted template with tainted variables. | +| `taint-xxe` | `xxe` | Attacker-controlled XML reaching a parser that resolves external entities. Covers JAXP `DocumentBuilder.parse` / `SAXParser.parse` / `XMLReader.parse`, lxml `etree.parse`, Nokogiri, fast-xml-parser, xml2js, libxml2 `xmlReadFile` / `xmlReadMemory`. Suppressed when the receiver carries a hardening fact in `xml_parser_config` (`secure_processing`, `disallow_doctype`, `processEntities: false`, `LIBXML_NOENT` not set). | +| `taint-prototype-pollution` | `prototype_pollution` | Attacker-controlled key reaching an object property assignment that can mutate `Object.prototype`. JS/TS only. Covers `obj[tainted] = v` (synthetic `__index_set__` sink), library-mediated deep-merge / set helpers (`_.merge`, `_.set`, `dotProp.set`, `objectPath.set`, `setValue`), and jQuery's `extend(true, target, src)` deep-merge form via the `LiteralOnly` activation gate. Suppressed by constant-key fold (`__proto__` / `constructor` / `prototype` filtering), reject / allowlist guards on the key, and `Object.create(null)` receivers (flow-sensitive `NullPrototypeObject` type). Python equivalent (`dict.update`) is opt-in via `NYX_PYTHON_PROTO_POLLUTION=1`. | | `taint-data-exfiltration` | `data_exfil` | Sensitive data flowing into the payload of an outbound network request (body / headers / json on `fetch`, body on `XMLHttpRequest.send`). Distinct from SSRF: the destination is fixed but attacker-influenced bytes leave the process. | | `rs.auth.missing_ownership_check.taint` | `unauthorized_id` | Rust auth subsystem fold-in; see [auth.md](auth.md). | -A single call site can fire several of these at once when it carries multiple gates — `fetch(taintedUrl, {body: tainted})` produces both an SSRF finding (URL flow) and a `taint-data-exfiltration` finding (body flow), each with its own cap mask rather than a conflated union. +A single call site can fire several of these at once when it carries multiple gates. `fetch(taintedUrl, {body: tainted})` produces both an SSRF finding (URL flow) and a `taint-data-exfiltration` finding (body flow), each with its own cap mask rather than a conflated union. + +Each cap-class entry is registered in `CAP_RULE_REGISTRY` (`src/labels/mod.rs`) with its title, severity, OWASP 2021 code, and description. Browse the registry from the CLI with `nyx rules list --class-only`, or `nyx rules list --kind class --json` for machine output. For Rust auth-specific rules (`rs.auth.*`), see [auth.md](auth.md). diff --git a/docs/detectors/taint.md b/docs/detectors/taint.md index d8490eb2..2f8eebe1 100644 --- a/docs/detectors/taint.md +++ b/docs/detectors/taint.md @@ -135,10 +135,17 @@ Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer onl | `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation | | `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` | | `ssrf` | | URL-prefix locks | `requests.get`, `fetch` URL arg, outbound HTTP destination | -| `data_exfil` | cookies, headers, env, db rows, file reads (Sensitive-tier sources only) | | `fetch` body / headers / json, `XMLHttpRequest.send` body | | `code_exec` | | | `eval`, `exec`, `Function` | | `crypto` | | | weak-algorithm constructors | | `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write | +| `ldap_injection` | | `ldap-escape` filter / dn helpers, project-local `escapeLdapFilter` | `DirContext.search`, `LdapClient.search`, `ldap_search`, `Net::LDAP#search`, `ldap_search_ext_s` | +| `xpath_injection` | | bound `XPathVariableResolver`, `escapeXpath` / `xpathEscape` helpers | `XPath.evaluate`, `DOMXPath::query`, `document.evaluate`, `xpath.select`, `etree.XPath` | +| `header_injection` | | `stripCRLF` / `escapeHeader` / `sanitizeHeader` | `setHeader`, `res.set`, `res.append`, `headers["X-Foo"] = bar`, `Header().Set`, `header()`, `setcookie` | +| `open_redirect` | | leading-slash check (`startsWith("/")`), URL-parse + host allowlist (`new URL(x).host === ALLOWED`) | `Redirect::to`, Spring `redirect:` view name, `flask.redirect`, `http.Redirect`, `redirect_to` | +| `ssti` | | | template constructors fed by tainted source: `Jinja2 Template(...)`, `freemarker.Template`, `Twig::createTemplate`, Handlebars `compile`, `ERB.new`, Mako `Template(...)` | +| `xxe` | | hardened parser config (`secure_processing`, `disallow-doctype-decl`, `processEntities: false`, `LIBXML_NOENT` not set) | `DocumentBuilder.parse`, `SAXParser.parse`, `xml2js`, `fast-xml-parser`, `lxml.etree.parse`, `xmlReadFile` | +| `prototype_pollution` | | constant-key fold, reject / allowlist guards on the key, `Object.create(null)` receivers | `obj[tainted] = v` synthetic `__index_set__`, `_.merge`, `_.set`, `dotProp.set`, `objectPath.set`, jQuery `extend(true, ...)` | +| `data_exfil` | cookies, headers, env, db rows, file reads (Sensitive-tier sources only) | | `fetch` body / headers / json, `XMLHttpRequest.send` body | | `all` | Sources typically use `all` so they match any sink | | | Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name. diff --git a/docs/rules.md b/docs/rules.md index 22ca8f2b..c35c0dde 100644 --- a/docs/rules.md +++ b/docs/rules.md @@ -24,13 +24,22 @@ Language prefixes: `rs`, `c`, `cpp`, `go`, `java`, `js`, `ts`, `py`, `php`, `rb` ### Taint -One rule covers every source-to-sink flow. The parenthetical identifies the source location. +The taint family is split into cap-specific rule classes. The `taint-unsanitised-flow` id is the catch-all for the legacy caps that have not migrated to a dedicated rule id yet (`sql_query`, `ssrf`, `code_exec`, `file_io`, `fmt_string`, `deserialize`, `crypto`). The seven new vulnerability classes plus auth and data-exfil emerge under their own rule id. The parenthetical identifies the source location. -| Rule ID | Severity | -|---|---| -| `taint-unsanitised-flow (source L:C)` | Varies by source kind and sink capability | +| Rule ID | Cap | Severity | +|---|---|---| +| `taint-unsanitised-flow (source L:C)` | `sql_query` / `ssrf` / `code_exec` / `file_io` / `fmt_string` / `deserialize` / `crypto` | Varies | +| `taint-ldap-injection` | `ldap_injection` | High | +| `taint-xpath-injection` | `xpath_injection` | High | +| `taint-header-injection` | `header_injection` | High | +| `taint-open-redirect` | `open_redirect` | Medium | +| `taint-template-injection` | `ssti` | High | +| `taint-xxe` | `xxe` | High | +| `taint-prototype-pollution` | `prototype_pollution` | High | +| `taint-data-exfiltration` | `data_exfil` | High / Medium | +| `rs.auth.missing_ownership_check.taint` | `unauthorized_id` | High | -The matcher sets (sources, sanitizers, sinks, gated sinks) live per-language in `src/labels/.rs`. [Language maturity](language-maturity.md) gives per-language counts and what's covered. +Each cap-class entry is registered in `CAP_RULE_REGISTRY` (`src/labels/mod.rs`). Browse the registry from the CLI with `nyx rules list --class-only`, or via the dashboard's Rules page. The matcher sets (sources, sanitizers, sinks, gated sinks) live per-language in `src/labels/.rs`. [Language maturity](language-maturity.md) gives per-language counts and what's covered. ### CFG structural @@ -257,6 +266,8 @@ The tables below are generated from `src/patterns/.rs` by [`tools/docgen`] `nyx config add-rule --cap ` and `[analysis.languages.*.rules]` in config accept: -`env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all` +`env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `data_exfil`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all` -Source for both the enum and the `to_cap` mapping: [`src/labels/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/labels/mod.rs) (`Cap`) and [`src/utils/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/utils/config.rs) (`CapName`). +Aliases: `data_exfiltration` for `data_exfil`, `ldapi` for `ldap_injection`, `xpathi` for `xpath_injection`, `crlf` and `response_splitting` for `header_injection`, `redirect` for `open_redirect`, `template_injection` for `ssti`, `proto_pollution` for `prototype_pollution`. + +Source for both the enum and the `to_cap` mapping: [`src/labels/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/labels/mod.rs) (`Cap` and `CAP_RULE_REGISTRY`) and [`src/utils/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/utils/config.rs) (`CapName`). diff --git a/frontend/src/api/types.ts b/frontend/src/api/types.ts index 3bf83a9d..94732376 100644 --- a/frontend/src/api/types.ts +++ b/frontend/src/api/types.ts @@ -355,6 +355,7 @@ export interface RuleListItem { enabled: boolean; is_custom: boolean; is_gated: boolean; + is_class: boolean; case_sensitive: boolean; finding_count: number; suppression_rate: number; diff --git a/src/ast.rs b/src/ast.rs index d6574e3f..26b4130f 100644 --- a/src/ast.rs +++ b/src/ast.rs @@ -377,7 +377,7 @@ fn build_taint_diag( // Resolved sink capability bits, used by deduplication to distinguish // sinks with different cap types on the same source line (e.g. // `sink_sql(x); sink_shell(x);`). - let sink_caps_bits: u16 = cfg_graph[finding.sink] + let sink_caps_bits: u32 = cfg_graph[finding.sink] .taint .labels .iter() @@ -385,7 +385,7 @@ fn build_taint_diag( crate::labels::DataLabel::Sink(c) => Some(c.bits()), _ => None, }) - .fold(0u16, |acc, b| acc | b); + .fold(0u32, |acc, b| acc | b); // Cap-specific rule-id routing. // @@ -508,6 +508,14 @@ fn build_taint_diag( || (finding.source_kind.sensitivity() >= crate::labels::Sensitivity::Sensitive && (flow_has_body_bind || source_is_credential_bearing))); + // Cap-specific rule routing. Auth-as-taint and data-exfil keep their + // pre-existing branches so the routing rules they encode (auth-finding + // namespace alignment; body-bind / source-sensitivity gate) stay + // exactly as before. New cap classes (LDAP / XPath / Header / Open + // redirect / SSTI / XXE / Prototype pollution) route through + // `cap_rule_meta()` so the canonical rule ids in the registry are the + // single source of truth. Legacy generic taint findings continue to + // emit `taint-unsanitised-flow`. let diag_id = if effective_caps.contains(crate::labels::Cap::UNAUTHORIZED_ID) { "rs.auth.missing_ownership_check.taint".to_string() } else if is_data_exfil_rule { @@ -516,6 +524,25 @@ fn build_taint_diag( source_point.row + 1, source_point.column + 1 ) + } else if let Some(meta) = [ + crate::labels::Cap::LDAP_INJECTION, + crate::labels::Cap::XPATH_INJECTION, + crate::labels::Cap::HEADER_INJECTION, + crate::labels::Cap::OPEN_REDIRECT, + crate::labels::Cap::SSTI, + crate::labels::Cap::XXE, + crate::labels::Cap::PROTOTYPE_POLLUTION, + ] + .iter() + .find(|c| effective_caps.contains(**c)) + .and_then(|c| crate::labels::cap_rule_meta(*c)) + { + format!( + "{} (source {}:{})", + meta.rule_id, + source_point.row + 1, + source_point.column + 1 + ) } else { format!( "taint-unsanitised-flow (source {}:{})", @@ -576,6 +603,23 @@ fn build_taint_diag( } _ => crate::patterns::Severity::Medium, } + } else if let Some(meta) = [ + crate::labels::Cap::LDAP_INJECTION, + crate::labels::Cap::XPATH_INJECTION, + crate::labels::Cap::HEADER_INJECTION, + crate::labels::Cap::OPEN_REDIRECT, + crate::labels::Cap::SSTI, + crate::labels::Cap::XXE, + crate::labels::Cap::PROTOTYPE_POLLUTION, + ] + .iter() + .find(|c| effective_caps.contains(**c)) + .and_then(|c| crate::labels::cap_rule_meta(*c)) + { + // New cap classes draw severity from the rule registry so a single + // edit to `CAP_RULE_REGISTRY` cascades through SARIF, the dashboard, + // and the integration suite without per-language source-kind nudges. + meta.severity } else { severity_for_source_kind(finding.source_kind) }; diff --git a/src/auth_analysis/mod.rs b/src/auth_analysis/mod.rs index ea56aeb0..15fce859 100644 --- a/src/auth_analysis/mod.rs +++ b/src/auth_analysis/mod.rs @@ -206,8 +206,8 @@ pub fn run_auth_analysis_with_model( // (when provided) for cross-file helpers that live in other files. apply_helper_lifting(&mut model, lang, file_path, scan_root, global_summaries); - // Phase 1 caller-scope IPA: propagate route-handler-level auth - // checks DOWN to callee helper units within the same file. See + // Caller-scope IPA: propagate route-handler-level auth checks DOWN + // to callee helper units within the same file. See // [`apply_caller_scope_propagation`] for the propagation rule. apply_caller_scope_propagation(&mut model); @@ -547,8 +547,8 @@ fn apply_helper_lifting( } } -/// Phase 1 caller-scope IPA: propagate route-handler-level auth checks -/// DOWN to callee helper units within the same file. +/// Caller-scope IPA: propagate route-handler-level auth checks DOWN to +/// callee helper units within the same file. /// /// `apply_helper_lifting` walks UPWARD: a helper that internally /// proves ownership / membership / etc. has its summary lifted onto diff --git a/src/cfg/cfg_tests.rs b/src/cfg/cfg_tests.rs index 92819c84..d3e0e753 100644 --- a/src/cfg/cfg_tests.rs +++ b/src/cfg/cfg_tests.rs @@ -1190,6 +1190,7 @@ fn clone_preserves_all_sub_structs() { destination_uses: None, gate_filters: Vec::new(), is_constructor: false, + produces_null_proto: false, }, taint: TaintMeta { labels: { @@ -1841,9 +1842,12 @@ def outer(cmd): assert_eq!(kwargs[1].0, "check"); } -/// Languages without keyword-argument grammar should leave `kwargs` empty. +/// JS object-literal positional args lift their `pair` children into +/// `kwargs` so consumers like xml_config's `processEntities` / +/// `resolve_entities` opt-in detector can read them without re-walking +/// the tree-sitter AST. #[test] -fn call_node_kwargs_empty_for_javascript() { +fn call_node_kwargs_lifts_javascript_object_literal_pairs() { let src = br" function outer(cmd) { child_process.exec(cmd, { shell: true }); @@ -1861,9 +1865,12 @@ fn call_node_kwargs_empty_for_javascript() { .is_some_and(|c| c.ends_with("exec")) }) .expect("child_process.exec call node should exist"); + let kwargs = &call_node.call.kwargs; assert!( - call_node.call.kwargs.is_empty(), - "JS object-literal arg is not a keyword_argument — kwargs should stay empty" + kwargs + .iter() + .any(|(k, vs)| k == "shell" && vs.iter().any(|v| v == "true")), + "JS object-literal `{{ shell: true }}` should surface as kwarg, got {kwargs:?}" ); } diff --git a/src/cfg/dto.rs b/src/cfg/dto.rs index 016c28a6..8cc27aed 100644 --- a/src/cfg/dto.rs +++ b/src/cfg/dto.rs @@ -7,7 +7,7 @@ //! Strictly additive: classes whose fields cannot be classified produce //! a `DtoFields` with an empty `fields` map, the caller must decide //! whether to use that as a "Dto with no inferred fields" or fall back -//! to the pre-Phase-6 Object/Unknown classification. +//! to the generic Object/Unknown classification. use std::collections::{HashMap, HashSet}; diff --git a/src/cfg/helpers.rs b/src/cfg/helpers.rs index 8a1194e5..c6407794 100644 --- a/src/cfg/helpers.rs +++ b/src/cfg/helpers.rs @@ -35,6 +35,16 @@ pub(crate) fn root_receiver_text(n: Node, lang: &str, code: &[u8]) -> Option text_of(n, code), } } + // PHP `variable_name` text carries a leading `$` (`$smarty`, `$twig`). + // Strip it so chain text built downstream (`{recv}.{method}`) presents + // a `.`-only delimiter sequence — required by the suffix-matcher + // boundary rule, which only accepts `.`/`:` as chain separators. + // Without this strip, gate matchers like `Smarty.fetch` / + // `Environment.createTemplate` never fire on idiomatic + // `$smarty->fetch(...)` / `$twig->createTemplate(...)` shapes. + _ if lang == "php" && n.kind() == "variable_name" => { + text_of(n, code).map(|s| s.trim_start_matches('$').to_string()) + } _ => text_of(n, code), } } diff --git a/src/cfg/literals.rs b/src/cfg/literals.rs index 79034f1d..a7062910 100644 --- a/src/cfg/literals.rs +++ b/src/cfg/literals.rs @@ -195,6 +195,56 @@ pub(super) fn extract_destination_kwarg_pairs( /// Extract the string-literal content at argument position `index` (0-based). /// Returns `None` if the argument is not a string literal or the index is out of range. +/// True when `call_node` is `Object.create(null)` (or its parenthesised / +/// awaited / type-cast wrappers). Strict literal-`null` first-arg match, +/// no aliasing through intermediate variables. Caller restricts to JS/TS. +pub(super) fn is_object_create_null_call(call_node: Node, code: &[u8]) -> bool { + if !matches!(call_node.kind(), "call_expression") { + return false; + } + let callee = call_node + .child_by_field_name("function") + .and_then(|f| text_of(f, code)) + .unwrap_or_default(); + if callee != "Object.create" { + return false; + } + let Some(args) = call_node.child_by_field_name("arguments") else { + return false; + }; + let mut cursor = args.walk(); + let named: Vec = args.named_children(&mut cursor).collect(); + if named.len() != 1 { + return false; + } + let mut arg = named[0]; + // Unwrap parens / await / TS type-assertions. + for _ in 0..4 { + match arg.kind() { + "parenthesized_expression" => { + if let Some(inner) = arg.named_child(0) { + arg = inner; + continue; + } + } + "await_expression" => { + if let Some(inner) = arg.child_by_field_name("argument") { + arg = inner; + continue; + } + } + "as_expression" | "type_assertion" => { + if let Some(inner) = arg.named_child(0) { + arg = inner; + continue; + } + } + _ => break, + } + } + arg.kind() == "null" || text_of(arg, code).as_deref() == Some("null") +} + pub(super) fn extract_const_string_arg( call_node: Node, index: usize, @@ -222,6 +272,37 @@ pub(super) fn extract_const_string_arg( None } } + // Boolean literals — JS/TS `true`/`false` are their own node kinds; some + // grammars wrap them as identifiers carrying the keyword text. Returned + // verbatim so `dangerous_values` matching can detect deep-flag forms + // like `extend(true, target, src)`. + "true" | "false" => Some(arg.kind().to_string()), + // PHP double-quoted strings parse as `encapsed_string` whose body is + // a sequence of `string_content` / `escape_sequence` / interpolation + // nodes. Treat the string as constant only when every child is a + // pure-literal segment (no `variable_name` / `subscript_expression` + // interpolations); the returned value is the concatenation of the + // literal segments verbatim. + "encapsed_string" => { + let mut c = arg.walk(); + let mut buf = String::new(); + for ch in arg.named_children(&mut c) { + match ch.kind() { + "string_content" => { + if let Some(s) = text_of(ch, code) { + buf.push_str(&s); + } + } + "escape_sequence" => { + if let Some(s) = text_of(ch, code) { + buf.push_str(&s); + } + } + _ => return None, + } + } + Some(buf) + } "template_string" => { // Only treat as constant if no interpolation (no template_substitution children) let mut c = arg.walk(); @@ -238,6 +319,44 @@ pub(super) fn extract_const_string_arg( None } } + // Concat-style binary expression with a leading string literal, e.g. + // PHP `"Location: " . $url`, JS/TS `"Location: " + url`. Returns the + // left-most literal so prefix-driven gates (`dangerous_prefixes`) can + // activate on partially-dynamic concatenations; falls through to + // `None` when the leading segment is not a string literal so + // exact-`dangerous_values` matching keeps its strict semantics. + "binary_expression" => { + let left = arg.child_by_field_name("left")?; + match left.kind() { + "string" + | "string_literal" + | "interpreted_string_literal" + | "raw_string_literal" => { + let raw = text_of(left, code)?; + if raw.len() >= 2 { + Some(raw[1..raw.len() - 1].to_string()) + } else { + None + } + } + "encapsed_string" => { + let mut c = left.walk(); + let mut buf = String::new(); + for ch in left.named_children(&mut c) { + match ch.kind() { + "string_content" | "escape_sequence" => { + if let Some(s) = text_of(ch, code) { + buf.push_str(&s); + } + } + _ => return None, + } + } + Some(buf) + } + _ => None, + } + } _ => None, } } @@ -271,6 +390,27 @@ pub(super) fn extract_const_macro_arg( "identifier" | "name" | "qualified_name" | "scoped_identifier" => { text_of(arg, code).map(|s| s.to_string()) } + // Ruby bare constant (`NOENT`) — leaf form. + "constant" => text_of(arg, code).map(|s| s.to_string()), + // Ruby scope-qualified constant (`Nokogiri::XML::ParseOptions::NOENT`). + // Return only the rightmost `name` segment so the gate's + // `dangerous_values` list can stay identifier-bare instead of + // enumerating every possible namespacing. Falls back to the full + // text if the `name` field is missing for any reason. + "scope_resolution" => arg + .child_by_field_name("name") + .and_then(|n| text_of(n, code)) + .map(|s| s.to_string()) + .or_else(|| text_of(arg, code).map(|s| s.to_string())), + // Integer literals at the activation arg position. PHP / C / C++ + // commonly use plain `0` to opt into the safe-default option set + // (e.g. `simplexml_load_string($xml, "SimpleXMLElement", 0)`). The + // gate's `dangerous_values` list is identifier-only, so returning + // the literal text lets the comparison fail against `LIBXML_NOENT` + // and suppresses the conservative-fire branch. + "integer" | "integer_literal" | "number_literal" | "decimal_integer_literal" => { + text_of(arg, code).map(|s| s.to_string()) + } _ => None, } } @@ -728,35 +868,72 @@ pub(super) fn find_chained_inner_call<'a>( return Some((function, inner_text)); } // The function/method field for a chained call is a member_expression - // (JS/TS) or attribute (Python) etc.; its `object` field is the - // receiver expression. Only proceed when that receiver is itself a - // call. - let object = function.child_by_field_name("object")?; + // (JS/TS), attribute (Python), or field_expression (Rust); its + // receiver is the `object` field (JS/TS/Python) or `value` field + // (Rust). Only proceed when that receiver is itself a call. + let object = function + .child_by_field_name("object") + .or_else(|| function.child_by_field_name("value"))?; if !matches!(lookup(lang, object.kind()), Kind::CallFn | Kind::CallMethod) { return None; } - // Recurse: the inner call may itself be chained - // (`axios.get(u).then(h).catch(h)`, innermost is `axios.get`). - if let Some(inner) = find_chained_inner_call(object, lang, code) { - return Some(inner); - } - // `object` is the innermost call_expression in the chain. Extract - // its callee identifier the same way `first_call_ident_with_span` - // does for a CallFn (member_expression text → "http.get"). - let inner_func = object + // Decide whether `object` is itself a chained method call (its + // function/method field is a member-style expression). When yes, + // recurse one more level so deeper chains resolve to their innermost + // method (e.g. `axios.get(u).then(h).catch(h)` → `axios.get`). + // When no — the receiver is a plain function/constructor call like + // Rust's `HttpResponse::Found()` — descending one more level would + // strand us on the non-method leaf whose text would not match any + // gate matcher. Stop here and return the current `outer` level, + // which IS the innermost method call. + let object_function = object .child_by_field_name("function") - .or_else(|| object.child_by_field_name("method")) - .or_else(|| object.child_by_field_name("name"))?; - // Multi-line dotted member expressions (`http\n .get`) include - // formatting whitespace in the source-text slice. The labels map - // keys are literal `"http.get"` etc., strip whitespace so the - // chained-call inner-gate rebinding fires for both single-line and - // multi-line chain styles. Also strips `\r` for CRLF sources. - // Motivated by upstream Parse Server CVE-2025-64430 which uses the - // multi-line `http\n .get(uri, ...)\n .on(...)` form. - let raw = text_of(inner_func, code)?; + .or_else(|| object.child_by_field_name("method")); + let object_is_chained_method = object_function + .map(|f| { + matches!( + f.kind(), + "member_expression" + | "attribute" + | "field_expression" + | "scoped_identifier" + | "scope_resolution" + ) && f + .child_by_field_name("object") + .or_else(|| f.child_by_field_name("value")) + .is_some() + }) + .unwrap_or(false); + if object_is_chained_method { + // Recurse: the inner call may itself be chained. + if let Some(inner) = find_chained_inner_call(object, lang, code) { + return Some(inner); + } + // `object` is the innermost call_expression in the chain. Extract + // its callee identifier the same way `first_call_ident_with_span` + // does for a CallFn (member_expression text → "http.get"). + let inner_func = object + .child_by_field_name("function") + .or_else(|| object.child_by_field_name("method")) + .or_else(|| object.child_by_field_name("name"))?; + // Multi-line dotted member expressions (`http\n .get`) include + // formatting whitespace in the source-text slice. The labels map + // keys are literal `"http.get"` etc., strip whitespace so the + // chained-call inner-gate rebinding fires for both single-line and + // multi-line chain styles. Also strips `\r` for CRLF sources. + // Motivated by upstream Parse Server CVE-2025-64430 which uses the + // multi-line `http\n .get(uri, ...)\n .on(...)` form. + let raw = text_of(inner_func, code)?; + let inner_text: String = raw.chars().filter(|c| !c.is_whitespace()).collect(); + return Some((object, inner_text)); + } + // Receiver is a non-chained call (Rust constructor `Foo::new()` / + // `HttpResponse::Found()`, JS bare `f()`). Outer level IS the + // innermost method call — return its own function text so gate + // matching sees the method name. + let raw = text_of(function, code)?; let inner_text: String = raw.chars().filter(|c| !c.is_whitespace()).collect(); - Some((object, inner_text)) + Some((outer, inner_text)) } /// Recursively walk the receiver chain of `outer` (a CallFn / CallMethod @@ -1389,6 +1566,47 @@ pub(super) fn extract_kwargs(call_node: Node, code: &[u8]) -> Vec<(String, Vec Vec<(String, Vec Vec text_of(target, code).map(|s| s.to_string()), _ => None, }; result.push(literal); diff --git a/src/cfg/mod.rs b/src/cfg/mod.rs index 34b79fa2..b0f0e0d0 100644 --- a/src/cfg/mod.rs +++ b/src/cfg/mod.rs @@ -70,8 +70,8 @@ use literals::{ extract_destination_field_pairs, extract_destination_kwarg_pairs, extract_kwargs, extract_literal_rhs, extract_object_arg_property, extract_shell_array_payload_idents, find_call_node, find_call_node_deep, find_chained_inner_call, has_keyword_arg, - has_object_arg_property, has_only_literal_args, is_parameterized_query_call, - java_chain_arg0_kind_for_method, js_chain_arg0_kind_for_method, + has_object_arg_property, has_only_literal_args, is_object_create_null_call, + is_parameterized_query_call, java_chain_arg0_kind_for_method, js_chain_arg0_kind_for_method, js_chain_outer_method_for_inner, ruby_chain_arg0_for_method, walk_chain_inner_call_args, }; use params::{ @@ -359,6 +359,14 @@ pub struct CallMeta { /// must not survive into the constructed object. #[serde(default)] pub is_constructor: bool, + /// True when this call is `Object.create(null)` (or alias). The returned + /// value has no prototype chain. Consumed by TypeFacts to tag the + /// SsaValue with [`crate::ssa::type_facts::TypeKind::NullPrototypeObject`] + /// so PROTOTYPE_POLLUTION suppression can fire flow-sensitively at the + /// synthetic `__index_set__` sink. Set during CFG node construction so + /// SSA does not need to re-walk the AST. + #[serde(default)] + pub produces_null_proto: bool, } /// One gate's contribution at a call site whose callee matches multiple @@ -601,8 +609,7 @@ pub struct BodyMeta { /// decorators / annotations / static type text at CFG construction /// time. Same length as `params`; positions with no recoverable /// type info are `None`. Strictly additive, when every entry is - /// `None`, downstream behaviour is identical to the pre-Phase-1 - /// engine. + /// `None`, downstream behaviour is identical to the type-unaware path. pub param_types: Vec>, /// Per-parameter destructured-binding sibling names. Same length /// as `params`; entry `i` lists field names bound by the same @@ -1811,6 +1818,31 @@ pub(super) fn push_node<'a>( labels.push(l); } } + // Subscript-set form: `response.headers["X-Foo"] = bar` + // (Ruby `element_reference`, JS/TS `subscript_expression`, + // Python `subscript`). The LHS has no `property` field, so + // walk into the subscript's `object` and try classifying its + // member-expression text (e.g. `response.headers`). This + // lets header-injection sinks fire on the bare bracket form + // alongside the `set_header` / `headers_mut.insert` method + // shapes already covered above. + if labels.is_empty() + && matches!( + lhs.kind(), + "subscript_expression" | "subscript" | "element_reference" + ) + { + let obj = lhs + .child_by_field_name("object") + .or_else(|| lhs.child_by_field_name("value")) + .or_else(|| lhs.child(0)); + if let Some(obj_node) = obj + && let Some(obj_text) = member_expr_text(obj_node, code) + && let Some(l) = classify(lang, &obj_text, extra) + { + labels.push(l); + } + } } } @@ -1933,18 +1965,45 @@ pub(super) fn push_node<'a>( { let gate_call = call_ast.or_else(|| find_call_node_deep(ast, lang, 4)); if let Some(cn) = gate_call { - let gate_callee_text = if call_ast.is_some() { + // Derive the gate's callee text from the call's + // `function`/`method`/`name` field, falling back to `text`. + // + // The default is `text`, which by this point reflects the + // qualified callee for method calls (`Velocity.evaluate`, + // `$smarty->fetch`) reconstructed in the `Kind::CallMethod` + // arm. When `first_member_label` rewrites `text` to a member + // Source like `req.body` (because the wrapper carries one as + // an argument), the rewrite is correct for source attribution + // but defeats gate matching against a bare callee + // (`setValue(target, req.body, …)` would gate-match + // `req.body` instead of `setValue`). + // + // Detect that case structurally: a Source label is present AND + // the call's function-field text differs from `text`. The + // function field carries the actual callee identifier; when it + // disagrees with `text`, `text` was clobbered by a member-source + // override and the function field is the right gate target. + // Whitespace is stripped to mirror `find_chained_inner_call` + // so multi-line chains (`http\n .get(...)`) still match flat + // gate matchers like `http.get`. + let function_field_text: Option = cn + .child_by_field_name("function") + .or_else(|| cn.child_by_field_name("method")) + .or_else(|| cn.child_by_field_name("name")) + .and_then(|f| text_of(f, code)) + .map(|t| t.chars().filter(|c| !c.is_whitespace()).collect::()); + let has_source_label = labels + .iter() + .any(|l| matches!(l, crate::labels::DataLabel::Source(_))); + let gate_callee_text = if let Some(ff) = function_field_text.as_deref() + && has_source_label + && ff != text.as_str() + { + ff.to_string() + } else if call_ast.is_some() { text.clone() } else { - // Inner call reached via wrapper, use the call-expression's - // function name directly. Falls back to `text` so non-call- - // expression kinds (method calls, Ruby `call` nodes, macros) - // still have a usable callee string. - cn.child_by_field_name("function") - .or_else(|| cn.child_by_field_name("method")) - .or_else(|| cn.child_by_field_name("name")) - .and_then(|f| text_of(f, code)) - .unwrap_or_else(|| text.clone()) + function_field_text.unwrap_or_else(|| text.clone()) }; let matches = classify_gated_sink( lang, @@ -1953,12 +2012,15 @@ pub(super) fn push_node<'a>( extract_const_string_arg(cn, idx, code).or_else(|| { // C/C++ preprocessor macros and PHP `define`d constants // surface as identifier nodes, not string literals. - // Falling back to the macro-arg extractor for those - // languages lets gates like `curl_easy_setopt` / - // `curl_setopt` activate on a `CURLOPT_POSTFIELDS` - // ident match instead of firing conservatively on - // every positional arg. - if matches!(lang, "c" | "cpp" | "c++" | "php") { + // Ruby option constants (e.g. + // `Nokogiri::XML::ParseOptions::NOENT`) surface as + // `scope_resolution` / `constant` nodes. Falling back + // to the macro-arg extractor for those languages lets + // gates like `curl_easy_setopt` / `curl_setopt` / + // `Nokogiri::XML` activate on a bare-leaf identifier + // match instead of firing conservatively on every + // positional arg. + if matches!(lang, "c" | "cpp" | "c++" | "php" | "ruby" | "rb") { extract_const_macro_arg(cn, idx, code) } else { None @@ -2656,6 +2718,13 @@ pub(super) fn push_node<'a>( || call_ast .is_some_and(|cn| matches!(cn.kind(), "new_expression" | "object_creation_expression")); + // Detect `Object.create(null)` so TypeFacts can tag the returned + // SsaValue with `NullPrototypeObject` for flow-sensitive + // prototype-pollution suppression. Restricted to JS/TS where + // `Object.create` is the idiomatic null-prototype constructor. + let produces_null_proto = matches!(lang, "javascript" | "typescript") + && call_ast.is_some_and(|cn| is_object_create_null_call(cn, code)); + let idx = g.add_node(NodeInfo { kind, call: CallMeta { @@ -2672,6 +2741,7 @@ pub(super) fn push_node<'a>( destination_uses, gate_filters, is_constructor, + produces_null_proto, }, taint: TaintMeta { labels, @@ -2860,6 +2930,31 @@ fn try_lower_subscript_write( *call_ordinal += 1; let mut uses_all: Vec = vec![arr_text.clone(), idx_text.clone()]; uses_all.extend(rhs_uses.iter().cloned()); + + // Prototype pollution sink classification on the synthetic + // `__index_set__` node for JS/TS. Tainted *key* in `obj[key] = val` + // is the pollution channel (a `__proto__` / `constructor` literal flowing + // through `key` mutates `Object.prototype` globally), so the gate's + // payload arg list is `[0]` (the key only — the value at index 1 is + // benign on its own). Sanitizer recognition is structural (no taint + // engine plumbing) and runs before label attachment, so suppressed + // shapes never enter the SSA sink scan: + // * constant string key whose literal value is not in the dangerous + // set (`__proto__` / `constructor` / `prototype`), + // * receiver was assigned `Object.create(null)` in this function + // (no prototype chain to pollute), + // * the assignment is dominated by an `if` whose condition rejects + // dangerous keys with an early `return` / `throw` / `break`, or + // that allowlists the key against safe constants on its true arm. + let mut pp_labels: smallvec::SmallVec<[DataLabel; 2]> = smallvec::SmallVec::new(); + let mut pp_payload_args: Option> = None; + if matches!(lang, "javascript" | "typescript" | "js" | "ts") + && !pp_should_suppress_index_set(assign_ast, subscript_node, &arr_text, &idx_text, code) + { + pp_labels.push(DataLabel::Sink(Cap::PROTOTYPE_POLLUTION)); + pp_payload_args = Some(vec![0]); + } + let n = g.add_node(NodeInfo { kind: StmtKind::Call, call: CallMeta { @@ -2867,9 +2962,11 @@ fn try_lower_subscript_write( receiver: Some(arr_text.clone()), arg_uses: vec![vec![idx_text.clone()], rhs_uses.clone()], call_ordinal: ord, + sink_payload_args: pp_payload_args, ..Default::default() }, taint: TaintMeta { + labels: pp_labels, uses: uses_all, ..Default::default() }, @@ -2883,6 +2980,477 @@ fn try_lower_subscript_write( Some(n) } +/// Spring MVC controller-return open-redirect recogniser. Detects the +/// shape `return "redirect:" + tainted` (Java string concatenation) and +/// emits a synthetic `__spring_redirect__` Call sink with +/// `Sink(OPEN_REDIRECT)` so the existing taint pipeline propagates the +/// concatenated suffix through the OPEN_REDIRECT cap. The synthetic +/// node sequences between `preds` and the eventual Return node. +/// +/// Returns `Some(synthetic_idx)` when matched, otherwise `None`. +/// Java only — Spring's `redirect:` view-name convention has no +/// counterpart in the other supported languages, and matching the +/// literal across non-Spring code would over-fire. +fn try_lower_spring_redirect_return( + ast: Node, + preds: &[NodeIndex], + g: &mut Cfg, + lang: &str, + code: &[u8], + enclosing_func: Option<&str>, + call_ordinal: &mut u32, +) -> Option { + if lang != "java" { + return None; + } + // `return EXPR ;` — find the returned expression. tree-sitter-java + // wraps the value in a `return_statement` whose first named child + // is the expression. + let expr = ast.named_child(0)?; + // Strip parentheses. + let mut cur = expr; + while cur.kind() == "parenthesized_expression" { + cur = cur.named_child(0)?; + } + if cur.kind() != "binary_expression" { + return None; + } + let op = cur.child_by_field_name("operator")?; + let op_text = text_of(op, code)?; + if op_text != "+" { + return None; + } + // Walk leftmost descent through left-associated `+` chains so that + // `"redirect:" + a + b` still matches (the AST nests as + // `(("redirect:" + a) + b)`). + let mut leftmost = cur; + loop { + let left = leftmost.child_by_field_name("left")?; + let mut left_inner = left; + while left_inner.kind() == "parenthesized_expression" { + left_inner = left_inner.named_child(0)?; + } + if left_inner.kind() == "binary_expression" { + let op_l = left_inner.child_by_field_name("operator")?; + if text_of(op_l, code).as_deref() == Some("+") { + leftmost = left_inner; + continue; + } + } + // `left_inner` is the leftmost atom — must be a string literal + // whose constant value starts with `redirect:`. + if !matches!(left_inner.kind(), "string_literal" | "string") { + return None; + } + let lit = text_of(left_inner, code)?; + if lit.len() < 2 { + return None; + } + let inner = &lit[1..lit.len() - 1]; + if !inner.starts_with("redirect:") { + return None; + } + break; + } + + // Collect identifiers referenced anywhere in the original concat + // expression — the tainted URL piece is one of them. Receiver-style + // method calls (`view.toString()`) are intentionally captured via + // the bare identifier; precision improvements are deferred to the + // SSA / abstract-string layer. + let mut concat_uses: Vec = Vec::new(); + collect_idents(cur, code, &mut concat_uses); + if concat_uses.is_empty() { + return None; + } + + let span = (ast.start_byte(), ast.end_byte()); + let ord = *call_ordinal; + *call_ordinal += 1; + + let mut labels: smallvec::SmallVec<[DataLabel; 2]> = smallvec::SmallVec::new(); + labels.push(DataLabel::Sink(Cap::OPEN_REDIRECT)); + + let n = g.add_node(NodeInfo { + kind: StmtKind::Call, + call: CallMeta { + callee: Some("__spring_redirect__".to_string()), + arg_uses: vec![concat_uses.clone()], + call_ordinal: ord, + sink_payload_args: Some(vec![0]), + ..Default::default() + }, + taint: TaintMeta { + labels, + uses: concat_uses, + ..Default::default() + }, + ast: AstMeta { + span, + enclosing_func: enclosing_func.map(|s| s.to_string()), + }, + ..Default::default() + }); + connect_all(g, preds, n, EdgeKind::Seq); + Some(n) +} + +/// Prototype-pollution suppression decisions for the synthetic +/// `__index_set__` node emitted by `try_lower_subscript_write`. +/// +/// Returns `true` when the assignment is provably safe and the +/// `Cap::PROTOTYPE_POLLUTION` sink label should be elided. The three +/// CFG-layer recognised shapes are flow-insensitive AST patterns: +/// +/// 1. Constant string key whose value is not one of the dangerous +/// keys (`__proto__`, `constructor`, `prototype`). A literal-keyed +/// write cannot pollute even if the value is tainted. +/// 2. Reject pattern `if (idx === "__proto__" || idx === "constructor" +/// || idx === "prototype") ` enclosing the +/// assignment. The dangerous-key path terminates before reaching +/// the synthesised store. +/// 3. Allowlist pattern `if (idx === "name" || idx === "id") { obj[idx] +/// = v }`. The assignment only executes when `idx` is one of a +/// small set of known-safe constants. +/// +/// The null-prototype receiver suppression (`Object.create(null)`) is +/// handled flow-sensitively in the SSA taint engine via +/// `TypeKind::NullPrototypeObject`, since AST scans cannot honour +/// branch-local re-bindings or phi joins. +/// +/// Conservative: any unrecognised shape returns `false` so the sink +/// label is attached and the SSA layer decides on taint reachability. +fn pp_should_suppress_index_set( + assign_ast: Node, + subscript_node: Node, + _arr_text: &str, + idx_text: &str, + code: &[u8], +) -> bool { + // 1. Constant-key fold. + if let Some(idx_node) = subscript_node + .child_by_field_name("index") + .or_else(|| subscript_node.child_by_field_name("subscript")) + .or_else(|| { + let mut cur = subscript_node.walk(); + subscript_node.named_children(&mut cur).nth(1) + }) + { + if let Some(literal) = pp_string_literal_value(idx_node, code) { + return !pp_is_dangerous_proto_key(&literal); + } + } + + // 2 + 3. Dominator-style guard ancestors (reject + allowlist). + if pp_is_guarded_by_proto_check(assign_ast, idx_text, code) { + return true; + } + + false +} + +/// Dangerous prototype-pollution key strings. Matches the literal +/// values that JS engines treat as references into the prototype chain. +fn pp_is_dangerous_proto_key(s: &str) -> bool { + matches!(s, "__proto__" | "constructor" | "prototype") +} + +/// Extract the value of a JS/TS string literal node, stripping the +/// outer quote bytes (single, double, or backtick). Returns `None` +/// for non-literal nodes, template literals containing interpolation, +/// or anything that doesn't resemble a single-segment string. +fn pp_string_literal_value(n: Node, code: &[u8]) -> Option { + let kind = n.kind(); + if !matches!(kind, "string" | "string_literal" | "template_string") { + return None; + } + let raw = std::str::from_utf8(&code[n.start_byte()..n.end_byte()]).ok()?; + if raw.len() < 2 { + return None; + } + let bytes = raw.as_bytes(); + let first = bytes[0]; + let last = bytes[bytes.len() - 1]; + if !matches!(first, b'"' | b'\'' | b'`') || first != last { + return None; + } + let inner = &raw[1..raw.len() - 1]; + // Reject template literals carrying `${...}` interpolation — we + // can't fold those to a single concrete value. + if first == b'`' && inner.contains("${") { + return None; + } + Some(inner.to_string()) +} + +/// Walk up from the assignment node looking for two structural guard +/// shapes: +/// +/// * **Reject pattern** — a *previous sibling* `if_statement` in any +/// enclosing block whose condition is `idx === DANGEROUS [|| …]` and +/// whose consequence terminates control flow (`return` / `throw` / +/// `break` / `continue`). The dangerous-key path never reaches the +/// subsequent assignment. +/// * **Allowlist pattern** — an *ancestor* `if_statement` whose +/// condition is `idx === SAFE [|| …]` and through whose consequence +/// the descendant flows. Only the safe-key arm reaches the +/// assignment. +/// +/// Both shapes must compare against the same key variable as the +/// synthetic `__index_set__` node. Stops at the enclosing function so +/// guards in an outer scope around a closure passed elsewhere don't +/// accidentally suppress inner assignments. +fn pp_is_guarded_by_proto_check(from: Node, idx_text: &str, code: &[u8]) -> bool { + let mut cur = from; + while let Some(parent) = cur.parent() { + match parent.kind() { + "function_declaration" + | "function" + | "function_expression" + | "arrow_function" + | "method_definition" + | "generator_function_declaration" + | "program" + | "source_file" => return false, + "if_statement" => { + if let Some(cond) = parent.child_by_field_name("condition") { + let consequence = parent.child_by_field_name("consequence"); + if let Some(verdict) = + pp_classify_proto_guard(cond, consequence, cur, idx_text, code) + { + return verdict; + } + } + } + _ => {} + } + + // Reject pattern: scan previous siblings in the parent block + // for `if (idx === DANGEROUS [|| …]) { return; }` shapes that + // dominate the assignment via early-return. + let mut sibling_cursor = parent.walk(); + for sibling in parent.named_children(&mut sibling_cursor) { + if sibling.start_byte() >= cur.start_byte() { + break; + } + if sibling.kind() != "if_statement" { + continue; + } + if pp_is_reject_pattern(sibling, idx_text, code) { + return true; + } + } + + cur = parent; + } + false +} + +/// True when `if_node` is `if (idx === DANGEROUS [|| idx === DANGEROUS] +/// …) { return; / throw …; / break; }` shaped — every disjunct +/// compares the named key variable to a dangerous prototype key, and +/// the consequence terminates control flow. +fn pp_is_reject_pattern(if_node: Node, idx_text: &str, code: &[u8]) -> bool { + let Some(cond) = if_node.child_by_field_name("condition") else { + return false; + }; + let consequence = if_node.child_by_field_name("consequence"); + let clauses = pp_split_or_clauses(cond); + if clauses.is_empty() { + return false; + } + for clause in &clauses { + let Some((var, lit)) = pp_extract_eq_compare(*clause, code) else { + return false; + }; + if var != idx_text || !pp_is_dangerous_proto_key(&lit) { + return false; + } + } + consequence.map(pp_block_terminates).unwrap_or(false) +} + +/// Decide whether an enclosing `if` clause around an `__index_set__` +/// statement constitutes a prototype-pollution guard. +/// +/// `cond` is the if's condition expression, `consequence` is the +/// optional consequence block, and `descendant` is the node on the +/// path from the if-statement down to the assignment (used to +/// distinguish "assignment lives inside the consequence" from +/// "assignment lives after the if"). `idx_text` is the textual key +/// variable used by the synthetic `__index_set__`. +/// +/// Returns `Some(true)` to suppress, `Some(false)` to keep the gate +/// (e.g. an unrelated guard), and `None` when the if-statement is +/// not a recognised guard so the walker continues outward. +fn pp_classify_proto_guard( + cond: Node, + consequence: Option, + descendant: Node, + idx_text: &str, + code: &[u8], +) -> Option { + let cond_clauses = pp_split_or_clauses(cond); + if cond_clauses.is_empty() { + return None; + } + + let mut all_against_idx = true; + let mut all_dangerous = true; + let mut all_safe = true; + for clause in &cond_clauses { + let (var, lit) = pp_extract_eq_compare(*clause, code)?; + if var != idx_text { + all_against_idx = false; + break; + } + let dangerous = pp_is_dangerous_proto_key(&lit); + if dangerous { + all_safe = false; + } else { + all_dangerous = false; + } + } + if !all_against_idx { + return None; + } + + let consequence_contains_descendant = consequence + .map(|c| pp_subtree_contains(c, descendant)) + .unwrap_or(false); + + // Allowlist pattern: every clause is `idx === SAFE` and the + // assignment lives inside the consequence (true arm). + if all_safe && consequence_contains_descendant { + return Some(true); + } + + // Reject pattern: every clause is `idx === DANGEROUS` and the + // consequence terminates control flow before reaching the + // assignment. Only suppress when the assignment is *outside* the + // consequence (i.e., follows the if). + if all_dangerous + && !consequence_contains_descendant + && consequence.map(pp_block_terminates).unwrap_or(false) + { + return Some(true); + } + + None +} + +/// True when `descendant` is identical to or transitively a child of +/// `root`. Identity is checked via byte-range equality because +/// tree-sitter `Node` doesn't implement `Eq` directly. +fn pp_subtree_contains(root: Node, descendant: Node) -> bool { + let dr = (descendant.start_byte(), descendant.end_byte()); + let rr = (root.start_byte(), root.end_byte()); + dr.0 >= rr.0 && dr.1 <= rr.1 +} + +/// True when `block` (typically an `if` consequence) terminates +/// control flow on every path: the last meaningful statement is a +/// return / throw / break / continue. Conservative — falls back to +/// `false` for empty blocks or anything non-trivial. +fn pp_block_terminates(block: Node) -> bool { + // Bare statement consequence (no braces): the if's consequence is + // the terminator itself. + if pp_is_terminator(block) { + return true; + } + if !matches!(block.kind(), "statement_block" | "block") { + return false; + } + let mut cursor = block.walk(); + let last_stmt = block.named_children(&mut cursor).last(); + match last_stmt { + Some(s) => pp_is_terminator(s), + None => false, + } +} + +/// True when `n` is a control-flow-ending statement: return / throw / +/// break / continue. +fn pp_is_terminator(n: Node) -> bool { + matches!( + n.kind(), + "return_statement" | "throw_statement" | "break_statement" | "continue_statement" + ) +} + +/// Split an expression by top-level `||` operators. Returns the +/// individual disjunct sub-expressions. Single (non-OR) expressions +/// yield a one-element vector. Walks `binary_expression` nodes whose +/// `operator` field is `||` and recurses into both sides. +fn pp_split_or_clauses<'a>(expr: Node<'a>) -> Vec> { + let mut out = Vec::new(); + pp_collect_or_clauses(expr, &mut out); + out +} + +fn pp_collect_or_clauses<'a>(expr: Node<'a>, out: &mut Vec>) { + let stripped = pp_unwrap_paren(expr); + if matches!(stripped.kind(), "binary_expression") { + let op = stripped + .child_by_field_name("operator") + .map(|o| o.kind()) + .unwrap_or(""); + if op == "||" { + if let Some(l) = stripped.child_by_field_name("left") { + pp_collect_or_clauses(l, out); + } + if let Some(r) = stripped.child_by_field_name("right") { + pp_collect_or_clauses(r, out); + } + return; + } + } + out.push(stripped); +} + +fn pp_unwrap_paren(n: Node) -> Node { + let mut cur = n; + while matches!(cur.kind(), "parenthesized_expression") { + match cur.named_child(0) { + Some(inner) => cur = inner, + None => break, + } + } + cur +} + +/// Extract `(var_text, literal_value)` from an equality comparison +/// `var === "literal"` / `var == "literal"` (and reversed forms). +/// Returns `None` for any other shape. +fn pp_extract_eq_compare(expr: Node, code: &[u8]) -> Option<(String, String)> { + let stripped = pp_unwrap_paren(expr); + if !matches!(stripped.kind(), "binary_expression") { + return None; + } + let op = stripped + .child_by_field_name("operator") + .map(|o| o.kind()) + .unwrap_or(""); + if !matches!(op, "===" | "==") { + return None; + } + let left = stripped.child_by_field_name("left")?; + let right = stripped.child_by_field_name("right")?; + let left = pp_unwrap_paren(left); + let right = pp_unwrap_paren(right); + if let (Some(lv), Some(rs)) = (text_of(left, code), pp_string_literal_value(right, code)) { + if matches!(left.kind(), "identifier" | "shorthand_property_identifier") { + return Some((lv, rs)); + } + } + if let (Some(rv), Some(ls)) = (text_of(right, code), pp_string_literal_value(left, code)) { + if matches!(right.kind(), "identifier" | "shorthand_property_identifier") { + return Some((rv, ls)); + } + } + None +} + /// Step 1 (`pre_emit_arg_source_nodes`): scan the AST, create Source nodes, /// wire them to `preds`, and return (effective_preds, synth_bindings, /// uses_only_synth_names). @@ -3682,6 +4250,21 @@ pub(super) fn build_sub<'a>( Vec::new() } else { + // Spring MVC `return "redirect:" + url` open-redirect + // synthetic-sink emission. When matched the synthetic + // call sequences between `preds` and the Return node. + let mut effective_preds: Vec = preds.to_vec(); + if let Some(synth) = try_lower_spring_redirect_return( + ast, + &effective_preds, + g, + lang, + code, + enclosing_func, + call_ordinal, + ) { + effective_preds = vec![synth]; + } let ret = push_node( g, StmtKind::Return, @@ -3692,7 +4275,7 @@ pub(super) fn build_sub<'a>( 0, analysis_rules, ); - connect_all(g, preds, ret, EdgeKind::Seq); + connect_all(g, &effective_preds, ret, EdgeKind::Seq); Vec::new() // terminates this path } } diff --git a/src/cfg/params.rs b/src/cfg/params.rs index 957a52f3..b07785c3 100644 --- a/src/cfg/params.rs +++ b/src/cfg/params.rs @@ -13,7 +13,7 @@ use tree_sitter::Node; /// of `build_cfg`. Returns the [`TypeKind::Dto`] carrying the /// per-field type map when the class is declared in the same file; /// returns `None` otherwise so callers can fall through to the -/// pre-Phase-6 behaviour (Object / Unknown). +/// generic Object / Unknown classification. fn lookup_dto_class(class_name: &str) -> Option { DTO_CLASSES.with(|cell| cell.borrow().get(class_name).cloned().map(TypeKind::Dto)) } @@ -27,7 +27,7 @@ fn lookup_dto_class(class_name: &str) -> Option { /// for the JS/TS object-pattern formal `({ a, b, c })`, the entry is /// `("a", None, ["b", "c"])`. Strictly additive: when the param is /// not a destructured pattern (or the language has no destructure -/// concept), behaviour is identical to the pre-Phase-5 names-only path. +/// concept), behaviour is identical to the names-only path. /// /// Closes the residual gap behind CVE-2026-25544 (PayloadCMS Drizzle /// SQL injection): a per-parameter taint probe that seeds only the diff --git a/src/cli.rs b/src/cli.rs index 64f98b10..460fa65f 100644 --- a/src/cli.rs +++ b/src/cli.rs @@ -49,6 +49,7 @@ impl Commands { match self { Commands::Scan { explain_engine, .. } => *explain_engine, Commands::List { .. } => true, + Commands::Rules { .. } => true, Commands::Config { action } => { matches!(action, ConfigAction::Show { .. } | ConfigAction::Path) } @@ -459,6 +460,12 @@ pub enum Commands { action: ConfigAction, }, + /// Browse the built-in rule registry (cap classes + per-language label rules) + Rules { + #[command(subcommand)] + action: RulesAction, + }, + /// Start the local web UI for browsing scan results Serve { /// Path to scan root (defaults to current directory) @@ -525,6 +532,36 @@ pub enum ConfigAction { }, } +#[derive(Subcommand)] +pub enum RulesAction { + /// List built-in rules + List { + /// Filter by language slug (e.g. javascript, java, python). Cap-class + /// entries (`language = "all"`) are always shown unless `--no-class` + /// is set. + #[arg(long)] + lang: Option, + + /// Filter by rule kind (`class`, `source`, `sink`, `sanitizer`). + #[arg(long)] + kind: Option, + + /// Show only the cap-class registry entries (one per vulnerability + /// class), suppressing per-language label rules. + #[arg(long, conflicts_with = "no_class")] + class_only: bool, + + /// Suppress cap-class registry entries (show only per-language label + /// rules and gated sinks). + #[arg(long)] + no_class: bool, + + /// Emit JSON instead of the human-readable table. + #[arg(long)] + json: bool, + }, +} + #[derive(Subcommand)] pub enum IndexAction { /// Build or update index for current project diff --git a/src/commands/mod.rs b/src/commands/mod.rs index 307deee0..5bdce09e 100644 --- a/src/commands/mod.rs +++ b/src/commands/mod.rs @@ -10,6 +10,7 @@ pub mod clean; pub mod config; pub mod index; pub mod list; +pub mod rules; pub mod scan; #[cfg(feature = "serve")] pub mod serve; @@ -352,6 +353,9 @@ pub fn handle_command( } } } + Commands::Rules { action } => { + self::rules::handle(action, config)?; + } Commands::Serve { path, port, diff --git a/src/commands/rules.rs b/src/commands/rules.rs new file mode 100644 index 00000000..f9d79a0d --- /dev/null +++ b/src/commands/rules.rs @@ -0,0 +1,248 @@ +//! `nyx rules` subcommand. +//! +//! Surfaces the rule registry from the terminal so users can enumerate +//! the same content that the dashboard's `/api/rules` endpoint and the +//! browser's Rules page show. The output composes built-in cap-class +//! entries (one per `Cap` with a canonical rule id), per-language label +//! rules (sink/source/sanitizer), gated sinks, and any custom rules +//! defined in the user's config. + +use crate::cli::RulesAction; +use crate::errors::NyxResult; +use crate::labels::{self, RuleInfo}; +use crate::utils::config::{Config, RuleKind}; +use console::style; + +pub fn handle(action: RulesAction, config: &Config) -> NyxResult<()> { + match action { + RulesAction::List { + lang, + kind, + class_only, + no_class, + json: as_json, + } => list( + config, + lang.as_deref(), + kind.as_deref(), + class_only, + no_class, + as_json, + ), + } +} + +fn list( + config: &Config, + lang_filter: Option<&str>, + kind_filter: Option<&str>, + class_only: bool, + no_class: bool, + as_json: bool, +) -> NyxResult<()> { + let mut rules = labels::enumerate_builtin_rules(); + + // Apply disabled-rules overlay so the CLI matches the dashboard view. + for rule in &mut rules { + if config.analysis.disabled_rules.contains(&rule.id) { + rule.enabled = false; + } + } + + // Append custom rules from config. Mirrors the projection in + // `src/server/routes/rules.rs::build_rule_list`. + for (cfg_lang, lang_cfg) in &config.analysis.languages { + let canonical = labels::canonical_lang(cfg_lang); + for cr in &lang_cfg.rules { + let kind_str = match cr.kind { + RuleKind::Source => "source", + RuleKind::Sanitizer => "sanitizer", + RuleKind::Sink => "sink", + }; + let id = labels::custom_rule_id(canonical, kind_str, &cr.matchers); + let first = cr.matchers.first().map(|s| s.as_str()).unwrap_or("?"); + let title = format!("{} (custom {})", first, kind_str); + let cap = cr.cap.to_cap(); + let enabled = !config.analysis.disabled_rules.contains(&id); + rules.push(RuleInfo { + id, + title, + language: canonical.to_string(), + kind: kind_str.to_string(), + cap: labels::cap_to_name(cap).to_string(), + cap_bits: cap.bits(), + matchers: cr.matchers.clone(), + case_sensitive: cr.case_sensitive, + is_custom: true, + is_gated: false, + is_class: false, + emission_active: true, + enabled, + }); + } + } + + // Filter. + let lang_filter_canonical = lang_filter.map(labels::canonical_lang); + rules.retain(|r| { + if class_only && !r.is_class { + return false; + } + if no_class && r.is_class { + return false; + } + if let Some(want) = lang_filter_canonical { + // Cap-class entries (`language == "all"`) are language-agnostic; + // surface them alongside any language filter unless explicitly + // suppressed via `--no-class`. + if r.language != want && r.language != "all" { + return false; + } + } + if let Some(want) = kind_filter + && !r.kind.eq_ignore_ascii_case(want) + { + return false; + } + true + }); + + if as_json { + let body = serde_json::to_string_pretty(&rules) + .map_err(|e| crate::errors::NyxError::Msg(format!("rules JSON serialise: {e}")))?; + println!("{body}"); + return Ok(()); + } + + if rules.is_empty() { + println!("{}", style("(no rules match the supplied filters)").dim()); + return Ok(()); + } + + // Header. + println!( + "{}", + style("Rules (built-in registry, per-language labels, and custom rules from config)") + .bold() + ); + println!(); + + // Cap-class section first, distinct from per-language entries. + let class_rules: Vec<&RuleInfo> = rules.iter().filter(|r| r.is_class).collect(); + if !class_rules.is_empty() { + println!(" {}", style("Vulnerability classes").cyan().bold()); + for r in &class_rules { + print_class_row(r); + } + println!(); + } + + let builtin_label_rules: Vec<&RuleInfo> = rules + .iter() + .filter(|r| !r.is_class && !r.is_custom) + .collect(); + if !builtin_label_rules.is_empty() { + println!(" {}", style("Built-in label rules").cyan().bold()); + for r in &builtin_label_rules { + print_label_row(r); + } + println!(); + } + + let custom_rules: Vec<&RuleInfo> = rules.iter().filter(|r| r.is_custom).collect(); + if !custom_rules.is_empty() { + println!(" {}", style("Custom rules (from config)").cyan().bold()); + for r in &custom_rules { + print_label_row(r); + } + println!(); + } + + println!( + "{}", + style(format!( + "{} class · {} built-in label · {} custom · {} total", + class_rules.len(), + builtin_label_rules.len(), + custom_rules.len(), + rules.len() + )) + .dim() + ); + + Ok(()) +} + +fn print_class_row(r: &RuleInfo) { + let status = if r.enabled { + style("on ").green().to_string() + } else { + style("off").red().dim().to_string() + }; + // Forward-declared classes (registered but not yet wired through + // `ast.rs::diag_for_finding`) carry a tag so users don't expect + // findings under the class id; live findings still surface under + // the legacy `taint-unsanitised-flow` rule id. + let tag = if r.emission_active { + String::new() + } else { + format!(" {}", style("(forward-declared)").yellow()) + }; + println!( + " {} {:<32} {} {}{}", + status, + style(&r.id).white().bold(), + style(format!("[{}]", r.cap)).dim(), + style(&r.title).dim(), + tag, + ); +} + +fn print_label_row(r: &RuleInfo) { + let status = if r.enabled { + style("on ").green().to_string() + } else { + style("off").red().dim().to_string() + }; + let tag = if r.is_custom { + style(" custom").yellow().to_string() + } else if r.is_gated { + style(" gated").magenta().to_string() + } else { + String::new() + }; + let matchers = if r.matchers.is_empty() { + String::new() + } else { + let joined = r.matchers.join(", "); + format!(" — {joined}") + }; + println!( + " {} {:<10} {:<10} {:<14}{}{}", + status, + style(&r.language).cyan(), + style(&r.kind).white(), + style(&r.cap).dim(), + tag, + style(matchers).dim(), + ); +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::utils::config::Config; + + #[test] + fn list_runs_without_panic_default_config() { + let cfg = Config::default(); + // Plain list, no filters. + list(&cfg, None, None, false, false, false).unwrap(); + // Class-only. + list(&cfg, None, None, true, false, false).unwrap(); + // JSON output. + list(&cfg, None, None, false, false, true).unwrap(); + // Lang + kind filters. + list(&cfg, Some("javascript"), Some("sink"), false, true, false).unwrap(); + } +} diff --git a/src/commands/scan.rs b/src/commands/scan.rs index 0ccffcb1..fdce5a6f 100644 --- a/src/commands/scan.rs +++ b/src/commands/scan.rs @@ -544,14 +544,14 @@ pub(crate) fn deduplicate_taint_flows(diags: &mut Vec) { id.starts_with(TAINT_BASE) } - fn sink_cap_bits(d: &Diag) -> u16 { + fn sink_cap_bits(d: &Diag) -> u32 { d.evidence.as_ref().map(|e| e.sink_caps).unwrap_or(0) } // Group candidates by (path, line, severity, sink_cap_bits). Only // `taint-unsanitised-flow` rule IDs participate; findings with other // bases (e.g. `js.code_exec.eval`) are left untouched per guardrails. - let mut groups: HashMap<(String, usize, Severity, u16), Vec> = HashMap::new(); + let mut groups: HashMap<(String, usize, Severity, u32), Vec> = HashMap::new(); for (i, d) in diags.iter().enumerate() { if is_taint_flow(&d.id) { groups @@ -690,8 +690,8 @@ pub const SCC_UNCONVERGED_CROSS_FILE_NOTE_PREFIX: &str = "scc_unconverged:cross- /// file set. Semantics match [`diff_cap_snapshots`], a key that /// appears or disappears counts as changed. fn changed_cap_keys_of( - before: &HashMap)>, - after: &HashMap)>, + before: &HashMap)>, + after: &HashMap)>, ) -> HashSet { let mut changed = HashSet::new(); for (k, v_after) in after { @@ -971,10 +971,10 @@ fn run_topo_batches( // with a 64-iter budget; the classifier only needs the tail. let mut delta_trajectory: smallvec::SmallVec<[u32; 4]> = smallvec::SmallVec::new(); - // Phase-B worklist: files to re-analyse in this iteration. + // SCC fixpoint worklist: files to re-analyse in this iteration. // Initialised to the full batch so iteration 0 behaves like - // the pre-Phase-B implementation; subsequent iterations - // prune to files containing a caller of a changed summary. + // the unconditional re-analysis; subsequent iterations prune + // to files containing a caller of a changed summary. // // Storing `PathBuf` clones (matching how the rest of the // SCC loop identifies files) so membership tests are cheap diff --git a/src/constraint/domain.rs b/src/constraint/domain.rs index f17758df..06bc82c3 100644 --- a/src/constraint/domain.rs +++ b/src/constraint/domain.rs @@ -113,22 +113,22 @@ impl ConstValue { // ── TypeSet ───────────────────────────────────────────────────────────── -/// Bitset over [`TypeKind`] variants (12 bits used of u16). +/// Bitset over [`TypeKind`] variants (19 bits used of u32). #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] -pub struct TypeSet(u16); +pub struct TypeSet(u32); impl TypeSet { - /// All 12 type bits set, no type constraint (Top). - pub const TOP: Self = Self(0x0FFF); + /// All 19 type bits set, no type constraint (Top). + pub const TOP: Self = Self(0x0007_FFFF); /// No type bits, unsatisfiable (Bottom). pub const BOTTOM: Self = Self(0); pub fn singleton(kind: &TypeKind) -> Self { - Self(1u16 << type_kind_index(kind)) + Self(1u32 << type_kind_index(kind)) } pub fn contains(&self, kind: &TypeKind) -> bool { - self.0 & (1u16 << type_kind_index(kind)) != 0 + self.0 & (1u32 << type_kind_index(kind)) != 0 } /// Meet (intersection): refine type knowledge. @@ -156,7 +156,7 @@ impl TypeSet { /// Check if this set contains exactly one type matching the given kind. pub fn is_singleton_of(&self, kind: &TypeKind) -> bool { - self.0 != 0 && self.0 == (1u16 << type_kind_index(kind)) + self.0 != 0 && self.0 == (1u32 << type_kind_index(kind)) } /// Return the TypeKind if this is a singleton set (exactly one type). @@ -186,12 +186,21 @@ fn type_kind_index(kind: &TypeKind) -> u32 { TypeKind::LocalCollection => 12, TypeKind::RequestBuilder => 13, TypeKind::JpaCriteriaQuery => 14, + TypeKind::LdapClient => 15, + TypeKind::XPathClient => 16, + TypeKind::XmlParser => 17, + TypeKind::Template => 18, // the analysis DTO types carry per-field structural info that the // bitset domain can't represent. Collapse to Unknown so callers // still see "any type possible" rather than crashing on an // unhandled variant. Same-file/cross-file Dto-aware paths read // the structured TypeKind directly, not via this index. TypeKind::Dto(_) => 6, + // NullPrototypeObject is a JS-only sub-kind of Object used for + // flow-sensitive prototype-pollution suppression. The bitset + // domain has no dedicated slot, share the Object index so + // singleton recovery still maps to a meaningful TypeKind. + TypeKind::NullPrototypeObject => 3, } } @@ -212,6 +221,10 @@ fn type_kind_from_index(idx: u32) -> Option { 12 => Some(TypeKind::LocalCollection), 13 => Some(TypeKind::RequestBuilder), 14 => Some(TypeKind::JpaCriteriaQuery), + 15 => Some(TypeKind::LdapClient), + 16 => Some(TypeKind::XPathClient), + 17 => Some(TypeKind::XmlParser), + 18 => Some(TypeKind::Template), _ => None, } } @@ -801,7 +814,7 @@ pub struct PathEnv { /// Per-key meet count for widening decisions. meet_counts: SmallVec<[(SsaValue, u8); 8]>, /// Refinement counter (bounded per block). - refine_count: u16, + refine_count: u32, } impl PathEnv { @@ -837,7 +850,7 @@ impl PathEnv { if self.unsat { return; } - if self.refine_count >= MAX_REFINE_PER_BLOCK as u16 { + if self.refine_count >= MAX_REFINE_PER_BLOCK as u32 { return; // bounded } let canonical = self.uf.find_immutable(v); @@ -860,7 +873,7 @@ impl PathEnv { // but `refine_single` is also invoked directly from `assume_eq`, // `assume_neq`, and a few internal sites. Large generated inputs // (thousands of short statements on one line) can drive millions - // of calls and overflow a plain u16 `refine_count`. Saturate to + // of calls and overflow a plain u32 `refine_count`. Saturate to // stay within bounds, the refinement pipeline is already // idempotent past the cap, so saturation is semantically a no-op. self.refine_count = self.refine_count.saturating_add(1); diff --git a/src/constraint/solver.rs b/src/constraint/solver.rs index 5cbcf6e8..1d9d3b67 100644 --- a/src/constraint/solver.rs +++ b/src/constraint/solver.rs @@ -250,6 +250,31 @@ pub fn class_name_to_type_kind(name: &str) -> Option { // Java I/O supertypes (enables hierarchy fallback for subtypes) | "InputStream" | "OutputStream" | "Reader" | "Writer" | "PrintWriter" | "BufferedInputStream" | "BufferedOutputStream" => Some(TypeKind::FileHandle), + // JNDI / Spring LDAP directory-service types. Field- and method-typed + // declarations (`DirContext ctx = ...`, `LdapTemplate ldapTemplate;`) + // attach this fact to the receiver SSA value so type-qualified + // resolution rewrites `ctx.search(...)` → `LdapClient.search`. + "DirContext" | "LdapContext" | "InitialDirContext" | "InitialLdapContext" + | "LdapTemplate" => Some(TypeKind::LdapClient), + // JAXP XML parser instances. Field/local declarations like + // `DocumentBuilder builder = factory.newDocumentBuilder();` route + // through this map so the receiver SSA value carries + // `TypeKind::XmlParser` and the type-qualified + // `XmlParser.parse` rule fires on `builder.parse(...)`. + "DocumentBuilder" | "SAXParser" | "XMLReader" | "SAXBuilder" => { + Some(TypeKind::XmlParser) + } + // JAXP XPath instances. `XPath xpath = factory.newXPath();` + // routes through this map so the receiver carries + // `TypeKind::XPathClient`, enabling the type-qualified + // `XPathClient.evaluate` resolution and the resolver-binding + // suppression sidecar. + "XPath" | "XPathExpression" => Some(TypeKind::XPathClient), + // Apache FreeMarker `Template` declared receiver type. Routes + // `Template tpl = ...; tpl.process(model, out)` through + // type-qualified resolution to `Template.process`, the SSTI + // sink defined in `labels/java.rs`. + "Template" => Some(TypeKind::Template), // Python qualified type names. // Only covers raw lowered names from isinstance(). The lowering in lower.rs // extracts the literal type text: isinstance(x, requests.Session) produces diff --git a/src/database.rs b/src/database.rs index fd7885c2..6d48120b 100644 --- a/src/database.rs +++ b/src/database.rs @@ -225,7 +225,17 @@ pub mod index { /// * `"3"`, `ssa_function_bodies.body` changed from JSON TEXT to /// bincode BLOB. Old JSON payloads cannot be deserialised by the /// new engine, so they are silently rebuilt on open. - pub const SCHEMA_VERSION: &str = "3"; + /// * `"4"`, `Cap` widened from u16 to u32 to accommodate cap bits + /// ≥ 14 (LDAP_INJECTION, XPATH_INJECTION, HEADER_INJECTION, + /// OPEN_REDIRECT, SSTI, XXE, PROTOTYPE_POLLUTION). The `Cap` + /// deserialiser accepts both u16- and u32-width JSON values, so + /// pre-bump caches load without crashing, but the cached + /// `source_caps` / `sanitizer_caps` / `sink_caps` blobs were + /// produced before any of these caps could appear and would + /// underreport rules that emit them. Bumping forces a rescan so + /// newly-emitted gates and sinks land in the cache with the wider + /// footprint. + pub const SCHEMA_VERSION: &str = "4"; // TODO: ADD CLEANS FOR EACH TABLE BASED ON PROJECT WHICH RUNS ON CLEAN // TODO: ADD DROP AND GIVE A CLI PARAMETER FOR DROP @@ -2899,6 +2909,8 @@ fn make_test_callee_body( type_facts: crate::ssa::type_facts::TypeFactResult { facts: std::collections::HashMap::new(), }, + xml_parser_config: crate::ssa::xml_config::XmlParserConfigResult::default(), + xpath_config: crate::ssa::xpath_config::XPathConfigResult::default(), alias_result: crate::ssa::alias::BaseAliasResult::empty(), points_to: crate::ssa::heap::PointsToResult::empty(), module_aliases: std::collections::HashMap::new(), @@ -3765,7 +3777,7 @@ fn metadata_table_survives_clear() { /// receiver sentinel (`u32::MAX`), the container-element marker /// (``), and the `overflow` flag across serialise → store → /// load → deserialise. This is the strict-additive contract for -/// pre-Phase-5 blobs (default-empty deserialises cleanly) and the +/// older blobs without field_points_to (default-empty deserialises cleanly) and the /// completeness check for the W3 cross-call resolver. #[test] fn ssa_summaries_round_trip_preserves_field_points_to() { @@ -3840,15 +3852,15 @@ fn ssa_summaries_round_trip_preserves_field_points_to() { assert!(!sum.field_points_to.overflow); } -/// Pre-Phase-5 blob compatibility: a summary serialised without +/// Older blob compatibility: a summary serialised without /// `field_points_to` deserialises with the empty default, no /// migration needed because the field is `#[serde(default)]`. #[test] -fn ssa_summaries_pre_phase5_blob_decodes_with_empty_field_points_to() { +fn ssa_summaries_legacy_blob_decodes_with_empty_field_points_to() { use crate::summary::ssa_summary::SsaFuncSummary; // Hand-craft JSON without the `field_points_to` key. - let pre_phase5_json = r#"{ + let legacy_json = r#"{ "param_to_return": [], "param_to_sink": [], "source_caps": 0, @@ -3865,7 +3877,7 @@ fn ssa_summaries_pre_phase5_blob_decodes_with_empty_field_points_to() { "return_path_facts": [], "typed_call_receivers": [] }"#; - let sum: SsaFuncSummary = serde_json::from_str(pre_phase5_json).unwrap(); + let sum: SsaFuncSummary = serde_json::from_str(legacy_json).unwrap(); assert!( sum.field_points_to.is_empty(), "missing field_points_to must default to empty", diff --git a/src/evidence.rs b/src/evidence.rs index d46785c3..4c2df575 100644 --- a/src/evidence.rs +++ b/src/evidence.rs @@ -217,15 +217,15 @@ pub struct Evidence { #[serde(default, skip_serializing_if = "Option::is_none")] pub symbolic: Option, - /// Resolved sink capability bits (u16 from `Cap::bits()`). + /// Resolved sink capability bits (u32 from `Cap::bits()`). /// /// Used by deduplication to distinguish findings that share a /// `(path, line, severity)` key but target different sinks (e.g. /// `sink_sql(x); sink_shell(x);` on the same line). 0 when the sink /// caps could not be resolved at the CFG node (e.g. pure summary /// resolution where the caller's sink node carries no label). - #[serde(default, skip_serializing_if = "is_zero_u16")] - pub sink_caps: u16, + #[serde(default, skip_serializing_if = "is_zero_cap_bits")] + pub sink_caps: u32, /// Engine provenance notes attached to this finding (e.g. "worklist /// iteration budget was hit before convergence"), propagated from @@ -243,7 +243,7 @@ pub struct Evidence { pub data_exfil_field: Option, } -fn is_zero_u16(v: &u16) -> bool { +fn is_zero_cap_bits(v: &u32) -> bool { *v == 0 } diff --git a/src/labels/c.rs b/src/labels/c.rs index 31222bf2..13c95db7 100644 --- a/src/labels/c.rs +++ b/src/labels/c.rs @@ -67,6 +67,30 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::SSRF), case_sensitive: false, }, + // ─── LDAP injection sinks ─── + // + // OpenLDAP / libldap surface: `ldap_search_s(ld, base, scope, filter, ...)` + // and the asynchronous variant `ldap_search_ext_s(ld, base, scope, filter, + // attrs, attrsonly, serverctrls, clientctrls, timeout, sizelimit, *res)`. + // The filter argument (position 3) is the LDAP-injection vector. No + // standard libldap escape helper exists in the C surface; sanitisation is + // typically caller-implemented (`sanitize_*` covers the developer-named + // case via the existing prefix rule above). + LabelRule { + matchers: &["ldap_search_s", "ldap_search_ext_s"], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── XPath injection sinks ─── + // + // libxml2 evaluation entry points: `xmlXPathEvalExpression(expr, ctx)`, + // `xmlXPathEval(expr, ctx)`, `xmlXPathCompile(expr)`. The expression + // string is arg 0 and is the canonical XPath-injection vector. + LabelRule { + matchers: &["xmlXPathEvalExpression", "xmlXPathEval", "xmlXPathCompile"], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: false, + }, ]; /// Gated sinks for C. diff --git a/src/labels/cpp.rs b/src/labels/cpp.rs index 43ee9119..f2285a84 100644 --- a/src/labels/cpp.rs +++ b/src/labels/cpp.rs @@ -89,6 +89,24 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::SSRF), case_sensitive: false, }, + // ─── LDAP injection sinks ─── + // + // OpenLDAP / libldap C interface (also used from C++ wrappers): the filter + // argument carries attacker-controlled data unless explicitly escaped. + LabelRule { + matchers: &["ldap_search_s", "ldap_search_ext_s"], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── XPath injection sinks ─── + // + // libxml2 (the dominant C++ XML parser surface): `xmlXPathEvalExpression`, + // `xmlXPathEval`, `xmlXPathCompile` accept the expression string as arg 0. + LabelRule { + matchers: &["xmlXPathEvalExpression", "xmlXPathEval", "xmlXPathCompile"], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: false, + }, ]; /// Gated sinks for C++. diff --git a/src/labels/go.rs b/src/labels/go.rs index ba2ce7ea..043eed40 100644 --- a/src/labels/go.rs +++ b/src/labels/go.rs @@ -148,6 +148,97 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::CRYPTO), case_sensitive: false, }, + // ─── LDAP injection sinks ─── + // + // go-ldap (`github.com/go-ldap/ldap/v3`): `conn, _ := ldap.DialURL(url); + // req := ldap.NewSearchRequest(base, scope, deref, sizeLimit, timeLimit, + // typesOnly, filter, attrs, controls)`. The filter argument (position 6) + // is the LDAP-injection vector; passing the request to `conn.Search(req)` + // executes the filter. Type-qualified resolution rewrites `conn.Search` + // → `LdapClient.Search` when the receiver was returned by + // `ldap.DialURL` / `ldap.Dial` / `ldap.DialTLS` (see + // [`crate::ssa::type_facts::constructor_type`]). We also tag + // `ldap.NewSearchRequest` directly so taint reaching the filter argument + // surfaces at the construction call (matches the typical FP-free shape + // where the request is built once and passed straight to `Search`). + LabelRule { + matchers: &[ + "LdapClient.Search", + "LdapClient.SearchWithPaging", + "ldap.NewSearchRequest", + ], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── LDAP-filter sanitizer ─── + // + // go-ldap exposes `ldap.EscapeFilter(s string) string` (RFC 4515 metachar + // escaping). Treat any call as clearing the LDAP_INJECTION cap. + LabelRule { + matchers: &["ldap.EscapeFilter"], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── Header / CRLF injection sinks ─── + // + // `net/http` `ResponseWriter.Header()` returns a `Header` map; calls to + // `Set(name, val)` / `Add(name, val)` write a single header value. + // After paren-group stripping the chain text becomes + // `w.Header.Set` / `w.Header.Add`, so suffix matchers on `Header.Set` / + // `Header.Add` cover both the bound-receiver form (`w.Header().Set(...)`) + // and the documentation-style class-qualified form (`Header.Set`). + // Tainted strings without `\r\n` stripping enable response splitting. + LabelRule { + matchers: &["Header.Set", "Header.Add"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: true, + }, + // ─── Header / CRLF sanitizers ─── + // + // Project-local `stripCRLF` / `escapeHeader` helpers that strip `\r` and + // `\n` from a value before it is written to a response header. + LabelRule { + matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Open redirect sinks ─── + // + // `net/http` `http.Redirect(w, r, url, code)` writes a `Location` header + // and a 3xx status from the supplied URL. Without an allowlist check, + // a tainted `url` is the canonical Go open-redirect vector. + LabelRule { + matchers: &["http.Redirect"], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + LabelRule { + matchers: &[ + "validateRedirectUrl", + "isSafeRedirect", + "stripScheme", + "ensureRelativeUrl", + "assertRelativePath", + "isRelativeUrl", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── SSTI sinks ─── + // + // `text/template` and `html/template` parse a template source string via + // `template.New(name).Parse(src)`. After paren-group stripping the chain + // text becomes `template.New.Parse`, so the suffix matcher catches both + // packages (`text/template`, `html/template`) regardless of import alias. + // `template.ParseFiles` / `ParseGlob` take file paths (path-traversal, + // not SSTI) and are intentionally excluded. `html/template`'s auto- + // escaping applies during `Execute`, not `Parse`, so a tainted source + // string still yields SSTI. + LabelRule { + matchers: &["template.New.Parse"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + }, ]; /// Argument-role-aware Go sinks. Two classes coexist on the outbound HTTP diff --git a/src/labels/java.rs b/src/labels/java.rs index f4a6a760..72176e96 100644 --- a/src/labels/java.rs +++ b/src/labels/java.rs @@ -1,4 +1,6 @@ -use crate::labels::{Cap, DataLabel, Kind, LabelRule, ParamConfig, RuntimeLabelRule}; +use crate::labels::{ + Cap, DataLabel, GateActivation, Kind, LabelRule, ParamConfig, RuntimeLabelRule, SinkGate, +}; use crate::utils::project::{DetectedFramework, FrameworkContext}; use phf::{Map, phf_map}; @@ -265,6 +267,223 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::CODE_EXEC), case_sensitive: false, }, + // ─── LDAP injection sinks ─── + // + // JNDI / Spring LDAP search APIs accept an attacker-influenceable filter + // expression as either the second positional argument (`DirContext.search(name, + // filter, controls)` / `LdapTemplate.search(base, filter, mapper)`). Without + // RFC 4515 escaping the filter can be rewritten to bypass authentication or + // exfiltrate directory entries. Type-qualified resolution rewrites + // `ctx.search(...)` → `LdapClient.search` when the receiver carries a + // `TypeKind::LdapClient` fact (set by `class_name_to_type_kind` for the + // declared types `DirContext`, `InitialDirContext`, `LdapContext`, + // `LdapTemplate`, or by `constructor_type` for `new InitialDirContext(...)` + // / `new InitialLdapContext(...)`). Direct flat matchers cover the + // documentation-style class-qualified call forms that bypass receiver + // typing. + LabelRule { + matchers: &[ + "LdapClient.search", + "LdapClient.searchByEntity", + "LdapClient.searchForObject", + "LdapClient.searchForContext", + "DirContext.search", + "LdapTemplate.search", + "LdapTemplate.searchByEntity", + "LdapTemplate.searchForObject", + "LdapTemplate.searchForContext", + "ctx.search", + ], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── LDAP-filter sanitizers ─── + // + // Spring LDAP's `LdapEncoder.filterEncode(s)` applies RFC 4515 escaping to + // metacharacters (`\`, `*`, `(`, `)`, ``). `nameEncode` performs the + // companion DN-component escaping. Both fully clear the LDAP_INJECTION + // cap; downstream sinks see a sanitised value. + LabelRule { + matchers: &["LdapEncoder.filterEncode", "LdapEncoder.nameEncode"], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── XPath injection sinks ─── + // + // `javax.xml.xpath.XPath.evaluate(expr, source, ...)` and the matching + // `XPathExpression.evaluate(source)` accept an attacker-influenceable + // expression string. Without parameterisation via + // `XPathVariableResolver` the expression can be rewritten to bypass + // authentication or exfiltrate document subtrees. `XPath.compile(expr)` + // is the equivalent pre-compile entry point. Direct flat matchers cover + // the documentation-style class-qualified call forms. + LabelRule { + matchers: &[ + "XPath.evaluate", + "XPath.compile", + "XPathExpression.evaluate", + "xpath.evaluate", + "xpath.compile", + ], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── XPath escape sanitizers ─── + // + // OWASP ESAPI's `Encoder.encodeForXPath(s)` escapes the XPath + // metacharacters (`'`, `"`, `[`, `]`, `(`, `)`, `,`, `=`, `<`, `>`, + // `*`). Project-local `xpathEscape` / `escapeXpath` are the common + // developer-named equivalents. + LabelRule { + matchers: &["Encoder.encodeForXPath", "xpathEscape", "escapeXpath"], + label: DataLabel::Sanitizer(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // Parameterised XPath via `XPath.setXPathVariableResolver(resolver)` + // suppression is implemented as a receiver-config sidecar in + // [`crate::ssa::xpath_config::XPathConfigResult`]: a + // `setXPathVariableResolver` call on a receiver carrying + // `TypeKind::XPathClient` flips the receiver's `has_resolver` flag, + // and the SSA sink-emission site strips `Cap::XPATH_INJECTION` from + // any later `xpath.evaluate(taintedExpr, ...)` whose receiver is + // provably bound. No flat sanitizer rule is needed (and a + // name-only rule would clear the wrong call site). + // ─── Header / CRLF injection sinks ─── + // + // `HttpServletResponse.setHeader(name, val)` / `addHeader(name, val)` + // accept a single header value; tainted strings without `\r\n` stripping + // let an attacker inject extra headers (response splitting). + // `addCookie(c)` carries a `Cookie` whose constructor takes a value + // string; track at the higher-level setHeader / addHeader entry points. + LabelRule { + matchers: &["setHeader", "addHeader", "addCookie"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF sanitizers ─── + LabelRule { + matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Open redirect sinks ─── + // + // Servlet API: `HttpServletResponse.sendRedirect(url)`. Spring MVC + // controllers can also return a `"redirect:"` prefixed string but that + // sink shape is not modelled here. + LabelRule { + matchers: &["sendRedirect"], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + LabelRule { + matchers: &[ + "validateRedirectUrl", + "isSafeRedirect", + "stripScheme", + "ensureRelativeUrl", + "assertRelativePath", + "isRelativeUrl", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── SSTI sinks ─── + // + // Apache FreeMarker `Template.process(model, writer)` renders an + // already-parsed template; the SSTI vector is when the template source + // is attacker-influenced (e.g. `new Template(name, new StringReader(src), cfg)`). + // The flat matcher fires only when the receiver chain text resolves to + // `Template.process` — typically through a `Template`-typed declared + // receiver routed via type-qualified resolution. Without a `Template` + // TypeKind, idiomatic `Template tpl = new Template(...); tpl.process(...)` + // shapes are not recognised; tracked under deferred phases. + // + // Apache Velocity `Velocity.evaluate(ctx, writer, tag, src)` is modelled + // as a gated sink in `GATED_SINKS` below so only the template-source + // arg (index 3) activates SSTI; tainted variables in the `ctx` arg + // (data) stay clean. + LabelRule { + matchers: &["Template.process"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: true, + }, + // ─── XXE sinks ─── + // + // Java's stock XML parsers (JAXP) are XXE-vulnerable by default: the + // factories ship with external-entity / DTD resolution enabled and only + // become safe after `setFeature(FEATURE_SECURE_PROCESSING, true)` / + // disabling `external-general-entities` / `external-parameter-entities`. + // Tainted XML reaching any of these parser entry points is treated as + // an XXE flow; a config-check sanitizer pass (Phase XXE Layer 2) is + // out of scope for this rule and is the follow-up listed in + // `.pitboss/play/deferred.md`. + // + // Class-qualified suffix matching covers both the documentation-style + // `javax.xml.parsers.DocumentBuilder.parse(...)` form and the bound- + // receiver `XmlParser.parse(...)` form (when the receiver's TypeKind + // resolves to `XmlParser`). Bare `parse` is intentionally avoided to + // prevent collisions with `Integer.parseInt`, `LocalDate.parse`, + // generic JSON parsers, etc. + LabelRule { + matchers: &[ + "DocumentBuilder.parse", + "SAXParser.parse", + "XMLReader.parse", + "SAXBuilder.build", + "XmlParser.parse", + "XmlParser.build", + ], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + }, + // ─── XXE config-setter sanitizers ─── + // + // Phase 07: a JAXP `setFeature(...)` / `setExpandEntityReferences(...)` + // call is itself a label-level Sanitizer for `Cap::XXE` so that the + // *call's return value* (rare but exists for fluent factory APIs) + // does not carry XXE through it. The real load-bearing suppression + // is the receiver-fact path in + // [`crate::ssa::xml_config::XmlParserConfigResult`], which the SSA + // sink emission consults at every parse-class sink site. This rule + // is conservative noise reduction for downstream sinks that consume + // the setter call's value. + LabelRule { + matchers: &[ + "setFeature", + "setExpandEntityReferences", + "setXIncludeAware", + "setValidating", + ], + label: DataLabel::Sanitizer(Cap::XXE), + case_sensitive: true, + }, +]; + +/// Java gated sinks. Argument-position-aware classification for callees +/// where the SSTI activation is restricted to the template-source arg +/// rather than every positional argument. +pub static GATED_SINKS: &[SinkGate] = &[ + // Apache Velocity static API: `Velocity.evaluate(ctx, writer, logTag, src)`. + // Arg 3 carries the inline template source; tainted text at that + // position is SSTI. Tainted data in the context (arg 0) is rendered + // through Velocity's escape policy, not parsed as template source, so + // those flows must not activate SSTI. Activation is unconditional; + // payload_args narrows the cap to the template-source position. + SinkGate { + callee_matcher: "Velocity.evaluate", + arg_index: 3, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: true, + payload_args: &[3], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, ]; pub static KINDS: Map<&'static str, Kind> = phf_map! { diff --git a/src/labels/javascript.rs b/src/labels/javascript.rs index a2dba01c..1ebe5b3d 100644 --- a/src/labels/javascript.rs +++ b/src/labels/javascript.rs @@ -310,6 +310,178 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::SQL_QUERY), case_sensitive: true, }, + // ─── LDAP injection sinks ─── + // + // `ldapjs`: both the bound-variable idiom + // `const client = ldap.createClient({...}); client.search(...)` and the + // chained idiom `ldap.createClient({...}).search(...)` are covered by + // type-qualified receiver resolution. The receiver of the inner call is + // typed `TypeKind::LdapClient` via `ssa::type_facts::constructor_type`, + // and (for the bound-variable form) closure-captured types are forwarded + // into the per-function type-fact result by + // [`crate::taint::inject_external_type_facts`], so the qualified callee + // text resolves to `LdapClient.search` in both shapes. + LabelRule { + matchers: &["LdapClient.search"], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── LDAP-filter sanitizers ─── + // + // The `ldap-escape` package exports `filter` and `dn` tagged-template + // helpers (`filter`\`(uid=${input})\``). After tree-sitter lifts the + // template-tag identifier, the callee text is the function name; suffix + // matching on `ldapEscape` / `ldapescape` covers `const ldapEscape = + // require('ldap-escape')` plus default-import shapes. + LabelRule { + matchers: &[ + "ldapEscape", + "ldap-escape", + "ldapescape.filter", + "ldapescape.dn", + ], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── XPath injection sinks ─── + // + // `document.evaluate(expr, contextNode, ...)` (DOM) and the npm `xpath` + // package's `xpath.select(expr, doc)` / `xpath.evaluate(expr, doc, ...)` + // accept the expression string as arg 0; concatenated user input there + // is the canonical XPath-injection vector. + LabelRule { + matchers: &[ + "document.evaluate", + "xpath.select", + "xpath.evaluate", + "xpath.select1", + ], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── XPath escape sanitizers ─── + // + // No standard library helper escapes XPath metacharacters; project-local + // `escapeXpath` / `xpathEscape` are the developer-named equivalents. + LabelRule { + matchers: &["escapeXpath", "xpathEscape", "escape_xpath"], + label: DataLabel::Sanitizer(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF injection sinks ─── + // + // Express/Fastify/Node `http` response APIs that write a single header + // value: `res.setHeader(name, val)` (case-insensitive verb), `res.set`, + // `res.header`, `res.append`. Tainted strings here without `\r\n` + // stripping let an attacker inject extra headers (response splitting). + LabelRule { + matchers: &["setHeader", "res.set", "res.header", "res.append"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // Subscript-set form: `res.headers["X-Foo"] = bar` / + // `response.headers["X-Foo"] = bar`. The LHS-subscript classification + // path in `cfg/mod.rs::push_node` walks into the subscript's `object` + // and classifies its member-expression text, so the bare bracket form + // fires alongside `setHeader` / `res.set` / `res.header` / `res.append`. + LabelRule { + matchers: &["res.headers", "response.headers", "self.response.headers"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF sanitizers ─── + // + // Project-local `stripCRLF` / `escapeHeader` helpers that strip `\r` and + // `\n` from a value before it is written to a response header. + LabelRule { + matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Prototype pollution sinks (library-mediated) ─── + // + // Recursive merge / deep-assign helpers from lodash / common bundles. + // Argument-role gating (target vs src) is enforced via Destination + // activation in `GATED_SINKS` below: only taint flowing into the + // source-object arguments (positions 1+) activates; tainted-target- + // only is benign because writes to a tainted target object don't + // pollute `Object.prototype`. Flat rules here are intentionally + // empty for the merge family; see GATED_SINKS for the per-call + // gating. `_.template` is excluded — it is handled separately as + // a gated CODE_EXEC sink (Strapi CVE-2023-22621 evaluate:false + // suppression). + // ─── Open redirect sinks ─── + // + // Express response redirect: `res.redirect(url)`. Browser-side + // navigation: `location.replace` / `location.assign` fire as direct + // calls; `window.location = url` / `window.location.href = url` / + // `location.href = url` fire as assignment-LHS sinks via the + // `member_expr_text` classification path in `cfg::push_node`. + // `router.navigate` covers the Angular Router (`Router.navigate`, + // `Router.navigateByUrl`) and the React-Router `useNavigate`-returned + // `navigate` function; suffix matching catches both the bound-receiver + // and direct-call shapes. + LabelRule { + matchers: &[ + "res.redirect", + "location.replace", + "location.assign", + "router.navigate", + "router.navigateByUrl", + "window.location", + "window.location.href", + "location.href", + ], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── Open-redirect URL allowlist sanitizers ─── + // + // Project-local helpers that allowlist hosts or enforce relative-only + // URLs. `validateRedirectUrl` / `isSafeRedirect` are the canonical + // developer-named allowlist helpers; `stripScheme` clears any absolute + // scheme and degrades the URL to a relative path. `ensureRelativeUrl` + // / `assertRelativePath` cover the leading-slash / no-scheme idiom. + LabelRule { + matchers: &[ + "validateRedirectUrl", + "isSafeRedirect", + "stripScheme", + "ensureRelativeUrl", + "assertRelativePath", + "isRelativeUrl", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── SSTI sinks ─── + // + // Template-engine entry points that accept the template *source string* + // as the first argument: tainted arg 0 lets the attacker drive + // arbitrary template execution. `_.template` is excluded — it has + // its own gated CODE_EXEC classifier (Strapi CVE-2023-22621) that + // respects the `evaluate:false` opt-out. `nunjucks.renderString` is + // also excluded — see GATED_SINKS below for arg-0-only payload + // gating (suppresses tainted-`ctx`-only flows). + LabelRule { + matchers: &["Handlebars.compile"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + }, + // ─── XXE sinks ─── + // + // libxmljs `parseXmlString` / `parseXml` resolve external entities by + // default when called with `{ noent: true }` or + // `{ replaceEntities: true }`. The flat-rule modeling treats any call + // as a sink, the safe path requires explicit option suppression. + // libxmljs's own default ignores entities so the sink is conservative + // here; xml2js / fast-xml-parser are gated below in GATED_SINKS to + // suppress the safe-default case. + LabelRule { + matchers: &["libxmljs.parseXmlString", "libxmljs.parseXml"], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + }, ]; /// Callee patterns that must never be classified as source/sanitizer/sink. @@ -420,6 +592,33 @@ pub static GATED_SINKS: &[SinkGate] = &[ dangerous_kwargs: &[], activation: GateActivation::ValueMatch, }, + // ── XML XXE gates ───────────────────────────────────────────────────── + // + // `xml2js.parseString(xml, opts, cb)` is XXE-safe by default; opts + // `{ explicitChildren: true, charkey: '__cdata' }` are benign, but + // resolving entities at the underlying sax-js layer requires user + // intent. The gate fires only when the option object literal carries + // an entity-resolution kwarg with a truthy value (or is dynamic). Only + // the XML payload (arg 0) is the protected position. + SinkGate { + callee_matcher: "xml2js.parseString", + arg_index: 1, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[ + ("processEntities", &["true"]), + ("explicitEntities", &["true"]), + ("strict", &["false"]), + ], + activation: GateActivation::ValueMatch, + }, + // Note: `fast-xml-parser` (`new XMLParser({...}).parse(xml)`) is XXE-safe + // by default; flagging it would require constructor-option tracking via + // TypeFacts (XmlParser type with config carry). Deferred to Layer 2. // ── Outbound HTTP clients (SSRF) ────────────────────────────────────── // // Policy: SSRF fires only when taint reaches the destination-bearing @@ -797,6 +996,282 @@ pub static GATED_SINKS: &[SinkGate] = &[ object_destination_fields: &[], }, }, + // `nunjucks.renderString(src, ctx)` — Nunjucks SSTI sink. Only the + // template *source* (arg 0) lets an attacker drive template execution; + // the `ctx` data object (arg 1) is rendered via the template's escape + // policy and is not itself a code-injection vector. Gate via + // Destination-style activation with `payload_args: &[0]` so taint + // flowing only into `ctx` is suppressed. + SinkGate { + callee_matcher: "nunjucks.renderString", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // ── Prototype pollution gates ──────────────────────────────────────── + // + // Library-mediated recursive merge / deep-assign helpers. Argument- + // role gating: `(target, src1, src2, ...)` — only taint reaching a + // *source* position (index 1+) can pollute `Object.prototype` via + // `__proto__` / `constructor` keys on attacker-controlled input. + // Tainted target alone is benign (it just mutates that object). + // `payload_args: &[1, 2, 3, 4, 5]` covers the canonical 1-target + + // up-to-5-source signatures used by lodash / Object.assign / jQuery + // extend; arity beyond 5 is rare in practice and would over-suppress + // only at the long tail. + SinkGate { + callee_matcher: "_.merge", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.mergeWith", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.defaultsDeep", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // `_.set(obj, path, value)` — both `path` (arg 1) and `value` (arg 2) + // can drive prototype pollution: a tainted path of `__proto__.foo` + // mutates `Object.prototype`, and a tainted value into `obj.__proto__` + // does the same. Object (arg 0) is the canonical target. + SinkGate { + callee_matcher: "_.set", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.setWith", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // Generic project-local deep-merge helpers. Suffix-matched so any + // `*.deepMerge` / `*.defaultsDeep` qualified call also resolves. + SinkGate { + callee_matcher: "deepMerge", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "defaultsDeep", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // `Object.assign(target, ...sources)` is safe with constant-literal + // sources (`{a: 1, b: 2}`) but dangerous with attacker-controlled + // input (`req.body`). Gate target out of payload_args so tainted- + // target alone does not fire. + SinkGate { + callee_matcher: "Object.assign", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // jQuery / Zepto `$.extend(target, ...sources)` and `jQuery.extend`. + // Arg 0 may be a deep-flag boolean (`true`) when the deep-merge form + // is in use, in which case sources start at arg 2. Cover both + // shapes by listing arg 1, 2, 3, 4 in `payload_args`: a `true` first + // arg never carries taint, so its inclusion is harmless; for the + // shallow `$.extend(target, src)` form, src at arg 1 still fires. + SinkGate { + callee_matcher: "$.extend", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2, 3, 4], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "jQuery.extend", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2, 3, 4], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // Bare `extend` (suffix-matched) for jQuery's deep form imported as a + // bound name: `const { extend } = require('jquery'); extend(true, t, s)`. + // Suffix `extend` would over-fire on Backbone's `Model.extend(proto)` / + // `View.extend({...})` class-extension idiom, so this gate uses + // `LiteralOnly` activation: it fires only when arg 0 is the literal + // boolean `true` (the deep-flag form, never used by Backbone subclassing). + // Sources start at arg 2 because arg 0 is the flag and arg 1 is the + // target; tainting the target alone is benign. + SinkGate { + callee_matcher: "extend", + arg_index: 0, + dangerous_values: &["true"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::LiteralOnly, + }, + // `set-value` standalone helper: `setValue(obj, key, val)` — historic + // CVE-2019-10747 (set-value <2.0.1) and CVE-2021-23440 (set-value <4.0.1) + // recursive set-by-path helper that did not block `__proto__` keys. + // Suffix-matched so qualified imports (`require('set-value')`) bound to + // `setValue` still resolve. + SinkGate { + callee_matcher: "setValue", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // `dot-prop` standalone helper: `dotProp.set(obj, path, val)` — + // CVE-2020-8116. Path is a dotted-string with prototype-key support; + // a tainted `path` of `__proto__.x` mutates Object.prototype. + SinkGate { + callee_matcher: "dotProp.set", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // `JSONPath` / `jsonpath-plus` `JSONPath({path: p, json: o, callback: fn})` + // historically supported a `resultType: 'value'` mode that, combined with + // `parent`/`parentProperty` writes inside the callback, can mutate the + // prototype chain. Recognise the `jp.set(obj, path, value)` family + // (jsonpath, jsonpath-plus) on the same shape as `_.set`. + SinkGate { + callee_matcher: "jp.set", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "jsonpath.set", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, ]; pub static KINDS: Map<&'static str, Kind> = phf_map! { diff --git a/src/labels/mod.rs b/src/labels/mod.rs index a28b06b5..90066fd5 100644 --- a/src/labels/mod.rs +++ b/src/labels/mod.rs @@ -66,6 +66,17 @@ pub enum GateActivation { /// selects which attribute is being set) and `parseFromString` (activation /// arg selects the MIME type). ValueMatch, + /// Strict literal-value activation. The gate fires only when the + /// activation arg is a literal that matches `dangerous_values` / + /// `dangerous_prefixes`. Unknown/dynamic activation arg suppresses + /// (no conservative ALL_ARGS_PAYLOAD push). + /// + /// Used for ambiguously-named matchers where the dangerous shape is + /// only identifiable by an explicit literal flag — e.g. bare `extend` + /// where the deep-merge form is `extend(true, target, src)` but + /// Backbone's `Model.extend({proto})` shares the suffix. Conservative + /// fallback would over-fire on the class-extension form. + LiteralOnly, /// Destination-bearing flow activation. The gate fires when taint reaches /// a declared destination location at the call site, no literal /// inspection, no prefix heuristic. @@ -156,53 +167,83 @@ bitflags! { /// In practice: a finding fires when a tainted value reaches a sink and /// `(value_caps & sink_caps) != 0`. #[derive(Debug, Clone, Copy, PartialEq, Eq)] - pub struct Cap: u16 { + pub struct Cap: u32 { /// Taint that originated from an environment variable read. /// Used as a source-origin marker for env-injection rules. - const ENV_VAR = 0b0000_0000_0000_0001; // bit 0 + const ENV_VAR = 1 << 0; /// Sanitizer: the value has passed through HTML entity escaping. /// Strips XSS risk from values that reach HTML output sinks. - const HTML_ESCAPE = 0b0000_0000_0000_0010; // bit 1 + const HTML_ESCAPE = 1 << 1; /// Sanitizer: the value has been shell-argument escaped. /// Strips command-injection risk before shell sinks. - const SHELL_ESCAPE = 0b0000_0000_0000_0100; // bit 2 + const SHELL_ESCAPE = 1 << 2; /// Sanitizer: the value has been percent-encoded for use in a URL. - const URL_ENCODE = 0b0000_0000_0000_1000; // bit 3 + const URL_ENCODE = 1 << 3; /// Sanitizer: the value was parsed through a structured JSON decoder /// (as opposed to `eval`-based or regex parsing). - const JSON_PARSE = 0b0000_0000_0001_0000; // bit 4 + const JSON_PARSE = 1 << 4; /// Sink: file system read or write operation (path traversal, arbitrary /// file read/write). - const FILE_IO = 0b0000_0000_0010_0000; // bit 5 + const FILE_IO = 1 << 5; /// Sink: format string injection (e.g. `printf`-family, `String.format`). - const FMT_STRING = 0b0000_0000_0100_0000; // bit 6 + const FMT_STRING = 1 << 6; /// Sink: SQL query construction. Fires for string-concatenated queries /// and parameterized-query builders where the query text itself is tainted. - const SQL_QUERY = 0b0000_0000_1000_0000; // bit 7 + const SQL_QUERY = 1 << 7; /// Sink: unsafe object deserialization (Java `ObjectInputStream`, /// Python `pickle`, Ruby `Marshal`, PHP `unserialize`, etc.). - const DESERIALIZE = 0b0000_0001_0000_0000; // bit 8 + const DESERIALIZE = 1 << 8; /// Sink: server-side request forgery. Fires when attacker-controlled /// data reaches the destination URL of an outbound HTTP request. - const SSRF = 0b0000_0010_0000_0000; // bit 9 + const SSRF = 1 << 9; /// Sink: code or command execution (shell injection, `eval`, `exec`, /// dynamic `require`/`import`, template injection). - const CODE_EXEC = 0b0000_0100_0000_0000; // bit 10 + const CODE_EXEC = 1 << 10; /// Sink: cryptographic operation with a tainted algorithm name or seed /// (weak-crypto / predictable-randomness patterns). - const CRYPTO = 0b0000_1000_0000_0000; // bit 11 + const CRYPTO = 1 << 11; /// Request-bound, caller-supplied identifier that has not yet been /// validated against an ownership/membership check. Used as the /// carrier cap for folding `auth_analysis` into the SSA/taint /// engine. - const UNAUTHORIZED_ID = 0b0001_0000_0000_0000; // bit 12 + const UNAUTHORIZED_ID = 1 << 12; /// Cross-boundary data-exfiltration: tainted sensitive data flowing /// into outbound request bodies, headers, or other payload-bearing /// fields of network egress APIs. Distinct from `SSRF` (attacker /// control over the destination URL), `DATA_EXFIL` fires when the /// destination is fixed but attacker-influenced data leaves the /// process via the request payload. - const DATA_EXFIL = 0b0010_0000_0000_0000; // bit 13 + const DATA_EXFIL = 1 << 13; + /// Sink: LDAP search/query construction. Fires when attacker-controlled + /// data reaches a directory-service filter or DN argument without + /// LDAP-filter escaping. + const LDAP_INJECTION = 1 << 14; + /// Sink: XPath expression construction. Fires when attacker-controlled + /// data is concatenated into an XPath query rather than passed via + /// XPath variable bindings. + const XPATH_INJECTION = 1 << 15; + /// Sink: HTTP response header value (or any CRLF-sensitive output). + /// Fires when attacker-controlled data lands in a `Set-Header` / + /// header-add call without `\r\n` stripping (response splitting). + const HEADER_INJECTION = 1 << 16; + /// Sink: redirect / `Location` header destination. Fires when an + /// attacker-controlled URL reaches a redirect call without an + /// allowlist or relative-URL check. + const OPEN_REDIRECT = 1 << 17; + /// Sink: server-side template injection. Fires when the **template + /// source string** itself is attacker-controlled (e.g. + /// `Template(user_input).render()`), distinct from rendering a + /// trusted template with tainted variables. + const SSTI = 1 << 18; + /// Sink: XML external entity resolution. Fires when attacker-controlled + /// XML reaches a parser configured to resolve external entities (or + /// missing the secure-processing feature). + const XXE = 1 << 19; + /// Sink: prototype pollution. Fires when an attacker-controlled key + /// reaches an object property assignment that can mutate + /// `Object.prototype` (`__proto__`, `constructor.prototype`, deep-merge + /// helpers). + const PROTOTYPE_POLLUTION = 1 << 20; } } @@ -214,14 +255,18 @@ impl Default for Cap { impl serde::Serialize for Cap { fn serialize(&self, s: S) -> Result { - s.serialize_u16(self.bits()) + s.serialize_u32(self.bits()) } } impl<'de> serde::Deserialize<'de> for Cap { fn deserialize>(d: D) -> Result { - let bits = u16::deserialize(d)?; - Ok(Cap::from_bits_truncate(bits)) + // Accept any unsigned integer width (existing JSON written with the + // u16 representation must continue to deserialise into the widened + // u32 cap field). serde-json hands these through `deserialize_u64`; + // the truncating cast preserves all currently-defined cap bits. + let bits = u64::deserialize(d)?; + Ok(Cap::from_bits_truncate(bits as u32)) } } @@ -370,16 +415,46 @@ static GATED_REGISTRY: Lazy> = Lazy:: m.insert("js", javascript::GATED_SINKS); m.insert("typescript", typescript::GATED_SINKS); m.insert("ts", typescript::GATED_SINKS); - m.insert("python", python::GATED_SINKS); - m.insert("py", python::GATED_SINKS); + + // Python prototype-pollution gates are opt-in: `dict.update(target, + // src)` overlaps too broadly with non-pollution use of `update` + // (Counter, namespaced state mutation) to ship as a default sink. + // The `NYX_PYTHON_PROTO_POLLUTION` env var enables them; when set + // the merged slice is leaked into a `'static` reference so the + // registry's lifetime invariant holds. + let python_gates: &'static [SinkGate] = if env_python_proto_pollution() { + let mut combined: Vec = python::GATED_SINKS.to_vec(); + combined.extend_from_slice(python::PROTO_POLLUTION_GATES); + Box::leak(combined.into_boxed_slice()) + } else { + python::GATED_SINKS + }; + m.insert("python", python_gates); + m.insert("py", python_gates); + m.insert("go", go::GATED_SINKS); m.insert("php", php::GATED_SINKS); m.insert("c", c::GATED_SINKS); m.insert("cpp", cpp::GATED_SINKS); m.insert("c++", cpp::GATED_SINKS); + m.insert("ruby", ruby::GATED_SINKS); + m.insert("rb", ruby::GATED_SINKS); + m.insert("java", java::GATED_SINKS); + m.insert("rust", rust::GATED_SINKS); + m.insert("rs", rust::GATED_SINKS); m }); +/// Feature flag for the Python prototype-pollution gates. Disabled by +/// default; set `NYX_PYTHON_PROTO_POLLUTION=1` (or `true`) to enable +/// `dict.update` / `__dict__.update` proto-pollution detection. +fn env_python_proto_pollution() -> bool { + matches!( + std::env::var("NYX_PYTHON_PROTO_POLLUTION").ok().as_deref(), + Some("1") | Some("true") | Some("TRUE") | Some("yes") | Some("on") + ) +} + /// Per-language exclusion patterns: callee text that must never be classified. static EXCLUDES: Lazy> = Lazy::new(|| { let mut m = HashMap::new(); @@ -725,6 +800,13 @@ pub fn parse_cap(s: &str) -> Option { "crypto" => Some(Cap::CRYPTO), "unauthorized_id" => Some(Cap::UNAUTHORIZED_ID), "data_exfil" | "data_exfiltration" => Some(Cap::DATA_EXFIL), + "ldap_injection" | "ldapi" => Some(Cap::LDAP_INJECTION), + "xpath_injection" | "xpathi" => Some(Cap::XPATH_INJECTION), + "header_injection" | "crlf" | "response_splitting" => Some(Cap::HEADER_INJECTION), + "open_redirect" | "redirect" => Some(Cap::OPEN_REDIRECT), + "ssti" | "template_injection" => Some(Cap::SSTI), + "xxe" => Some(Cap::XXE), + "prototype_pollution" | "proto_pollution" => Some(Cap::PROTOTYPE_POLLUTION), "all" => Some(Cap::all()), _ => None, } @@ -1274,7 +1356,15 @@ pub fn classify_gated_sink( // where `userAttr` is user-controlled) is itself a vulnerability // path. Return ALL_ARGS_PAYLOAD so downstream sink scanning // considers every positional argument. + // + // `LiteralOnly` opts out of this conservative branch: the gate + // requires positive literal evidence to fire, so unknown + // activation suppresses entirely (avoids false positives on + // ambiguously-named suffix matchers like bare `extend`). None => { + if matches!(gate.activation, GateActivation::LiteralOnly) { + continue; + } out.push(GateMatch { label: gate.label, payload_args: ALL_ARGS_PAYLOAD, @@ -1396,10 +1486,283 @@ pub fn cap_to_name(cap: Cap) -> &'static str { Cap::CODE_EXEC => "code_exec", Cap::CRYPTO => "crypto", Cap::UNAUTHORIZED_ID => "unauthorized_id", + Cap::DATA_EXFIL => "data_exfil", + Cap::LDAP_INJECTION => "ldap_injection", + Cap::XPATH_INJECTION => "xpath_injection", + Cap::HEADER_INJECTION => "header_injection", + Cap::OPEN_REDIRECT => "open_redirect", + Cap::SSTI => "ssti", + Cap::XXE => "xxe", + Cap::PROTOTYPE_POLLUTION => "prototype_pollution", _ => "unknown", } } +// ── Cap rule registry ──────────────────────────────────────────────────── +// +// Static, single-source-of-truth metadata table keyed by [`Cap`]. Every +// vulnerability class with its own canonical rule id appears here; the +// per-language `RULES` arrays only carry the language-specific match shapes. +// Sink-cap fields on a finding (or `Cap::DATA_EXFIL` carried alongside) feed +// `cap_rule_meta()` to pick the rule id surfaced to SARIF, the dashboard, +// and `enumerate_builtin_rules()` for `nyx rules list`. + +/// Static metadata for one cap-defined vulnerability class. +#[derive(Debug, Clone, Copy)] +pub struct CapRuleMeta { + pub cap: Cap, + /// Canonical rule id surfaced by finding emission (no source-suffix). + pub rule_id: &'static str, + /// Display title for `nyx rules list` and dashboard. + pub title: &'static str, + pub severity: crate::patterns::Severity, + /// OWASP 2021 code (e.g. `"A03"`). + pub owasp_code: &'static str, + /// OWASP 2021 long label (e.g. `"Injection"`). + pub owasp_label: &'static str, + pub description: &'static str, + /// `false` only for caps gated behind a config flag (e.g. + /// `Cap::UNAUTHORIZED_ID`, which still defers to the standalone + /// `auth_analysis` subsystem unless `enable_auth_as_taint` is on). + pub default_enabled: bool, + /// Whether the diag-id emission path in `ast.rs` actually surfaces + /// findings under [`Self::rule_id`]. When `false`, sink findings + /// for this cap currently surface under the legacy + /// `taint-unsanitised-flow` id (the per-language family-token + /// dispatch in [`crate::server::owasp::owasp_bucket_for`] still + /// buckets them correctly). Dashboards and `nyx rules list` consume + /// this flag to decide whether to surface the synthetic class entry + /// alongside live findings or hide it as forward-declared. + /// + /// Migrating a cap from `false` → `true` requires adding it to the + /// cap-specific routing list in `ast.rs::diag_for_finding`; tests + /// that pin the legacy `taint-unsanitised-flow` rule id for that + /// cap must be updated to the cap-specific id. + pub emission_active: bool, +} + +/// Registry of cap-class metadata. Keyed in cap-bit order so additions +/// stay clustered with their bitflag declarations. +pub static CAP_RULE_REGISTRY: &[CapRuleMeta] = &[ + CapRuleMeta { + cap: Cap::FILE_IO, + rule_id: "taint-path-traversal", + title: "Path Traversal / Arbitrary File Access", + severity: crate::patterns::Severity::High, + owasp_code: "A01", + owasp_label: "Broken Access Control", + description: "Attacker-controlled data flows into a filesystem path without canonicalisation \ + or root-confinement, allowing reads or writes outside the intended directory.", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::FMT_STRING, + rule_id: "taint-format-string", + title: "Format String Injection", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker-controlled data is used as a format string argument (printf-family, \ + String.format) and can leak memory or crash the process.", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::SQL_QUERY, + rule_id: "taint-sql-injection", + title: "SQL Injection", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker-controlled data is concatenated into a SQL query string instead of \ + being bound through a parameterised statement.", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::DESERIALIZE, + rule_id: "taint-deserialization", + title: "Unsafe Deserialization", + severity: crate::patterns::Severity::High, + owasp_code: "A08", + owasp_label: "Software and Data Integrity Failures", + description: "Attacker-controlled bytes are fed to an unsafe object deserialiser \ + (pickle, ObjectInputStream, Marshal, unserialize) enabling arbitrary code \ + execution via crafted payloads.", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::SSRF, + rule_id: "taint-ssrf", + title: "Server-Side Request Forgery", + severity: crate::patterns::Severity::High, + owasp_code: "A10", + owasp_label: "Server-Side Request Forgery", + description: "Attacker-controlled URL reaches the destination of an outbound HTTP request \ + without an allowlist or scheme/host restriction.", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::CODE_EXEC, + rule_id: "taint-code-execution", + title: "Code / Command Execution", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker-controlled data reaches an `eval`/`exec`/shell sink, dynamic \ + require/import, or other arbitrary-code construct.", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::CRYPTO, + rule_id: "taint-crypto-misuse", + title: "Tainted Cryptographic Parameter", + severity: crate::patterns::Severity::Medium, + owasp_code: "A02", + owasp_label: "Cryptographic Failures", + description: "Attacker-controlled data drives the algorithm name, key, or seed of a \ + cryptographic primitive (weak-crypto / predictable-randomness).", + default_enabled: true, + emission_active: false, + }, + CapRuleMeta { + cap: Cap::UNAUTHORIZED_ID, + rule_id: "rs.auth.missing_ownership_check.taint", + title: "Missing Ownership Check (taint variant)", + severity: crate::patterns::Severity::High, + owasp_code: "A01", + owasp_label: "Broken Access Control", + description: "Request-bound identifier reaches a privileged sink without an intervening \ + ownership/membership check. Companion to the standalone `auth_analysis` \ + rule; gated by `scanner.enable_auth_as_taint`.", + default_enabled: false, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::DATA_EXFIL, + rule_id: "taint-data-exfiltration", + title: "Sensitive Data Exfiltration", + severity: crate::patterns::Severity::High, + owasp_code: "A04", + owasp_label: "Insecure Design", + description: "Sensitive data (cookies, headers, env, db rows, files) flows into the body, \ + headers, or other payload field of an outbound network request to a fixed \ + destination.", + default_enabled: true, + emission_active: true, + }, + // ── Cap-specific rule ids ──────────────────────────────────────────── + CapRuleMeta { + cap: Cap::LDAP_INJECTION, + rule_id: "taint-ldap-injection", + title: "LDAP Injection", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker-controlled data is concatenated into an LDAP filter or DN without \ + RFC 4515 escaping, letting the attacker rewrite the directory query.", + default_enabled: true, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::XPATH_INJECTION, + rule_id: "taint-xpath-injection", + title: "XPath Injection", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker-controlled data is concatenated into an XPath expression instead of \ + passed through XPath variable bindings, letting the attacker rewrite the \ + query.", + default_enabled: true, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::HEADER_INJECTION, + rule_id: "taint-header-injection", + title: "HTTP Header / Response Splitting", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker-controlled data lands in an HTTP response header without `\\r\\n` \ + stripping, enabling response splitting and cache-poisoning attacks.", + default_enabled: true, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::OPEN_REDIRECT, + rule_id: "taint-open-redirect", + title: "Open Redirect", + severity: crate::patterns::Severity::Medium, + owasp_code: "A01", + owasp_label: "Broken Access Control", + description: "Attacker-controlled URL drives a redirect / `Location` header without an \ + allowlist or relative-URL check, enabling phishing pivots.", + default_enabled: true, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::SSTI, + rule_id: "taint-template-injection", + title: "Server-Side Template Injection", + severity: crate::patterns::Severity::High, + owasp_code: "A03", + owasp_label: "Injection", + description: "Attacker controls the template *source string* (not just template variables) \ + passed to a server-side renderer (Jinja2, Twig, Handlebars, ERB), enabling \ + arbitrary expression evaluation.", + default_enabled: true, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::XXE, + rule_id: "taint-xxe", + title: "XML External Entity Resolution", + severity: crate::patterns::Severity::High, + owasp_code: "A05", + owasp_label: "Security Misconfiguration", + description: "Attacker-controlled XML reaches a parser configured to resolve external \ + entities (or missing the secure-processing feature), enabling SSRF, file \ + read, and DoS.", + default_enabled: true, + emission_active: true, + }, + CapRuleMeta { + cap: Cap::PROTOTYPE_POLLUTION, + rule_id: "taint-prototype-pollution", + title: "Prototype Pollution", + severity: crate::patterns::Severity::High, + owasp_code: "A05", + owasp_label: "Security Misconfiguration", + description: "Attacker-controlled key reaches an object property assignment that can mutate \ + `Object.prototype` (deep-merge / `__proto__` / dynamic subscript).", + default_enabled: true, + emission_active: true, + }, +]; + +/// Resolve a cap to its canonical rule metadata. Returns `None` for caps +/// without a rule-emission role (origin / sanitizer markers like +/// [`Cap::ENV_VAR`], [`Cap::HTML_ESCAPE`]). +pub fn cap_rule_meta(cap: Cap) -> Option<&'static CapRuleMeta> { + CAP_RULE_REGISTRY.iter().find(|m| m.cap == cap) +} + +/// Resolve any subset of `effective_caps` to a single rule id. When +/// multiple bits are set, picks the first registry entry that intersects +/// (registry order is bit-position). Returns `None` when no bit in the +/// set has a registered rule id. +pub fn rule_id_for_caps(effective_caps: Cap) -> Option<&'static str> { + CAP_RULE_REGISTRY + .iter() + .find(|m| effective_caps.contains(m.cap)) + .map(|m| m.rule_id) +} + /// Generate a stable rule ID from language, kind, and matchers. pub fn rule_id(lang: &str, kind: &str, matchers: &[&str]) -> String { let mut sorted: Vec<&str> = matchers.to_vec(); @@ -1418,11 +1781,25 @@ pub struct RuleInfo { pub language: String, pub kind: String, pub cap: String, - pub cap_bits: u16, + pub cap_bits: u32, pub matchers: Vec, pub case_sensitive: bool, pub is_custom: bool, pub is_gated: bool, + /// Cap-class registry entry (one per `Cap` with a canonical rule id), + /// distinct from per-language sink/source/sanitizer match rules. The + /// dashboard groups these separately so the rules surface does not mix + /// "the LDAP injection class exists" with "Java's `DirContext.search` + /// is a sink for that class". + pub is_class: bool, + /// For class entries (`is_class == true`), whether the diag-id + /// emission path in `ast.rs` actually surfaces findings under + /// [`Self::id`]. When `false`, the class is registered but live + /// findings still emerge under the legacy `taint-unsanitised-flow` + /// rule id; dashboards can use this flag to suppress the synthetic + /// entry until the cap is migrated to its specific rule id. + /// Always `true` for non-class label rules. + pub emission_active: bool, pub enabled: bool, } @@ -1430,6 +1807,27 @@ pub struct RuleInfo { pub fn enumerate_builtin_rules() -> Vec { let mut out = Vec::new(); + // Cap-class entries (one per registered vulnerability class). Kind + // `class` so dashboards can distinguish them from per-language + // sink/source/sanitizer entries. + for meta in CAP_RULE_REGISTRY { + out.push(RuleInfo { + id: meta.rule_id.to_string(), + title: meta.title.to_string(), + language: "all".to_string(), + kind: "class".to_string(), + cap: cap_to_name(meta.cap).to_string(), + cap_bits: meta.cap.bits(), + matchers: Vec::new(), + case_sensitive: false, + is_custom: false, + is_gated: false, + is_class: true, + emission_active: meta.emission_active, + enabled: meta.default_enabled, + }); + } + for &lang in CANONICAL_LANGS { if let Some(rules) = REGISTRY.get(lang) { for rule in *rules { @@ -1453,6 +1851,8 @@ pub fn enumerate_builtin_rules() -> Vec { case_sensitive: rule.case_sensitive, is_custom: false, is_gated: false, + is_class: false, + emission_active: true, enabled: true, }); } @@ -1479,6 +1879,8 @@ pub fn enumerate_builtin_rules() -> Vec { case_sensitive: gate.case_sensitive, is_custom: false, is_gated: true, + is_class: false, + emission_active: true, enabled: true, }); } @@ -1498,6 +1900,65 @@ pub fn custom_rule_id(lang: &str, kind: &str, matchers: &[String]) -> String { mod tests { use super::*; + /// Pin the current set of caps whose `rule_id` is reachable via the + /// diag-id routing in `ast.rs::diag_for_finding`. When migrating a + /// legacy cap (e.g. SQL_QUERY → `taint-sql-injection`), update both + /// `ast.rs` (add the cap to the cap-specific routing list) and the + /// `emission_active: true` flag in `CAP_RULE_REGISTRY`, then update + /// this assertion. The split exists because legacy taint findings + /// historically all surfaced under the generic `taint-unsanitised-flow` + /// rule id; the seven cap-specific routes (LDAP / XPath / header / + /// open redirect / SSTI / XXE / prototype pollution) plus + /// `unauthorized_id` and `data_exfil` are the only ones wired through. + #[test] + fn cap_rule_registry_emission_active_set_is_pinned() { + let active: Vec = CAP_RULE_REGISTRY + .iter() + .filter(|m| m.emission_active) + .map(|m| m.cap) + .collect(); + let expected = [ + Cap::UNAUTHORIZED_ID, + Cap::DATA_EXFIL, + Cap::LDAP_INJECTION, + Cap::XPATH_INJECTION, + Cap::HEADER_INJECTION, + Cap::OPEN_REDIRECT, + Cap::SSTI, + Cap::XXE, + Cap::PROTOTYPE_POLLUTION, + ]; + for c in expected { + assert!( + active.contains(&c), + "cap {:?} expected to be emission_active in CAP_RULE_REGISTRY", + c + ); + } + let inactive: Vec = CAP_RULE_REGISTRY + .iter() + .filter(|m| !m.emission_active) + .map(|m| m.cap) + .collect(); + let expected_inactive = [ + Cap::FILE_IO, + Cap::FMT_STRING, + Cap::SQL_QUERY, + Cap::DESERIALIZE, + Cap::SSRF, + Cap::CODE_EXEC, + Cap::CRYPTO, + ]; + for c in expected_inactive { + assert!( + inactive.contains(&c), + "cap {:?} expected to be emission_inactive in CAP_RULE_REGISTRY (legacy \ + finding still emits as taint-unsanitised-flow)", + c + ); + } + } + #[test] fn receiver_validator_python_relative_to() { // Bare method name fires. @@ -1781,6 +2242,33 @@ mod tests { // from `File.open` / `IO.open` / `URI.open`, each of which has its // own non-piping semantics. Without the sigil, the suffix-with- // boundary matcher would over-fire on every `X.open` call. + #[test] + fn classify_javascript_set_value_is_proto_pollution_gate() { + let no_kw = |_: &str| None; + let no_kw_present = |_: &str| false; + let result = classify_gated_sink("javascript", "setValue", |_| None, no_kw, no_kw_present); + assert!( + result + .iter() + .any(|m| m.label == DataLabel::Sink(Cap::PROTOTYPE_POLLUTION)), + "expected PROTOTYPE_POLLUTION gate match for bare `setValue`, got {result:?}" + ); + } + + #[test] + fn classify_javascript_dot_prop_set_is_proto_pollution_gate() { + let no_kw = |_: &str| None; + let no_kw_present = |_: &str| false; + let result = + classify_gated_sink("javascript", "dotProp.set", |_| None, no_kw, no_kw_present); + assert!( + result + .iter() + .any(|m| m.label == DataLabel::Sink(Cap::PROTOTYPE_POLLUTION)), + "expected PROTOTYPE_POLLUTION gate match for `dotProp.set`, got {result:?}" + ); + } + #[test] fn classify_ruby_bare_open_is_shell_escape_sink() { let result = classify("ruby", "open", None); @@ -2419,7 +2907,7 @@ mod tests { ); assert_eq!( classify("rust", "Redirect::to(next)", Some(&extras)), - Some(DataLabel::Sink(Cap::SSRF)), + Some(DataLabel::Sink(Cap::OPEN_REDIRECT)), ); let empty = rust::framework_rules(&FrameworkContext::default()); @@ -2470,7 +2958,7 @@ mod tests { ); assert_eq!( classify("rust", "Redirect::to(next)", Some(&extras)), - Some(DataLabel::Sink(Cap::SSRF)), + Some(DataLabel::Sink(Cap::OPEN_REDIRECT)), ); } } diff --git a/src/labels/php.rs b/src/labels/php.rs index cac0af90..a1f46814 100644 --- a/src/labels/php.rs +++ b/src/labels/php.rs @@ -178,6 +178,143 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::DATA_EXFIL), case_sensitive: true, }, + // ─── LDAP injection sinks ─── + // + // PHP's procedural LDAP API: `ldap_search($ds, $base, $filter)`, + // `ldap_list($ds, $base, $filter)`, `ldap_read($ds, $base, $filter)`. + // The filter argument is the LDAP-injection vector when concatenated + // with attacker-controlled input. + LabelRule { + matchers: &["ldap_search", "ldap_list", "ldap_read"], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── LDAP-filter sanitizer ─── + // + // `ldap_escape($value, $ignore, LDAP_ESCAPE_FILTER)` applies RFC 4515 + // escaping; treat any `ldap_escape` call as clearing the LDAP_INJECTION + // cap (the no-flag default also escapes filter metacharacters + // conservatively). + LabelRule { + matchers: &["ldap_escape"], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── XPath injection sinks ─── + // + // `DOMXPath::query($expr, $ctx)` and `DOMXPath::evaluate($expr, $ctx)` + // accept the expression string as arg 0; concatenated user input there + // is the canonical PHP XPath-injection vector. `SimpleXMLElement::xpath` + // takes the same shape. Direct flat matchers cover the + // class-qualified call forms. + // Type-qualified rewrites: `$xp = new DOMXPath($doc)` tags `$xp` as + // `TypeKind::XPathClient`, so `$xp->query(...)` / `$xp->evaluate(...)` + // resolve to `XPathClient.query` / `XPathClient.evaluate`. Without + // the distinct TypeKind, bare `query` would match the SQL_QUERY sink. + LabelRule { + matchers: &[ + "XPathClient.query", + "XPathClient.evaluate", + "DOMXPath::query", + "DOMXPath::evaluate", + "SimpleXMLElement::xpath", + ], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // Bare `xpath` method: SimpleXMLElement instances expose `->xpath($expr)` + // and Symfony / DOMCrawler wrappers do the same. Suffix matching on + // `xpath` covers `$xml->xpath(...)` and similar bound-receiver shapes + // where the receiver type is not statically known. Case-sensitive to + // avoid collisions with the `XPath` capitalisation used by qualified + // names. + LabelRule { + matchers: &["xpath"], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: true, + }, + // ─── XPath escape sanitizers ─── + // + // No PHP standard library helper escapes XPath metacharacters; project- + // local `escape_xpath` / `xpath_escape` are the developer-named + // equivalents. + LabelRule { + matchers: &["escape_xpath", "xpath_escape"], + label: DataLabel::Sanitizer(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF injection sinks ─── + // + // PHP's `header($line)` writes a raw header line. Tainted strings + // without `\r\n` stripping let an attacker inject extra headers + // (response splitting); see GATED_SINKS for the corresponding + // OPEN_REDIRECT co-tag on `Location: ...` forms. + // + // The HEADER_INJECTION sink is intentionally implemented as a gate + // (not a flat rule) so the multi-gate SSA dispatch can co-emit it + // alongside the OPEN_REDIRECT gate on the same call site, producing + // separate findings for each cap with their canonical rule ids. + // ─── Header / CRLF sanitizers ─── + LabelRule { + matchers: &["strip_crlf", "escape_header", "sanitize_header"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Open-redirect URL allowlist sanitizers ─── + // + // Mirrors the JS/TS rule. Developer-named functions that allowlist + // / scheme-strip a redirect URL clear OPEN_REDIRECT taint before it + // reaches `header("Location: …")`. PHP also commonly uses + // `snake_case` variants. + LabelRule { + matchers: &[ + "validateRedirectUrl", + "isSafeRedirect", + "stripScheme", + "validate_redirect_url", + "is_safe_redirect", + "strip_scheme", + "ensure_relative_url", + "ensureRelativeUrl", + "assert_relative_path", + "assertRelativePath", + "is_relative_url", + "isRelativeUrl", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── SSTI sinks ─── + // + // Twig `\Twig\Environment::createTemplate(string $template)` parses an + // arbitrary template source string at runtime; a tainted source yields + // SSTI when the resulting template is rendered. `Environment::render` + // / `Environment::load` take a *template name* (file lookup, not source) + // and are intentionally excluded. After PHP scope-resolution stripping + // the chain text covers both `$twig->createTemplate($src)` and + // `Twig\Environment::createTemplate(...)` shapes. + LabelRule { + matchers: &["Environment.createTemplate", "Twig.createTemplate"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: true, + }, + // ─── XXE sanitizers ─── + // + // `libxml_disable_entity_loader(true)` (PHP <8) / `libxml_set_external_entity_loader($cb)` + // disable external-entity expansion process-wide. Treat their return + // value as XXE-cleared so config-style fixtures (`libxml_disable_entity_loader(true); + // simplexml_load_string($xml, ...)`) suppress the gate when the call is + // present in the same SSA scope. The flat-rule sanitizer is a coarse + // approximation, the real config-check pattern would track parser-instance + // hardening (deferred Layer 2). + LabelRule { + matchers: &[ + "libxml_disable_entity_loader", + "libxml_set_external_entity_loader", + ], + label: DataLabel::Sanitizer(Cap::XXE), + case_sensitive: false, + }, ]; /// Gated sinks for PHP. @@ -193,18 +330,157 @@ pub static RULES: &[LabelRule] = &[ /// /// Identifier-based activation is enabled via the macro-arg fallback in /// `cfg::mod::classify_gated_sink` for `lang == "php"`. -pub static GATED_SINKS: &[SinkGate] = &[SinkGate { - callee_matcher: "curl_setopt", - arg_index: 1, - dangerous_values: &["CURLOPT_POSTFIELDS", "CURLOPT_COPYPOSTFIELDS"], - dangerous_prefixes: &[], - label: DataLabel::Sink(Cap::DATA_EXFIL), - case_sensitive: true, - payload_args: &[2], - keyword_name: None, - dangerous_kwargs: &[], - activation: GateActivation::ValueMatch, -}]; +pub static GATED_SINKS: &[SinkGate] = &[ + SinkGate { + callee_matcher: "curl_setopt", + arg_index: 1, + dangerous_values: &["CURLOPT_POSTFIELDS", "CURLOPT_COPYPOSTFIELDS"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::DATA_EXFIL), + case_sensitive: true, + payload_args: &[2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // PHP `header($line)` HEADER_INJECTION sink. Modelled as a gate so + // it can coexist with the OPEN_REDIRECT gate below: the multi-gate + // SSA dispatch needs each capability declared on its own gate filter + // to emit one finding per cap. Always activates (Destination), with + // payload arg 0 only (`header()` only accepts the line as arg 0; + // arg 1 is `replace`/`response_code`, not the line content). + SinkGate { + callee_matcher: "=header", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // PHP `simplexml_load_string($xml, $class, $options)` — + // XXE sink gated on the `LIBXML_NOENT` flag (or `LIBXML_DTDLOAD`, + // `LIBXML_DTDATTR`). PHP's libxml is XXE-safe by default since 2.9.0; + // the gate fires only when the `$options` literal includes one of the + // dangerous flags. Identifier-based activation works via the macro-arg + // fallback in `cfg::mod::classify_gated_sink` for `lang == "php"`. + SinkGate { + callee_matcher: "simplexml_load_string", + arg_index: 2, + dangerous_values: &["LIBXML_NOENT", "LIBXML_DTDLOAD", "LIBXML_DTDATTR"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + SinkGate { + callee_matcher: "simplexml_load_file", + arg_index: 2, + dangerous_values: &["LIBXML_NOENT", "LIBXML_DTDLOAD", "LIBXML_DTDATTR"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // DOMDocument::loadXML($xml, $options) — same gating as + // simplexml_load_string. The chain-normalised callee text for + // `$dom->loadXML(...)` is `dom.loadXML`; suffix matching on + // `loadXML` covers the bound-receiver form. + SinkGate { + callee_matcher: "loadXML", + arg_index: 1, + dangerous_values: &["LIBXML_NOENT", "LIBXML_DTDLOAD", "LIBXML_DTDATTR"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // PHP `header($line)` co-tag for OPEN_REDIRECT. + // + // The flat HEADER_INJECTION sink (`=header`) above already fires for + // any `header(...)` call regardless of the line content. This gate + // adds the OPEN_REDIRECT co-tag specifically when the first argument + // is a `Location: ...` header, so the dashboard / OWASP bucket + // correctly classifies redirect-class flows independently of CRLF. + // + // Activation: arg 0 prefix `Location:` (case-insensitive). When arg + // 0 is a constant string starting with `Location:` the gate fires and + // checks payload arg 0 for taint; constants like `Content-Type: ...` + // are suppressed by the safe-literal branch. When arg 0 is a binary + // expression (`"Location: " . $url`) or otherwise dynamic, the + // value-extraction returns `None` and the gate fires conservatively + // — matching the existing convention in `setAttribute`/`parseFromString`. + SinkGate { + callee_matcher: "=header", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &["Location:"], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: false, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // Smarty `$smarty->fetch($name)` — only the `string:` resource prefix + // accepts an inline template *source*; the bare form (`page.tpl`) is a + // file lookup (not SSTI). Gate activates only when arg 0's leading + // literal segment is the `string:` prefix; the constant-string suffix + // and concat (`"string:" . $src`) shapes both reach `extract_const_string_arg`'s + // leading-literal path and trigger activation. Payload is arg 0 + // itself — taint reaching the template source string is the SSTI flow. + // Suffix matching catches both `Smarty.fetch` and the bound-receiver + // `$smarty->fetch(...)` forms. + SinkGate { + callee_matcher: "Smarty.fetch", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &["string:"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // Twig `\Twig\Environment::createTemplate(string $template)` — + // gated SSTI sink. Activation is unconditional (no value gate); + // payload arg 0 is the template source string. Bare suffix + // `createTemplate` matches the idiomatic instance shape + // `$twig->createTemplate($src)` (chain text `twig.createTemplate`) + // as well as the static `Environment::createTemplate(...)` form; + // `createTemplate` is Twig-specific terminology so over-fire risk + // is low. The matching flat rule remains for documentation-style + // class-qualified call shapes. + SinkGate { + callee_matcher: "createTemplate", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, +]; pub static KINDS: Map<&'static str, Kind> = phf_map! { // control-flow diff --git a/src/labels/python.rs b/src/labels/python.rs index 96a192e6..67eeee89 100644 --- a/src/labels/python.rs +++ b/src/labels/python.rs @@ -61,7 +61,7 @@ pub static RULES: &[LabelRule] = &[ // pattern that follows `from flask import session`. The `=session` // exact-match form fires only when the call is the bare top-level // `session(...)` so accidental field projections like - // `obj.client.session` (Phase 2 chained-receiver lowering) don't get + // `obj.client.session` (chained-receiver lowering) don't get // mis-labelled as sources. LabelRule { matchers: &[ @@ -284,6 +284,212 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::DESERIALIZE), case_sensitive: false, }, + // ─── LDAP injection sinks ─── + // + // python-ldap exposes module-level `ldap.search_s` / `ldap.search_ext_s` + // and method-style `conn.search_s(base, scope, filter)` after `conn = + // ldap.initialize(url)`. Suffix matching on the method names catches both + // the qualified form (`ldap.search_s`, matched as a literal) and the + // bound-receiver form (`conn.search_s` ends with `search_s`). ldap3 uses + // `Connection(server, ...)` whose `.search(...)` accepts a filter kwarg / + // positional; receiver typing tags the connection as `TypeKind::LdapClient` + // so type-qualified resolution rewrites `conn.search` → `LdapClient.search`. + LabelRule { + matchers: &[ + "ldap.search_s", + "ldap.search_ext_s", + "search_s", + "search_ext_s", + "LdapClient.search", + "ldap3.Connection.search", + ], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── LDAP-filter sanitizers ─── + // + // python-ldap: `ldap.filter.escape_filter_chars(s)` and ldap3's + // `ldap3.utils.conv.escape_filter_chars(s)` both apply RFC 4515 escaping + // to filter metacharacters. Suffix matching on `escape_filter_chars` + // covers both the fully-qualified import and the bare-name destructured + // import (`from ldap.filter import escape_filter_chars`). + LabelRule { + matchers: &[ + "escape_filter_chars", + "ldap.filter.escape_filter_chars", + "ldap3.utils.conv.escape_filter_chars", + ], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── XPath injection sinks ─── + // + // lxml: `tree.xpath(expr)` / `etree.XPath(expr)` accept an + // attacker-influenceable expression string. ElementTree's + // `find` / `findall` / `findtext` accept the same kind of XPath subset + // and admit injection when the path is built by string concatenation. + // Suffix matching on the bare method names catches both + // `lxml.etree._Element.xpath(...)` and `tree.xpath(...)` shapes. + LabelRule { + matchers: &[ + "xpath", + "lxml.etree.XPath", + "etree.XPath", + "ElementTree.find", + "ElementTree.findall", + "ElementTree.findtext", + ], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: true, + }, + // ─── XPath escape sanitizers ─── + // + // No standard library helper escapes XPath metacharacters; project-local + // `escape_xpath` / `xpath_escape` are the developer-named equivalents. + LabelRule { + matchers: &["escape_xpath", "xpath_escape"], + label: DataLabel::Sanitizer(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF injection sinks ─── + // + // Flask / Werkzeug response APIs that write a single header value: + // `response.headers.add(name, val)`, `response.set_cookie(name, val)`, + // and the bare subscript-set form `response.headers[name] = val`. + // The subscript-set form is picked up via the LHS-subscript + // classification path in `cfg/mod.rs::push_node`: the LHS object's + // member-expression text matches `response.headers` / + // `self.response.headers` and tags the assignment as a HEADER_INJECTION + // sink. + LabelRule { + matchers: &["headers.add", "headers.set", "set_cookie"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + LabelRule { + matchers: &["response.headers", "self.response.headers", "resp.headers"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF sanitizers ─── + LabelRule { + matchers: &["strip_crlf", "escape_header", "sanitize_header"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Open redirect sinks ─── + // + // Flask `redirect(url)`, Django `HttpResponseRedirect(url)`, FastAPI / + // Starlette `RedirectResponse(url=...)`. Tainted URL flowing to any of + // these without an allowlist check is an open-redirect vector. + LabelRule { + matchers: &[ + "redirect", + "flask.redirect", + "django.shortcuts.redirect", + "HttpResponseRedirect", + "RedirectResponse", + ], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: true, + }, + LabelRule { + matchers: &[ + "validate_redirect_url", + "is_safe_redirect", + "strip_scheme", + "ensure_relative_url", + "assert_relative_path", + "is_relative_url", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── SSTI sinks ─── + // + // Template-engine constructors / `from_string` factories that accept the + // template *source string* as arg 0. `flask.render_template` takes a + // file PATH (not source) so does NOT match here — the safe API stays + // clean by name. + LabelRule { + matchers: &[ + "=Template", + "jinja2.Template", + "jinja2.Environment.from_string", + "Environment.from_string", + // `compile_expression` is jinja2-specific terminology (it returns a + // callable from an inline expression source). Bare suffix lets the + // rule fire on idiomatic instance shapes (`env.compile_expression(s)`) + // without a `jinja2.Environment` TypeKind. + "compile_expression", + "mako.template.Template", + "Template.render", + ], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: true, + }, + // Template-loader paths: a tainted `name` lets the attacker swap the + // resolved template behind the renderer. Mako's `TemplateLookup.get_template` + // and Jinja2's `Environment.get_template` / `select_template` / + // `loader.get_source` all take a template name (path-like) as arg 0. + // Modeling these as SSTI sinks captures the loader-path attack — the + // file resolver itself becomes the gadget when the name is attacker-controlled. + LabelRule { + matchers: &[ + "TemplateLookup.get_template", + "Environment.get_template", + "Environment.select_template", + "loader.get_source", + // Bare-suffix forms for the idiomatic instance shapes + // (`env.get_template(name)`, `lookup.get_template(name)`). + "get_template", + "select_template", + ], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: true, + }, + // ─── XXE sinks ─── + // + // Python's stock `xml.sax.parseString` / `xml.sax.parse` parsers are + // XXE-vulnerable by default; `xml.dom.minidom.parseString` / + // `xml.dom.minidom.parse` likewise resolve external entities through + // the underlying expat parser unless the entity-loader is hardened. + // Each entry is the dotted-module suffix; bare `parseString` / `parse` + // are intentionally avoided to prevent collisions with JSON parsers + // (`json.loads`), `lxml.etree.fromstring` is excluded — modern lxml + // disables external entities by default and would over-fire here. + LabelRule { + matchers: &[ + "xml.sax.parseString", + "xml.sax.parse", + "xml.dom.minidom.parseString", + "xml.dom.minidom.parse", + "xml.dom.pulldom.parseString", + "xml.dom.pulldom.parse", + ], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + }, + // `defusedxml.*` is the canonical hardened drop-in: every parser in + // the package strips external-entity / DTD resolution and raises on + // the patterns that would otherwise XXE. Treat any defusedxml + // call as an XXE sanitizer. + LabelRule { + matchers: &[ + "defusedxml.ElementTree.fromstring", + "defusedxml.ElementTree.parse", + "defusedxml.minidom.parseString", + "defusedxml.minidom.parse", + "defusedxml.sax.parseString", + "defusedxml.sax.parse", + "defusedxml.pulldom.parseString", + "defusedxml.pulldom.parse", + "defusedxml.lxml.fromstring", + "defusedxml.lxml.parse", + ], + label: DataLabel::Sanitizer(Cap::XXE), + case_sensitive: true, + }, ]; /// Method-call validators that strip caps from their *receiver* (and @@ -1041,6 +1247,55 @@ pub static GATED_SINKS: &[SinkGate] = &[ }, ]; +/// Prototype-pollution-style gates for Python. Opt-in via the +/// `NYX_PYTHON_PROTO_POLLUTION` env var (see +/// `super::env_python_proto_pollution`); when enabled they are merged +/// into the language's `GATED_REGISTRY` slice at startup. +/// +/// Coverage is deliberately narrow: the `dict.update(target, src)` +/// class-method form (where the first arg is the target and the second +/// is the source) is the canonical attack shape for `__class__` / +/// `__dict__` pollution in Python frameworks that thread user input +/// through configuration objects. The bound-method form +/// (`config.update(req_data)`) is handled by the suffix-matched +/// `dict.update` callee text only when the receiver text literally +/// equals `dict`, keeping the gate from over-firing on every `update` +/// method in the codebase. +pub static PROTO_POLLUTION_GATES: &[SinkGate] = &[ + // `dict.update(target, src)` — class-method form. Argument-role + // gating: only `src` (arg 1) taint activates; tainted target alone + // is benign. + SinkGate { + callee_matcher: "dict.update", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // `obj.__dict__.update(src)` — instance-attribute pollution shape. + SinkGate { + callee_matcher: "__dict__.update", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, +]; + pub static KINDS: Map<&'static str, Kind> = phf_map! { // control-flow "if_statement" => Kind::If, diff --git a/src/labels/ruby.rs b/src/labels/ruby.rs index 90656daa..1878b008 100644 --- a/src/labels/ruby.rs +++ b/src/labels/ruby.rs @@ -1,4 +1,6 @@ -use crate::labels::{Cap, DataLabel, Kind, LabelRule, ParamConfig, RuntimeLabelRule}; +use crate::labels::{ + Cap, DataLabel, GateActivation, Kind, LabelRule, ParamConfig, RuntimeLabelRule, SinkGate, +}; use crate::utils::project::{DetectedFramework, FrameworkContext}; use phf::{Map, phf_map}; @@ -226,10 +228,30 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::SQL_QUERY), case_sensitive: true, }, - // Open redirect: redirect_to with user-controlled destination. + // Open redirect: redirect_to (Rails) / redirect (Sinatra) with + // user-controlled destination. `redirect` is a top-level Sinatra + // helper; case-sensitive matching keeps it from over-firing on + // unrelated identifiers. `redirect_to` is the Rails canonical. LabelRule { matchers: &["redirect_to"], - label: DataLabel::Sink(Cap::SSRF), + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + LabelRule { + matchers: &["redirect"], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: true, + }, + LabelRule { + matchers: &[ + "validate_redirect_url", + "is_safe_redirect", + "strip_scheme", + "ensure_relative_url", + "assert_relative_path", + "is_relative_url", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), case_sensitive: false, }, // Path traversal: file serving with user-controlled path. @@ -244,6 +266,173 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::HTML_ESCAPE), case_sensitive: false, }, + // ─── LDAP injection sinks ─── + // + // `Net::LDAP.new(host:, ...).search(base:, filter:, ...)` is the canonical + // ruby-ldap shape. Type-qualified resolution rewrites `ldap.search` → + // `LdapClient.search` when the receiver was constructed via `Net::LDAP.new` + // / `Net::LDAP.open` (see [`crate::ssa::type_facts::constructor_type`]). + // The chained literal form `Net::LDAP.new(...).search(...)` is also caught + // by the suffix matcher `Net::LDAP.search` after `()` stripping (the + // post-strip text is `Net::LDAP.new.search`, which ends in `.search`; the + // explicit `LDAP.search` keyword form `Net::LDAP.search(filter)` matches + // the same matcher directly). + LabelRule { + matchers: &["LdapClient.search", "Net::LDAP.search"], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── LDAP-filter sanitizer ─── + // + // `Net::LDAP::Filter.escape(value)` applies RFC 4515 escaping; treat any + // call as clearing the LDAP_INJECTION cap. + LabelRule { + matchers: &["Net::LDAP::Filter.escape"], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── XPath injection sinks ─── + // + // `Nokogiri::XML::Node#xpath(expr)`, `at_xpath(expr)`, and `search(expr)` + // accept the expression string as arg 0; concatenated user input there is + // the canonical Nokogiri XPath-injection vector. Suffix matching on the + // bare method names catches the bound-receiver form (`doc.xpath(expr)`). + LabelRule { + matchers: &["xpath", "at_xpath"], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: true, + }, + // ─── XPath escape sanitizers ─── + // + // No Nokogiri / stdlib helper escapes XPath metacharacters; project-local + // `escape_xpath` / `xpath_escape` are the developer-named equivalents. + LabelRule { + matchers: &["escape_xpath", "xpath_escape"], + label: DataLabel::Sanitizer(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF injection sinks ─── + // + // Rack `Response#set_header(name, value)` / `add_header(name, value)` + // and `ActionDispatch::Response#headers[]=` write a single header value. + // The subscript-set form `response.headers["X-Foo"] = bar` is picked up + // via the LHS-subscript classification path in `cfg/mod.rs`: when the + // LHS object's member-expression text matches `response.headers` (or a + // synonym), the assignment is tagged as a HEADER_INJECTION sink. + // Tainted strings without `\r\n` stripping enable response splitting. + LabelRule { + matchers: &["set_header", "add_header"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + LabelRule { + matchers: &["response.headers", "res.headers", "self.response.headers"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + LabelRule { + matchers: &["strip_crlf", "escape_header", "sanitize_header"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── SSTI sinks ─── + // + // `ERB.new(template_source)` and `Liquid::Template.parse(source)` accept + // the template *source string* as arg 0; tainted source there yields + // arbitrary template execution at the corresponding `result(binding)` / + // `render` step. `=ERB.new` exact-matcher syntax limits the rule to the + // direct call (the leading `=` is the same convention used elsewhere in + // this file for Kernel-style globals like `=open`). + LabelRule { + matchers: &["=ERB.new", "Liquid::Template.parse"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: true, + }, + // ─── XXE sinks ─── + // + // `REXML::Document.new(xml)` instantiates the (legacy, default-vulnerable) + // pure-Ruby XML parser; an attacker-controlled `xml` is XXE. + // + // Nokogiri (`Nokogiri::XML(xml)` / `Nokogiri::XML::Document.parse(xml)`) + // is XXE-safe by default since 1.10, but resolving external entities + // requires explicitly opting in via `Nokogiri::XML::ParseOptions::NOENT` + // (or `DTDLOAD` / `DTDATTR`). Option-flagged detection lives in + // `GATED_SINKS` below. + LabelRule { + matchers: &["REXML::Document.new"], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + }, +]; + +/// Ruby gated sinks. Argument-role-aware classification for callees that +/// are XXE-safe by default but become unsafe when the caller passes an +/// option flag that re-enables external-entity resolution. +/// +/// Activation uses the bare-leaf comparison: scope-qualified constants like +/// `Nokogiri::XML::ParseOptions::NOENT` are reduced to the rightmost +/// `name` segment by the `scope_resolution` branch in +/// `cfg::literals::extract_const_macro_arg`, so the +/// `dangerous_values` list stays identifier-bare. +/// +/// Default-arg semantics: Ruby `Nokogiri::XML(xml)` with no options arg +/// reaches the gate's `None` activation branch (the activation arg +/// position simply doesn't exist), which falls through to a conservative +/// fire. Callers wishing to suppress the gate explicitly should pass a +/// safe options literal at the activation position (e.g. +/// `Nokogiri::XML::ParseOptions::DEFAULT_XML`); any non-dangerous +/// scope-qualified constant disables the gate. +pub static GATED_SINKS: &[SinkGate] = &[ + // `Nokogiri::XML(xml, url=nil, encoding=nil, options=NIL)` — top-level + // module method. arg 3 carries the parse-option flag literal. + // + // tree-sitter-ruby parses `Nokogiri::XML(args)` as a `call` whose + // `receiver` field is the `Nokogiri` constant and `method` field is + // the `XML` constant (with `::` as the call operator). `push_node`'s + // `CallMethod` path joins these as `{receiver}.{method}` → matchable + // suffix `Nokogiri.XML`. + SinkGate { + callee_matcher: "Nokogiri.XML", + arg_index: 3, + dangerous_values: &["NOENT", "DTDLOAD", "DTDATTR"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // `Nokogiri::XML::Document.parse(xml, url=nil, encoding=nil, options=NIL)` + // — receiver is the scope_resolution `Nokogiri::XML::Document` (text of + // the whole receiver is preserved verbatim) and method is `parse`, so + // the constructed callee text is `Nokogiri::XML::Document.parse`. + SinkGate { + callee_matcher: "Nokogiri::XML::Document.parse", + arg_index: 3, + dangerous_values: &["NOENT", "DTDLOAD", "DTDATTR"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, + // `Nokogiri::HTML(html, ..., options)` shares the same option flags as + // the XML helper. Same callee normalization as `Nokogiri.XML`. + SinkGate { + callee_matcher: "Nokogiri.HTML", + arg_index: 3, + dangerous_values: &["NOENT", "DTDLOAD", "DTDATTR"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, ]; pub static KINDS: Map<&'static str, Kind> = phf_map! { diff --git a/src/labels/rust.rs b/src/labels/rust.rs index da6255d7..c384efc4 100644 --- a/src/labels/rust.rs +++ b/src/labels/rust.rs @@ -1,4 +1,6 @@ -use crate::labels::{Cap, DataLabel, Kind, LabelRule, ParamConfig, RuntimeLabelRule}; +use crate::labels::{ + Cap, DataLabel, GateActivation, Kind, LabelRule, ParamConfig, RuntimeLabelRule, SinkGate, +}; use crate::utils::project::{DetectedFramework, FrameworkContext}; use phf::{Map, phf_map}; @@ -245,6 +247,89 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::DESERIALIZE), case_sensitive: false, }, + // ─── Header / CRLF injection sinks ─── + // + // `http::HeaderMap::insert(name, val)` / `append(...)` write a single + // header value. The canonical idiom is `response.headers_mut().insert(...)` + // (axum, actix-web `HttpResponse.headers_mut`, hyper `Response::headers_mut`). + // After paren-group stripping the chain text becomes + // `response.headers_mut.insert`, so suffix matchers on + // `headers_mut.insert` / `headers_mut.append` cover the bound-receiver + // form regardless of the response builder's concrete type. Tainted + // strings without CRLF stripping enable response splitting. + LabelRule { + matchers: &["headers_mut.insert", "headers_mut.append"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + LabelRule { + matchers: &["strip_crlf", "escape_header", "sanitize_header"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Open redirect sinks ─── + // + // axum / rocket `Redirect::to(url)` / `Redirect::permanent(url)` / + // `Redirect::temporary(url)` build a 3xx response with the URL in the + // `Location` header. Without an allowlist check, a tainted `url` is + // the canonical Rust open-redirect vector. Listed unconditionally (not + // gated on framework detection) so non-framework helpers / re-exports + // still surface; the framework-conditional rules below are + // intentionally not duplicating this label. Actix + // `HttpResponse::Found().header("Location", x)` is covered by the + // existing `header` HEADER_INJECTION sink and any Location-line + // co-tagging is deferred to the abstract-string-domain pattern hook. + LabelRule { + matchers: &["Redirect::to", "Redirect::permanent", "Redirect::temporary"], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: true, + }, + LabelRule { + matchers: &[ + "validate_redirect_url", + "is_safe_redirect", + "strip_scheme", + "ensure_relative_url", + "assert_relative_path", + "is_relative_url", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, +]; + +/// Rust gated sinks. Argument-position-aware classification for callees +/// where activation depends on a literal arg value rather than the bare +/// callee name. +pub static GATED_SINKS: &[SinkGate] = &[ + // actix-web `HttpResponse::Found().header("Location", url)` (and other + // builder variants like `Ok().header(...)`, `MovedPermanently().header(...)`). + // After chain normalisation the callee text is e.g. + // `HttpResponse.Found.header`; suffix matching on `header` covers every + // builder variant. + // + // Activation: arg 0 case-insensitive equality with `"Location"`. When + // arg 0 is a constant string equal to `Location` the gate fires and + // checks payload arg 1 for taint; constants like `"Content-Type"` are + // suppressed by the safe-literal branch. When arg 0 is dynamic the + // gate fires conservatively (per the existing `setAttribute` / + // `parseFromString` convention). + // + // Mirrors PHP's `=header` Location gate; the Rust analog is split + // across two args (`name`, `value`) instead of PHP's single `Location: ...` + // line. + SinkGate { + callee_matcher: "header", + arg_index: 0, + dangerous_values: &["Location"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: true, + payload_args: &[1], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::ValueMatch, + }, ]; pub static KINDS: Map<&'static str, Kind> = phf_map! { @@ -337,11 +422,8 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec { label: DataLabel::Sink(Cap::HTML_ESCAPE), case_sensitive: true, }); - rules.push(RuntimeLabelRule { - matchers: vec!["Redirect::to".into()], - label: DataLabel::Sink(Cap::SSRF), - case_sensitive: true, - }); + // `Redirect::to` is declared unconditionally as Sink(OPEN_REDIRECT) + // in `RULES` above; no framework-conditional duplicate needed. } if ctx.has(DetectedFramework::ActixWeb) { @@ -395,11 +477,8 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec { label: DataLabel::Sink(Cap::HTML_ESCAPE), case_sensitive: true, }); - rules.push(RuntimeLabelRule { - matchers: vec!["Redirect::to".into()], - label: DataLabel::Sink(Cap::SSRF), - case_sensitive: true, - }); + // `Redirect::to` is declared unconditionally as Sink(OPEN_REDIRECT) + // in `RULES` above; no framework-conditional duplicate needed. } rules diff --git a/src/labels/typescript.rs b/src/labels/typescript.rs index a5f5c413..b933bdca 100644 --- a/src/labels/typescript.rs +++ b/src/labels/typescript.rs @@ -255,6 +255,113 @@ pub static RULES: &[LabelRule] = &[ label: DataLabel::Sink(Cap::SQL_QUERY), case_sensitive: true, }, + // ─── LDAP injection sinks ─── + // + // Mirror of `labels/javascript.rs`; ldapjs / ts-ldapjs has the same + // `client.search(...)` shape. Type-qualified resolution covers both + // `const client = ldap.createClient({...}); client.search(...)` (bound + // variable, type forwarded from the parent body via + // [`crate::taint::inject_external_type_facts`]) and the chained + // `ldap.createClient({...}).search(...)` form. + LabelRule { + matchers: &["LdapClient.search"], + label: DataLabel::Sink(Cap::LDAP_INJECTION), + case_sensitive: true, + }, + // ─── LDAP-filter sanitizers ─── + LabelRule { + matchers: &[ + "ldapEscape", + "ldap-escape", + "ldapescape.filter", + "ldapescape.dn", + ], + label: DataLabel::Sanitizer(Cap::LDAP_INJECTION), + case_sensitive: false, + }, + // ─── XPath injection sinks ─── (mirrors `labels/javascript.rs`) + LabelRule { + matchers: &[ + "document.evaluate", + "xpath.select", + "xpath.evaluate", + "xpath.select1", + ], + label: DataLabel::Sink(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── XPath escape sanitizers ─── (mirrors `labels/javascript.rs`) + LabelRule { + matchers: &["escapeXpath", "xpathEscape", "escape_xpath"], + label: DataLabel::Sanitizer(Cap::XPATH_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF injection sinks ─── (mirrors `labels/javascript.rs`) + LabelRule { + matchers: &["setHeader", "res.set", "res.header", "res.append"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // Subscript-set form (mirrors `labels/javascript.rs`). + LabelRule { + matchers: &["res.headers", "response.headers", "self.response.headers"], + label: DataLabel::Sink(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Header / CRLF sanitizers ─── (mirrors `labels/javascript.rs`) + LabelRule { + matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"], + label: DataLabel::Sanitizer(Cap::HEADER_INJECTION), + case_sensitive: false, + }, + // ─── Prototype pollution sinks ─── (mirrors `labels/javascript.rs`) + // + // Argument-role gating is enforced via Destination activation in + // `GATED_SINKS` below: only taint flowing into source-object + // arguments (positions 1+) activates; tainted-target alone is + // benign. Flat rules here are intentionally empty for the merge + // family. + // ─── Open redirect sinks ─── (mirrors `labels/javascript.rs`) + LabelRule { + matchers: &[ + "res.redirect", + "location.replace", + "location.assign", + "router.navigate", + "router.navigateByUrl", + "window.location", + "window.location.href", + "location.href", + ], + label: DataLabel::Sink(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + LabelRule { + matchers: &[ + "validateRedirectUrl", + "isSafeRedirect", + "stripScheme", + "ensureRelativeUrl", + "assertRelativePath", + "isRelativeUrl", + ], + label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT), + case_sensitive: false, + }, + // ─── SSTI sinks ─── (mirrors `labels/javascript.rs`; `_.template` + // and `nunjucks.renderString` excluded — gated classifiers in + // GATED_SINKS) + LabelRule { + matchers: &["Handlebars.compile"], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + }, + // ─── XXE sinks ─── (mirrors `labels/javascript.rs`) + LabelRule { + matchers: &["libxmljs.parseXmlString", "libxmljs.parseXml"], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + }, ]; /// Callee patterns that must never be classified as source/sanitizer/sink. @@ -309,6 +416,23 @@ pub static GATED_SINKS: &[SinkGate] = &[ dangerous_kwargs: &[], activation: GateActivation::ValueMatch, }, + // ── XML XXE gates, mirrors `labels/javascript.rs` ──────────────────── + SinkGate { + callee_matcher: "xml2js.parseString", + arg_index: 1, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::XXE), + case_sensitive: true, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[ + ("processEntities", &["true"]), + ("explicitEntities", &["true"]), + ("strict", &["false"]), + ], + activation: GateActivation::ValueMatch, + }, // ── Outbound HTTP clients (SSRF), see javascript.rs for rationale ──── SinkGate { callee_matcher: "fetch", @@ -603,6 +727,189 @@ pub static GATED_SINKS: &[SinkGate] = &[ object_destination_fields: &[], }, }, + // `nunjucks.renderString(src, ctx)` — Nunjucks SSTI sink. Only the + // template *source* (arg 0) lets an attacker drive template + // execution; the `ctx` data object (arg 1) is rendered via the + // template's escape policy and is not itself a code-injection + // vector. Gate via Destination-style activation with + // `payload_args: &[0]` so taint flowing only into `ctx` is + // suppressed. Mirrors `labels/javascript.rs`. + SinkGate { + callee_matcher: "nunjucks.renderString", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::SSTI), + case_sensitive: false, + payload_args: &[0], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // ── Prototype pollution gates ──────────────────────────────────────── + // + // Mirrors `labels/javascript.rs` GATED_SINKS proto-pollution block. + // Argument-role gating: `(target, src1, src2, ...)`, only source + // positions trigger. See the JS module for the rationale and the + // `payload_args` width choice. + SinkGate { + callee_matcher: "_.merge", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.mergeWith", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.defaultsDeep", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.set", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "_.setWith", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "deepMerge", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "defaultsDeep", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: false, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "Object.assign", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "$.extend", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2, 3, 4], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + SinkGate { + callee_matcher: "jQuery.extend", + arg_index: 0, + dangerous_values: &[], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[1, 2, 3, 4], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::Destination { + object_destination_fields: &[], + }, + }, + // Bare `extend` (suffix-matched) — see labels/javascript.rs for full + // rationale. `LiteralOnly` activation requires arg 0 to be literal `true` + // so Backbone's `Model.extend({proto})` class-extension form does not + // fire (its arg 0 is an object literal, not a boolean). + SinkGate { + callee_matcher: "extend", + arg_index: 0, + dangerous_values: &["true"], + dangerous_prefixes: &[], + label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION), + case_sensitive: true, + payload_args: &[2, 3, 4, 5], + keyword_name: None, + dangerous_kwargs: &[], + activation: GateActivation::LiteralOnly, + }, ]; pub static KINDS: Map<&'static str, Kind> = phf_map! { diff --git a/src/server/debug.rs b/src/server/debug.rs index 6bf0868d..d46fd5ee 100644 --- a/src/server/debug.rs +++ b/src/server/debug.rs @@ -1181,7 +1181,12 @@ fn type_kind_tag(k: &TypeKind) -> String { TypeKind::LocalCollection => "LocalCollection".into(), TypeKind::RequestBuilder => "RequestBuilder".into(), TypeKind::JpaCriteriaQuery => "JpaCriteriaQuery".into(), + TypeKind::LdapClient => "LdapClient".into(), + TypeKind::XPathClient => "XPathClient".into(), + TypeKind::XmlParser => "XmlParser".into(), + TypeKind::Template => "Template".into(), TypeKind::Dto(_) => "Dto".into(), + TypeKind::NullPrototypeObject => "NullPrototypeObject".into(), } } @@ -1538,6 +1543,8 @@ pub fn analyse_function_taint( receiver_seed: None, const_values: Some(&opt.const_values), type_facts: Some(&opt.type_facts), + xml_parser_config: Some(&opt.xml_parser_config), + xpath_config: Some(&opt.xpath_config), ssa_summaries: None, extra_labels: None, callee_bodies: None, diff --git a/src/server/models.rs b/src/server/models.rs index 0acc5f70..ee92a151 100644 --- a/src/server/models.rs +++ b/src/server/models.rs @@ -138,6 +138,7 @@ pub struct RuleListItem { pub enabled: bool, pub is_custom: bool, pub is_gated: bool, + pub is_class: bool, pub case_sensitive: bool, pub finding_count: usize, pub suppression_rate: f64, @@ -156,6 +157,7 @@ pub struct RuleDetailView { pub enabled: bool, pub is_custom: bool, pub is_gated: bool, + pub is_class: bool, pub finding_count: usize, pub suppression_rate: f64, pub example_findings: Vec, diff --git a/src/server/owasp.rs b/src/server/owasp.rs index c4dafb0a..217afb5b 100644 --- a/src/server/owasp.rs +++ b/src/server/owasp.rs @@ -25,6 +25,20 @@ fn extract_family(rule_id: &str) -> &str { rule_id } +/// True when `rule_id` either equals `prefix` or starts with `prefix` +/// followed by one of the recognised separator characters used by the +/// finding-id emitter. Prevents `taint-ssrf-allowlist-violation` +/// from silently inheriting `taint-ssrf`'s OWASP bucket. +fn matches_cap_rule_id(rule_id: &str, prefix: &str) -> bool { + if !rule_id.starts_with(prefix) { + return false; + } + matches!( + rule_id.as_bytes().get(prefix.len()), + None | Some(b' ') | Some(b'(') | Some(b'.') + ) +} + /// Return the OWASP 2021 (code, label) pair for a given rule id, or `None` if unmapped. pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> { let family = extract_family(rule_id); @@ -32,6 +46,27 @@ pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> { return None; } + // Cap-class rule ids carry their canonical OWASP code in + // `CAP_RULE_REGISTRY`; consult that first so adding a new cap class + // does not require updating two tables. The legacy family-token + // dispatch below covers per-language tree-sitter pattern rules + // (`js.xss.outer_html` style) that have no cap entry. + // + // Match shape: exact equality, or registry id followed by a separator + // that the emitter actually uses (` ` for ` (source 1:1)` suffixes, + // `(` for `(source 1:1)` style without a leading space, `.` for + // dotted variants like `rs.auth.missing_ownership_check.taint`). + // Plain `starts_with` would silently bucket a future + // `taint-ssrf-allowlist-violation` under the SSRF entry; the + // separator gate keeps unrelated suffixes from inheriting a parent + // bucket. + if let Some(meta) = crate::labels::CAP_RULE_REGISTRY + .iter() + .find(|m| matches_cap_rule_id(rule_id, m.rule_id)) + { + return Some((meta.owasp_code, meta.owasp_label)); + } + Some(match family { // A01, Broken Access Control "auth" | "csrf" | "mass_assign" | "path" | "redirect" => ("A01", "Broken Access Control"), @@ -39,10 +74,10 @@ pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> { "crypto" | "secrets" => ("A02", "Cryptographic Failures"), // A03, Injection (covers SQLi, XSS, command, code-eval, template, NoSQL, LDAP, reflection, // and engine-level taint findings without a more specific family tag). - "sqli" | "xss" | "cmdi" | "code_exec" | "template" | "nosql" | "ldap" | "reflection" - | "taint" => ("A03", "Injection"), - // A05, Security Misconfiguration (TLS verify off, cookie flags, prototype pollution) - "config" | "transport" | "prototype" => ("A05", "Security Misconfiguration"), + "sqli" | "xss" | "cmdi" | "code_exec" | "template" | "nosql" | "ldap" | "xpath" + | "header" | "reflection" | "taint" => ("A03", "Injection"), + // A05, Security Misconfiguration (TLS verify off, cookie flags, prototype pollution, XXE) + "config" | "transport" | "prototype" | "xxe" => ("A05", "Security Misconfiguration"), // A08, Software and Data Integrity Failures "deser" => ("A08", "Software and Data Integrity Failures"), // A09, Logging & Monitoring Failures @@ -112,6 +147,30 @@ fn issue_category_label(rule_id: &str) -> &'static str { if rule_id.starts_with("taint-data-exfiltration") { return "Data Exfiltration"; } + // Cap-class rule ids share the `taint` family token but each represent + // a distinct vulnerability class. Match them before falling through + // to family-based dispatch so the dashboard surfaces the right badge. + if rule_id.starts_with("taint-ldap-injection") { + return "LDAP Injection"; + } + if rule_id.starts_with("taint-xpath-injection") { + return "XPath Injection"; + } + if rule_id.starts_with("taint-header-injection") { + return "Header Injection"; + } + if rule_id.starts_with("taint-open-redirect") { + return "Open Redirect"; + } + if rule_id.starts_with("taint-template-injection") { + return "Template Injection"; + } + if rule_id.starts_with("taint-xxe") { + return "XXE"; + } + if rule_id.starts_with("taint-prototype-pollution") { + return "Prototype Pollution"; + } match extract_family(rule_id) { "sqli" => "SQL Injection", "xss" => "Cross-Site Scripting", @@ -229,6 +288,40 @@ mod tests { assert_eq!(out[2].count, 2); } + #[test] + fn cap_rule_id_match_requires_separator() { + // Exact match → bucketed. + assert_eq!( + owasp_bucket_for("taint-ssrf"), + Some(("A10", "Server-Side Request Forgery")) + ); + // Suffix after recognised separators is bucketed. + assert_eq!( + owasp_bucket_for("taint-ssrf (source 1:1)"), + Some(("A10", "Server-Side Request Forgery")) + ); + assert_eq!( + owasp_bucket_for("taint-ssrf(source 1:1)"), + Some(("A10", "Server-Side Request Forgery")) + ); + // Dotted suffix (used by `rs.auth.missing_ownership_check.taint`). + assert_eq!( + owasp_bucket_for("rs.auth.missing_ownership_check.taint"), + Some(("A01", "Broken Access Control")) + ); + // Hyphenated suffix without separator must NOT silently inherit + // the parent bucket. Falls through to the family-token table, + // where `ssrf` still resolves to A10, so use a hypothetical + // sibling that would only resolve via the cap registry. + assert_eq!( + owasp_bucket_for("taint-ldap-injection-allowlist"), + // Family token "taint" → A03; without separator gating this + // would have inherited the LDAP entry's A03 anyway, but the + // important property is that the registry match was rejected. + Some(("A03", "Injection")) + ); + } + #[test] fn issue_category_label_routes_data_exfil_to_dedicated_bucket() { // `taint-data-exfiltration` shares the `taint` family token with diff --git a/src/server/routes/rules.rs b/src/server/routes/rules.rs index 25205fe5..049da5b5 100644 --- a/src/server/routes/rules.rs +++ b/src/server/routes/rules.rs @@ -53,6 +53,8 @@ fn build_rule_list(state: &AppState) -> Vec { case_sensitive: cr.case_sensitive, is_custom: true, is_gated: false, + is_class: false, + emission_active: true, enabled, }); } @@ -89,6 +91,7 @@ async fn list_rules(State(state): State) -> Json> { enabled: r.enabled, is_custom: r.is_custom, is_gated: r.is_gated, + is_class: r.is_class, case_sensitive: r.case_sensitive, finding_count: count, suppression_rate: rate, @@ -134,6 +137,7 @@ async fn get_rule( enabled: rule.enabled, is_custom: rule.is_custom, is_gated: rule.is_gated, + is_class: rule.is_class, finding_count: total, suppression_rate: rate, example_findings: examples, diff --git a/src/ssa/mod.rs b/src/ssa/mod.rs index b56c60cc..2e275090 100644 --- a/src/ssa/mod.rs +++ b/src/ssa/mod.rs @@ -31,6 +31,8 @@ pub mod param_points_to; pub mod pointsto; pub mod static_map; pub mod type_facts; +pub mod xml_config; +pub mod xpath_config; #[allow(unused_imports)] pub use ir::*; @@ -51,6 +53,20 @@ pub struct OptimizeResult { pub const_values: HashMap, /// Type fact analysis results. pub type_facts: type_facts::TypeFactResult, + /// XML-parser configuration facts: per-receiver SSA value + /// `secure_processing` / `disallow_doctype` / `external_entities` + /// flags carried forward from setter calls and constructor kwargs. + /// Consumed by the SSA taint engine to suppress XXE on parse-class + /// sinks whose receiver was provably hardened. + #[serde(default)] + pub xml_parser_config: xml_config::XmlParserConfigResult, + /// XPath-receiver configuration facts: per-receiver SSA value + /// `has_resolver` flag set by `setXPathVariableResolver` calls. + /// Consumed by the SSA taint engine to suppress XPATH_INJECTION on + /// `evaluate` / `compile` sinks whose receiver was provably bound + /// to a variable resolver (parameterised XPath shape). + #[serde(default)] + pub xpath_config: xpath_config::XPathConfigResult, /// Base-variable alias groups from copy propagation. pub alias_result: alias::BaseAliasResult, /// Points-to analysis: per-SSA-value abstract heap object sets. @@ -100,6 +116,17 @@ pub fn optimize_ssa_with_param_types( let type_facts = type_facts::analyze_types_with_param_types(body, cfg, &cp.values, lang, param_types); + // 5b. XML-parser config analysis. Tracks per-receiver hardening + // flags so XXE sinks can be suppressed when the parser was provably + // configured for secure processing. + let xml_parser_config = xml_config::analyze_xml_parser_config(body, cfg, &cp.values, lang); + + // 5c. XPath-receiver config analysis. Tracks per-receiver + // `has_resolver` flag so `XPath.evaluate(taintedExpr, ...)` sinks + // can be suppressed when the receiver was bound to an + // `XPathVariableResolver` (parameterised-XPath shape). + let xpath_config = xpath_config::analyze_xpath_config(body, cfg, lang); + // 6. Points-to analysis (uses allocation site detection + SSA def-use) let points_to = heap::analyze_points_to(body, cfg, lang); @@ -113,6 +140,8 @@ pub fn optimize_ssa_with_param_types( OptimizeResult { const_values: cp.values, type_facts, + xml_parser_config, + xpath_config, alias_result, points_to, module_aliases, diff --git a/src/ssa/type_facts.rs b/src/ssa/type_facts.rs index d02f6a96..32f24fa2 100644 --- a/src/ssa/type_facts.rs +++ b/src/ssa/type_facts.rs @@ -52,12 +52,55 @@ pub enum TypeKind { /// where openmrs / xwiki / keycloak Hibernate DAOs build queries /// via `cb.createQuery(Foo.class)` + `Root` / `Predicate` API. JpaCriteriaQuery, + /// An LDAP directory-service client / connection (`DirContext`, + /// `LdapTemplate`, `Net::LDAP`, `ldap3.Connection`, `ldap.createClient`, + /// `ldap.DialURL`, etc.). Distinct from `DatabaseConnection` so the + /// type-qualified `LdapClient.search` rule fires only on directory + /// search APIs rather than every DB receiver with a `search` method. + LdapClient, + /// An XPath query / evaluation client (`DOMXPath`, `XPath`, + /// `XPathExpression`, `lxml.etree.XPath`, etc.). Distinct from + /// `DatabaseConnection` so the type-qualified `XPathClient.query` / + /// `XPathClient.evaluate` rules fire only on XPath APIs rather than + /// every receiver with a generic `query` / `evaluate` method (avoids + /// collision with PHP `$pdo->query` SQL_QUERY sink). + XPathClient, + /// A pre-parsed template object whose `process` / `merge` / + /// `render` method renders bound data through an already-compiled + /// template body. The SSTI vector is when the template *source* + /// fed to the constructor / factory was attacker-influenced; the + /// render-time call site is the sink. Currently populated by + /// `new freemarker.template.Template(...)`; the type-qualified + /// resolver rewrites `tpl.process(...)` → `Template.process` so + /// the existing flat SSTI rule fires on idiomatic + /// `Template tpl = new Template(...); tpl.process(model, out)` + /// shapes. + Template, + /// An XML parser instance produced by a JAXP factory call + /// (`DocumentBuilderFactory.newDocumentBuilder()`, + /// `SAXParserFactory.newSAXParser()`, `XMLReaderFactory.createXMLReader()`). + /// `DOMXPath` and friends keep their own `XPathClient` tag. Used so + /// the type-qualified `XmlParser.parse` rule fires on instance-style + /// calls (`builder.parse(input)`) without needing a flat-rule + /// matcher per concrete subclass. Also gates the XXE config-fact + /// suppression: only XmlParser-typed receivers consult the + /// [`crate::ssa::xml_config::XmlParserConfigResult`] sidecar. + XmlParser, /// A framework-injected DTO body whose field types are known. /// Populated when a parameter is recognised as a typed extractor and /// the DTO class / struct / Pydantic model is resolvable in scope. /// Strictly additive, without a DTO definition, callers fall back /// to name-only resolution. Dto(DtoFields), + /// An object created with `Object.create(null)` — has no prototype + /// chain, so subscript-write keys cannot pollute `Object.prototype`. + /// Populated for JS/TS values whose constructor call is + /// `Object.create(null)`. The PROTOTYPE_POLLUTION suppression at the + /// synthetic `__index_set__` sink consults this fact (via SSA receiver + /// value) so the suppression is flow-sensitive: if a phi join leaves + /// the receiver only sometimes null-prototyped, the fact widens to + /// `Unknown` and the sink fires on the unsafe path. + NullPrototypeObject, } /// structural carrier for a recognised DTO type. Maps @@ -99,6 +142,10 @@ impl TypeKind { Self::Url => Some("URL"), Self::RequestBuilder => Some("RequestBuilder"), Self::JpaCriteriaQuery => Some("JpaCriteriaQuery"), + Self::LdapClient => Some("LdapClient"), + Self::XPathClient => Some("XPathClient"), + Self::XmlParser => Some("XmlParser"), + Self::Template => Some("Template"), _ => None, } } @@ -288,9 +335,11 @@ pub fn is_safe_query_object_arg( /// authoritative, and consumers see Unknown instead of a wrong /// type tag. /// -/// `_args` and `_consts` are kept on the signature so we can later -/// add arg-shape narrowing when class-literal lowering captures -/// `Foo.class` as an arg-use. +/// `_args` and `_consts` allow arg-shape narrowing when an arg's +/// constant value distinguishes overloads. Reserved for future Java +/// `createQuery(Foo.class)` shape (the `Object.create(null)` case is +/// driven by the `produces_null_proto` CFG flag instead, since a +/// literal `null` arg leaves no SSA value to inspect). fn arg_aware_call_type( lang: Lang, callee: &str, @@ -392,6 +441,40 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option { "createCriteriaUpdate" | "createCriteriaDelete" | "createTupleQuery" | "subquery" => { Some(TypeKind::JpaCriteriaQuery) } + // LDAP directory-service clients. `new InitialDirContext(env)` / + // `new InitialLdapContext(env, ctls)` instantiate the JNDI LDAP + // provider; `new LdapTemplate(...)` / `LdapTemplate.` is the + // Spring LDAP wrapper. Both expose `search` / `searchByEntity` + // /`searchForObject` overloads where filter/DN strings are LDAP + // injection sinks. + "InitialDirContext" | "InitialLdapContext" | "LdapTemplate" => { + Some(TypeKind::LdapClient) + } + // JAXP factory-produced XML parser instances. Each is + // XXE-vulnerable by default until hardened with + // `setFeature(FEATURE_SECURE_PROCESSING, true)` (or + // disallow-doctype-decl, etc.). The + // [`crate::ssa::xml_config::XmlParserConfigResult`] sidecar + // suppresses the XXE bit at the type-qualified `XmlParser.parse` + // sink when the receiver carries a hardening fact. + "newDocumentBuilder" | "newSAXParser" | "getXMLReader" | "newXMLReader" + | "createXMLReader" => Some(TypeKind::XmlParser), + // `XPathFactory.newXPath()` returns a JAXP `XPath` instance. + // Mapping it to `XPathClient` lets the type-qualified resolver + // pick up `xpath.evaluate(...)` against the existing + // `XPathClient.evaluate` rule and lets the + // [`crate::ssa::xpath_config::XPathConfigResult`] sidecar + // suppress XPATH_INJECTION when the receiver was bound to an + // `XPathVariableResolver`. + "newXPath" => Some(TypeKind::XPathClient), + // Apache FreeMarker `new Template(name, reader, cfg)` / + // `cfg.getTemplate(name)`. The `Template` instance's + // `.process(model, out)` is an SSTI sink when the + // constructor source / template body came from tainted + // input. Type-qualified resolution rewrites + // `tpl.process(...)` → `Template.process` against the + // existing flat rule in `labels/java.rs`. + "Template" | "getTemplate" => Some(TypeKind::Template), _ => None, }, Lang::JavaScript | Lang::TypeScript => match suffix { @@ -409,6 +492,12 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option { // `elementsMap.get(id)`, `origIdToDuplicateId.get(...)`, // `groupIdMapForOperation.set(...)` shapes). "Map" | "Set" | "WeakMap" | "WeakSet" | "Array" => Some(TypeKind::LocalCollection), + // ldapjs client factory: `ldap.createClient({ url: '…' })` returns + // a Client whose `search(base, opts, cb)` is an LDAP injection + // sink. Match the qualified callee text rather than the bare + // `createClient` suffix to avoid widening to unrelated factories + // with the same verb name. + "createClient" if callee.contains("ldap") => Some(TypeKind::LdapClient), _ => None, }, Lang::Python => { @@ -429,6 +518,15 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option { } else if suffix == "open" && !callee.contains('.') { // Bare `open()` is file I/O in Python Some(TypeKind::FileHandle) + } else if callee == "ldap.initialize" + || callee == "ldap3.Connection" + || callee.ends_with(".initialize") && callee.contains("ldap") + { + // python-ldap: `conn = ldap.initialize(url)` returns an + // LDAPObject whose `search_s` / `search_ext_s` methods are + // LDAP-injection sinks. ldap3: `Connection(server, ...)` + // returns a Connection with a `search()` method. + Some(TypeKind::LdapClient) } else { None } @@ -442,6 +540,10 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option { Some(TypeKind::FileHandle) } else if callee.contains("url.") && suffix == "Parse" { Some(TypeKind::Url) + } else if callee.contains("ldap.") && matches!(suffix, "Dial" | "DialURL" | "DialTLS") { + // go-ldap (`github.com/go-ldap/ldap/v3`): `conn, _ := ldap.DialURL(url)` + // returns `*ldap.Conn` whose `Search(req)` is an LDAP-injection sink. + Some(TypeKind::LdapClient) } else { None } @@ -451,6 +553,10 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option { "curl_init" => Some(TypeKind::HttpClient), "fopen" => Some(TypeKind::FileHandle), "SplFileObject" => Some(TypeKind::FileHandle), + // DOMXPath: `$xp = new DOMXPath($doc)`. `$xp->query($expr)` / + // `$xp->evaluate($expr)` are XPath-injection sinks; without a + // distinct TypeKind they collide with the bare `query` SQL sink. + "DOMXPath" => Some(TypeKind::XPathClient), _ => None, }, Lang::C => match suffix { @@ -524,6 +630,11 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option { Some(TypeKind::DatabaseConnection) } else if after_colons.starts_with("File.") && matches!(suffix, "open" | "new") { Some(TypeKind::FileHandle) + } else if callee.contains("Net::LDAP") && matches!(suffix, "new" | "open") { + // net-ldap gem: `Net::LDAP.new(host: ...)` / `Net::LDAP.open` + // returns a connection whose `search(base:, filter:)` accepts + // an attacker-influenceable filter expression. + Some(TypeKind::LdapClient) } else { None } @@ -768,8 +879,7 @@ pub fn analyze_types( /// Same as [`analyze_types`] but seeds [`SsaOp::Param`] values with /// per-position [`TypeKind`] facts from `param_types` (parallel-vec to /// the function's BodyMeta.params). An entry of `None` (or an out-of- -/// range index) leaves the value at the default Param fact (Unknown), -/// preserving the pre-Phase-3 behaviour. +/// range index) leaves the value at the default Param fact (Unknown). pub fn analyze_types_with_param_types( body: &SsaBody, cfg: &Cfg, @@ -810,8 +920,7 @@ pub fn analyze_types_with_param_types( SsaOp::Param { index } => { // Seed from the function's BodyMeta.param_types when // a TypeKind was recovered at CFG construction time. - // Out-of-range / None entries fall back to Unknown, - // matching the pre-Phase-3 behaviour. + // Out-of-range / None entries fall back to Unknown. match param_types.get(*index).and_then(|t| t.clone()) { Some(tk) => TypeFact::from_kind(tk), None => TypeFact::unknown(), @@ -820,7 +929,19 @@ pub fn analyze_types_with_param_types( SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object), SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object), SsaOp::Call { callee, args, .. } => { - if let Some(ty) = lang.and_then(|l| constructor_type(l, callee)) { + // CFG marks `Object.create(null)` (and future + // null-prototype constructors) at lowering time. + // Honour it ahead of generic constructor / arg-aware + // dispatch so the returned SsaValue carries + // `NullPrototypeObject` for prototype-pollution + // suppression. + let null_proto = cfg + .node_weight(inst.cfg_node) + .map(|ni| ni.call.produces_null_proto) + .unwrap_or(false); + if null_proto { + TypeFact::from_kind(TypeKind::NullPrototypeObject) + } else if let Some(ty) = lang.and_then(|l| constructor_type(l, callee)) { TypeFact::from_kind(ty) } else if let Some(ty) = lang.and_then(|l| arg_aware_call_type(l, callee, args, consts)) @@ -1667,7 +1788,7 @@ mod tests { /// Param values seeded from `param_types` must surface /// the right TypeKind for downstream sink suppression. An out-of- - /// range index falls back to Unknown (the pre-Phase-3 default). + /// range index falls back to Unknown. #[test] fn param_types_seed_param_value_facts() { use crate::cfg::Cfg; @@ -1728,7 +1849,7 @@ mod tests { // Index 99 is out of range → falls back to Unknown. assert_eq!(result.get_type(SsaValue(1)), Some(&TypeKind::Unknown)); - // Empty slice = pre-Phase-3 behaviour. + // Empty slice = type-unaware fallback (analyze_types path). let result2 = analyze_types(&body, &cfg, &consts, Some(Lang::Java)); assert_eq!(result2.get_type(SsaValue(0)), Some(&TypeKind::Unknown)); } @@ -2364,7 +2485,7 @@ mod tests { )); } - // ── JPA Criteria query suppression (Phase: real-repo openmrs FP) ─── + // ── JPA Criteria query suppression (real-repo openmrs FP) ───────── // // These tests pin the `TypeKind::JpaCriteriaQuery` variant + the // `is_safe_query_object_arg` predicate + the diff --git a/src/ssa/xml_config.rs b/src/ssa/xml_config.rs new file mode 100644 index 00000000..0ab55c43 --- /dev/null +++ b/src/ssa/xml_config.rs @@ -0,0 +1,614 @@ +//! Per-SSA-value XML-parser configuration tracking. +//! +//! Tracks "is this XML parser configured to disable external entities / DTD +//! resolution" facts on parser-receiver SSA values. When a parse-class sink +//! is reached and the receiver is provably configured for secure processing, +//! the XXE bit is stripped from the sink's cap mask. +//! +//! The pass is intentionally a small forward dataflow run alongside type-fact +//! analysis. It does NOT flow through the SSA taint engine's worklist. Phi +//! nodes propagate the meet of operand configs (a flag is "set" only when all +//! reaching operands set it), and copy assignments propagate the receiver's +//! config. Recognised setter calls update the receiver's config in place; +//! identity-style transformer calls that produce a child parser (e.g. +//! `factory.newDocumentBuilder()`) inherit the receiver's config into the +//! result value. + +use std::collections::HashMap; + +use super::const_prop::ConstLattice; +use super::ir::*; +use crate::cfg::Cfg; +use crate::symbol::Lang; +use serde::{Deserialize, Serialize}; + +/// Receiver-instance config carried forward from setter calls. +/// +/// All flags default to `false` (parser may be unsafe). A `true` flag +/// means: we have proven this parser was hardened along this control-flow +/// path. The XXE-suppression check is `secure_processing || +/// disallow_doctype` — either gate is sufficient to neutralise external +/// entity resolution in JAXP / lxml / xml2js. +/// +/// `external_entities` is the *unsafe* polarity: when set to `true`, the +/// parser was explicitly opted into external-entity resolution (e.g. +/// `XMLParser(resolve_entities=True)`). A parse call with this flag +/// retains XXE even if the language default would otherwise be safe. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)] +pub struct XmlParserConfig { + pub secure_processing: bool, + pub disallow_doctype: bool, + pub external_entities: bool, +} + +impl XmlParserConfig { + /// True when the parser is provably hardened against XXE. + pub fn is_secure(&self) -> bool { + (self.secure_processing || self.disallow_doctype) && !self.external_entities + } + + /// Phi-meet: a flag survives only when *both* operands set it. Used + /// when the parser variable was reassigned across branches. + fn meet(&self, other: &Self) -> Self { + XmlParserConfig { + secure_processing: self.secure_processing && other.secure_processing, + disallow_doctype: self.disallow_doctype && other.disallow_doctype, + // Unsafe polarity: ANY branch enabling external entities + // contaminates the join. Conservative w.r.t. XXE. + external_entities: self.external_entities || other.external_entities, + } + } + + /// Union: caller updates the same receiver across multiple setter + /// calls. All known-safe flags accumulate; unsafe is sticky. + fn union(&self, other: &Self) -> Self { + XmlParserConfig { + secure_processing: self.secure_processing || other.secure_processing, + disallow_doctype: self.disallow_doctype || other.disallow_doctype, + external_entities: self.external_entities || other.external_entities, + } + } +} + +/// Result of XML-parser config analysis. +#[derive(Clone, Debug, Default, Serialize, Deserialize)] +pub struct XmlParserConfigResult { + pub configs: HashMap, +} + +impl XmlParserConfigResult { + /// True when the value carries a config fact proving secure processing. + pub fn is_secure(&self, v: SsaValue) -> bool { + self.configs.get(&v).is_some_and(|c| c.is_secure()) + } + + /// True when the value was explicitly opted into external-entity + /// resolution (e.g. lxml `resolve_entities=True`). + pub fn is_unsafe_explicit(&self, v: SsaValue) -> bool { + self.configs.get(&v).is_some_and(|c| c.external_entities) + } +} + +/// Suppress the `Cap::XXE` bit when the receiver of an XXE-class sink +/// was provably hardened. Returns `true` when XXE should be stripped +/// from the sink's cap mask. +/// +/// Conservative defaults: +/// * No receiver SSA value (free function) → returns `false` (cannot +/// prove safety, fall through to existing classification). +/// * Receiver carries no config fact → returns `false`. +/// * `external_entities` flag is set → returns `false` even if a safe +/// flag is also set, since the unsafe opt-in dominates. +pub fn xxe_safe(receiver: Option, xml_config: &XmlParserConfigResult) -> bool { + let Some(rv) = receiver else { + return false; + }; + xml_config.is_secure(rv) +} + +/// Per-call analysis result: how this call mutates the parser-config +/// universe. +#[allow(dead_code)] // SeedResult reserved for future constructor-driven seeding +enum ConfigEffect { + /// No effect on parser configuration. + None, + /// Update the call's receiver in place by OR-ing the supplied config + /// into its current config. Used for setter calls + /// (`factory.setFeature(FEATURE_SECURE_PROCESSING, true)`). + UpdateReceiver(XmlParserConfig), + /// Inherit the receiver's config into the call's result value. + /// Used for identity-style transformer calls + /// (`factory.newDocumentBuilder()` returns a builder that shares + /// the factory's hardening state). + InheritFromReceiver, + /// Initialise the call's result value with the supplied config. + /// Used for constructor calls whose options reveal the unsafe-explicit + /// opt-in (`new XMLParser({ processEntities: true })`, + /// `lxml.etree.XMLParser(resolve_entities=True)`). + SeedResult(XmlParserConfig), +} + +/// Classify a Call instruction's effect on the parser-config universe. +/// +/// `arg_const` looks up the const-lattice value for an SSA arg position +/// (returns `None` if the position is out of range or the SSA value is +/// not a known constant). Setter detection consults arg-0 (the feature +/// name) and arg-1 (the boolean flag). +/// +/// `arg_idents` is the matching CFG-level [`info.call.arg_uses`] vector +/// (per-position identifier text from the source AST). Used to recover +/// non-literal feature names like `XMLConstants.FEATURE_SECURE_PROCESSING` +/// or bare identifiers (`FEATURE_SECURE_PROCESSING`, `Boolean.TRUE`) +/// that const-propagation cannot fold to a literal. +/// +/// `arg_literals` is the matching CFG-level +/// [`info.call.arg_string_literals`] vector (per-position literal text; +/// strings, booleans, and null/nil/None tokens). Used to recover the +/// boolean polarity of `setFeature(NAME, true)` since SSA lowering does +/// not bind boolean arg literals to any SSA value (`arg_uses` skips them +/// because they are not identifiers). +fn classify_call( + lang: Lang, + callee: &str, + args: &[smallvec::SmallVec<[SsaValue; 2]>], + receiver: Option, + consts: &HashMap, + arg_idents: &[Vec], + arg_literals: &[Option], +) -> ConfigEffect { + let suffix = callee.rsplit(['.', ':']).next().unwrap_or(callee); + + // Helper: lookup the const lattice for arg N's first SSA value. + let arg_const = |n: usize| -> Option<&ConstLattice> { + args.get(n) + .and_then(|vals| vals.first()) + .and_then(|v| consts.get(v)) + }; + // Helper: text of the const lattice (for string/identifier comparison). + let arg_text = |n: usize| -> Option { + match arg_const(n)? { + ConstLattice::Str(s) => Some(s.clone()), + ConstLattice::Bool(b) => Some(b.to_string()), + ConstLattice::Int(i) => Some(i.to_string()), + _ => None, + } + }; + // Helper: textual identifier(s) at arg N from the CFG node. Non-literal + // feature names (`XMLConstants.FEATURE_SECURE_PROCESSING`, bare + // `FEATURE_SECURE_PROCESSING`, etc.) surface here. + let arg_ident_text = |n: usize| -> Vec<&str> { + arg_idents + .get(n) + .map(|v| v.iter().map(|s| s.as_str()).collect()) + .unwrap_or_default() + }; + let arg_bool = |n: usize| -> Option { + if let Some(b) = arg_const(n).and_then(|c| match c { + ConstLattice::Bool(b) => Some(*b), + ConstLattice::Str(s) => match s.as_str() { + "True" | "true" => Some(true), + "False" | "false" => Some(false), + _ => None, + }, + _ => None, + }) { + return Some(b); + } + // Fallback: tree-sitter classifies `true` / `false` as bare + // identifiers in some grammars. Inspect the arg's use list. + for tok in arg_ident_text(n) { + match tok { + "true" | "True" | "Boolean.TRUE" => return Some(true), + "false" | "False" | "Boolean.FALSE" => return Some(false), + _ => {} + } + } + // Fallback: literal tokens lifted by `extract_arg_string_literals` + // (booleans / null / numeric tokens). Java `setFeature(NAME, true)` + // does not bind the `true` token to any SSA value, but the literal + // surfaces here so the polarity can still be read. + if let Some(Some(lit)) = arg_literals.get(n) { + match lit.as_str() { + "true" | "True" | "Boolean.TRUE" => return Some(true), + "false" | "False" | "Boolean.FALSE" => return Some(false), + _ => {} + } + } + None + }; + + match lang { + Lang::Java => match suffix { + // `factory.setFeature(NAME, BOOL)` — the canonical JAXP + // hardening switch. Three feature names matter: + // * `FEATURE_SECURE_PROCESSING` (XMLConstants.FEATURE_SECURE_PROCESSING) + // * `http://apache.org/xml/features/disallow-doctype-decl` + // * `http://xml.org/sax/features/external-general-entities` + // * `http://xml.org/sax/features/external-parameter-entities` + // The first two harden by being SET TRUE; the entity ones + // harden by being SET FALSE. + "setFeature" => { + if receiver.is_none() { + return ConfigEffect::None; + } + let name_lit = arg_text(0).unwrap_or_default(); + let name_idents = arg_ident_text(0); + let value = arg_bool(1); + let any_ident = |needle: &str| name_idents.iter().any(|s| s.contains(needle)); + let mut cfg = XmlParserConfig::default(); + if name_lit == "FEATURE_SECURE_PROCESSING" + || name_lit.contains("XMLConstants.FEATURE_SECURE_PROCESSING") + || name_lit.contains("javax.xml.XMLConstants/feature/secure-processing") + || any_ident("FEATURE_SECURE_PROCESSING") + { + if value == Some(true) { + cfg.secure_processing = true; + } + } else if name_lit.contains("disallow-doctype-decl") + || any_ident("disallow-doctype-decl") + { + if value == Some(true) { + cfg.disallow_doctype = true; + } + } else if (name_lit.contains("external-general-entities") + || name_lit.contains("external-parameter-entities") + || name_lit.contains("load-external-dtd") + || any_ident("external-general-entities") + || any_ident("external-parameter-entities") + || any_ident("load-external-dtd")) + && value == Some(false) + { + cfg.disallow_doctype = true; + } + if cfg == XmlParserConfig::default() { + ConfigEffect::None + } else { + ConfigEffect::UpdateReceiver(cfg) + } + } + // `factory.setExpandEntityReferences(false)` — + // DocumentBuilderFactory legacy hardening switch. + "setExpandEntityReferences" => { + if receiver.is_none() { + return ConfigEffect::None; + } + if arg_bool(0) == Some(false) { + ConfigEffect::UpdateReceiver(XmlParserConfig { + disallow_doctype: true, + ..Default::default() + }) + } else { + ConfigEffect::None + } + } + // `factory.newDocumentBuilder()` / `factory.newSAXParser()` / + // `parser.getXMLReader()` propagate the hardening state from + // the factory (receiver) onto the produced parser instance + // (return value). Without this propagation, a hardened + // factory's child builder would parse with no config. + "newDocumentBuilder" | "newSAXParser" | "getXMLReader" | "newXMLReader" => { + if receiver.is_some() { + ConfigEffect::InheritFromReceiver + } else { + ConfigEffect::None + } + } + _ => ConfigEffect::None, + }, + Lang::Python => { + // `lxml.etree.XMLParser(resolve_entities=False)` — the lxml + // parser default resolves entities; the keyword argument + // changes that. Const-propagation will not generally see the + // kwarg value here (kwargs land in `info.call.kwargs`, not + // positional args), so we treat the constructor as a + // best-effort initialiser keyed off the keyword's literal + // text via the static-map. When neither keyword surfaces, + // the parser keeps the default-empty config. + if callee.ends_with("etree.XMLParser") || suffix == "XMLParser" { + // Positional kwargs aren't reliable here; rely on the + // call's static-map kwargs (handled by the per-callsite + // pass below). Fall through to None at this layer. + ConfigEffect::None + } else { + ConfigEffect::None + } + } + _ => ConfigEffect::None, + } +} + +/// Run the XML-parser config analysis on an SSA body. +pub fn analyze_xml_parser_config( + body: &SsaBody, + cfg: &Cfg, + consts: &HashMap, + lang: Option, +) -> XmlParserConfigResult { + let Some(lang) = lang else { + return XmlParserConfigResult::default(); + }; + + let mut configs: HashMap = HashMap::new(); + + // Helper: read the kwargs attached to the original CFG node for the + // call instruction at hand. Used for languages where parser + // hardening flags arrive as keyword arguments (Python lxml). + let lookup_kwargs = |node_idx: petgraph::graph::NodeIndex| -> Vec<(String, Vec)> { + cfg.node_weight(node_idx) + .map(|ni| ni.call.kwargs.clone()) + .unwrap_or_default() + }; + // Helper: read the positional arg-use identifier vectors (e.g. + // `XMLConstants.FEATURE_SECURE_PROCESSING` surfaces as a dotted path + // here even when const-prop folds it to nothing). + let lookup_arg_idents = |node_idx: petgraph::graph::NodeIndex| -> Vec> { + cfg.node_weight(node_idx) + .map(|ni| ni.call.arg_uses.clone()) + .unwrap_or_default() + }; + // Helper: read the per-position literal-token vector + // (`arg_string_literals` lifts strings, booleans, null tokens, and + // numeric tokens — see `extract_arg_string_literals`). + let lookup_arg_literals = |node_idx: petgraph::graph::NodeIndex| -> Vec> { + cfg.node_weight(node_idx) + .map(|ni| ni.call.arg_string_literals.clone()) + .unwrap_or_default() + }; + + // Pass 1 — direct effects from Call instructions in source order. + // Setter updates and constructor seeds are effectively monotone + // (we OR safe flags onto the receiver / value), so a single pass is + // sufficient when phi nodes only appear after the setter. Pass 2 + // below handles phi/copy propagation. + for block in &body.blocks { + for inst in block.body.iter() { + if let SsaOp::Call { + callee, + args, + receiver, + .. + } = &inst.op + { + // Python lxml.etree.XMLParser(resolve_entities=...): the + // kwarg lives on the CFG node's `kwargs` list, not in + // the SSA Call args. Inspect it directly. + if matches!(lang, Lang::Python) + && (callee.ends_with("etree.XMLParser") + || callee.rsplit(['.', ':']).next() == Some("XMLParser")) + { + let kwargs = lookup_kwargs(inst.cfg_node); + for (name, values) in &kwargs { + if name == "resolve_entities" { + // Look up the literal text on the matching + // argument; tree-sitter-python keywords surface + // the value identifier in the `values` slot. + if values.iter().any(|v| v == "True" || v == "true") { + let entry = configs.entry(inst.value).or_default(); + entry.external_entities = true; + } else if values.iter().any(|v| v == "False" || v == "false") { + let entry = configs.entry(inst.value).or_default(); + entry.disallow_doctype = true; + } + } + if name == "no_network" && values.iter().any(|v| v == "True" || v == "true") + { + let entry = configs.entry(inst.value).or_default(); + entry.disallow_doctype = true; + } + } + continue; + } + + // JS/TS: `new XMLParser({ processEntities: true, ... })`. + // The fast-xml-parser constructor's option-object fields + // are not exposed via const-prop, but the CFG layer + // captures string-literal kwargs in the call's + // `arg_string_literals` for object-literal positions. + // For now, mark the result as unsafe-explicit only when + // the static-kwargs list carries `processEntities=true`. + if matches!(lang, Lang::JavaScript | Lang::TypeScript) + && (callee.ends_with("XMLParser") || callee.ends_with(".XMLParser")) + { + let kwargs = lookup_kwargs(inst.cfg_node); + for (name, values) in &kwargs { + if name == "processEntities" && values.iter().any(|v| v == "true") { + let entry = configs.entry(inst.value).or_default(); + entry.external_entities = true; + } + } + continue; + } + + let arg_idents = lookup_arg_idents(inst.cfg_node); + let arg_literals = lookup_arg_literals(inst.cfg_node); + match classify_call( + lang, + callee, + args, + *receiver, + consts, + &arg_idents, + &arg_literals, + ) { + ConfigEffect::None => {} + ConfigEffect::UpdateReceiver(delta) => { + if let Some(rv) = *receiver { + let entry = configs.entry(rv).or_default(); + *entry = entry.union(&delta); + } + } + ConfigEffect::InheritFromReceiver => { + if let Some(rv) = *receiver + && let Some(parent) = configs.get(&rv).copied() + { + let entry = configs.entry(inst.value).or_default(); + *entry = entry.union(&parent); + } + } + ConfigEffect::SeedResult(seed) => { + let entry = configs.entry(inst.value).or_default(); + *entry = entry.union(&seed); + } + } + } + } + } + + // Pass 2 — fixed-point propagation through copy assignments and phi + // joins. Caps the iteration count: in practice 2-3 rounds suffice + // on intra-procedural shapes. + for _ in 0..6 { + let mut changed = false; + for block in &body.blocks { + for inst in &block.phis { + if let SsaOp::Phi(operands) = &inst.op { + let mut acc: Option = None; + for (_, val) in operands { + let cfg_val = configs.get(val).copied().unwrap_or_default(); + acc = Some(match acc { + None => cfg_val, + Some(prev) => prev.meet(&cfg_val), + }); + } + if let Some(joined) = acc + && joined != XmlParserConfig::default() + { + let prev = configs.get(&inst.value).copied(); + if prev != Some(joined) { + configs.insert(inst.value, joined); + changed = true; + } + } + } + } + for inst in &block.body { + if let SsaOp::Assign(uses) = &inst.op + && uses.len() == 1 + && let Some(src_cfg) = configs.get(&uses[0]).copied() + && src_cfg != XmlParserConfig::default() + { + let prev = configs.get(&inst.value).copied().unwrap_or_default(); + let new_cfg = prev.union(&src_cfg); + if Some(new_cfg) != configs.get(&inst.value).copied() { + configs.insert(inst.value, new_cfg); + changed = true; + } + } + // InheritFromReceiver may need a re-pass when the + // receiver's config was set after the call itself was + // visited (e.g. the call appears in a later block whose + // dominator chain only resolves on the second iteration). + if let SsaOp::Call { + callee, + receiver: Some(rv), + .. + } = &inst.op + { + let suffix = callee.rsplit(['.', ':']).next().unwrap_or(callee); + let inherit = matches!(lang, Lang::Java) + && matches!( + suffix, + "newDocumentBuilder" | "newSAXParser" | "getXMLReader" | "newXMLReader" + ); + if inherit && let Some(parent) = configs.get(rv).copied() { + let prev = configs.get(&inst.value).copied().unwrap_or_default(); + let new_cfg = prev.union(&parent); + if Some(new_cfg) != configs.get(&inst.value).copied() + && new_cfg != XmlParserConfig::default() + { + configs.insert(inst.value, new_cfg); + changed = true; + } + } + } + } + } + if !changed { + break; + } + } + + XmlParserConfigResult { configs } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn default_config_is_unsafe() { + let c = XmlParserConfig::default(); + assert!(!c.is_secure()); + } + + #[test] + fn secure_processing_alone_is_safe() { + let c = XmlParserConfig { + secure_processing: true, + ..Default::default() + }; + assert!(c.is_secure()); + } + + #[test] + fn external_entities_overrides_safe_flag() { + let c = XmlParserConfig { + secure_processing: true, + external_entities: true, + ..Default::default() + }; + assert!(!c.is_secure()); + } + + #[test] + fn meet_keeps_only_intersection_of_safe_flags() { + let a = XmlParserConfig { + secure_processing: true, + disallow_doctype: true, + ..Default::default() + }; + let b = XmlParserConfig { + secure_processing: true, + ..Default::default() + }; + let m = a.meet(&b); + assert!(m.secure_processing); + assert!(!m.disallow_doctype); + } + + #[test] + fn meet_propagates_unsafe_flag() { + let a = XmlParserConfig { + secure_processing: true, + ..Default::default() + }; + let b = XmlParserConfig { + external_entities: true, + ..Default::default() + }; + let m = a.meet(&b); + // Unsafe sticky → no longer secure even though one branch was. + assert!(!m.is_secure()); + } + + #[test] + fn xxe_safe_returns_false_without_receiver() { + let result = XmlParserConfigResult::default(); + assert!(!xxe_safe(None, &result)); + } + + #[test] + fn xxe_safe_uses_receiver_config() { + let mut configs = HashMap::new(); + configs.insert( + SsaValue(7), + XmlParserConfig { + secure_processing: true, + ..Default::default() + }, + ); + let result = XmlParserConfigResult { configs }; + assert!(xxe_safe(Some(SsaValue(7)), &result)); + assert!(!xxe_safe(Some(SsaValue(8)), &result)); + } +} diff --git a/src/ssa/xpath_config.rs b/src/ssa/xpath_config.rs new file mode 100644 index 00000000..5602e984 --- /dev/null +++ b/src/ssa/xpath_config.rs @@ -0,0 +1,235 @@ +//! Per-SSA-value XPath-receiver configuration tracking. +//! +//! Mirrors [`crate::ssa::xml_config`] but for `XPath` instances rather +//! than JAXP parser instances. Tracks "is this XPath receiver bound to +//! an `XPathVariableResolver`" along the control-flow path: when a +//! resolver has been bound, subsequent `xpath.evaluate(expr, ...)` calls +//! are treated as parameterised and the `XPATH_INJECTION` bit is +//! stripped from the sink's cap mask. +//! +//! Same engine shape as [`crate::ssa::xml_config::XmlParserConfigResult`]: +//! a small forward dataflow run alongside type-fact analysis. Phi nodes +//! propagate the meet of operand configs (a flag is "set" only when all +//! reaching operands set it), copy assignments propagate the receiver's +//! config, and `setXPathVariableResolver` calls update the receiver's +//! config in place. + +use std::collections::HashMap; + +use super::ir::*; +use crate::cfg::Cfg; +use crate::symbol::Lang; +use serde::{Deserialize, Serialize}; + +/// Receiver-instance config carried forward from `setXPathVariableResolver` +/// calls. All flags default to `false` (resolver not bound). A `true` +/// flag means: we have proven this XPath receiver was configured for +/// parameterised evaluation along this control-flow path. +#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)] +pub struct XPathReceiverConfig { + /// True when `xpath.setXPathVariableResolver(...)` has been called + /// on this receiver. Set by Pass 1 on the receiver SSA value; + /// propagated through phi joins (meet) and copy assignments (union). + pub has_resolver: bool, +} + +impl XPathReceiverConfig { + /// True when the receiver is provably bound to a variable resolver. + pub fn is_parameterised(&self) -> bool { + self.has_resolver + } + + /// Phi-meet: a flag survives only when *both* operands set it. Used + /// when the XPath variable was reassigned across branches and only + /// some branches bound a resolver. + fn meet(&self, other: &Self) -> Self { + XPathReceiverConfig { + has_resolver: self.has_resolver && other.has_resolver, + } + } + + /// Union: caller binds a resolver after a copy / phi-join. Any + /// branch setting the flag wins for the union (used for copy + /// propagation, which preserves the source value's flags). + fn union(&self, other: &Self) -> Self { + XPathReceiverConfig { + has_resolver: self.has_resolver || other.has_resolver, + } + } +} + +/// Result of XPath-receiver config analysis. +#[derive(Clone, Debug, Default, Serialize, Deserialize)] +pub struct XPathConfigResult { + pub configs: HashMap, +} + +impl XPathConfigResult { + /// True when the value carries a config fact proving resolver + /// binding. + pub fn is_parameterised(&self, v: SsaValue) -> bool { + self.configs.get(&v).is_some_and(|c| c.is_parameterised()) + } +} + +/// Suppress the `Cap::XPATH_INJECTION` bit when the receiver of an XPath +/// `evaluate` / `compile` sink was provably bound to a variable +/// resolver. Returns `true` when XPATH_INJECTION should be stripped +/// from the sink's cap mask. +/// +/// Conservative defaults: +/// * No receiver SSA value (free function) → returns `false` (cannot +/// prove safety, fall through to existing classification). +/// * Receiver carries no config fact → returns `false`. +pub fn xpath_safe(receiver: Option, xpath_config: &XPathConfigResult) -> bool { + let Some(rv) = receiver else { + return false; + }; + xpath_config.is_parameterised(rv) +} + +/// Run the XPath-receiver config analysis on an SSA body. +/// +/// Currently models Java's `setXPathVariableResolver` only — the only +/// language-level resolver-binding API for XPath in the existing +/// detection corpus. PHP's `DOMXPath::registerPhpFunctions()` is a +/// different mechanism (PHP function registration) and not modelled +/// here. +pub fn analyze_xpath_config(body: &SsaBody, cfg: &Cfg, lang: Option) -> XPathConfigResult { + let Some(lang) = lang else { + return XPathConfigResult::default(); + }; + if !matches!(lang, Lang::Java) { + return XPathConfigResult::default(); + } + + let mut configs: HashMap = HashMap::new(); + + // Pass 1 — direct effects from Call instructions in source order. + // `setXPathVariableResolver` updates the call's receiver in place; + // any non-null argument is treated as a resolver binding. Argument + // null-check would require a const-prop fact, but the conservative + // direction here is to assume the bound value is non-null (matches the + // XML parser-config setter semantics). + for block in &body.blocks { + for inst in block.body.iter() { + if let SsaOp::Call { + callee, receiver, .. + } = &inst.op + { + let suffix = callee.rsplit(['.', ':']).next().unwrap_or(callee); + if suffix == "setXPathVariableResolver" + && let Some(rv) = receiver + { + let entry = configs.entry(*rv).or_default(); + entry.has_resolver = true; + } + } + } + } + + if configs.is_empty() { + return XPathConfigResult::default(); + } + + // Pass 2 — fixed-point propagation through copy assignments and + // phi joins. Caps the iteration count: in practice 2-3 rounds + // suffice on intra-procedural shapes. + let _ = cfg; // CFG retained for parity with `xml_config`; reserved for + // future kwarg-driven seeds (e.g. constructor options). + for _ in 0..6 { + let mut changed = false; + for block in &body.blocks { + for inst in &block.phis { + if let SsaOp::Phi(operands) = &inst.op { + let mut acc: Option = None; + for (_, val) in operands { + let cfg_val = configs.get(val).copied().unwrap_or_default(); + acc = Some(match acc { + None => cfg_val, + Some(prev) => prev.meet(&cfg_val), + }); + } + if let Some(joined) = acc + && joined != XPathReceiverConfig::default() + { + let prev = configs.get(&inst.value).copied(); + if prev != Some(joined) { + configs.insert(inst.value, joined); + changed = true; + } + } + } + } + for inst in &block.body { + if let SsaOp::Assign(uses) = &inst.op + && uses.len() == 1 + && let Some(src_cfg) = configs.get(&uses[0]).copied() + && src_cfg != XPathReceiverConfig::default() + { + let prev = configs.get(&inst.value).copied().unwrap_or_default(); + let new_cfg = prev.union(&src_cfg); + if Some(new_cfg) != configs.get(&inst.value).copied() { + configs.insert(inst.value, new_cfg); + changed = true; + } + } + } + } + if !changed { + break; + } + } + + XPathConfigResult { configs } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn default_config_is_unparameterised() { + let c = XPathReceiverConfig::default(); + assert!(!c.is_parameterised()); + } + + #[test] + fn has_resolver_marks_parameterised() { + let c = XPathReceiverConfig { has_resolver: true }; + assert!(c.is_parameterised()); + } + + #[test] + fn meet_keeps_intersection() { + let a = XPathReceiverConfig { has_resolver: true }; + let b = XPathReceiverConfig { + has_resolver: false, + }; + let m = a.meet(&b); + assert!(!m.has_resolver); + } + + #[test] + fn meet_both_set_keeps_set() { + let a = XPathReceiverConfig { has_resolver: true }; + let b = XPathReceiverConfig { has_resolver: true }; + let m = a.meet(&b); + assert!(m.has_resolver); + } + + #[test] + fn xpath_safe_returns_false_without_receiver() { + let result = XPathConfigResult::default(); + assert!(!xpath_safe(None, &result)); + } + + #[test] + fn xpath_safe_uses_receiver_config() { + let mut configs = HashMap::new(); + configs.insert(SsaValue(7), XPathReceiverConfig { has_resolver: true }); + let result = XPathConfigResult { configs }; + assert!(xpath_safe(Some(SsaValue(7)), &result)); + assert!(!xpath_safe(Some(SsaValue(8)), &result)); + } +} diff --git a/src/summary/mod.rs b/src/summary/mod.rs index a62e5dc6..46283569 100644 --- a/src/summary/mod.rs +++ b/src/summary/mod.rs @@ -49,7 +49,7 @@ pub struct SinkSite { impl SinkSite { /// Dedup key: two sites with the same `(file_rel, line, col, cap)` /// describe the same consumption and collapse on merge. - pub(crate) fn dedup_key(&self) -> (&str, u32, u32, u16) { + pub(crate) fn dedup_key(&self) -> (&str, u32, u32, u32) { (self.file_rel.as_str(), self.line, self.col, self.cap.bits()) } @@ -277,18 +277,18 @@ pub struct FuncSummary { pub param_names: Vec, // ── Taint behaviour ────────────────────────────────────────────────── - // Stored as raw `u16` so serde doesn't need to know about `bitflags`. + // Stored as raw `u32` so serde doesn't need to know about `bitflags`. /// Caps this function **introduces**, i.e. the return value carries /// freshly‑tainted data even if no argument was tainted. - pub source_caps: u16, + pub source_caps: u32, /// Caps this function **cleans**, passing tainted data through this /// function strips the corresponding bits. - pub sanitizer_caps: u16, + pub sanitizer_caps: u32, /// Caps this function **consumes unsafely**, calling it with tainted /// arguments that still carry these bits is a finding. - pub sink_caps: u16, + pub sink_caps: u32, /// Which parameter indices (0‑based) flow through to the return value. #[serde(default)] @@ -1163,7 +1163,7 @@ impl GlobalSummaries { /// Returns `(source_caps, sanitizer_caps, sink_caps, propagating_params)` /// per key. Used by the SCC fixed-point loop to detect when an iteration /// has not changed any summary, i.e. convergence. - pub fn snapshot_caps(&self) -> HashMap)> { + pub fn snapshot_caps(&self) -> HashMap)> { self.by_key .iter() .map(|(k, s)| { diff --git a/src/summary/ssa_summary.rs b/src/summary/ssa_summary.rs index 007d87a6..a3b65714 100644 --- a/src/summary/ssa_summary.rs +++ b/src/summary/ssa_summary.rs @@ -283,7 +283,7 @@ pub struct SsaFuncSummary { /// /// Default-empty (most functions don't field-mutate their params) /// and elided from serialised output via `skip_serializing_if` so - /// pre-Phase-5 summaries deserialise cleanly without migration. + /// older summaries without this field deserialise cleanly without migration. /// Built by extraction in `summary_extract.rs` when the per-body /// [`crate::pointer::PointsToFacts`] are available /// (`NYX_POINTER_ANALYSIS=1`); empty otherwise. diff --git a/src/summary/tests.rs b/src/summary/tests.rs index e03037a5..1220ac6d 100644 --- a/src/summary/tests.rs +++ b/src/summary/tests.rs @@ -9,7 +9,7 @@ fn cap_sites(cap: Cap) -> SmallVec<[SinkSite; 1]> { smallvec![SinkSite::cap_only(cap)] } -fn make(name: &str, src: u16, san: u16, sink: u16) -> FuncSummary { +fn make(name: &str, src: u32, san: u32, sink: u32) -> FuncSummary { FuncSummary { name: name.into(), file_path: "test.rs".into(), @@ -263,7 +263,7 @@ fn lookup_same_lang_returns_all_matches() { } #[test] -fn u16_caps_round_trip_serde() { +fn cap_bits_round_trip_serde() { let summary = FuncSummary { name: "dangerous".into(), file_path: "test.rs".into(), @@ -292,9 +292,96 @@ fn u16_caps_round_trip_serde() { assert!(!json.contains("propagates_taint")); } +/// Every new cap class persists across the serde JSON round-trip used +/// for SQLite blob storage and the `/debug` endpoint. Catches a +/// width-mismatch (cap bits truncated to u16) as a hard fail rather than +/// silent zeroing of the upper bits. +#[test] +fn new_cap_classes_round_trip_serde() { + let new_caps = Cap::LDAP_INJECTION + | Cap::XPATH_INJECTION + | Cap::HEADER_INJECTION + | Cap::OPEN_REDIRECT + | Cap::SSTI + | Cap::XXE + | Cap::PROTOTYPE_POLLUTION; + + // Sanity: bit-width must accommodate every new cap. + assert_ne!( + new_caps.bits(), + 0, + "every new cap must carry a non-zero bit" + ); + assert_eq!( + new_caps.bits().count_ones(), + 7, + "exactly seven bits must be set across the new caps" + ); + + // Bit collisions with existing caps would mask a finding. + let existing = Cap::ENV_VAR + | Cap::HTML_ESCAPE + | Cap::SHELL_ESCAPE + | Cap::URL_ENCODE + | Cap::JSON_PARSE + | Cap::FILE_IO + | Cap::FMT_STRING + | Cap::SQL_QUERY + | Cap::DESERIALIZE + | Cap::SSRF + | Cap::CODE_EXEC + | Cap::CRYPTO + | Cap::UNAUTHORIZED_ID + | Cap::DATA_EXFIL; + assert!( + (existing & new_caps).is_empty(), + "new caps must not collide" + ); + + let summary = FuncSummary { + name: "all_new_classes".into(), + file_path: "fixture.rs".into(), + lang: "rust".into(), + param_count: 0, + param_names: vec![], + source_caps: 0, + sanitizer_caps: 0, + sink_caps: new_caps.bits(), + propagating_params: vec![], + propagates_taint: false, + tainted_sink_params: vec![], + callees: vec![], + ..Default::default() + }; + + // serde JSON round-trip (the on-disk SQLite format). + let json = serde_json::to_string(&summary).unwrap(); + let back: FuncSummary = serde_json::from_str(&json).unwrap(); + assert_eq!(back.sink_caps, new_caps.bits()); + assert!(back.sink_caps().contains(Cap::LDAP_INJECTION)); + assert!(back.sink_caps().contains(Cap::PROTOTYPE_POLLUTION)); + + // Cap registry must surface a rule id for each new cap. + for cap in [ + Cap::LDAP_INJECTION, + Cap::XPATH_INJECTION, + Cap::HEADER_INJECTION, + Cap::OPEN_REDIRECT, + Cap::SSTI, + Cap::XXE, + Cap::PROTOTYPE_POLLUTION, + ] { + let meta = crate::labels::cap_rule_meta(cap) + .unwrap_or_else(|| panic!("missing CAP_RULE_REGISTRY entry for {cap:?}")); + assert!(meta.rule_id.starts_with("taint-")); + assert!(!meta.title.is_empty()); + assert!(!meta.description.is_empty()); + } +} + #[test] fn backward_compat_u8_json_deserializes() { - // Old u8-range values still deserialize correctly into u16 fields + // Old u8-range values still deserialize correctly into u32 fields let json = r#"{ "name": "old_func", "file_path": "legacy.py", @@ -948,6 +1035,8 @@ fn make_callee_body( type_facts: crate::ssa::type_facts::TypeFactResult { facts: std::collections::HashMap::new(), }, + xml_parser_config: crate::ssa::xml_config::XmlParserConfigResult::default(), + xpath_config: crate::ssa::xpath_config::XPathConfigResult::default(), alias_result: crate::ssa::alias::BaseAliasResult::empty(), points_to: crate::ssa::heap::PointsToResult::empty(), module_aliases: std::collections::HashMap::new(), @@ -1413,7 +1502,7 @@ fn fs_with( arity: usize, kind: FuncKind, disambig: Option, - sink_bits: u16, + sink_bits: u32, ) -> (FuncKey, FuncSummary) { let key = FuncKey { lang: Lang::Java, @@ -1611,7 +1700,7 @@ fn interop_lookup_returns_none_when_disambig_none_matches_many() { // and only disambig distinguishes them, the relaxed interop lookup must // return None rather than picking arbitrarily. let mut gs = GlobalSummaries::new(); - let mk = |disambig: u32, bits: u16| { + let mk = |disambig: u32, bits: u32| { let k = FuncKey { lang: Lang::Go, namespace: "lib.go".into(), @@ -2102,7 +2191,7 @@ fn method_summary( container: &str, name: &str, arity: usize, - sink_bits: u16, + sink_bits: u32, ) -> (FuncKey, FuncSummary) { fs_with( namespace, @@ -2119,7 +2208,7 @@ fn free_summary( namespace: &str, name: &str, arity: usize, - sink_bits: u16, + sink_bits: u32, ) -> (FuncKey, FuncSummary) { fs_with( namespace, @@ -2912,7 +3001,7 @@ fn legacy_summary( param_names: Vec, kind: FuncKind, container: &str, - sink: u16, + sink: u32, ) -> FuncSummary { FuncSummary { name: name.into(), @@ -3778,7 +3867,7 @@ fn cross_file_devirt_does_not_union_unrelated_findbyids() { use crate::labels::Cap; use crate::symbol::FuncKey; - fn method_summary(name: &str, container: &str, file: &str, sink_caps: u16) -> FuncSummary { + fn method_summary(name: &str, container: &str, file: &str, sink_caps: u32) -> FuncSummary { FuncSummary { name: name.into(), file_path: file.into(), @@ -3989,7 +4078,7 @@ mod hierarchy_widened_tests { container: &str, name: &str, arity: usize, - sink_bits: u16, + sink_bits: u32, hierarchy_edges: Vec<(String, String)>, ) -> (FuncKey, FuncSummary) { let (key, mut summary) = fs_with( diff --git a/src/taint/mod.rs b/src/taint/mod.rs index 07c1b23b..8d90be58 100644 --- a/src/taint/mod.rs +++ b/src/taint/mod.rs @@ -580,9 +580,19 @@ pub(crate) fn analyse_file_with_lowered( f.source.index(), !f.path_validated, f.path_hash, + f.effective_sink_caps.bits(), + ) + }); + all_findings.dedup_by_key(|f| { + ( + f.body_id, + f.sink, + f.source, + f.path_validated, + f.path_hash, + f.effective_sink_caps.bits(), ) }); - all_findings.dedup_by_key(|f| (f.body_id, f.sink, f.source, f.path_validated, f.path_hash)); // 5. Assign stable finding IDs now that `body_id` has been set and // the dedup has picked the final set of distinct flows. The ID @@ -679,9 +689,118 @@ fn containment_order(bodies: &[BodyCfg]) -> Vec { order } +/// Build a `var_name → TypeKind` map from a body's optimised SSA + type-fact +/// result. Used by [`analyse_multi_body`] to forward closure-captured types +/// from a parent body into its children, so that bound-variable receiver +/// idioms (`const c = ldap.createClient(...); function f() { c.search(...) }`) +/// pick up `TypeKind::LdapClient` on the inner reference via the +/// [`ssa_transfer::resolve_type_qualified_labels`] receiver scan. +/// +/// Conflict policy: if the same `var_name` reaches multiple SSA values with +/// distinct `TypeKind`s the entry is dropped — propagating an ambiguous type +/// into a child body would fabricate facts, while dropping it just falls back +/// to the existing structural resolution paths. +fn extract_named_type_facts( + ssa: &crate::ssa::SsaBody, + type_facts: &crate::ssa::type_facts::TypeFactResult, +) -> HashMap { + use crate::ssa::type_facts::TypeKind; + let mut acc: HashMap = HashMap::new(); + let mut conflicts: HashSet = HashSet::new(); + for block in &ssa.blocks { + for inst in block.phis.iter().chain(block.body.iter()) { + let Some(name) = inst.var_name.as_deref() else { + continue; + }; + if conflicts.contains(name) { + continue; + } + let Some(kind) = type_facts.get_type(inst.value) else { + continue; + }; + if matches!(kind, TypeKind::Unknown) { + continue; + } + match acc.get(name) { + Some(existing) if existing != kind => { + acc.remove(name); + conflicts.insert(name.to_string()); + } + Some(_) => {} + None => { + acc.insert(name.to_string(), kind.clone()); + } + } + } + } + acc +} + +/// Inject parent-known closure-capture types into a per-body +/// [`crate::ssa::type_facts::TypeFactResult`]. +/// +/// Scoped lowering ([`crate::ssa::lower_to_ssa_with_params`]) injects a +/// `SsaOp::Param` (or `SsaOp::SelfParam`) at the entry block for every +/// free / closure-captured variable read by the body. The per-body type +/// analysis can only seed declared formal-parameter types (via +/// `BodyMeta.param_types`); free variables are left as `TypeKind::Unknown` +/// because their definition lives in an enclosing body whose SSA is not +/// in scope. +/// +/// This pass walks the entry block's synthetic prologue and, for each +/// external Param whose name resolves in `parent_var_types`, inserts the +/// matching [`crate::ssa::type_facts::TypeFact`] into `type_facts.facts`. +/// Strictly additive: existing facts (e.g. a fact already produced by +/// `BodyMeta.param_types` seeding for a real formal that happens to share +/// a name) are never overwritten. +fn inject_external_type_facts( + ssa: &crate::ssa::SsaBody, + type_facts: &mut crate::ssa::type_facts::TypeFactResult, + parent_var_types: &HashMap, +) { + use crate::ssa::ir::SsaOp; + use crate::ssa::type_facts::TypeFact; + if parent_var_types.is_empty() || ssa.blocks.is_empty() { + return; + } + for inst in ssa.blocks[0].body.iter() { + if !matches!(inst.op, SsaOp::Param { .. } | SsaOp::SelfParam) { + continue; + } + if type_facts.facts.contains_key(&inst.value) { + // `analyze_types_with_param_types` may have already typed this + // value via a non-Unknown entry from BodyMeta.param_types; in + // that case the formal-parameter declaration wins. Note: the + // analysis seeds an Unknown placeholder for unparameterised + // Param ops, so we still need to override Unknown entries. + if !matches!( + type_facts.facts.get(&inst.value).map(|f| &f.kind), + Some(crate::ssa::type_facts::TypeKind::Unknown) + ) { + continue; + } + } + let Some(name) = inst.var_name.as_deref() else { + continue; + }; + let Some(kind) = parent_var_types.get(name) else { + continue; + }; + let nullable = matches!(kind, crate::ssa::type_facts::TypeKind::Null); + type_facts.facts.insert( + inst.value, + TypeFact { + kind: kind.clone(), + nullable, + }, + ); + } +} + /// Analyse a single body with an optional parent seed. /// /// Shared logic extracted from `analyse_multi_body` to avoid deep nesting. +#[allow(clippy::type_complexity)] fn analyse_body_with_seed( body: &BodyCfg, lang: Lang, @@ -698,9 +817,11 @@ fn analyse_body_with_seed( seed: Option<&HashMap>, import_bindings: Option<&crate::cfg::ImportBindings>, cross_file_bodies: Option<&std::collections::HashMap>, + parent_var_types: Option<&HashMap>, ) -> ( Vec, Option>, + Option>, ) { let cfg = &body.graph; let entry = body.entry; @@ -757,12 +878,21 @@ fn analyse_body_with_seed( match ssa_result { Ok(mut ssa_body) => { - let opt = crate::ssa::optimize_ssa_with_param_types( + let mut opt = crate::ssa::optimize_ssa_with_param_types( &mut ssa_body, cfg, Some(lang), &body.meta.param_types, ); + // Forward parent-body type facts onto closure-captured Param ops + // before any consumer reads `opt.type_facts`. This is the lever + // that makes bound-variable receiver idioms work in scoped bodies + // (`let c = ldap.createClient(...); function f() { c.search(...) }`) + // — without it the inner `c` SSA value stays Unknown because the + // per-body type-fact pass cannot see the enclosing definition. + if let Some(pvt) = parent_var_types { + inject_external_type_facts(&ssa_body, &mut opt.type_facts, pvt); + } if tracing::enabled!(tracing::Level::TRACE) { tracing::trace!( func = body.meta.name.as_deref().unwrap_or(""), @@ -811,6 +941,8 @@ fn analyse_body_with_seed( receiver_seed: None, const_values: Some(&opt.const_values), type_facts: Some(&opt.type_facts), + xml_parser_config: Some(&opt.xml_parser_config), + xpath_config: Some(&opt.xpath_config), ssa_summaries, extra_labels, base_aliases: Some(&opt.alias_result), @@ -909,7 +1041,16 @@ fn analyse_body_with_seed( &transfer, body_id, ); - (findings, Some(exit_state)) + // Snapshot named TypeKinds so child bodies can pick up + // closure-captured types (e.g. an outer `LdapClient` flowing + // into an inner function via free-variable read). + let named_types = extract_named_type_facts(&ssa_body, &opt.type_facts); + let named_types = if named_types.is_empty() { + None + } else { + Some(named_types) + }; + (findings, Some(exit_state), named_types) } Err(e) => { // SSA lowering produced no analyzable body. We still surface @@ -929,7 +1070,7 @@ fn analyse_body_with_seed( // Drain the collector so the note does not bleed into the // next body (which will call reset on entry, but be explicit). let _ = ssa_transfer::take_body_engine_notes(); - (Vec::new(), None) + (Vec::new(), None, None) } } } @@ -967,6 +1108,14 @@ fn analyse_multi_body( HashMap, > = HashMap::new(); + // Per-body `var_name → TypeKind` snapshots, used to forward closure- + // captured types from parent bodies into their children's type-fact + // results. Only populated when a body produces a non-empty set of + // typed named values, i.e. it has at least one named SSA value with + // a concrete `TypeKind` after optimisation. + let mut body_var_types: HashMap> = + HashMap::new(); + // ── Pass 1: lexical containment propagation ────────────────────── for &idx in &order { let body = &file_cfg.bodies[idx]; @@ -975,8 +1124,12 @@ fn analyse_multi_body( .meta .parent_body_id .and_then(|pid| body_exit_states.get(&pid)); + let parent_var_types = body + .meta + .parent_body_id + .and_then(|pid| body_var_types.get(&pid)); - let (findings, exit_state) = analyse_body_with_seed( + let (findings, exit_state, var_types) = analyse_body_with_seed( body, lang, namespace, @@ -990,6 +1143,7 @@ fn analyse_multi_body( parent_seed, import_bindings, cross_file_bodies, + parent_var_types, ); tracing::debug!( body_id = body.meta.id.0, @@ -1003,6 +1157,9 @@ fn analyse_multi_body( if let Some(es) = exit_state { body_exit_states.insert(body.meta.id, es); } + if let Some(vt) = var_types { + body_var_types.insert(body.meta.id, vt); + } } // ── Pass 2: JS/TS iterative convergence ────────────────────────── @@ -1163,8 +1320,12 @@ fn analyse_multi_body( .meta .parent_body_id .and_then(|pid| body_exit_states.get(&pid)); + let parent_var_types = body + .meta + .parent_body_id + .and_then(|pid| body_var_types.get(&pid)); - let (findings, exit_state) = analyse_body_with_seed( + let (findings, exit_state, var_types) = analyse_body_with_seed( body, lang, namespace, @@ -1178,11 +1339,15 @@ fn analyse_multi_body( parent_seed, import_bindings, cross_file_bodies, + parent_var_types, ); // Phase-B: replace (not append) this body's findings // in the cache. Previous rounds' findings for this // body are superseded by the new round's output. findings_by_body.insert(body.meta.id, findings); + if let Some(vt) = var_types { + body_var_types.insert(body.meta.id, vt); + } if let Some(es) = exit_state { // Phase-C Gauss-Seidel: immediately publish this // body's filtered exit into `current_seed` and @@ -2073,6 +2238,8 @@ fn augment_summaries_with_child_sinks( receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: Some(summaries), extra_labels: None, base_aliases: None, @@ -2135,6 +2302,8 @@ fn augment_summaries_with_child_sinks( receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: Some(summaries), extra_labels: None, base_aliases: None, diff --git a/src/taint/path_state.rs b/src/taint/path_state.rs index fbf7087e..493f6c5e 100644 --- a/src/taint/path_state.rs +++ b/src/taint/path_state.rs @@ -30,6 +30,26 @@ pub enum PredicateKind { /// and the **false branch is the validated path**. Use inverted polarity /// when applying branch predicates. ShellMetaValidated, + /// Inline relative-URL validation: `x.startsWith("/")` / `x.starts_with("/")` + /// / `x.startswith("/")` / `strpos(x, "/") === 0`. The TRUE branch + /// constrains `x` to a relative path (no scheme, no `//host`), which is + /// the standard inline form of an open-redirect sanitiser when the + /// developer didn't extract a named helper. Cap-aware: clears + /// [`crate::labels::Cap::OPEN_REDIRECT`] only on the validated branch + /// so non-redirect sinks downstream still fire on the residual taint. + /// Mirrors [`ShellMetaValidated`](Self::ShellMetaValidated) but with + /// non-inverted polarity (true branch is the validated path). + RelativeUrlValidated, + /// Inline URL-parse + host-allowlist validation: + /// `new URL(x).host === ALLOWED` (JS/TS), + /// `urlparse(x).netloc == ALLOWED` (Python), + /// `urlparse(x).hostname in ALLOWED_HOSTS` (Python). + /// The TRUE branch constrains the parsed URL's host to a developer-chosen + /// allowlist value, the canonical multi-statement open-redirect sanitiser + /// for absolute URLs. Cap-aware: clears + /// [`crate::labels::Cap::OPEN_REDIRECT`] only on the validated branch so + /// non-redirect sinks downstream still fire on residual taint. + HostAllowlistValidated, /// Bounded-length rejection: `x.len() > N` / `x.length < N` with N >= 2. /// /// Commonly paired with `ShellMetaValidated` in OR-chain rejection @@ -178,6 +198,324 @@ fn is_metachar_regex_class(text: &str) -> bool { false } +/// Check whether `text` is an inline relative-URL validation: a leading- +/// slash check on a string variable. Recognised shapes: +/// +/// * `.startsWith("/")` — JS/TS/Java/Kotlin +/// * `.starts_with("/")` — Rust +/// * `.startswith("/")` — Python +/// * `strpos($X, "/") === 0` / `mb_strpos(...)` — PHP +/// * `[0] === "/"` / `[0] == '/'` — JS/TS direct index +/// +/// Negation prefixes (`!`, `not`) are NOT stripped, the caller's +/// classification path handles those uniformly via the predicate +/// polarity inversion machinery. +fn is_leading_slash_check(text: &str) -> bool { + let lower = text.to_ascii_lowercase(); + // Method-call form: `.startswith("/")` covers JS/TS/Java (`startsWith` + // lower-cases to `startswith`), Python (`startswith`), Rust + // (`starts_with` → `starts_with` after lower). Keep the variants + // explicit so we don't miss the underscore form. + for method in [".startswith(", ".starts_with("] { + if let Some(idx) = lower.find(method) { + let args_start = idx + method.len(); + if let Some(needle) = extract_first_string_arg(&lower[args_start..]) { + if needle == "/" { + return true; + } + } + } + } + // PHP `strpos($x, "/") === 0` / `mb_strpos($x, "/") === 0` — leading- + // slash detection via offset-zero substring match. Both equality + // forms (`===`, `==`) accepted; the `0` literal is the load-bearing + // bit. Conservative: requires the closing `=== 0` form; bare + // `strpos(...)` (truthy check) is not recognised. + for prefix in ["strpos(", "mb_strpos("] { + if let Some(start) = lower.find(prefix) { + let after = &lower[start + prefix.len()..]; + // Find the closing paren of the strpos call. + let mut depth = 1usize; + let bytes = after.as_bytes(); + let mut close = None; + let mut i = 0; + while i < bytes.len() { + match bytes[i] { + b'(' => depth += 1, + b')' => { + depth -= 1; + if depth == 0 { + close = Some(i); + break; + } + } + _ => {} + } + i += 1; + } + let Some(close) = close else { continue }; + let args = &after[..close]; + // Need at least one comma so we have two args. + let mut depth = 0i32; + let mut comma = None; + for (j, ch) in args.char_indices() { + match ch { + '(' | '[' | '{' => depth += 1, + ')' | ']' | '}' => depth -= 1, + ',' if depth == 0 => { + comma = Some(j); + break; + } + _ => {} + } + } + let Some(comma) = comma else { continue }; + let second = args[comma + 1..].trim(); + // Strip optional surrounding parens / quotes. + let needle = second.trim_matches(|c: char| c == '"' || c == '\''); + if needle != "/" { + continue; + } + // Tail after the strpos `)` should compare against 0 with + // `===` / `==`. Allow whitespace. + let tail = after[close + 1..].trim_start(); + if let Some(rest) = tail.strip_prefix("===").or_else(|| tail.strip_prefix("==")) { + if rest.trim() == "0" { + return true; + } + } + } + } + // Direct subscript form: `[0] === '/'` / `[0] == "/"`. + // Conservative: the literal `[0]` immediately followed by an + // equality op and a single-char `/` literal. + for op in ["===", "=="] { + let probe = format!("[0] {}", op); + if let Some(idx) = lower.find(&probe) { + let after = lower[idx + probe.len()..].trim_start(); + if after.starts_with("'/'") || after.starts_with("\"/\"") { + return true; + } + } + // Without spaces around the operator: `[0]==='/'`. + let probe_tight = format!("[0]{}", op); + if let Some(idx) = lower.find(&probe_tight) { + let after = lower[idx + probe_tight.len()..].trim_start(); + if after.starts_with("'/'") || after.starts_with("\"/\"") { + return true; + } + } + } + false +} + +/// Check whether `text` is an inline URL-parse + host-allowlist validation. +/// +/// Recognises the canonical multi-statement open-redirect sanitiser shapes: +/// +/// * `new URL().host === ALLOWED` / `new URL().hostname === ALLOWED` +/// / `new URL().origin === ALLOWED` (JS/TS) — accepts `==` and `===`. +/// * `urlparse().netloc == ALLOWED` / `urlparse().hostname == ALLOWED` +/// (Python `urllib.parse.urlparse` and the `urlparse.urlparse` legacy alias) +/// — accepts `==`. +/// * `urllib.parse.urlparse().netloc == ALLOWED` (qualified Python form). +/// * `.host_str() == ALLOWED` (Rust `url::Url::host_str()`). +/// * `.Host == ALLOWED` / `.Hostname() == ALLOWED` +/// (Go `*url.URL` — case-sensitive capital `H`). +/// +/// The Rust/Go forms intentionally do not look for the parse call in the +/// condition text — those parse on a separate line (`let parsed = Url::parse(x)?`, +/// `parsed, err := url.Parse(x)`) and the validated branch then references +/// `parsed` directly as the redirect target. Distinctive accessor names +/// (`.host_str()`, capital-`H` `.Host`/`.Hostname()`) gate the match so a bare +/// `u.host == X` (lowercase, ambiguous) still falls through to `Comparison`. +/// +/// The right-hand side may be a string literal or a bare identifier +/// (`ALLOWED_HOST` / `cfg.allowed_origin`) — what matters is that the +/// validation pins the parsed host to one fixed value, locking off the +/// scheme/authority that would otherwise let the redirect leave the trusted +/// origin. The membership form +/// `ALLOWED_HOSTS.includes(new URL().host)` / `urlparse().host in ALLOWED` +/// is intentionally NOT recognised here, those fall through to +/// `AllowlistCheck` whose generic validated-must mechanic already clears +/// every cap for the matched receiver / member token. +/// +/// Negation prefixes are not stripped, the caller's polarity-inversion +/// machinery handles `!`-wrapped forms uniformly. +fn is_host_allowlist_check(text: &str) -> bool { + let lower = text.to_ascii_lowercase(); + // Need an equality operator so we know the host is being pinned to a + // specific allowed value (not e.g. assigned, indexed, or used as a key). + if !(lower.contains("==") || lower.contains("!=")) { + return false; + } + let has_parse_call = lower.contains("new url(") + || lower.contains("urlparse(") + || lower.contains("url.parse(") + || lower.contains("urllib.parse.urlparse("); + if has_parse_call { + // Need a host-style accessor on the parse result. + return lower.contains(".host") + || lower.contains(".hostname") + || lower.contains(".netloc") + || lower.contains(".origin"); + } + // Multi-statement form: parse happened on a prior line. Match + // distinctive Rust/Go accessor names so we don't misclassify a + // generic `obj.host == X` field comparison. + // + // Rust: `parsed.host_str() == Some("x")` + // Go: `parsed.Host == "x"` / `parsed.Hostname() == "x"` + // + // `.host_str()` is Rust-specific (lowercase-stable identifier). + // `.Host`/`.Hostname()` use case-sensitive capital `H` to avoid + // matching lowercase `u.host` (which `host_allowlist_requires_parse_call` + // explicitly excludes). + if lower.contains(".host_str(") { + return true; + } + if has_capital_host_accessor(text) { + return true; + } + false +} + +/// Test whether `text` contains a Go-style capital-`H` URL host accessor: +/// `.Host` (followed by whitespace or `==`/`!=`) or `.Hostname(`. +fn has_capital_host_accessor(text: &str) -> bool { + if text.contains(".Hostname(") { + return true; + } + let mut rest = text; + while let Some(pos) = rest.find(".Host") { + let after = &rest[pos + ".Host".len()..]; + // Reject `.Hostname` (handled above) and any continuation that + // would make `.Host` part of a longer identifier (`.Hostess` etc.). + let next = after.chars().next(); + let is_terminator = match next { + None => true, + Some(c) => !c.is_ascii_alphanumeric() && c != '_', + }; + if is_terminator { + // Require an equality op somewhere after the accessor so it's + // a comparison, not e.g. an assignment target. + let trimmed = after.trim_start(); + if trimmed.starts_with("==") || trimmed.starts_with("!=") { + return true; + } + } + rest = after; + } + false +} + +/// Extract the parse-call argument from a host-allowlist condition. +/// +/// Inline form (single-statement parse + check, JS/TS/Python): +/// recognises `new URL()`, `urlparse()`, `URL.parse()`, +/// `urllib.parse.urlparse()`. Returns `Some("X")` when the argument is a +/// bare identifier (with optional `&` or PHP `$` sigil stripped). +/// +/// Multi-statement form (Rust/Go): recognises the receiver of `.host_str()`, +/// case-sensitive `.Host`/`.Hostname()` and returns the receiver identifier +/// (the parsed-URL var), which is what downstream code redirects on. +/// +/// Returns `None` for nested expressions / multi-arg calls so branch +/// narrowing doesn't widen to a non-existent var. Mirrors the conservative +/// target shape used by [`extract_validation_target`]. +fn extract_host_allowlist_target(text: &str) -> Option { + let lower = text.to_ascii_lowercase(); + for probe in [ + "new url(", + "urllib.parse.urlparse(", + "urlparse(", + "url.parse(", + ] { + if let Some(idx) = lower.find(probe) { + let args_start = idx + probe.len(); + if args_start <= text.len() { + if let Some(first_arg) = first_call_arg(&text[args_start..]) { + let first_arg = first_arg.strip_prefix('&').unwrap_or(first_arg).trim(); + let first_arg = first_arg.strip_prefix('$').unwrap_or(first_arg); + if !first_arg.is_empty() && is_identifier(first_arg) { + return Some(first_arg.to_string()); + } + } + } + } + } + // Multi-statement form: receiver of the host accessor is the + // parsed-URL var. Walk the original text (case-sensitive for Go). + extract_host_accessor_receiver(text) +} + +/// Walk `text` for `.host_str(` (Rust), `.Host` followed +/// by `==`/`!=` (Go), or `.Hostname(` (Go). Returns `Some(receiver)` +/// when the receiver is a bare identifier (optionally with a `&` deref-prefix +/// stripped, e.g. Rust `&parsed.host_str()`); `None` otherwise. +fn extract_host_accessor_receiver(text: &str) -> Option { + let probes: &[(&str, bool)] = &[ + (".host_str(", false), // Rust, case-stable + (".Hostname(", false), // Go + (".Host", true), // Go, requires `==`/`!=` after + ]; + for (probe, requires_eq) in probes { + if let Some(idx) = text.find(probe) { + if *requires_eq { + let after = &text[idx + probe.len()..]; + // Reject `.Hostname` (handled by its own probe) and any + // longer-identifier continuation. + if let Some(c) = after.chars().next() + && (c.is_ascii_alphanumeric() || c == '_') + { + continue; + } + let trimmed = after.trim_start(); + if !(trimmed.starts_with("==") || trimmed.starts_with("!=")) { + continue; + } + } + let before = &text[..idx]; + // Receiver = trailing identifier of `before`, optionally + // preceded by `&` (Rust deref). `parsed.foo.host_str()` + // would yield `foo`, which is not a parse var, so we + // conservatively reject any receiver with a `.` or `::`. + let recv = trailing_identifier(before)?; + if recv.contains('.') || recv.contains(':') { + return None; + } + return Some(recv); + } + } + None +} + +/// Walk back from the end of `s` and return the trailing identifier token. +/// +/// `&parsed` → `Some("parsed")`, `foo.bar` → `Some("bar")`, +/// `()` → `None`. Used by [`extract_host_accessor_receiver`] to pull the +/// parsed-URL var out of `parsed.host_str() == ...`. +fn trailing_identifier(s: &str) -> Option { + let bytes = s.as_bytes(); + let mut end = bytes.len(); + while end > 0 { + let c = bytes[end - 1]; + if c.is_ascii_alphanumeric() || c == b'_' { + end -= 1; + } else { + break; + } + } + if end == bytes.len() { + return None; + } + let ident = &s[end..]; + if ident.is_empty() || ident.as_bytes()[0].is_ascii_digit() { + return None; + } + Some(ident.to_string()) +} + /// Check whether `text` looks like a bounded-length rejection: /// `x.len() > N`, `x.len() < N`, `x.length >= N`, etc. where `N` is an /// integer literal >= 2. Excludes `> 0` / `>= 1` / `< 1`, those are @@ -330,6 +668,28 @@ pub fn classify_condition(text: &str) -> PredicateKind { return PredicateKind::ShellMetaValidated; } + // ── Inline relative-URL validation ────────────────────────────────── + // + // `x.startsWith("/")` (JS/TS/Java/Kotlin), `x.starts_with("/")` (Rust), + // `x.startswith("/")` (Python), `strpos($x, "/") === 0` (PHP). + // The TRUE branch constrains `x` to a leading-slash relative path — + // the canonical inline open-redirect sanitiser. Matched BEFORE + // AllowlistCheck (which would otherwise capture `.starts_with(`). + if is_leading_slash_check(text) { + return PredicateKind::RelativeUrlValidated; + } + + // ── Host-allowlist URL-parse validation ───────────────────────────── + // + // `new URL(x).host === ALLOWED` (JS/TS), `urlparse(x).netloc == ALLOWED` + // (Python), etc. Matched BEFORE AllowlistCheck so the membership form + // `ALLOWED.includes(new URL(x).host)` doesn't fall through here, and + // BEFORE the generic Comparison branch so the equality operator + // doesn't classify generically. + if is_host_allowlist_check(text) { + return PredicateKind::HostAllowlistValidated; + } + // ── Allowlist / membership checks ──────────────────────────────────── if lower.contains(".includes(") || lower.contains(".include?(") @@ -552,6 +912,19 @@ pub fn classify_condition_with_target(text: &str) -> (PredicateKind, Option { + // Receiver of `.startsWith("/")` / `.startswith("/")` / + // `.starts_with("/")`, or first arg of `strpos($x, "/")`. + // Same machinery as ShellMetaValidated. + let target = extract_validation_target(text); + (kind, target) + } + PredicateKind::HostAllowlistValidated => { + // Argument of the parse call: `new URL(x).host` → `x`, + // `urlparse(x).netloc` → `x`. + let target = extract_host_allowlist_target(text); + (kind, target) + } PredicateKind::Comparison => { // `x === '/login'`, `x == 5`, `null != obj`, when exactly one // side is a literal, extract the identifier side as the target. @@ -1731,6 +2104,150 @@ mod tests { assert!(is_bounded_length_check("x.len() > 2")); assert!(is_bounded_length_check("x.len() <= 256")); } + + // ── HostAllowlistValidated ──────────────────────────────────────────── + + #[test] + fn classify_host_allowlist_js_strict_eq() { + assert_eq!( + classify_condition("new URL(target).host === ALLOWED_HOST"), + PredicateKind::HostAllowlistValidated + ); + assert_eq!( + classify_condition("new URL(target).hostname === \"trusted.example.com\""), + PredicateKind::HostAllowlistValidated + ); + assert_eq!( + classify_condition("new URL(target).origin === ALLOWED_ORIGIN"), + PredicateKind::HostAllowlistValidated + ); + } + + #[test] + fn classify_host_allowlist_python_urlparse() { + assert_eq!( + classify_condition("urlparse(target).netloc == ALLOWED_HOST"), + PredicateKind::HostAllowlistValidated + ); + assert_eq!( + classify_condition("urllib.parse.urlparse(target).hostname == \"trusted.example.com\""), + PredicateKind::HostAllowlistValidated + ); + } + + #[test] + fn target_host_allowlist_extracts_parse_arg_js() { + let (kind, target) = + classify_condition_with_target("new URL(target).host === ALLOWED_HOST"); + assert_eq!(kind, PredicateKind::HostAllowlistValidated); + assert_eq!(target.as_deref(), Some("target")); + } + + #[test] + fn target_host_allowlist_extracts_parse_arg_python() { + let (kind, target) = + classify_condition_with_target("urlparse(target).netloc == ALLOWED_HOST"); + assert_eq!(kind, PredicateKind::HostAllowlistValidated); + assert_eq!(target.as_deref(), Some("target")); + } + + #[test] + fn host_allowlist_requires_parse_call() { + // Bare `.host == X` without a parse call is not host-allowlist. + let kind = classify_condition("u.host == ALLOWED_HOST"); + assert_ne!(kind, PredicateKind::HostAllowlistValidated); + } + + #[test] + fn host_allowlist_requires_equality_op() { + // `new URL(x)` without an equality op is not host-allowlist. + let kind = classify_condition("new URL(target).host"); + assert_ne!(kind, PredicateKind::HostAllowlistValidated); + } + + // ── Multi-statement form: Rust `.host_str()` ────────────────────────── + + #[test] + fn classify_host_allowlist_rust_host_str() { + assert_eq!( + classify_condition("parsed.host_str() == Some(\"trusted.example.com\")"), + PredicateKind::HostAllowlistValidated + ); + } + + #[test] + fn target_host_allowlist_rust_host_str_extracts_receiver() { + let (kind, target) = + classify_condition_with_target("parsed.host_str() == Some(\"trusted.example.com\")"); + assert_eq!(kind, PredicateKind::HostAllowlistValidated); + assert_eq!(target.as_deref(), Some("parsed")); + } + + #[test] + fn target_host_allowlist_rust_host_str_strips_amp_deref() { + // `&parsed.host_str()` is not idiomatic but we still pull out the + // receiver via the trailing-identifier walk. + let (kind, target) = + classify_condition_with_target("&parsed.host_str() == Some(\"trusted.com\")"); + assert_eq!(kind, PredicateKind::HostAllowlistValidated); + assert_eq!(target.as_deref(), Some("parsed")); + } + + // ── Multi-statement form: Go `.Host` / `.Hostname()` ────────────────── + + #[test] + fn classify_host_allowlist_go_capital_host() { + assert_eq!( + classify_condition("parsed.Host == \"trusted.example.com\""), + PredicateKind::HostAllowlistValidated + ); + } + + #[test] + fn classify_host_allowlist_go_hostname_method() { + assert_eq!( + classify_condition("parsed.Hostname() == \"trusted.example.com\""), + PredicateKind::HostAllowlistValidated + ); + } + + #[test] + fn target_host_allowlist_go_extracts_receiver() { + let (kind, target) = + classify_condition_with_target("parsed.Host == \"trusted.example.com\""); + assert_eq!(kind, PredicateKind::HostAllowlistValidated); + assert_eq!(target.as_deref(), Some("parsed")); + } + + #[test] + fn target_host_allowlist_go_hostname_extracts_receiver() { + let (kind, target) = + classify_condition_with_target("parsed.Hostname() == \"trusted.example.com\""); + assert_eq!(kind, PredicateKind::HostAllowlistValidated); + assert_eq!(target.as_deref(), Some("parsed")); + } + + #[test] + fn host_allowlist_rejects_lowercase_host_field() { + // `.host` (lowercase) without a parse call must NOT match — that + // shape is too generic (could be any struct field named `host`). + let kind = classify_condition("u.host == ALLOWED_HOST"); + assert_ne!(kind, PredicateKind::HostAllowlistValidated); + } + + #[test] + fn host_allowlist_rejects_capital_host_without_eq() { + // `parsed.Host` used as a side-effect call argument, not a guard. + let kind = classify_condition("log(parsed.Host)"); + assert_ne!(kind, PredicateKind::HostAllowlistValidated); + } + + #[test] + fn host_allowlist_rejects_capital_host_substring_in_identifier() { + // `.Hostess` is NOT `.Host` — must not match. + let kind = classify_condition("party.Hostess == \"alice\""); + assert_ne!(kind, PredicateKind::HostAllowlistValidated); + } } #[cfg(test)] diff --git a/src/taint/ssa_transfer/events.rs b/src/taint/ssa_transfer/events.rs index eccf8755..ec424f5c 100644 --- a/src/taint/ssa_transfer/events.rs +++ b/src/taint/ssa_transfer/events.rs @@ -277,7 +277,14 @@ pub fn ssa_events_to_findings( ssa: &SsaBody, cfg: &Cfg, ) -> Vec { - type FindingDedupKey = (usize, usize, Option<(String, u32, u32)>); + // The dedup key includes `cap_bits` so the multi-gate dispatch can + // co-emit separate findings for distinct capabilities at the same + // (origin, sink) pair (e.g. PHP `header("Location: " . $url)` fires + // both HEADER_INJECTION and OPEN_REDIRECT, attributed by the gate + // filters' per-cap masks). Single-cap call sites are unaffected: + // every event in that case carries the same `sink_caps`, so the key + // collapses identically with or without the extra component. + type FindingDedupKey = (usize, usize, Option<(String, u32, u32)>, u32); let mut findings = Vec::new(); let mut seen: HashSet = HashSet::new(); @@ -345,12 +352,14 @@ pub fn ssa_events_to_findings( .as_ref() .map(|l| (l.file_rel.clone(), l.line, l.col)); for (val, caps, origins) in &event.tainted_values { - let cap_specificity = (*caps & event.sink_caps).bits().count_ones() as u8; + let effective_caps = event.sink_caps & *caps; + let cap_specificity = effective_caps.bits().count_ones() as u8; for origin in origins { if seen.insert(( origin.node.index(), event.sink_node.index(), loc_key.clone(), + effective_caps.bits(), )) { let hop_count = block_distance(ssa, origin.node, event.sink_node); let flow_steps = reconstruct_flow_path(*val, origin, event.sink_node, ssa, cfg); diff --git a/src/taint/ssa_transfer/inline.rs b/src/taint/ssa_transfer/inline.rs index c3c6d4df..07dcbf02 100644 --- a/src/taint/ssa_transfer/inline.rs +++ b/src/taint/ssa_transfer/inline.rs @@ -21,7 +21,7 @@ pub(super) const MAX_INLINE_BLOCKS: usize = 500; /// Compact cache key: per-arg-position cap bits (sorted, non-empty /// only). Origin identity is not part of the key. #[derive(Clone, Debug, PartialEq, Eq, Hash)] -pub(crate) struct ArgTaintSig(pub(super) SmallVec<[(usize, u16); 4]>); +pub(crate) struct ArgTaintSig(pub(super) SmallVec<[(usize, u32); 4]>); /// Call-site-adapted result of inline-analyzing a callee. Built fresh /// per call site so origins point to the current caller's chain. @@ -79,7 +79,7 @@ pub(crate) struct ReturnShape { impl CachedInlineShape { /// Cap bits of the return value, or zero if this shape records "no /// return taint". Used by [`inline_cache_fingerprint`]. - fn return_caps_bits(&self) -> u16 { + fn return_caps_bits(&self) -> u32 { self.0.as_ref().map(|s| s.caps.bits()).unwrap_or(0) } } @@ -101,7 +101,7 @@ pub(crate) fn inline_cache_clear_epoch(cache: &mut InlineCache) { #[allow(dead_code)] pub(crate) fn inline_cache_fingerprint( cache: &InlineCache, -) -> HashMap<(FuncKey, ArgTaintSig), u16> { +) -> HashMap<(FuncKey, ArgTaintSig), u32> { cache .iter() .map(|(k, v)| (k.clone(), v.return_caps_bits())) diff --git a/src/taint/ssa_transfer/mod.rs b/src/taint/ssa_transfer/mod.rs index c3b62771..9aa952e8 100644 --- a/src/taint/ssa_transfer/mod.rs +++ b/src/taint/ssa_transfer/mod.rs @@ -105,6 +105,18 @@ pub struct SsaTaintTransfer<'a> { /// Type facts from type analysis. /// Used for type-aware sink filtering (e.g., suppress SQL injection for int-typed values). pub type_facts: Option<&'a crate::ssa::type_facts::TypeFactResult>, + /// XML-parser config facts. Used to suppress XXE bits at parse-class + /// sinks whose receiver was provably hardened + /// (`setFeature(FEATURE_SECURE_PROCESSING, true)`, etc.). Strictly + /// additive: `None` falls back to the existing flat / gated XXE + /// classification. + pub xml_parser_config: Option<&'a crate::ssa::xml_config::XmlParserConfigResult>, + /// XPath-receiver config facts. Used to suppress XPATH_INJECTION at + /// `evaluate` / `compile` sinks whose receiver was provably bound to + /// an `XPathVariableResolver` (parameterised-XPath shape). Strictly + /// additive: `None` falls back to the existing flat / gated XPATH + /// classification. + pub xpath_config: Option<&'a crate::ssa::xpath_config::XPathConfigResult>, /// Precise per-function SSA summaries for intra-file callee resolution. /// Checked before legacy FuncSummary resolution. /// @@ -1207,6 +1219,85 @@ fn apply_branch_predicates( } } + // RelativeUrlValidated: TRUE branch is the validated path + // (`x.startsWith("/")` succeeded → `x` cannot redirect off-host). + // Cap-aware: clear `Cap::OPEN_REDIRECT` only; non-redirect sinks + // (XSS / SQLi / FILE_IO) downstream still fire on residual taint. + if kind == PredicateKind::RelativeUrlValidated && polarity { + for var in condition_vars { + let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new(); + for (val, _) in state.values.iter() { + if let Some(name) = ssa + .value_defs + .get(val.0 as usize) + .and_then(|vd| vd.var_name.as_deref()) + { + if name == var { + to_clear.push(*val); + } + } + } + for val in to_clear { + if let Some(taint) = state.get(val).cloned() { + let new_caps = taint.caps & !Cap::OPEN_REDIRECT; + if new_caps.is_empty() { + state.remove(val); + } else { + state.set( + val, + VarTaint { + caps: new_caps, + origins: taint.origins, + uses_summary: taint.uses_summary, + }, + ); + } + } + } + } + } + + // HostAllowlistValidated: TRUE branch is the validated path + // (`new URL(x).host === ALLOWED` succeeded → `x` cannot redirect off-host). + // Cap-aware: clear `Cap::OPEN_REDIRECT` only; non-redirect sinks downstream + // still fire on the residual taint caps. Mirrors the + // `RelativeUrlValidated` handler exactly, the only difference is the + // recogniser shape (multi-statement parse + host comparison instead of + // inline leading-slash check). + if kind == PredicateKind::HostAllowlistValidated && polarity { + for var in condition_vars { + let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new(); + for (val, _) in state.values.iter() { + if let Some(name) = ssa + .value_defs + .get(val.0 as usize) + .and_then(|vd| vd.var_name.as_deref()) + { + if name == var { + to_clear.push(*val); + } + } + } + for val in to_clear { + if let Some(taint) = state.get(val).cloned() { + let new_caps = taint.caps & !Cap::OPEN_REDIRECT; + if new_caps.is_empty() { + state.remove(val); + } else { + state.set( + val, + VarTaint { + caps: new_caps, + origins: taint.origins, + uses_summary: taint.uses_summary, + }, + ); + } + } + } + } + } + // ShellMetaValidated: inverted polarity, the FALSE branch (no metachar // found) is the validated path; the TRUE branch is the rejection path. // @@ -2203,6 +2294,8 @@ fn inline_analyse_callee( receiver_seed: receiver_seed.as_ref(), const_values: Some(&callee_body.opt.const_values), type_facts: Some(&callee_body.opt.type_facts), + xml_parser_config: Some(&callee_body.opt.xml_parser_config), + xpath_config: Some(&callee_body.opt.xpath_config), ssa_summaries: transfer.ssa_summaries, extra_labels: transfer.extra_labels, base_aliases: Some(&callee_body.opt.alias_result), @@ -5891,6 +5984,34 @@ fn collect_block_events( sink_caps &= !Cap::DATA_EXFIL; } + // Receiver-type-incompatibility stripping. When the receiver's type + // proves a structurally-attached cap cannot apply (e.g. an + // `LdapClient` receiver carrying an `HTML_ESCAPE` Sink label that was + // attached to the CFG node by a `*.send`/`*.json`-style suffix + // matcher), drop the offending bits *before* the type-qualified- + // resolution branch below, so that branch is reachable on the + // remaining empty `sink_caps` and can re-anchor a precise sink class + // (`LdapClient.search` → `Cap::LDAP_INJECTION`). Both the + // flow-sensitive type from `path_env` and the static type from + // `type_facts` are consulted; the static path is what enables + // closure-captured receivers (parent body → child body via + // [`crate::taint::inject_external_type_facts`]) to participate. + if let SsaOp::Call { + receiver: Some(rv), .. + } = &inst.op + { + if let Some(ref env) = state.path_env { + if let Some(kind) = env.get(*rv).types.as_singleton() { + sink_caps &= !receiver_incompatible_sink_caps(&kind, sink_caps); + } + } + if let Some(tf) = transfer.type_facts { + if let Some(kind) = tf.get_type(*rv) { + sink_caps &= !receiver_incompatible_sink_caps(kind, sink_caps); + } + } + } + // Type-qualified sink resolution: when normal sink resolution found nothing, // try using the receiver's inferred type to construct a qualified callee name. if sink_caps.is_empty() { @@ -5954,6 +6075,39 @@ fn collect_block_events( } } + // ADD XXE on opt-in. When the receiver was constructed + // with an explicit external-entity opt-in + // (`new XMLParser({ processEntities: true })`, + // `lxml.etree.XMLParser(resolve_entities=True)`), the subsequent + // `parser.parse(xml)` is an XXE flow even though the callee + // carries no flat XXE rule (fast-xml-parser and lxml are + // XXE-safe by default). Runs BEFORE the empty check below so a + // previously-empty sink_caps becomes non-empty and downstream + // emission proceeds. The complementary `xxe_safe` suppress path + // still runs after this; a call where the receiver was both + // opt-in AND later hardened by a setter results in net-zero + // (suppress strips what we added). + if let SsaOp::Call { + receiver: Some(rv), + callee: callee_str, + .. + } = &inst.op + { + if let Some(xc) = transfer.xml_parser_config { + if xc.is_unsafe_explicit(*rv) { + let suffix = callee_str + .rsplit(['.', ':']) + .next() + .unwrap_or(callee_str.as_str()); + // `feed` covers Python lxml incremental parsing + // (`parser.feed(body); parser.close()`). + if matches!(suffix, "parse" | "parseString" | "parseFromString" | "feed") { + sink_caps |= Cap::XXE; + } + } + } + } + if sink_caps.is_empty() { // Callback pattern: check if callee has source_to_callback and the // actual callback argument has a matching param_to_sink. @@ -6055,17 +6209,89 @@ fn collect_block_events( continue; } - // Receiver type incompatibility check. - // If the receiver's flow-sensitive type proves it cannot be the kind - // of object the sink expects (e.g., Int receiver → not an HTTP response - // sink), strip those sink caps. - if let Some(ref env) = state.path_env { + if sink_caps.is_empty() { + continue; + } + + // XXE config-fact suppression. A parse-class sink whose receiver + // was provably hardened (`setFeature(FEATURE_SECURE_PROCESSING, + // true)`, `setExpandEntityReferences(false)`, etc.) is not an XXE + // flow. Drop the bit before downstream sink emission. Runs after + // type-qualified resolution / module alias resolution so the XXE + // bit added by `XmlParser.parse` resolution is visible here. + if sink_caps.intersects(Cap::XXE) { if let SsaOp::Call { receiver: Some(rv), .. } = &inst.op { - if let Some(kind) = env.get(*rv).types.as_singleton() { - sink_caps &= !receiver_incompatible_sink_caps(&kind, sink_caps); + if let Some(xc) = transfer.xml_parser_config { + if crate::ssa::xml_config::xxe_safe(Some(*rv), xc) { + sink_caps &= !Cap::XXE; + } + } + } + } + if sink_caps.is_empty() { + continue; + } + + // XPath resolver-binding suppression. An XPath `evaluate` / + // `compile` sink whose receiver was provably bound to an + // `XPathVariableResolver` is treated as parameterised and the + // XPATH_INJECTION bit is stripped. Mirrors the XXE config-fact + // shape above. Only fires when the receiver also carries + // `TypeKind::XPathClient` (gates the suppression behind + // type-fact disambiguation so a generic `obj.evaluate(...)` + // matched as XPATH_INJECTION via name-only labelling does not + // accidentally clear). + if sink_caps.intersects(Cap::XPATH_INJECTION) { + if let SsaOp::Call { + receiver: Some(rv), .. + } = &inst.op + { + if let Some(xpc) = transfer.xpath_config { + let receiver_is_xpath = transfer + .type_facts + .and_then(|tf| tf.get_type(*rv)) + .map(|kind| matches!(kind, crate::ssa::type_facts::TypeKind::XPathClient)) + .unwrap_or(false); + if receiver_is_xpath && crate::ssa::xpath_config::xpath_safe(Some(*rv), xpc) { + sink_caps &= !Cap::XPATH_INJECTION; + } + } + } + } + if sink_caps.is_empty() { + continue; + } + + // Prototype-pollution suppression (flow-sensitive). + // `Object.create(null)` produces a `NullPrototypeObject`-typed + // value; subscript writes to such an object cannot pollute + // `Object.prototype` because there is no prototype chain. + // Receiver SsaValue is read off the synthetic `__index_set__` + // Call op; phi joins downgrade to `Unknown` via `TypeFact::meet` + // so an if/else where only one branch initialises with + // `Object.create(null)` keeps the PROTOTYPE_POLLUTION bit on + // the unsafe path. + if sink_caps.intersects(Cap::PROTOTYPE_POLLUTION) { + if let SsaOp::Call { + callee, + receiver: Some(rv), + .. + } = &inst.op + { + if callee == "__index_set__" { + let receiver_is_null_proto = transfer + .type_facts + .and_then(|tf| tf.get_type(*rv)) + .map(|kind| { + matches!(kind, crate::ssa::type_facts::TypeKind::NullPrototypeObject) + }) + .unwrap_or(false); + if receiver_is_null_proto { + sink_caps &= !Cap::PROTOTYPE_POLLUTION; + } } } } @@ -6436,7 +6662,7 @@ fn pick_primary_sink_sites( return Vec::new(); }; let mut out: Vec = Vec::new(); - let mut seen: HashSet<(String, u32, u32, u16)> = HashSet::new(); + let mut seen: HashSet<(String, u32, u32, u32)> = HashSet::new(); for (param_idx, sites) in param_to_sink_sites { let Some(arg_vals) = args.get(*param_idx) else { continue; @@ -6475,7 +6701,7 @@ fn pick_primary_sink_sites_from_resolved( return Vec::new(); } let mut out: Vec = Vec::new(); - let mut seen: HashSet<(String, u32, u32, u16)> = HashSet::new(); + let mut seen: HashSet<(String, u32, u32, u32)> = HashSet::new(); for (_, sites) in param_to_sink_sites { for site in sites { if site.line == 0 { @@ -8127,13 +8353,36 @@ fn type_safe_for_taint_sink(kind: &crate::ssa::type_facts::TypeKind, cap: Cap) - fn receiver_incompatible_sink_caps(kind: &crate::ssa::type_facts::TypeKind, sink_caps: Cap) -> Cap { use crate::ssa::type_facts::TypeKind; let mut remove = Cap::empty(); - // HTML_ESCAPE requires HTTP response-like receiver - if sink_caps.intersects(Cap::HTML_ESCAPE) { + // HTML_ESCAPE / OPEN_REDIRECT / HEADER_INJECTION all require an HTTP + // response-like receiver: each is a write-side rule that fires when + // attacker data is rendered into / written onto the response stream + // (`*.send` / `*.redirect` / `*.setHeader` / etc.). Receivers proven + // to be a different class — directory-service connections (LDAP), + // database connections, file handles, in-memory collections, query- + // builder objects, URL values, HTTP clients (request-side), and so on + // — cannot host these sinks even when a same-named matcher + // (`*.send`, `*.set`, `*.append`) attaches the label by suffix. + let response_like_caps = Cap::HTML_ESCAPE | Cap::OPEN_REDIRECT | Cap::HEADER_INJECTION; + if sink_caps.intersects(response_like_caps) { match kind { TypeKind::HttpResponse => {} // compatible TypeKind::Unknown | TypeKind::Object => {} // could be response _ => { - remove |= Cap::HTML_ESCAPE; + remove |= sink_caps & response_like_caps; + } + } + } + // LDAP_INJECTION strictly requires a directory-service receiver. + // Non-LdapClient receivers carrying the cap by accident (e.g. a + // generic `*.search` suffix matcher firing on a Vec/HashMap) get the + // bit stripped. Unknown/Object stay untouched so type-fact gaps + // don't silently drop real sinks. + if sink_caps.intersects(Cap::LDAP_INJECTION) { + match kind { + TypeKind::LdapClient => {} // compatible + TypeKind::Unknown | TypeKind::Object => {} // could be ldap + _ => { + remove |= Cap::LDAP_INJECTION; } } } @@ -9364,7 +9613,7 @@ fn resolve_callee_full( } // 0.5) Cross-file SSA summaries (GlobalSummaries.ssa_by_key) with - // optional Phase-6 hierarchy fan-out. + // optional class-hierarchy fan-out. // // When the call has an authoritative receiver type AND // `GlobalSummaries::install_hierarchy` has been called AND the @@ -9468,7 +9717,7 @@ fn resolve_callee_full( } } - // 2) Global same-language (FuncSummary path) with Phase-6 hierarchy + // 2) Global same-language (FuncSummary path) with class-hierarchy // fan-out. Same semantics as step 0.5 but on coarse FuncSummary // entries, the SSA path missed because no implementer had an SSA // summary, so we widen the FuncSummary lookup symmetrically. diff --git a/src/taint/ssa_transfer/summary_extract.rs b/src/taint/ssa_transfer/summary_extract.rs index ce8dbf41..9d21a1af 100644 --- a/src/taint/ssa_transfer/summary_extract.rs +++ b/src/taint/ssa_transfer/summary_extract.rs @@ -246,6 +246,8 @@ pub fn extract_ssa_func_summary_full( receiver_seed: None, const_values: None, type_facts: local_type_facts_ref, + xml_parser_config: None, + xpath_config: None, ssa_summaries, extra_labels: None, base_aliases: None, @@ -792,6 +794,8 @@ pub fn extract_ssa_func_summary_full( receiver_seed: None, const_values: None, type_facts: local_type_facts_ref, + xml_parser_config: None, + xpath_config: None, ssa_summaries, extra_labels: None, base_aliases: None, diff --git a/src/taint/ssa_transfer/tests.rs b/src/taint/ssa_transfer/tests.rs index 46be20db..f789eaa7 100644 --- a/src/taint/ssa_transfer/tests.rs +++ b/src/taint/ssa_transfer/tests.rs @@ -93,6 +93,8 @@ mod cross_file_tests { type_facts: crate::ssa::type_facts::TypeFactResult { facts: std::collections::HashMap::new(), }, + xml_parser_config: crate::ssa::xml_config::XmlParserConfigResult::default(), + xpath_config: crate::ssa::xpath_config::XPathConfigResult::default(), alias_result: crate::ssa::alias::BaseAliasResult::empty(), points_to: crate::ssa::heap::PointsToResult::empty(), module_aliases: std::collections::HashMap::new(), @@ -251,7 +253,7 @@ mod inline_cache_epoch_tests { ArgTaintSig(SmallVec::new()) } - fn shape(caps_bits: u16) -> CachedInlineShape { + fn shape(caps_bits: u32) -> CachedInlineShape { CachedInlineShape(Some(ReturnShape { caps: Cap::from_bits_retain(caps_bits), internal_origins: SmallVec::new(), @@ -448,7 +450,7 @@ mod binding_key_tests { // ── seed_lookup ──────────────────────────────────────────────────── - fn taint(caps: u16) -> VarTaint { + fn taint(caps: u32) -> VarTaint { VarTaint { caps: Cap::from_bits_truncate(caps), origins: smallvec![], @@ -989,6 +991,8 @@ mod goto_succ_propagation_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -1079,6 +1083,8 @@ mod goto_succ_propagation_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -1516,10 +1522,10 @@ mod receiver_candidates_field_proj_tests { #[test] fn field_proj_receiver_walks_to_typed_root_in_go() { - // Go is not Rust, so pre-Phase-4 the candidate walk would have - // returned ONLY the immediate receiver (v2 = FieldProj). With - // We walk through FieldProj.receiver to recover v0 (the - // typed root `c`). + // Go is not Rust, so before the FieldProj walk fix the candidate + // walk would have returned ONLY the immediate receiver + // (v2 = FieldProj). We now walk through FieldProj.receiver to + // recover v0 (the typed root `c`). let body = body_with_field_proj_chain(); let cands = super::super::receiver_candidates_for_type_lookup(SsaValue(2), Some(&body), Lang::Go); @@ -1709,7 +1715,7 @@ mod fanout_merge_tests { ]; let m = merge_resolved_summaries_fanout(a, b); - let mut sorted: Vec<(usize, u16)> = m + let mut sorted: Vec<(usize, u32)> = m .param_to_sink .iter() .map(|(i, c)| (*i, c.bits())) @@ -2032,6 +2038,8 @@ mod field_write_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -2114,6 +2122,8 @@ mod field_write_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -2180,6 +2190,8 @@ mod field_write_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -2324,6 +2336,8 @@ mod field_write_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -2420,6 +2434,8 @@ mod container_elem_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -2697,6 +2713,8 @@ mod container_elem_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -2833,6 +2851,8 @@ mod container_elem_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -3387,6 +3407,8 @@ mod field_taint_origin_cap_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -3673,6 +3695,8 @@ mod pointer_lattice_worklist_tests { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, diff --git a/src/taint/tests.rs b/src/taint/tests.rs index 847a5af4..da952f32 100644 --- a/src/taint/tests.rs +++ b/src/taint/tests.rs @@ -45,6 +45,8 @@ fn ssa_analyse_rust(src: &[u8]) -> Vec { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -1669,10 +1671,10 @@ fn cpp_builder_chain_const_host_silent() { /// inline member-function bodies inside a /// `class_specifier` must be extracted as separate functions and -/// intra-file calls must resolve to their bodies. Pre-Phase-4, the -/// `class_specifier` AST kind was unmapped in cpp KINDS, so the CFG -/// walker treated the entire class as a leaf `Seq` node and never -/// descended into inline methods. +/// intra-file calls must resolve to their bodies. Before the cpp KINDS +/// fix the `class_specifier` AST kind was unmapped, so the CFG walker +/// treated the entire class as a leaf `Seq` node and never descended +/// into inline methods. #[test] fn cpp_inline_class_method_resolves() { let src = b"#include \nclass Inner {\npublic:\n void run(const char* arg) { std::system(arg); }\n};\nint main() {\n char* input = std::getenv(\"X\");\n Inner inner;\n inner.run(input);\n return 0;\n}\n"; @@ -3768,6 +3770,8 @@ fn assert_ssa_integration(src: &[u8]) { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -3904,6 +3908,8 @@ fn integ_php_echo_simple_var() { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, @@ -3972,6 +3978,8 @@ fn integ_c_curl_handle_ssrf() { receiver_seed: None, const_values: None, type_facts: None, + xml_parser_config: None, + xpath_config: None, ssa_summaries: None, extra_labels: None, base_aliases: None, diff --git a/src/utils/config.rs b/src/utils/config.rs index 0f3ac35c..30d9eefa 100644 --- a/src/utils/config.rs +++ b/src/utils/config.rs @@ -74,6 +74,14 @@ pub enum CapName { Crypto, /// Request-bound identifier not yet ownership-checked. UnauthorizedId, + DataExfil, + LdapInjection, + XpathInjection, + HeaderInjection, + OpenRedirect, + Ssti, + Xxe, + PrototypePollution, All, } @@ -94,6 +102,14 @@ impl CapName { Self::CodeExec => Cap::CODE_EXEC, Self::Crypto => Cap::CRYPTO, Self::UnauthorizedId => Cap::UNAUTHORIZED_ID, + Self::DataExfil => Cap::DATA_EXFIL, + Self::LdapInjection => Cap::LDAP_INJECTION, + Self::XpathInjection => Cap::XPATH_INJECTION, + Self::HeaderInjection => Cap::HEADER_INJECTION, + Self::OpenRedirect => Cap::OPEN_REDIRECT, + Self::Ssti => Cap::SSTI, + Self::Xxe => Cap::XXE, + Self::PrototypePollution => Cap::PROTOTYPE_POLLUTION, Self::All => Cap::all(), } } @@ -115,6 +131,14 @@ impl fmt::Display for CapName { Self::CodeExec => write!(f, "code_exec"), Self::Crypto => write!(f, "crypto"), Self::UnauthorizedId => write!(f, "unauthorized_id"), + Self::DataExfil => write!(f, "data_exfil"), + Self::LdapInjection => write!(f, "ldap_injection"), + Self::XpathInjection => write!(f, "xpath_injection"), + Self::HeaderInjection => write!(f, "header_injection"), + Self::OpenRedirect => write!(f, "open_redirect"), + Self::Ssti => write!(f, "ssti"), + Self::Xxe => write!(f, "xxe"), + Self::PrototypePollution => write!(f, "prototype_pollution"), Self::All => write!(f, "all"), } } @@ -137,11 +161,21 @@ impl FromStr for CapName { "code_exec" => Ok(Self::CodeExec), "crypto" => Ok(Self::Crypto), "unauthorized_id" => Ok(Self::UnauthorizedId), + "data_exfil" | "data_exfiltration" => Ok(Self::DataExfil), + "ldap_injection" | "ldapi" => Ok(Self::LdapInjection), + "xpath_injection" | "xpathi" => Ok(Self::XpathInjection), + "header_injection" | "crlf" | "response_splitting" => Ok(Self::HeaderInjection), + "open_redirect" | "redirect" => Ok(Self::OpenRedirect), + "ssti" | "template_injection" => Ok(Self::Ssti), + "xxe" => Ok(Self::Xxe), + "prototype_pollution" | "proto_pollution" => Ok(Self::PrototypePollution), "all" => Ok(Self::All), _ => Err(format!( "invalid cap name: {s:?} (expected env_var, html_escape, shell_escape, \ url_encode, json_parse, file_io, fmt_string, sql_query, deserialize, \ - ssrf, code_exec, crypto, unauthorized_id, all)" + ssrf, code_exec, crypto, unauthorized_id, data_exfil, ldap_injection, \ + xpath_injection, header_injection, open_redirect, ssti, xxe, \ + prototype_pollution, all)" )), } } diff --git a/tests/fixtures/cross_file_js_redirect/expectations.json b/tests/fixtures/cross_file_js_redirect/expectations.json index 9e9b22ff..4fa5712e 100644 --- a/tests/fixtures/cross_file_js_redirect/expectations.json +++ b/tests/fixtures/cross_file_js_redirect/expectations.json @@ -1,6 +1,6 @@ { "required_findings": [ - { "id_prefix": "taint-unsanitised-flow", "min_count": 1 } + { "id_prefix": "taint-open-redirect", "min_count": 1 } ], "forbidden_findings": [], "noise_budget": { diff --git a/tests/fixtures/header_injection/go/safe_set_header.go b/tests/fixtures/header_injection/go/safe_set_header.go new file mode 100644 index 00000000..e44b22e4 --- /dev/null +++ b/tests/fixtures/header_injection/go/safe_set_header.go @@ -0,0 +1,18 @@ +// Safe: query value routed through the project-local `stripCRLF` helper +// before being written to the response header. +package main + +import ( + "net/http" + "strings" +) + +func stripCRLF(raw string) string { + return strings.ReplaceAll(strings.ReplaceAll(raw, "\r", ""), "\n", "") +} + +func handler(w http.ResponseWriter, r *http.Request) { + lang := r.URL.Query().Get("lang") + safe := stripCRLF(lang) + w.Header().Set("X-Lang", safe) +} diff --git a/tests/fixtures/header_injection/go/unsafe_set_header.go b/tests/fixtures/header_injection/go/unsafe_set_header.go new file mode 100644 index 00000000..13a13350 --- /dev/null +++ b/tests/fixtures/header_injection/go/unsafe_set_header.go @@ -0,0 +1,12 @@ +// Unsafe: net/http `ResponseWriter.Header().Set` receives a value built from +// `r.URL.Query().Get`. HEADER_INJECTION fires on the value argument. +package main + +import ( + "net/http" +) + +func handler(w http.ResponseWriter, r *http.Request) { + lang := r.URL.Query().Get("lang") + w.Header().Set("X-Lang", lang) +} diff --git a/tests/fixtures/header_injection/java/SafeSetHeader.java b/tests/fixtures/header_injection/java/SafeSetHeader.java new file mode 100644 index 00000000..59030d68 --- /dev/null +++ b/tests/fixtures/header_injection/java/SafeSetHeader.java @@ -0,0 +1,16 @@ +// Safe: request parameter routed through the project-local `stripCRLF` +// helper before being written to the response header. +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; + +public class SafeSetHeader { + public static String stripCRLF(String raw) { + return raw.replace("\r", "").replace("\n", ""); + } + + public void handle(HttpServletRequest req, HttpServletResponse res) { + String lang = req.getParameter("lang"); + String safe = stripCRLF(lang); + res.setHeader("X-Lang", safe); + } +} diff --git a/tests/fixtures/header_injection/java/UnsafeSetHeader.java b/tests/fixtures/header_injection/java/UnsafeSetHeader.java new file mode 100644 index 00000000..d311ac4c --- /dev/null +++ b/tests/fixtures/header_injection/java/UnsafeSetHeader.java @@ -0,0 +1,11 @@ +// Unsafe: HttpServletResponse.setHeader receives a value built from a +// request parameter. HEADER_INJECTION fires on the value argument. +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; + +public class UnsafeSetHeader { + public void handle(HttpServletRequest req, HttpServletResponse res) { + String lang = req.getParameter("lang"); + res.setHeader("X-Lang", lang); + } +} diff --git a/tests/fixtures/header_injection/javascript/safe_set_header.js b/tests/fixtures/header_injection/javascript/safe_set_header.js new file mode 100644 index 00000000..43ba8e12 --- /dev/null +++ b/tests/fixtures/header_injection/javascript/safe_set_header.js @@ -0,0 +1,14 @@ +// Safe: req.query.lang routed through the project-local `stripCRLF` helper +// before being written to the response header. +function stripCRLF(raw) { + return raw.replace(/[\r\n]/g, ''); +} + +function handler(req, res) { + const lang = req.query.lang; + const safe = stripCRLF(lang); + res.setHeader('X-Lang', safe); + res.end(); +} + +module.exports = handler; diff --git a/tests/fixtures/header_injection/javascript/safe_subscript_set.js b/tests/fixtures/header_injection/javascript/safe_subscript_set.js new file mode 100644 index 00000000..3e0f78f0 --- /dev/null +++ b/tests/fixtures/header_injection/javascript/safe_subscript_set.js @@ -0,0 +1,14 @@ +// Safe: req.query.lang routed through the project-local `stripCRLF` helper +// (a registered HEADER_INJECTION sanitizer) before the subscript-set, so +// taint-header-injection stays clean. +function stripCRLF(raw) { + return raw.replace(/[\r\n]/g, ''); +} + +function handler(req, res) { + const lang = req.query.lang; + res.headers["X-Forwarded-By"] = stripCRLF(lang); + res.end(); +} + +module.exports = handler; diff --git a/tests/fixtures/header_injection/javascript/unsafe_set_header.js b/tests/fixtures/header_injection/javascript/unsafe_set_header.js new file mode 100644 index 00000000..36ec5dbb --- /dev/null +++ b/tests/fixtures/header_injection/javascript/unsafe_set_header.js @@ -0,0 +1,9 @@ +// Unsafe: Express `res.setHeader` receives a value built from req.query. +// HEADER_INJECTION fires on the value argument. +function handler(req, res) { + const lang = req.query.lang; + res.setHeader('X-Lang', lang); + res.end(); +} + +module.exports = handler; diff --git a/tests/fixtures/header_injection/javascript/unsafe_subscript_set.js b/tests/fixtures/header_injection/javascript/unsafe_subscript_set.js new file mode 100644 index 00000000..95288756 --- /dev/null +++ b/tests/fixtures/header_injection/javascript/unsafe_subscript_set.js @@ -0,0 +1,11 @@ +// Unsafe: tainted req.query value flows into the bare-subscript header set +// `res.headers["X-Forwarded-By"] = lang`. The LHS-subscript classification +// path matches `res.headers` as a HEADER_INJECTION sink so this form fires +// alongside the explicit `setHeader` / `res.set` method-call shapes. +function handler(req, res) { + const lang = req.query.lang; + res.headers["X-Forwarded-By"] = lang; + res.end(); +} + +module.exports = handler; diff --git a/tests/fixtures/header_injection/php/safe_set_header.php b/tests/fixtures/header_injection/php/safe_set_header.php new file mode 100644 index 00000000..33e588c2 --- /dev/null +++ b/tests/fixtures/header_injection/php/safe_set_header.php @@ -0,0 +1,10 @@ + String { + raw.replace('\r', "").replace('\n', "") +} + +fn handler(response: &mut http::Response<()>) { + let lang = env::var("LANG").unwrap_or_default(); + let safe = strip_crlf(&lang); + let value = http::HeaderValue::from_str(&safe).unwrap(); + response.headers_mut().insert("X-Lang", value); +} diff --git a/tests/fixtures/header_injection/rust/unsafe_set_header.rs b/tests/fixtures/header_injection/rust/unsafe_set_header.rs new file mode 100644 index 00000000..f558a7f3 --- /dev/null +++ b/tests/fixtures/header_injection/rust/unsafe_set_header.rs @@ -0,0 +1,9 @@ +// Unsafe: tainted env value flows into `response.headers_mut().insert`. +// HEADER_INJECTION fires on the value argument. +use std::env; + +fn handler(response: &mut http::Response<()>) { + let lang = env::var("LANG").unwrap_or_default(); + let value = http::HeaderValue::from_str(&lang).unwrap(); + response.headers_mut().insert("X-Lang", value); +} diff --git a/tests/fixtures/header_injection/typescript/safe_set_header.ts b/tests/fixtures/header_injection/typescript/safe_set_header.ts new file mode 100644 index 00000000..73da08e7 --- /dev/null +++ b/tests/fixtures/header_injection/typescript/safe_set_header.ts @@ -0,0 +1,12 @@ +// Safe: req.query.lang routed through `stripCRLF` before being written to +// the response header. +function stripCRLF(raw: string): string { + return raw.replace(/[\r\n]/g, ''); +} + +export function handler(req: any, res: any): void { + const lang: string = req.query.lang; + const safe: string = stripCRLF(lang); + res.setHeader('X-Lang', safe); + res.end(); +} diff --git a/tests/fixtures/header_injection/typescript/safe_subscript_set.ts b/tests/fixtures/header_injection/typescript/safe_subscript_set.ts new file mode 100644 index 00000000..6f1af514 --- /dev/null +++ b/tests/fixtures/header_injection/typescript/safe_subscript_set.ts @@ -0,0 +1,12 @@ +// Safe: req.query.lang routed through the project-local `stripCRLF` helper +// (a registered HEADER_INJECTION sanitizer) before the subscript-set, so +// taint-header-injection stays clean. +function stripCRLF(raw: string): string { + return raw.replace(/[\r\n]/g, ''); +} + +export function handler(req: any, res: any): void { + const lang: string = req.query.lang; + res.headers["X-Forwarded-By"] = stripCRLF(lang); + res.end(); +} diff --git a/tests/fixtures/header_injection/typescript/unsafe_set_header.ts b/tests/fixtures/header_injection/typescript/unsafe_set_header.ts new file mode 100644 index 00000000..fe8ade5f --- /dev/null +++ b/tests/fixtures/header_injection/typescript/unsafe_set_header.ts @@ -0,0 +1,7 @@ +// Unsafe: Express `res.setHeader` receives a value built from req.query. +// HEADER_INJECTION fires on the value argument. +export function handler(req: any, res: any): void { + const lang: string = req.query.lang; + res.setHeader('X-Lang', lang); + res.end(); +} diff --git a/tests/fixtures/header_injection/typescript/unsafe_subscript_set.ts b/tests/fixtures/header_injection/typescript/unsafe_subscript_set.ts new file mode 100644 index 00000000..abf1cad1 --- /dev/null +++ b/tests/fixtures/header_injection/typescript/unsafe_subscript_set.ts @@ -0,0 +1,9 @@ +// Unsafe: tainted req.query value flows into the bare-subscript header set +// `res.headers["X-Forwarded-By"] = lang`. The LHS-subscript classification +// path matches `res.headers` as a HEADER_INJECTION sink so this form fires +// alongside the explicit `setHeader` / `res.set` method-call shapes. +export function handler(req: any, res: any): void { + const lang: string = req.query.lang; + res.headers["X-Forwarded-By"] = lang; + res.end(); +} diff --git a/tests/fixtures/internal_redirect_taint/expectations.json b/tests/fixtures/internal_redirect_taint/expectations.json index b12b92f1..a7556925 100644 --- a/tests/fixtures/internal_redirect_taint/expectations.json +++ b/tests/fixtures/internal_redirect_taint/expectations.json @@ -1,6 +1,6 @@ { "required_findings": [ - { "id_prefix": "taint-unsanitised-flow", "min_count": 1 } + { "id_prefix": "taint-open-redirect", "min_count": 1 } ], "forbidden_findings": [], "noise_budget": { diff --git a/tests/fixtures/ldap_injection/c/baseline_constant_ldap.c b/tests/fixtures/ldap_injection/c/baseline_constant_ldap.c new file mode 100644 index 00000000..9dbb403b --- /dev/null +++ b/tests/fixtures/ldap_injection/c/baseline_constant_ldap.c @@ -0,0 +1,12 @@ +/* Baseline: filter is a string literal, no LDAP_INJECTION finding. */ +#include + +int do_lookup(LDAP *ld) { + LDAPMessage *res = NULL; + return ldap_search_ext_s( + ld, + "ou=people,dc=example,dc=com", + LDAP_SCOPE_SUBTREE, + "(objectClass=person)", + NULL, 0, NULL, NULL, NULL, 0, &res); +} diff --git a/tests/fixtures/ldap_injection/c/safe_ldap_search.c b/tests/fixtures/ldap_injection/c/safe_ldap_search.c new file mode 100644 index 00000000..a79b9a10 --- /dev/null +++ b/tests/fixtures/ldap_injection/c/safe_ldap_search.c @@ -0,0 +1,19 @@ +/* Safe: project-local sanitize_ldap_filter (matches the developer-named + * `sanitize_*` Sanitizer rule) clears caps on the user value before it + * reaches ldap_search_ext_s. */ +#include +#include + +extern char *sanitize_ldap_filter(const char *raw); + +int do_lookup(LDAP *ld) { + char *user_filter = getenv("USER_FILTER"); + char *safe = sanitize_ldap_filter(user_filter); + LDAPMessage *res = NULL; + return ldap_search_ext_s( + ld, + "ou=people,dc=example,dc=com", + LDAP_SCOPE_SUBTREE, + safe, + NULL, 0, NULL, NULL, NULL, 0, &res); +} diff --git a/tests/fixtures/ldap_injection/c/unsafe_ldap_search.c b/tests/fixtures/ldap_injection/c/unsafe_ldap_search.c new file mode 100644 index 00000000..3f97ad49 --- /dev/null +++ b/tests/fixtures/ldap_injection/c/unsafe_ldap_search.c @@ -0,0 +1,15 @@ +/* Unsafe: tainted env-string passed straight as the LDAP filter argument + * to ldap_search_ext_s. LDAP_INJECTION fires on the filter (arg 3). */ +#include +#include + +int do_lookup(LDAP *ld) { + char *user_filter = getenv("USER_FILTER"); + LDAPMessage *res = NULL; + return ldap_search_ext_s( + ld, + "ou=people,dc=example,dc=com", + LDAP_SCOPE_SUBTREE, + user_filter, + NULL, 0, NULL, NULL, NULL, 0, &res); +} diff --git a/tests/fixtures/ldap_injection/cpp/baseline_constant_ldap.cpp b/tests/fixtures/ldap_injection/cpp/baseline_constant_ldap.cpp new file mode 100644 index 00000000..da17dfff --- /dev/null +++ b/tests/fixtures/ldap_injection/cpp/baseline_constant_ldap.cpp @@ -0,0 +1,12 @@ +// Baseline: literal filter, no taint reaches the sink. +#include + +int do_lookup(LDAP* ld) { + LDAPMessage* res = nullptr; + return ldap_search_ext_s( + ld, + "ou=people,dc=example,dc=com", + LDAP_SCOPE_SUBTREE, + "(objectClass=person)", + nullptr, 0, nullptr, nullptr, nullptr, 0, &res); +} diff --git a/tests/fixtures/ldap_injection/cpp/safe_ldap_search.cpp b/tests/fixtures/ldap_injection/cpp/safe_ldap_search.cpp new file mode 100644 index 00000000..1e4e0eaa --- /dev/null +++ b/tests/fixtures/ldap_injection/cpp/safe_ldap_search.cpp @@ -0,0 +1,18 @@ +// Safe: developer-named sanitize_* helper clears caps on the user value +// before it reaches ldap_search_ext_s. +#include +#include + +extern const char* sanitize_ldap_filter(const char* raw); + +int do_lookup(LDAP* ld) { + const char* user_filter = std::getenv("USER_FILTER"); + const char* safe = sanitize_ldap_filter(user_filter); + LDAPMessage* res = nullptr; + return ldap_search_ext_s( + ld, + "ou=people,dc=example,dc=com", + LDAP_SCOPE_SUBTREE, + safe, + nullptr, 0, nullptr, nullptr, nullptr, 0, &res); +} diff --git a/tests/fixtures/ldap_injection/cpp/unsafe_ldap_search.cpp b/tests/fixtures/ldap_injection/cpp/unsafe_ldap_search.cpp new file mode 100644 index 00000000..46a8d4ff --- /dev/null +++ b/tests/fixtures/ldap_injection/cpp/unsafe_ldap_search.cpp @@ -0,0 +1,15 @@ +// Unsafe: tainted env value passed straight as the LDAP filter argument to +// ldap_search_ext_s. LDAP_INJECTION fires on the filter argument (position 3). +#include +#include + +int do_lookup(LDAP* ld) { + const char* user_filter = std::getenv("USER_FILTER"); + LDAPMessage* res = nullptr; + return ldap_search_ext_s( + ld, + "ou=people,dc=example,dc=com", + LDAP_SCOPE_SUBTREE, + user_filter, + nullptr, 0, nullptr, nullptr, nullptr, 0, &res); +} diff --git a/tests/fixtures/ldap_injection/go/baseline_constant_ldap.go b/tests/fixtures/ldap_injection/go/baseline_constant_ldap.go new file mode 100644 index 00000000..ac3761ea --- /dev/null +++ b/tests/fixtures/ldap_injection/go/baseline_constant_ldap.go @@ -0,0 +1,20 @@ +// Baseline: filter is a literal string, no taint reaches NewSearchRequest. +package ldap_baseline + +import ( + "github.com/go-ldap/ldap/v3" +) + +func Lookup() { + conn, _ := ldap.DialURL("ldap://example.com") + req := ldap.NewSearchRequest( + "ou=people,dc=example,dc=com", + ldap.ScopeWholeSubtree, + ldap.NeverDerefAliases, + 0, 0, false, + "(objectClass=person)", + []string{"cn"}, + nil, + ) + conn.Search(req) +} diff --git a/tests/fixtures/ldap_injection/go/safe_ldap_search.go b/tests/fixtures/ldap_injection/go/safe_ldap_search.go new file mode 100644 index 00000000..805f2364 --- /dev/null +++ b/tests/fixtures/ldap_injection/go/safe_ldap_search.go @@ -0,0 +1,27 @@ +// Safe: ldap.EscapeFilter applies RFC 4515 escaping before the user value +// is interpolated into the filter. Sanitizer(LDAP_INJECTION) clears the cap. +package ldap_safe + +import ( + "fmt" + "net/http" + + "github.com/go-ldap/ldap/v3" +) + +func Lookup(w http.ResponseWriter, r *http.Request) { + conn, _ := ldap.DialURL("ldap://example.com") + user := r.FormValue("user") + safe := ldap.EscapeFilter(user) + filter := fmt.Sprintf("(uid=%s)", safe) + req := ldap.NewSearchRequest( + "ou=people,dc=example,dc=com", + ldap.ScopeWholeSubtree, + ldap.NeverDerefAliases, + 0, 0, false, + filter, + []string{"cn"}, + nil, + ) + conn.Search(req) +} diff --git a/tests/fixtures/ldap_injection/go/unsafe_ldap_search.go b/tests/fixtures/ldap_injection/go/unsafe_ldap_search.go new file mode 100644 index 00000000..f02eca38 --- /dev/null +++ b/tests/fixtures/ldap_injection/go/unsafe_ldap_search.go @@ -0,0 +1,28 @@ +// Unsafe: form value concatenated into an LDAP filter passed to +// ldap.NewSearchRequest, then executed via conn.Search. The construction +// call is tagged Cap::LDAP_INJECTION on the filter argument so the finding +// fires here regardless of the eventual conn.Search execution site. +package ldap_unsafe + +import ( + "fmt" + "net/http" + + "github.com/go-ldap/ldap/v3" +) + +func Lookup(w http.ResponseWriter, r *http.Request) { + conn, _ := ldap.DialURL("ldap://example.com") + user := r.FormValue("user") + filter := fmt.Sprintf("(uid=%s)", user) + req := ldap.NewSearchRequest( + "ou=people,dc=example,dc=com", + ldap.ScopeWholeSubtree, + ldap.NeverDerefAliases, + 0, 0, false, + filter, + []string{"cn"}, + nil, + ) + conn.Search(req) +} diff --git a/tests/fixtures/ldap_injection/java/BaselineConstantLdap.java b/tests/fixtures/ldap_injection/java/BaselineConstantLdap.java new file mode 100644 index 00000000..774cad6f --- /dev/null +++ b/tests/fixtures/ldap_injection/java/BaselineConstantLdap.java @@ -0,0 +1,14 @@ +// Baseline: the filter is a compile-time constant; no taint reaches the sink +// and no LDAP_INJECTION finding fires. Guards the rule against firing on +// safe-by-construction call sites that simply happen to hit a search API. +import javax.naming.directory.DirContext; +import javax.naming.directory.SearchControls; + +public class BaselineConstantLdap { + private DirContext ctx; + + public Object lookup() throws Exception { + String filter = "(objectClass=person)"; + return ctx.search("ou=people,dc=example,dc=com", filter, new SearchControls()); + } +} diff --git a/tests/fixtures/ldap_injection/java/SafeLdapSearch.java b/tests/fixtures/ldap_injection/java/SafeLdapSearch.java new file mode 100644 index 00000000..51a649f7 --- /dev/null +++ b/tests/fixtures/ldap_injection/java/SafeLdapSearch.java @@ -0,0 +1,19 @@ +// Safe: the user-supplied substring is run through Spring LDAP's +// LdapEncoder.filterEncode (RFC 4515 escape) before being assembled into the +// filter. The Sanitizer(LDAP_INJECTION) clears the cap and the sink does not +// fire. +import javax.naming.directory.DirContext; +import javax.naming.directory.SearchControls; +import javax.servlet.http.HttpServletRequest; +import org.springframework.ldap.support.LdapEncoder; + +public class SafeLdapSearch { + private DirContext ctx; + + public Object lookup(HttpServletRequest req) throws Exception { + String user = req.getParameter("user"); + String safe = LdapEncoder.filterEncode(user); + String filter = "(uid=" + safe + ")"; + return ctx.search("ou=people,dc=example,dc=com", filter, new SearchControls()); + } +} diff --git a/tests/fixtures/ldap_injection/java/UnsafeLdapSearch.java b/tests/fixtures/ldap_injection/java/UnsafeLdapSearch.java new file mode 100644 index 00000000..758d04d4 --- /dev/null +++ b/tests/fixtures/ldap_injection/java/UnsafeLdapSearch.java @@ -0,0 +1,17 @@ +// Unsafe: attacker-controlled username concatenated into an LDAP filter passed +// to DirContext.search. The receiver `ctx` carries TypeKind::LdapClient via +// the declared `DirContext` type so type-qualified resolution rewrites the +// callee to `LdapClient.search` and the LDAP_INJECTION sink fires. +import javax.naming.directory.DirContext; +import javax.naming.directory.SearchControls; +import javax.servlet.http.HttpServletRequest; + +public class UnsafeLdapSearch { + private DirContext ctx; + + public Object lookup(HttpServletRequest req) throws Exception { + String user = req.getParameter("user"); + String filter = "(uid=" + user + ")"; + return ctx.search("ou=people,dc=example,dc=com", filter, new SearchControls()); + } +} diff --git a/tests/fixtures/ldap_injection/javascript/baseline_constant_ldap.js b/tests/fixtures/ldap_injection/javascript/baseline_constant_ldap.js new file mode 100644 index 00000000..85cb7ed6 --- /dev/null +++ b/tests/fixtures/ldap_injection/javascript/baseline_constant_ldap.js @@ -0,0 +1,11 @@ +// Baseline: filter is a literal constant; no taint reaches the search call. +const ldap = require('ldapjs'); + +const client = ldap.createClient({ url: 'ldap://example.com' }); + +function lookup(_req, res) { + const filter = '(objectClass=person)'; + client.search('ou=people,dc=example,dc=com', { filter: filter }, (err) => { res.json({ ok: !err }); }); +} + +module.exports = lookup; diff --git a/tests/fixtures/ldap_injection/javascript/safe_ldap_search.js b/tests/fixtures/ldap_injection/javascript/safe_ldap_search.js new file mode 100644 index 00000000..dce2f3f0 --- /dev/null +++ b/tests/fixtures/ldap_injection/javascript/safe_ldap_search.js @@ -0,0 +1,16 @@ +// Safe: ldap-escape's `filter` helper escapes the user-controlled substring +// before it lands in the filter expression. Mirrors the unsafe sibling's +// bound-variable shape so only the sanitiser introduction differs. +const ldap = require('ldapjs'); +const ldapEscape = require('ldap-escape'); + +const client = ldap.createClient({ url: 'ldap://example.com' }); + +function lookup(req, res) { + const user = req.query.user; + const safe = ldapEscape(user); + const filter = '(uid=' + safe + ')'; + client.search('ou=people,dc=example,dc=com', { filter: filter }, (err) => { res.json({ ok: !err }); }); +} + +module.exports = lookup; diff --git a/tests/fixtures/ldap_injection/javascript/unsafe_ldap_search.js b/tests/fixtures/ldap_injection/javascript/unsafe_ldap_search.js new file mode 100644 index 00000000..4a298e84 --- /dev/null +++ b/tests/fixtures/ldap_injection/javascript/unsafe_ldap_search.js @@ -0,0 +1,16 @@ +// Unsafe: ldapjs `client.search` receives a filter assembled from req.query. +// Bound-variable idiom: the closure-captured `client` carries +// `TypeKind::LdapClient` (forwarded from the top-level body to the function +// body by `taint::inject_external_type_facts`), so type-qualified receiver +// resolution rewrites `client.search` → `LdapClient.search`. +const ldap = require('ldapjs'); + +const client = ldap.createClient({ url: 'ldap://example.com' }); + +function lookup(req, res) { + const user = req.query.user; + const filter = '(uid=' + user + ')'; + client.search('ou=people,dc=example,dc=com', { filter: filter }, (err) => { res.json({ ok: !err }); }); +} + +module.exports = lookup; diff --git a/tests/fixtures/ldap_injection/php/baseline_constant_ldap.php b/tests/fixtures/ldap_injection/php/baseline_constant_ldap.php new file mode 100644 index 00000000..e4724201 --- /dev/null +++ b/tests/fixtures/ldap_injection/php/baseline_constant_ldap.php @@ -0,0 +1,4 @@ + { res.json({ ok: !err }); }); +} diff --git a/tests/fixtures/ldap_injection/typescript/safe_ldap_search.ts b/tests/fixtures/ldap_injection/typescript/safe_ldap_search.ts new file mode 100644 index 00000000..99151ff0 --- /dev/null +++ b/tests/fixtures/ldap_injection/typescript/safe_ldap_search.ts @@ -0,0 +1,14 @@ +// Safe: ldap-escape's `filter` helper escapes the user-controlled substring +// before it lands in the filter expression. Mirrors the unsafe sibling's +// bound-variable shape so only the sanitiser introduction differs. +import * as ldap from 'ldapjs'; +import ldapEscape from 'ldap-escape'; + +const client = ldap.createClient({ url: 'ldap://example.com' }); + +export function lookup(req: any, res: any) { + const user = req.query.user; + const safe = ldapEscape(user); + const filter = '(uid=' + safe + ')'; + client.search('ou=people,dc=example,dc=com', { filter: filter }, (err: any) => { res.json({ ok: !err }); }); +} diff --git a/tests/fixtures/ldap_injection/typescript/unsafe_ldap_search.ts b/tests/fixtures/ldap_injection/typescript/unsafe_ldap_search.ts new file mode 100644 index 00000000..59ebf399 --- /dev/null +++ b/tests/fixtures/ldap_injection/typescript/unsafe_ldap_search.ts @@ -0,0 +1,14 @@ +// Unsafe: ldapjs `client.search` receives a filter assembled from req.query. +// Bound-variable idiom: the closure-captured `client` carries +// `TypeKind::LdapClient` (forwarded from the top-level body to the function +// body by `taint::inject_external_type_facts`), so type-qualified receiver +// resolution rewrites `client.search` → `LdapClient.search`. +import * as ldap from 'ldapjs'; + +const client = ldap.createClient({ url: 'ldap://example.com' }); + +export function lookup(req: any, res: any) { + const user = req.query.user; + const filter = '(uid=' + user + ')'; + client.search('ou=people,dc=example,dc=com', { filter: filter }, (err: any) => { res.json({ ok: !err }); }); +} diff --git a/tests/fixtures/open_redirect/go/safe_host_allowlist_redirect.go b/tests/fixtures/open_redirect/go/safe_host_allowlist_redirect.go new file mode 100644 index 00000000..acdf3350 --- /dev/null +++ b/tests/fixtures/open_redirect/go/safe_host_allowlist_redirect.go @@ -0,0 +1,27 @@ +package handler + +import ( + "net/http" + "net/url" +) + +const allowedHost = "trusted.example.com" + +// Safe: tainted query value parsed via `url.Parse` then host pinned against +// `allowedHost`. Multi-statement form — `parsed, err := url.Parse(target)` +// happens on a separate line from the `parsed.Host == allowedHost` check. +// Recognised by PredicateKind::HostAllowlistValidated which clears +// Cap::OPEN_REDIRECT on the validated branch. +func SafeHostAllowlist(w http.ResponseWriter, r *http.Request) { + target := r.URL.Query().Get("next") + parsed, err := url.Parse(target) + if err != nil { + http.Redirect(w, r, "/", http.StatusFound) + return + } + if parsed.Host == allowedHost { + http.Redirect(w, r, parsed.String(), http.StatusFound) + return + } + http.Redirect(w, r, "/", http.StatusFound) +} diff --git a/tests/fixtures/open_redirect/go/safe_redirect.go b/tests/fixtures/open_redirect/go/safe_redirect.go new file mode 100644 index 00000000..03343953 --- /dev/null +++ b/tests/fixtures/open_redirect/go/safe_redirect.go @@ -0,0 +1,23 @@ +package handler + +import ( + "net/http" + "strings" +) + +// validateRedirectUrl is a project-local allowlist helper: it requires a +// leading `/` to limit redirects to same-origin paths. Registered as a +// Sanitizer(OPEN_REDIRECT) by `labels/go.rs`. +func validateRedirectUrl(raw string) string { + if strings.HasPrefix(raw, "/") { + return raw + } + return "/" +} + +// Safe: query arg routed through validateRedirectUrl allowlist. +func Safe(w http.ResponseWriter, r *http.Request) { + target := r.URL.Query().Get("next") + safe := validateRedirectUrl(target) + http.Redirect(w, r, safe, http.StatusFound) +} diff --git a/tests/fixtures/open_redirect/go/safe_relative_redirect.go b/tests/fixtures/open_redirect/go/safe_relative_redirect.go new file mode 100644 index 00000000..d075e531 --- /dev/null +++ b/tests/fixtures/open_redirect/go/safe_relative_redirect.go @@ -0,0 +1,26 @@ +package handler + +import ( + "net/http" + "strings" +) + +// ensureRelativeUrl enforces a leading `/` and rejects scheme-prefixed or +// protocol-relative values (`//evil.example`). Registered as a +// Sanitizer(OPEN_REDIRECT) by `labels/go.rs`. +func ensureRelativeUrl(raw string) string { + if !strings.HasPrefix(raw, "/") { + return "/" + } + if strings.HasPrefix(raw, "//") { + return "/" + } + return raw +} + +// Safe: query arg routed through ensureRelativeUrl (relative-only). +func SafeRelative(w http.ResponseWriter, r *http.Request) { + target := r.URL.Query().Get("next") + safe := ensureRelativeUrl(target) + http.Redirect(w, r, safe, http.StatusFound) +} diff --git a/tests/fixtures/open_redirect/go/unsafe_redirect.go b/tests/fixtures/open_redirect/go/unsafe_redirect.go new file mode 100644 index 00000000..b0edba22 --- /dev/null +++ b/tests/fixtures/open_redirect/go/unsafe_redirect.go @@ -0,0 +1,9 @@ +package handler + +import "net/http" + +// Unsafe: query arg flows directly into http.Redirect (3rd arg URL). +func Unsafe(w http.ResponseWriter, r *http.Request) { + target := r.URL.Query().Get("next") + http.Redirect(w, r, target, http.StatusFound) +} diff --git a/tests/fixtures/open_redirect/java/SafeInlineRelative.java b/tests/fixtures/open_redirect/java/SafeInlineRelative.java new file mode 100644 index 00000000..be71074c --- /dev/null +++ b/tests/fixtures/open_redirect/java/SafeInlineRelative.java @@ -0,0 +1,16 @@ +// Phase 05 follow-up: inline relative-URL sanitiser. Developer didn't +// extract `ensureRelativeUrl` into a named helper; the leading-slash +// check is inline. RelativeUrlValidated predicate strips OPEN_REDIRECT +// on the true branch so the redirect call is not flagged. +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; + +public class SafeInlineRelative { + public void handle(HttpServletRequest req, HttpServletResponse res) throws IOException { + String target = req.getParameter("next"); + if (target != null && target.startsWith("/")) { + res.sendRedirect(target); + } + } +} diff --git a/tests/fixtures/open_redirect/java/SafeRedirect.java b/tests/fixtures/open_redirect/java/SafeRedirect.java new file mode 100644 index 00000000..cb03f993 --- /dev/null +++ b/tests/fixtures/open_redirect/java/SafeRedirect.java @@ -0,0 +1,17 @@ +// Safe: request param routed through `validateRedirectUrl` allowlist +// before being passed to sendRedirect. +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; + +public class SafeRedirect { + public static String validateRedirectUrl(String raw) { + return raw != null && raw.startsWith("/") ? raw : "/"; + } + + public void handle(HttpServletRequest req, HttpServletResponse res) throws IOException { + String target = req.getParameter("next"); + String safe = validateRedirectUrl(target); + res.sendRedirect(safe); + } +} diff --git a/tests/fixtures/open_redirect/java/SafeRelativeRedirect.java b/tests/fixtures/open_redirect/java/SafeRelativeRedirect.java new file mode 100644 index 00000000..7c9ca180 --- /dev/null +++ b/tests/fixtures/open_redirect/java/SafeRelativeRedirect.java @@ -0,0 +1,20 @@ +// Safe: request param routed through `ensureRelativeUrl` which enforces a +// leading `/` and rejects scheme-prefixed values (relative-only path). +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; + +public class SafeRelativeRedirect { + public static String ensureRelativeUrl(String raw) { + if (raw == null || !raw.startsWith("/") || raw.startsWith("//")) { + return "/"; + } + return raw; + } + + public void handle(HttpServletRequest req, HttpServletResponse res) throws IOException { + String target = req.getParameter("next"); + String safe = ensureRelativeUrl(target); + res.sendRedirect(safe); + } +} diff --git a/tests/fixtures/open_redirect/java/UnsafeRedirect.java b/tests/fixtures/open_redirect/java/UnsafeRedirect.java new file mode 100644 index 00000000..973b15eb --- /dev/null +++ b/tests/fixtures/open_redirect/java/UnsafeRedirect.java @@ -0,0 +1,11 @@ +// Unsafe: HttpServletResponse.sendRedirect receives a request-supplied URL. +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; + +public class UnsafeRedirect { + public void handle(HttpServletRequest req, HttpServletResponse res) throws IOException { + String target = req.getParameter("next"); + res.sendRedirect(target); + } +} diff --git a/tests/fixtures/open_redirect/java/UnsafeSpringRedirect.java b/tests/fixtures/open_redirect/java/UnsafeSpringRedirect.java new file mode 100644 index 00000000..174e3af8 --- /dev/null +++ b/tests/fixtures/open_redirect/java/UnsafeSpringRedirect.java @@ -0,0 +1,16 @@ +// Phase 05 follow-up: Spring MVC controller-return open-redirect. +// `return "redirect:" + url` is Spring's view-name convention; the +// returned String becomes a 302 to whatever follows the prefix, so a +// request-supplied URL flows straight into the redirect target. +import org.springframework.stereotype.Controller; +import org.springframework.web.bind.annotation.RequestMapping; +import org.springframework.web.bind.annotation.RequestParam; + +@Controller +public class UnsafeSpringRedirect { + @RequestMapping("/login/post") + public String afterLogin(@RequestParam("next") String next, javax.servlet.http.HttpServletRequest req) { + String target = req.getParameter("next"); + return "redirect:" + target; + } +} diff --git a/tests/fixtures/open_redirect/javascript/safe_host_allowlist_redirect.js b/tests/fixtures/open_redirect/javascript/safe_host_allowlist_redirect.js new file mode 100644 index 00000000..14e2d36d --- /dev/null +++ b/tests/fixtures/open_redirect/javascript/safe_host_allowlist_redirect.js @@ -0,0 +1,16 @@ +// Safe: req.query.next routed through `new URL(...).host === ALLOWED` +// host-allowlist gate before reaching res.redirect. Recognised by +// PredicateKind::HostAllowlistValidated which clears Cap::OPEN_REDIRECT +// on the validated branch. +const ALLOWED_HOST = "trusted.example.com"; + +function handler(req, res) { + const target = req.query.next; + if (new URL(target).host === ALLOWED_HOST) { + res.redirect(target); + return; + } + res.redirect("/"); +} + +module.exports = handler; diff --git a/tests/fixtures/open_redirect/javascript/safe_redirect.js b/tests/fixtures/open_redirect/javascript/safe_redirect.js new file mode 100644 index 00000000..f63e9892 --- /dev/null +++ b/tests/fixtures/open_redirect/javascript/safe_redirect.js @@ -0,0 +1,13 @@ +// Safe: req.query.next routed through `validateRedirectUrl` allowlist +// before being passed to res.redirect. +function validateRedirectUrl(raw) { + return raw.startsWith('/') ? raw : '/'; +} + +function handler(req, res) { + const target = req.query.next; + const safe = validateRedirectUrl(target); + res.redirect(safe); +} + +module.exports = handler; diff --git a/tests/fixtures/open_redirect/javascript/safe_relative_redirect.js b/tests/fixtures/open_redirect/javascript/safe_relative_redirect.js new file mode 100644 index 00000000..b057b16e --- /dev/null +++ b/tests/fixtures/open_redirect/javascript/safe_relative_redirect.js @@ -0,0 +1,16 @@ +// Safe: req.query.next routed through `ensureRelativeUrl` which enforces +// a leading `/` and rejects scheme-prefixed values (relative-only path). +function ensureRelativeUrl(raw) { + if (typeof raw !== 'string' || !raw.startsWith('/') || raw.startsWith('//')) { + return '/'; + } + return raw; +} + +function handler(req, res) { + const target = req.query.next; + const safe = ensureRelativeUrl(target); + res.redirect(safe); +} + +module.exports = handler; diff --git a/tests/fixtures/open_redirect/javascript/unsafe_redirect.js b/tests/fixtures/open_redirect/javascript/unsafe_redirect.js new file mode 100644 index 00000000..b87bf86a --- /dev/null +++ b/tests/fixtures/open_redirect/javascript/unsafe_redirect.js @@ -0,0 +1,7 @@ +// Unsafe: req.query.next flows directly into res.redirect. +function handler(req, res) { + const target = req.query.next; + res.redirect(target); +} + +module.exports = handler; diff --git a/tests/fixtures/open_redirect/php/safe_redirect.php b/tests/fixtures/open_redirect/php/safe_redirect.php new file mode 100644 index 00000000..20c2e472 --- /dev/null +++ b/tests/fixtures/open_redirect/php/safe_redirect.php @@ -0,0 +1,13 @@ + HttpResponse { + let next = env::var("NEXT").unwrap_or_default(); + HttpResponse::Ok().header("Content-Type", next).finish() +} diff --git a/tests/fixtures/open_redirect/rust/safe_host_allowlist_redirect.rs b/tests/fixtures/open_redirect/rust/safe_host_allowlist_redirect.rs new file mode 100644 index 00000000..66a19681 --- /dev/null +++ b/tests/fixtures/open_redirect/rust/safe_host_allowlist_redirect.rs @@ -0,0 +1,19 @@ +// Safe: tainted env value parsed via `url::Url::parse` then host pinned +// against `ALLOWED_HOST`. Multi-statement form — `parsed = Url::parse(x)` +// happens on a separate line from the `parsed.host_str() == Some(ALLOWED)` +// check. Recognised by PredicateKind::HostAllowlistValidated which clears +// Cap::OPEN_REDIRECT on the validated branch. +use axum::response::Redirect; +use std::env; +use url::Url; + +const ALLOWED_HOST: &str = "trusted.example.com"; + +fn bounce() -> Redirect { + let next = env::var("NEXT").unwrap_or_default(); + let parsed = Url::parse(&next).unwrap(); + if parsed.host_str() == Some(ALLOWED_HOST) { + return Redirect::to(parsed.as_str()); + } + Redirect::permanent("/") +} diff --git a/tests/fixtures/open_redirect/rust/safe_redirect.rs b/tests/fixtures/open_redirect/rust/safe_redirect.rs new file mode 100644 index 00000000..be76eb87 --- /dev/null +++ b/tests/fixtures/open_redirect/rust/safe_redirect.rs @@ -0,0 +1,18 @@ +// Safe: tainted value routed through `validate_redirect_url` allowlist +// before being passed to `Redirect::to`. +use axum::response::Redirect; +use std::env; + +fn validate_redirect_url(raw: &str) -> String { + if raw.starts_with('/') { + raw.to_string() + } else { + "/".to_string() + } +} + +fn bounce() -> Redirect { + let next = env::var("NEXT").unwrap_or_default(); + let safe = validate_redirect_url(&next); + Redirect::to(&safe) +} diff --git a/tests/fixtures/open_redirect/rust/safe_relative_redirect.rs b/tests/fixtures/open_redirect/rust/safe_relative_redirect.rs new file mode 100644 index 00000000..4e2f7982 --- /dev/null +++ b/tests/fixtures/open_redirect/rust/safe_relative_redirect.rs @@ -0,0 +1,18 @@ +// Safe: tainted value routed through `ensure_relative_url` which enforces +// a leading `/` and rejects scheme-prefixed or protocol-relative values +// (relative-only path). +use axum::response::Redirect; +use std::env; + +fn ensure_relative_url(raw: &str) -> String { + if !raw.starts_with('/') || raw.starts_with("//") { + return "/".to_string(); + } + raw.to_string() +} + +fn bounce() -> Redirect { + let next = env::var("NEXT").unwrap_or_default(); + let safe = ensure_relative_url(&next); + Redirect::permanent(&safe) +} diff --git a/tests/fixtures/open_redirect/rust/unsafe_actix_location.rs b/tests/fixtures/open_redirect/rust/unsafe_actix_location.rs new file mode 100644 index 00000000..0d31e540 --- /dev/null +++ b/tests/fixtures/open_redirect/rust/unsafe_actix_location.rs @@ -0,0 +1,11 @@ +// Unsafe: tainted env value flows directly into actix-web's +// `HttpResponse::Found().header("Location", url)` builder. Without an +// allowlist check, a tainted URL is the actix open-redirect vector. +use actix_web::HttpResponse; +use std::env; + +fn bounce() -> HttpResponse { + let next = env::var("NEXT").unwrap_or_default(); + let resp = HttpResponse::Found().header("Location", next); + resp.finish() +} diff --git a/tests/fixtures/open_redirect/rust/unsafe_actix_location_chained.rs b/tests/fixtures/open_redirect/rust/unsafe_actix_location_chained.rs new file mode 100644 index 00000000..bd0f449c --- /dev/null +++ b/tests/fixtures/open_redirect/rust/unsafe_actix_location_chained.rs @@ -0,0 +1,13 @@ +// Unsafe: tainted env value flows into actix-web's +// `HttpResponse::Found().header("Location", url)` builder, then chained +// `.finish()` returns the response in one expression. The chained +// `.finish()` is the outer call; without chained inner-gate rebinding +// the outer `.finish()` swallows classification and the inner `.header` +// open-redirect gate never fires. +use actix_web::HttpResponse; +use std::env; + +fn bounce() -> HttpResponse { + let next = env::var("NEXT").unwrap_or_default(); + HttpResponse::Found().header("Location", next).finish() +} diff --git a/tests/fixtures/open_redirect/rust/unsafe_redirect.rs b/tests/fixtures/open_redirect/rust/unsafe_redirect.rs new file mode 100644 index 00000000..b32f62b6 --- /dev/null +++ b/tests/fixtures/open_redirect/rust/unsafe_redirect.rs @@ -0,0 +1,9 @@ +// Unsafe: tainted env value flows directly into `Redirect::to`, the axum +// open-redirect entry point. +use axum::response::Redirect; +use std::env; + +fn bounce() -> Redirect { + let next = env::var("NEXT").unwrap_or_default(); + Redirect::to(&next) +} diff --git a/tests/fixtures/open_redirect/typescript/safe_host_allowlist_redirect.ts b/tests/fixtures/open_redirect/typescript/safe_host_allowlist_redirect.ts new file mode 100644 index 00000000..d9e8f7dd --- /dev/null +++ b/tests/fixtures/open_redirect/typescript/safe_host_allowlist_redirect.ts @@ -0,0 +1,14 @@ +// Safe: req.query.next routed through `new URL(...).hostname === ALLOWED` +// host-allowlist gate before reaching res.redirect. Recognised by +// PredicateKind::HostAllowlistValidated which clears Cap::OPEN_REDIRECT +// on the validated branch. +const ALLOWED_HOST: string = "trusted.example.com"; + +export function handler(req: any, res: any): void { + const target: string = req.query.next; + if (new URL(target).hostname === ALLOWED_HOST) { + res.redirect(target); + return; + } + res.redirect("/"); +} diff --git a/tests/fixtures/open_redirect/typescript/safe_redirect.ts b/tests/fixtures/open_redirect/typescript/safe_redirect.ts new file mode 100644 index 00000000..4c3743cd --- /dev/null +++ b/tests/fixtures/open_redirect/typescript/safe_redirect.ts @@ -0,0 +1,10 @@ +// Safe: req.query.next routed through `validateRedirectUrl` allowlist. +function validateRedirectUrl(raw: string): string { + return raw.startsWith('/') ? raw : '/'; +} + +export function handler(req: any, res: any): void { + const target: string = req.query.next; + const safe: string = validateRedirectUrl(target); + res.redirect(safe); +} diff --git a/tests/fixtures/open_redirect/typescript/safe_relative_redirect.ts b/tests/fixtures/open_redirect/typescript/safe_relative_redirect.ts new file mode 100644 index 00000000..4c0fe4e7 --- /dev/null +++ b/tests/fixtures/open_redirect/typescript/safe_relative_redirect.ts @@ -0,0 +1,14 @@ +// Safe: req.query.next routed through `ensureRelativeUrl` which enforces +// a leading `/` and rejects scheme-prefixed values (relative-only path). +function ensureRelativeUrl(raw: string): string { + if (!raw.startsWith('/') || raw.startsWith('//')) { + return '/'; + } + return raw; +} + +export function handler(req: any, res: any): void { + const target: string = req.query.next; + const safe: string = ensureRelativeUrl(target); + res.redirect(safe); +} diff --git a/tests/fixtures/open_redirect/typescript/unsafe_redirect.ts b/tests/fixtures/open_redirect/typescript/unsafe_redirect.ts new file mode 100644 index 00000000..f8e56548 --- /dev/null +++ b/tests/fixtures/open_redirect/typescript/unsafe_redirect.ts @@ -0,0 +1,5 @@ +// Unsafe: req.query.next flows directly into res.redirect. +export function handler(req: any, res: any): void { + const target: string = req.query.next; + res.redirect(target); +} diff --git a/tests/fixtures/prototype_pollution/full/safe_allowlist.js b/tests/fixtures/prototype_pollution/full/safe_allowlist.js new file mode 100644 index 00000000..87b348ed --- /dev/null +++ b/tests/fixtures/prototype_pollution/full/safe_allowlist.js @@ -0,0 +1,13 @@ +// Phase 09: allowlist guard restricts the key to a known-safe constant +// set on the true arm of the `if`, so the enclosed assignment cannot +// reach `__proto__` / `constructor` even though `userKey` is tainted. +function handler(req, res) { + const target = {}; + const userKey = req.query.k; + if (userKey === "name" || userKey === "id") { + target[userKey] = req.query.v; + } + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/full/safe_object_create_null.js b/tests/fixtures/prototype_pollution/full/safe_object_create_null.js new file mode 100644 index 00000000..5e1c65ea --- /dev/null +++ b/tests/fixtures/prototype_pollution/full/safe_object_create_null.js @@ -0,0 +1,11 @@ +// Phase 09: `Object.create(null)` produces a null-prototype receiver +// that has no `Object.prototype` to mutate, so writes through any key +// (including `__proto__`) cannot pollute the global prototype chain. +function handler(req, res) { + const target = Object.create(null); + const userKey = req.query.k; + target[userKey] = req.query.v; + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/full/safe_reject_list.js b/tests/fixtures/prototype_pollution/full/safe_reject_list.js new file mode 100644 index 00000000..8de03a59 --- /dev/null +++ b/tests/fixtures/prototype_pollution/full/safe_reject_list.js @@ -0,0 +1,15 @@ +// Phase 09: reject-list guard suppresses prototype pollution. The +// dangerous-key path terminates with `return`, so the assignment that +// follows only runs when `userKey` is provably not `__proto__` / +// `constructor` / `prototype`. +function handler(req, res) { + const target = {}; + const userKey = req.query.k; + if (userKey === "__proto__" || userKey === "constructor" || userKey === "prototype") { + return; + } + target[userKey] = req.query.v; + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/full/unsafe_dynamic_key.js b/tests/fixtures/prototype_pollution/full/unsafe_dynamic_key.js new file mode 100644 index 00000000..9e74a921 --- /dev/null +++ b/tests/fixtures/prototype_pollution/full/unsafe_dynamic_key.js @@ -0,0 +1,11 @@ +// Phase 09: tainted *key* in `obj[key] = val` is the prototype-pollution +// channel. When `req.query.k` resolves to `__proto__` / `constructor`, the +// assignment mutates `Object.prototype` globally. +function handler(req, res) { + const target = {}; + const userKey = req.query.k; + target[userKey] = req.query.v; + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/full/unsafe_partial_null_proto.js b/tests/fixtures/prototype_pollution/full/unsafe_partial_null_proto.js new file mode 100644 index 00000000..2465f369 --- /dev/null +++ b/tests/fixtures/prototype_pollution/full/unsafe_partial_null_proto.js @@ -0,0 +1,19 @@ +// Phase 09 flow-sensitive null-prototype guard. `target` is only +// initialised with `Object.create(null)` on one branch; the else branch +// leaves it as a plain object whose prototype chain is mutable. The +// prior AST-scan suppressor matched any same-function `Object.create(null)` +// assignment and silenced both branches; the SSA TypeFacts path joins +// to Unknown at the phi and keeps PROTOTYPE_POLLUTION on the unsafe path. +function handler(req, res) { + let target; + if (req.query.safe) { + target = Object.create(null); + } else { + target = {}; + } + const userKey = req.query.k; + target[userKey] = req.query.v; + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/safe_bare_extend_class.js b/tests/fixtures/prototype_pollution/javascript/safe_bare_extend_class.js new file mode 100644 index 00000000..bc6e06b5 --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/safe_bare_extend_class.js @@ -0,0 +1,14 @@ +// Safe: Backbone-style class extension shares the `extend` suffix but +// passes an object literal as arg 0, never the literal `true` deep flag. +// The bare `extend` SinkGate uses `LiteralOnly` activation so this call +// does not produce a PROTOTYPE_POLLUTION finding. +const Backbone = require('backbone'); + +const UserModel = Backbone.Model.extend({ + defaults: { name: '', id: 0 }, + initialize: function () { + this.set('createdAt', Date.now()); + }, +}); + +module.exports = UserModel; diff --git a/tests/fixtures/prototype_pollution/javascript/safe_bare_extend_dynamic.js b/tests/fixtures/prototype_pollution/javascript/safe_bare_extend_dynamic.js new file mode 100644 index 00000000..b101932a --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/safe_bare_extend_dynamic.js @@ -0,0 +1,14 @@ +// Safe: bare `extend` invoked with a dynamic flag value at arg 0. Without +// literal evidence that the deep-merge form is in use, the `LiteralOnly` +// gate suppresses (no conservative ALL_ARGS_PAYLOAD fire). This avoids +// over-firing on shallow `extend(target, src)` shapes (Underscore-style) +// where arg 0 is the target object, not a deep flag. +const { extend } = require('some-utility'); + +function handler(req, res) { + const target = {}; + extend(target, req.body); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/safe_lodash_merge_const.js b/tests/fixtures/prototype_pollution/javascript/safe_lodash_merge_const.js new file mode 100644 index 00000000..10f42451 --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/safe_lodash_merge_const.js @@ -0,0 +1,11 @@ +// Safe: lodash `_.merge` invoked with a constant-source object. No taint +// reaches the merge so PROTOTYPE_POLLUTION does not fire. +const _ = require('lodash'); + +function build() { + const target = {}; + _.merge(target, { a: 1, b: 2 }); + return target; +} + +module.exports = build; diff --git a/tests/fixtures/prototype_pollution/javascript/safe_object_assign_const.js b/tests/fixtures/prototype_pollution/javascript/safe_object_assign_const.js new file mode 100644 index 00000000..8d15e87b --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/safe_object_assign_const.js @@ -0,0 +1,9 @@ +// Safe: Object.assign with a constant-source object literal. No taint +// reaches the merge so PROTOTYPE_POLLUTION does not fire. +function build() { + const target = {}; + Object.assign(target, { x: 1, y: 2 }); + return target; +} + +module.exports = build; diff --git a/tests/fixtures/prototype_pollution/javascript/safe_set_value_const.js b/tests/fixtures/prototype_pollution/javascript/safe_set_value_const.js new file mode 100644 index 00000000..e0b9660a --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/safe_set_value_const.js @@ -0,0 +1,11 @@ +// Safe: `set-value` invoked with constant key + literal value. No tainted +// flow into the path or value position, no PROTOTYPE_POLLUTION. +const setValue = require('set-value'); + +function handler(req, res) { + const target = {}; + setValue(target, "name", "alice"); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/unsafe_bare_extend_deep.js b/tests/fixtures/prototype_pollution/javascript/unsafe_bare_extend_deep.js new file mode 100644 index 00000000..5e877b1a --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/unsafe_bare_extend_deep.js @@ -0,0 +1,14 @@ +// Unsafe: jQuery's deep-merge `extend` imported as a bound name. Bare +// `extend(true, target, src)` with attacker-controlled `req.body` as a +// source argument can rewrite `Object.prototype` via `__proto__` keys in +// the merged input. PROTOTYPE_POLLUTION fires via the `LiteralOnly` gate +// keyed on the literal `true` deep-flag at arg 0. +const { extend } = require('jquery'); + +function handler(req, res) { + const target = {}; + extend(true, target, req.body); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/unsafe_dot_prop_set.js b/tests/fixtures/prototype_pollution/javascript/unsafe_dot_prop_set.js new file mode 100644 index 00000000..c736aed8 --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/unsafe_dot_prop_set.js @@ -0,0 +1,12 @@ +// Unsafe: `dot-prop` standalone helper (CVE-2020-8116) invoked with an +// attacker-controlled `path`. Tainted path `__proto__.polluted` walks +// the prototype chain because dot-prop did not block prototype keys. +const dotProp = require('dot-prop'); + +function handler(req, res) { + const target = {}; + dotProp.set(target, req.body.path, req.body.value); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/unsafe_jsonpath_set.js b/tests/fixtures/prototype_pollution/javascript/unsafe_jsonpath_set.js new file mode 100644 index 00000000..b3ae32a8 --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/unsafe_jsonpath_set.js @@ -0,0 +1,12 @@ +// Unsafe: jsonpath `jp.set(obj, path, value)` invoked with an +// attacker-controlled `path`. Tainted path with `__proto__` segments +// pollutes the prototype chain. +const jp = require('jsonpath'); + +function handler(req, res) { + const target = {}; + jp.set(target, req.body.path, req.body.value); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/unsafe_lodash_merge.js b/tests/fixtures/prototype_pollution/javascript/unsafe_lodash_merge.js new file mode 100644 index 00000000..bd6ad85e --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/unsafe_lodash_merge.js @@ -0,0 +1,12 @@ +// Unsafe: lodash `_.merge` invoked with attacker-controlled `req.body` as +// the source argument. Tainted `__proto__` / `constructor` keys can rewrite +// Object.prototype globally. PROTOTYPE_POLLUTION fires. +const _ = require('lodash'); + +function handler(req, res) { + const target = {}; + _.merge(target, req.body); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/unsafe_object_assign.js b/tests/fixtures/prototype_pollution/javascript/unsafe_object_assign.js new file mode 100644 index 00000000..51ab1256 --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/unsafe_object_assign.js @@ -0,0 +1,8 @@ +// Unsafe: Object.assign with attacker-controlled `req.body` source. +function handler(req, res) { + const target = {}; + Object.assign(target, req.body); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/javascript/unsafe_set_value.js b/tests/fixtures/prototype_pollution/javascript/unsafe_set_value.js new file mode 100644 index 00000000..dbb1b55b --- /dev/null +++ b/tests/fixtures/prototype_pollution/javascript/unsafe_set_value.js @@ -0,0 +1,14 @@ +// Unsafe: `set-value` standalone helper (CVE-2019-10747 / CVE-2021-23440) +// invoked with attacker-controlled key and value. A tainted key of +// `__proto__.polluted` mutates Object.prototype. Inline `req.body.*` +// member access at the gated arg position must seed taint correctly — +// regression guard for the bare-callee gate-text-derivation fix. +const setValue = require('set-value'); + +function handler(req, res) { + const target = {}; + setValue(target, req.body.key, req.body.value); + res.json(target); +} + +module.exports = handler; diff --git a/tests/fixtures/prototype_pollution/python/unsafe_dict_update.py b/tests/fixtures/prototype_pollution/python/unsafe_dict_update.py new file mode 100644 index 00000000..40fc0937 --- /dev/null +++ b/tests/fixtures/prototype_pollution/python/unsafe_dict_update.py @@ -0,0 +1,26 @@ +# Unsafe: tainted JSON body merged into target dict via the +# `dict.update(target, src)` class-method form, the canonical Python +# prototype-pollution attack shape (real-world CVE pattern: configuration +# / namespace dicts that thread user input through framework merges). +# Opt-in via NYX_PYTHON_PROTO_POLLUTION=1. +import json +from flask import request + + +def handler(): + body = json.loads(request.get_data()) + target = {} + dict.update(target, body) + return target + + +def handler_obj_dict(): + # Instance-attribute pollution via `__dict__.update(src)`. + body = json.loads(request.get_data()) + + class Cfg: + pass + + obj = Cfg() + obj.__dict__.update(body) + return obj diff --git a/tests/fixtures/prototype_pollution/typescript/safe_bare_extend_class.ts b/tests/fixtures/prototype_pollution/typescript/safe_bare_extend_class.ts new file mode 100644 index 00000000..a8658da5 --- /dev/null +++ b/tests/fixtures/prototype_pollution/typescript/safe_bare_extend_class.ts @@ -0,0 +1,14 @@ +// Safe: Backbone-style class extension in TS shares the `extend` suffix +// but passes an object literal as arg 0, never the literal `true` deep +// flag. `LiteralOnly` activation suppresses the finding. +import * as Backbone from 'backbone'; + +export const UserModel = Backbone.Model.extend({ + defaults: { name: '', id: 0 }, + initialize: function () { + (this as unknown as { set: (k: string, v: unknown) => void }).set( + 'createdAt', + Date.now(), + ); + }, +}); diff --git a/tests/fixtures/prototype_pollution/typescript/safe_lodash_merge_const.ts b/tests/fixtures/prototype_pollution/typescript/safe_lodash_merge_const.ts new file mode 100644 index 00000000..8bd3af1a --- /dev/null +++ b/tests/fixtures/prototype_pollution/typescript/safe_lodash_merge_const.ts @@ -0,0 +1,8 @@ +// Safe: lodash `_.merge` invoked with a constant-source object. +import * as _ from 'lodash'; + +export function build(): any { + const target: any = {}; + _.merge(target, { a: 1, b: 2 }); + return target; +} diff --git a/tests/fixtures/prototype_pollution/typescript/safe_object_assign_const.ts b/tests/fixtures/prototype_pollution/typescript/safe_object_assign_const.ts new file mode 100644 index 00000000..f3537deb --- /dev/null +++ b/tests/fixtures/prototype_pollution/typescript/safe_object_assign_const.ts @@ -0,0 +1,7 @@ +// Safe: Object.assign with a constant-source object literal. No taint +// reaches the merge so PROTOTYPE_POLLUTION does not fire. +export function build(): Record { + const target: Record = {}; + Object.assign(target, { x: 1, y: 2 }); + return target; +} diff --git a/tests/fixtures/prototype_pollution/typescript/unsafe_bare_extend_deep.ts b/tests/fixtures/prototype_pollution/typescript/unsafe_bare_extend_deep.ts new file mode 100644 index 00000000..3ab70f55 --- /dev/null +++ b/tests/fixtures/prototype_pollution/typescript/unsafe_bare_extend_deep.ts @@ -0,0 +1,12 @@ +// Unsafe: jQuery's deep-merge `extend` imported as a bound name in TS. +// `extend(true, target, src)` with `req.body` as a tainted source rewrites +// `Object.prototype` via `__proto__` keys. PROTOTYPE_POLLUTION fires via +// the `LiteralOnly` gate keyed on the literal `true` deep-flag at arg 0. +import { extend } from 'jquery'; +import type { Request, Response } from 'express'; + +export function handler(req: Request, res: Response): void { + const target: Record = {}; + extend(true, target, req.body); + res.json(target); +} diff --git a/tests/fixtures/prototype_pollution/typescript/unsafe_lodash_merge.ts b/tests/fixtures/prototype_pollution/typescript/unsafe_lodash_merge.ts new file mode 100644 index 00000000..f3a41dee --- /dev/null +++ b/tests/fixtures/prototype_pollution/typescript/unsafe_lodash_merge.ts @@ -0,0 +1,8 @@ +// Unsafe: lodash `_.merge` invoked with attacker-controlled `req.body`. +import * as _ from 'lodash'; + +export function handler(req: any, res: any): void { + const target: any = {}; + _.merge(target, req.body); + res.json(target); +} diff --git a/tests/fixtures/prototype_pollution/typescript/unsafe_object_assign.ts b/tests/fixtures/prototype_pollution/typescript/unsafe_object_assign.ts new file mode 100644 index 00000000..55c63a92 --- /dev/null +++ b/tests/fixtures/prototype_pollution/typescript/unsafe_object_assign.ts @@ -0,0 +1,8 @@ +// Unsafe: Object.assign with attacker-controlled `req.body` source. +import type { Request, Response } from "express"; + +export function handler(req: Request, res: Response): void { + const target: Record = {}; + Object.assign(target, req.body); + res.json(target); +} diff --git a/tests/fixtures/real_world/javascript/taint/express_redirect.expect.json b/tests/fixtures/real_world/javascript/taint/express_redirect.expect.json index a99a3edf..c758f57f 100644 --- a/tests/fixtures/real_world/javascript/taint/express_redirect.expect.json +++ b/tests/fixtures/real_world/javascript/taint/express_redirect.expect.json @@ -4,7 +4,7 @@ "modes": ["full"], "expected": [ { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [3, 7], @@ -12,7 +12,7 @@ "notes": "req.query.url flows directly into res.redirect — open redirect" }, { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [9, 13], diff --git a/tests/fixtures/real_world/javascript/taint/open_redirect.expect.json b/tests/fixtures/real_world/javascript/taint/open_redirect.expect.json index 5681223c..d4de8649 100644 --- a/tests/fixtures/real_world/javascript/taint/open_redirect.expect.json +++ b/tests/fixtures/real_world/javascript/taint/open_redirect.expect.json @@ -4,7 +4,7 @@ "modes": ["full"], "expected": [ { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [1, 3], @@ -12,7 +12,7 @@ "notes": "location.hash flows directly into location.assign — open redirect" }, { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [5, 6], diff --git a/tests/fixtures/real_world/javascript/taint/open_redirect_unsafe.expect.json b/tests/fixtures/real_world/javascript/taint/open_redirect_unsafe.expect.json index cefb582a..400ac298 100644 --- a/tests/fixtures/real_world/javascript/taint/open_redirect_unsafe.expect.json +++ b/tests/fixtures/real_world/javascript/taint/open_redirect_unsafe.expect.json @@ -4,7 +4,7 @@ "modes": ["full"], "expected": [ { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [4, 7], diff --git a/tests/fixtures/real_world/ruby/taint/rails_redirect.expect.json b/tests/fixtures/real_world/ruby/taint/rails_redirect.expect.json index 3e0d30b3..3703e4df 100644 --- a/tests/fixtures/real_world/ruby/taint/rails_redirect.expect.json +++ b/tests/fixtures/real_world/ruby/taint/rails_redirect.expect.json @@ -11,7 +11,7 @@ ], "expected": [ { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [ diff --git a/tests/fixtures/real_world/typescript/taint/express_redirect.expect.json b/tests/fixtures/real_world/typescript/taint/express_redirect.expect.json index 15541451..df5c2e89 100644 --- a/tests/fixtures/real_world/typescript/taint/express_redirect.expect.json +++ b/tests/fixtures/real_world/typescript/taint/express_redirect.expect.json @@ -4,7 +4,7 @@ "modes": ["full"], "expected": [ { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [3, 7], @@ -12,7 +12,7 @@ "notes": "req.query.url flows directly into res.redirect — open redirect" }, { - "rule_id": "taint-unsanitised-flow", + "rule_id": "taint-open-redirect", "severity": null, "must_match": true, "line_range": [9, 13], diff --git a/tests/fixtures/rust_framework_rules/expectations.json b/tests/fixtures/rust_framework_rules/expectations.json index 2dfea892..a3deff72 100644 --- a/tests/fixtures/rust_framework_rules/expectations.json +++ b/tests/fixtures/rust_framework_rules/expectations.json @@ -1,6 +1,7 @@ { "required_findings": [ - { "id_prefix": "taint-unsanitised-flow", "min_count": 4 } + { "id_prefix": "taint-unsanitised-flow", "min_count": 3 }, + { "id_prefix": "taint-open-redirect", "min_count": 1 } ], "forbidden_findings": [], "noise_budget": { diff --git a/tests/fixtures/ssti/go/safe_template_constant.go b/tests/fixtures/ssti/go/safe_template_constant.go new file mode 100644 index 00000000..7ab692d8 --- /dev/null +++ b/tests/fixtures/ssti/go/safe_template_constant.go @@ -0,0 +1,20 @@ +// Safe: text/template parsed from a constant source string; user input +// flows into the data argument of `Execute`, which is rendered via the +// template's escape policy (not as source). + +package ssti + +import ( + "net/http" + "text/template" +) + +func HandlerSafe(w http.ResponseWriter, r *http.Request) { + name := r.URL.Query().Get("name") + tpl, err := template.New("x").Parse("Hello, {{.Name}}") + if err != nil { + http.Error(w, err.Error(), 500) + return + } + tpl.Execute(w, struct{ Name string }{Name: name}) +} diff --git a/tests/fixtures/ssti/go/safe_template_parsefiles.go b/tests/fixtures/ssti/go/safe_template_parsefiles.go new file mode 100644 index 00000000..eb0494e2 --- /dev/null +++ b/tests/fixtures/ssti/go/safe_template_parsefiles.go @@ -0,0 +1,17 @@ +// Safe-template-var: html/template loaded from disk via `ParseFiles` +// (path-traversal class, not SSTI). User input reaches the data arg of +// Execute but the template body is constant. + +package ssti + +import ( + "net/http" + + "html/template" +) + +func HandlerParseFiles(w http.ResponseWriter, r *http.Request) { + name := r.URL.Query().Get("name") + tpl := template.Must(template.ParseFiles("greeting.tmpl")) + tpl.Execute(w, struct{ Name string }{Name: name}) +} diff --git a/tests/fixtures/ssti/go/unsafe_template_parse.go b/tests/fixtures/ssti/go/unsafe_template_parse.go new file mode 100644 index 00000000..eccefa82 --- /dev/null +++ b/tests/fixtures/ssti/go/unsafe_template_parse.go @@ -0,0 +1,21 @@ +// Unsafe: text/template `template.New("x").Parse(src)` where src is +// taken from a request query parameter. Tainted template source = +// SSTI; html/template's auto-escaping applies during Execute, not Parse, +// so a tainted source still yields template injection. + +package ssti + +import ( + "net/http" + "text/template" +) + +func Handler(w http.ResponseWriter, r *http.Request) { + src := r.URL.Query().Get("template") + tpl, err := template.New("x").Parse(src) + if err != nil { + http.Error(w, err.Error(), 500) + return + } + tpl.Execute(w, nil) +} diff --git a/tests/fixtures/ssti/java/SafeFreemarkerConstant.java b/tests/fixtures/ssti/java/SafeFreemarkerConstant.java new file mode 100644 index 00000000..7554dc9f --- /dev/null +++ b/tests/fixtures/ssti/java/SafeFreemarkerConstant.java @@ -0,0 +1,18 @@ +// Safe: Velocity.evaluate receives a constant template source string. +// The user-controlled value is bound as a context *variable* (data), +// which Velocity renders via its escape policy — not as template source. + +import org.apache.velocity.VelocityContext; +import org.apache.velocity.app.Velocity; +import java.io.StringWriter; +import javax.servlet.http.HttpServletRequest; + +public class SafeFreemarkerConstant { + public String render(HttpServletRequest req) throws Exception { + VelocityContext ctx = new VelocityContext(); + ctx.put("name", req.getParameter("name")); + StringWriter out = new StringWriter(); + Velocity.evaluate(ctx, out, "greeting", "Hello, $name"); + return out.toString(); + } +} diff --git a/tests/fixtures/ssti/java/UnsafeFreemarkerProcess.java b/tests/fixtures/ssti/java/UnsafeFreemarkerProcess.java new file mode 100644 index 00000000..cd37a88b --- /dev/null +++ b/tests/fixtures/ssti/java/UnsafeFreemarkerProcess.java @@ -0,0 +1,27 @@ +// Unsafe: Apache FreeMarker constructor takes a tainted template *source* +// string (the second arg to `new Template(name, reader, cfg)` is read +// once into the compiled body), then `tpl.process(model, out)` renders +// it. Without `TypeKind::Template`, idiomatic `Template tpl = new +// Template(...); tpl.process(...)` shapes do not type-qualify +// `tpl.process` to `Template.process`, so the existing flat SSTI rule +// never fires. +import freemarker.template.Configuration; +import freemarker.template.Template; +import java.io.StringReader; +import java.io.StringWriter; +import java.util.HashMap; +import java.util.Map; +import javax.servlet.http.HttpServletRequest; + +public class UnsafeFreemarkerProcess { + public String render(HttpServletRequest req) throws Exception { + String src = req.getParameter("template"); + Configuration cfg = new Configuration(Configuration.VERSION_2_3_31); + Template tpl = new Template("user", new StringReader(src), cfg); + Map model = new HashMap<>(); + model.put("user", req.getParameter("name")); + StringWriter out = new StringWriter(); + tpl.process(model, out); + return out.toString(); + } +} diff --git a/tests/fixtures/ssti/java/UnsafeFreemarkerTemplate.java b/tests/fixtures/ssti/java/UnsafeFreemarkerTemplate.java new file mode 100644 index 00000000..29aec75f --- /dev/null +++ b/tests/fixtures/ssti/java/UnsafeFreemarkerTemplate.java @@ -0,0 +1,20 @@ +// Unsafe: Apache Velocity `Velocity.evaluate(ctx, out, "tag", src)` parses +// `src` as an inline template and renders it in one call. When `src` is +// taken from a request parameter, this is direct SSTI. Static-method +// shape ensures the chain text is `Velocity.evaluate`, matching the +// class-qualified Java SSTI rule without needing receiver type inference. + +import org.apache.velocity.VelocityContext; +import org.apache.velocity.app.Velocity; +import java.io.StringWriter; +import javax.servlet.http.HttpServletRequest; + +public class UnsafeFreemarkerTemplate { + public String render(HttpServletRequest req) throws Exception { + String src = req.getParameter("template"); + VelocityContext ctx = new VelocityContext(); + StringWriter out = new StringWriter(); + Velocity.evaluate(ctx, out, "user-template", src); + return out.toString(); + } +} diff --git a/tests/fixtures/ssti/javascript/safe_handlebars_constant.js b/tests/fixtures/ssti/javascript/safe_handlebars_constant.js new file mode 100644 index 00000000..9477657e --- /dev/null +++ b/tests/fixtures/ssti/javascript/safe_handlebars_constant.js @@ -0,0 +1,11 @@ +// Safe: Handlebars.compile receives a constant template source string. +// Variables provided at render time are not template source and do not +// activate SSTI. +const Handlebars = require('handlebars'); + +function handler(req, res) { + const compiled = Handlebars.compile('Hello, {{name}}'); + res.send(compiled({ name: req.query.name })); +} + +module.exports = handler; diff --git a/tests/fixtures/ssti/javascript/safe_nunjucks_render_string.js b/tests/fixtures/ssti/javascript/safe_nunjucks_render_string.js new file mode 100644 index 00000000..25ccbd88 --- /dev/null +++ b/tests/fixtures/ssti/javascript/safe_nunjucks_render_string.js @@ -0,0 +1,13 @@ +// Safe-template-var: nunjucks.renderString gets a *constant* template +// source; only the data context (arg 1) carries user input. Per the +// gated SSTI classifier (payload_args=[0]), this must NOT fire. +const nunjucks = require('nunjucks'); + +function handler(req, res) { + const html = nunjucks.renderString('Hello, {{ name }}', { + name: req.query.name, + }); + res.send(html); +} + +module.exports = handler; diff --git a/tests/fixtures/ssti/javascript/unsafe_handlebars_compile.js b/tests/fixtures/ssti/javascript/unsafe_handlebars_compile.js new file mode 100644 index 00000000..6d8dab40 --- /dev/null +++ b/tests/fixtures/ssti/javascript/unsafe_handlebars_compile.js @@ -0,0 +1,11 @@ +// Unsafe: Handlebars.compile receives a template *source* string built from +// req.body. SSTI fires on the source argument. +const Handlebars = require('handlebars'); + +function handler(req, res) { + const tmpl = req.body.template; + const compiled = Handlebars.compile(tmpl); + res.send(compiled({})); +} + +module.exports = handler; diff --git a/tests/fixtures/ssti/javascript/unsafe_nunjucks_render_string.js b/tests/fixtures/ssti/javascript/unsafe_nunjucks_render_string.js new file mode 100644 index 00000000..d5dd2a48 --- /dev/null +++ b/tests/fixtures/ssti/javascript/unsafe_nunjucks_render_string.js @@ -0,0 +1,11 @@ +// Unsafe: nunjucks.renderString receives a tainted template *source* +// string (arg 0) built from req.body; SSTI fires on the source argument. +const nunjucks = require('nunjucks'); + +function handler(req, res) { + const src = req.body.template; + const html = nunjucks.renderString(src, { user: 'anon' }); + res.send(html); +} + +module.exports = handler; diff --git a/tests/fixtures/ssti/php/safe_smarty_file_fetch.php b/tests/fixtures/ssti/php/safe_smarty_file_fetch.php new file mode 100644 index 00000000..5d3a8f9a --- /dev/null +++ b/tests/fixtures/ssti/php/safe_smarty_file_fetch.php @@ -0,0 +1,11 @@ +fetch('page.tpl')` uses the bare-file resource (no +// `string:` prefix), so the gated Smarty SSTI rule does not activate. +// Variables assigned via assign() carry user input but flow into a file- +// loaded template, not into a source string. + +function handler() { + $smarty = new \Smarty(); + $smarty->assign('name', $_GET['name']); + return $smarty->fetch('page.tpl'); +} diff --git a/tests/fixtures/ssti/php/safe_twig_constant.php b/tests/fixtures/ssti/php/safe_twig_constant.php new file mode 100644 index 00000000..6ee21621 --- /dev/null +++ b/tests/fixtures/ssti/php/safe_twig_constant.php @@ -0,0 +1,10 @@ +createTemplate('Hello, {{ name }}'); + return $tpl->render(['name' => $_GET['name']]); +} diff --git a/tests/fixtures/ssti/php/safe_twig_template_var.php b/tests/fixtures/ssti/php/safe_twig_template_var.php new file mode 100644 index 00000000..f6604871 --- /dev/null +++ b/tests/fixtures/ssti/php/safe_twig_template_var.php @@ -0,0 +1,11 @@ +render('greeting.html.twig', ['name' => $_GET['name']]); +} diff --git a/tests/fixtures/ssti/php/unsafe_smarty_string_fetch.php b/tests/fixtures/ssti/php/unsafe_smarty_string_fetch.php new file mode 100644 index 00000000..9e710af8 --- /dev/null +++ b/tests/fixtures/ssti/php/unsafe_smarty_string_fetch.php @@ -0,0 +1,9 @@ +fetch("string:" . $src)` parses the inline template +// source via the `string:` resource prefix. Tainted $src yields SSTI. + +function handler() { + $src = $_GET['template']; + $smarty = new \Smarty(); + return $smarty->fetch("string:" . $src); +} diff --git a/tests/fixtures/ssti/php/unsafe_twig_create_template.php b/tests/fixtures/ssti/php/unsafe_twig_create_template.php new file mode 100644 index 00000000..17a3b05c --- /dev/null +++ b/tests/fixtures/ssti/php/unsafe_twig_create_template.php @@ -0,0 +1,10 @@ +createTemplate($src); + return $tpl->render(['user' => 'anon']); +} diff --git a/tests/fixtures/ssti/python/safe_jinja_constant.py b/tests/fixtures/ssti/python/safe_jinja_constant.py new file mode 100644 index 00000000..205cc09b --- /dev/null +++ b/tests/fixtures/ssti/python/safe_jinja_constant.py @@ -0,0 +1,9 @@ +# Safe: jinja2.Template receives a constant template source. Variables +# passed at render time are not template source and do not activate SSTI. +from jinja2 import Template +from flask import request + + +def handler(): + t = Template("Hello, {{ name }}") + return t.render(name=request.args.get("name")) diff --git a/tests/fixtures/ssti/python/safe_mako_lookup_constant.py b/tests/fixtures/ssti/python/safe_mako_lookup_constant.py new file mode 100644 index 00000000..413ddfd5 --- /dev/null +++ b/tests/fixtures/ssti/python/safe_mako_lookup_constant.py @@ -0,0 +1,9 @@ +# Safe: Mako TemplateLookup.get_template receives a literal template name. +# No tainted flow into the loader-path argument, no SSTI. +from mako.lookup import TemplateLookup + + +def handler(): + lookup = TemplateLookup(directories=["/srv/templates"]) + template = lookup.get_template("home.mako") + return template.render(user="anon") diff --git a/tests/fixtures/ssti/python/safe_render_template_var.py b/tests/fixtures/ssti/python/safe_render_template_var.py new file mode 100644 index 00000000..51e8c149 --- /dev/null +++ b/tests/fixtures/ssti/python/safe_render_template_var.py @@ -0,0 +1,8 @@ +# Safe-template-var: Flask `render_template("file.html", **vars)`. The +# first arg is a *file path* (constant), variables carry user input but +# never become template source. Must NOT fire SSTI. +from flask import render_template, request + + +def handler(): + return render_template("greeting.html", name=request.args.get("name")) diff --git a/tests/fixtures/ssti/python/unsafe_jinja_compile_expression.py b/tests/fixtures/ssti/python/unsafe_jinja_compile_expression.py new file mode 100644 index 00000000..90dbdaa9 --- /dev/null +++ b/tests/fixtures/ssti/python/unsafe_jinja_compile_expression.py @@ -0,0 +1,11 @@ +# Unsafe: jinja2 Environment.compile_expression accepts an arbitrary +# expression source; tainted input compiles into an executable callable. +from jinja2 import Environment +from flask import request + + +def handler(): + env = Environment() + expr_src = request.form["expr"] + expr = env.compile_expression(expr_src) + return str(expr({})) diff --git a/tests/fixtures/ssti/python/unsafe_jinja_get_template.py b/tests/fixtures/ssti/python/unsafe_jinja_get_template.py new file mode 100644 index 00000000..5d3ca234 --- /dev/null +++ b/tests/fixtures/ssti/python/unsafe_jinja_get_template.py @@ -0,0 +1,13 @@ +# Unsafe: Jinja2 Environment.get_template receives an attacker-controlled +# template name. Tainted name lets the attacker swap the resolved template, +# yielding arbitrary template execution. Modeled as SSTI on the loader-path +# argument. +from jinja2 import Environment, FileSystemLoader +from flask import request + + +def handler(): + name = request.args.get("page") + env = Environment(loader=FileSystemLoader("/srv/templates")) + template = env.get_template(name) + return template.render(user="anon") diff --git a/tests/fixtures/ssti/python/unsafe_jinja_template.py b/tests/fixtures/ssti/python/unsafe_jinja_template.py new file mode 100644 index 00000000..21c981d1 --- /dev/null +++ b/tests/fixtures/ssti/python/unsafe_jinja_template.py @@ -0,0 +1,10 @@ +# Unsafe: jinja2.Template receives a template *source* string built from +# request data. SSTI fires on the source argument. +from jinja2 import Template +from flask import request + + +def handler(): + src = request.form["template"] + t = Template(src) + return t.render(user="anon") diff --git a/tests/fixtures/ssti/python/unsafe_mako_lookup_get_template.py b/tests/fixtures/ssti/python/unsafe_mako_lookup_get_template.py new file mode 100644 index 00000000..c5a8c13f --- /dev/null +++ b/tests/fixtures/ssti/python/unsafe_mako_lookup_get_template.py @@ -0,0 +1,13 @@ +# Unsafe: Mako TemplateLookup.get_template receives an attacker-controlled +# template name. A tainted name lets the attacker pick which file under the +# loader directory becomes the rendered template — arbitrary template +# execution, modeled as SSTI. +from mako.lookup import TemplateLookup +from flask import request + + +def handler(): + name = request.args.get("name") + lookup = TemplateLookup(directories=["/srv/templates"]) + template = lookup.get_template(name) + return template.render(user="anon") diff --git a/tests/fixtures/ssti/ruby/safe_erb_constant.rb b/tests/fixtures/ssti/ruby/safe_erb_constant.rb new file mode 100644 index 00000000..b7f8e564 --- /dev/null +++ b/tests/fixtures/ssti/ruby/safe_erb_constant.rb @@ -0,0 +1,10 @@ +# Safe: ERB.new receives a constant template source. Local variables +# bound through `binding` may carry user input but do not activate SSTI. + +require "erb" + +def handler(params) + name = params[:name] + template = ERB.new("Hello, <%= name %>") + template.result(binding) +end diff --git a/tests/fixtures/ssti/ruby/safe_erb_template_var.rb b/tests/fixtures/ssti/ruby/safe_erb_template_var.rb new file mode 100644 index 00000000..8e68b293 --- /dev/null +++ b/tests/fixtures/ssti/ruby/safe_erb_template_var.rb @@ -0,0 +1,8 @@ +# Safe-template-var: render an on-disk template via Rails-style +# `render :template, locals: {...}`. The template name is a constant +# symbol; the locals carry user input but flow into a file-loaded +# template, not into a source string. + +def handler(params) + render template: "users/show", locals: { name: params[:name] } +end diff --git a/tests/fixtures/ssti/ruby/unsafe_erb_new.rb b/tests/fixtures/ssti/ruby/unsafe_erb_new.rb new file mode 100644 index 00000000..60bcdf3a --- /dev/null +++ b/tests/fixtures/ssti/ruby/unsafe_erb_new.rb @@ -0,0 +1,10 @@ +# Unsafe: ERB.new receives a tainted template *source* string from +# request params; SSTI fires on the source argument. + +require "erb" + +def handler(params) + src = params[:template] + template = ERB.new(src) + template.result(binding) +end diff --git a/tests/fixtures/ssti/typescript/safe_handlebars_constant.ts b/tests/fixtures/ssti/typescript/safe_handlebars_constant.ts new file mode 100644 index 00000000..127bd8a2 --- /dev/null +++ b/tests/fixtures/ssti/typescript/safe_handlebars_constant.ts @@ -0,0 +1,7 @@ +// Safe: Handlebars.compile receives a constant template source. +import * as Handlebars from 'handlebars'; + +export function handler(req: any, res: any): void { + const compiled = Handlebars.compile('Hello, {{name}}'); + res.send(compiled({ name: req.query.name })); +} diff --git a/tests/fixtures/ssti/typescript/safe_nunjucks_render_string.ts b/tests/fixtures/ssti/typescript/safe_nunjucks_render_string.ts new file mode 100644 index 00000000..bdf51867 --- /dev/null +++ b/tests/fixtures/ssti/typescript/safe_nunjucks_render_string.ts @@ -0,0 +1,12 @@ +// Safe-template-var: nunjucks.renderString with constant template +// source; user-controlled context only. Gated SSTI classifier must NOT +// fire (payload_args=[0]). +import nunjucks from 'nunjucks'; +import type { Request, Response } from 'express'; + +export function handler(req: Request, res: Response): void { + const html = nunjucks.renderString('Hello, {{ name }}', { + name: req.query.name, + }); + res.send(html); +} diff --git a/tests/fixtures/ssti/typescript/unsafe_handlebars_compile.ts b/tests/fixtures/ssti/typescript/unsafe_handlebars_compile.ts new file mode 100644 index 00000000..80931c3c --- /dev/null +++ b/tests/fixtures/ssti/typescript/unsafe_handlebars_compile.ts @@ -0,0 +1,8 @@ +// Unsafe: Handlebars.compile receives a tainted template source. +import * as Handlebars from 'handlebars'; + +export function handler(req: any, res: any): void { + const tmpl: string = req.body.template; + const compiled = Handlebars.compile(tmpl); + res.send(compiled({})); +} diff --git a/tests/fixtures/ssti/typescript/unsafe_nunjucks_render_string.ts b/tests/fixtures/ssti/typescript/unsafe_nunjucks_render_string.ts new file mode 100644 index 00000000..d2d59743 --- /dev/null +++ b/tests/fixtures/ssti/typescript/unsafe_nunjucks_render_string.ts @@ -0,0 +1,10 @@ +// Unsafe: nunjucks.renderString receives a tainted template source +// from req.body; SSTI fires on the source argument. +import nunjucks from 'nunjucks'; +import type { Request, Response } from 'express'; + +export function handler(req: Request, res: Response): void { + const src: string = req.body.template; + const html: string = nunjucks.renderString(src, { user: 'anon' }); + res.send(html); +} diff --git a/tests/fixtures/xpath_injection/c/baseline_constant_xpath.c b/tests/fixtures/xpath_injection/c/baseline_constant_xpath.c new file mode 100644 index 00000000..83f87481 --- /dev/null +++ b/tests/fixtures/xpath_injection/c/baseline_constant_xpath.c @@ -0,0 +1,7 @@ +/* Baseline: expression is a compile-time constant. No taint reaches + * xmlXPathEvalExpression so no XPATH_INJECTION finding fires. */ +#include + +xmlXPathObjectPtr do_lookup(xmlXPathContextPtr ctx) { + return xmlXPathEvalExpression((xmlChar *)"//user[@role='admin']", ctx); +} diff --git a/tests/fixtures/xpath_injection/c/safe_xpath_query.c b/tests/fixtures/xpath_injection/c/safe_xpath_query.c new file mode 100644 index 00000000..a1fe93b2 --- /dev/null +++ b/tests/fixtures/xpath_injection/c/safe_xpath_query.c @@ -0,0 +1,13 @@ +/* Safe: project-local sanitize_xpath (matches the developer-named + * `sanitize_*` Sanitizer rule) clears caps on the user value before it + * reaches xmlXPathEvalExpression. */ +#include +#include + +extern char *sanitize_xpath(const char *raw); + +xmlXPathObjectPtr do_lookup(xmlXPathContextPtr ctx) { + char *user_expr = getenv("USER_EXPR"); + char *safe = sanitize_xpath(user_expr); + return xmlXPathEvalExpression((xmlChar *)safe, ctx); +} diff --git a/tests/fixtures/xpath_injection/c/unsafe_xpath_query.c b/tests/fixtures/xpath_injection/c/unsafe_xpath_query.c new file mode 100644 index 00000000..52769d70 --- /dev/null +++ b/tests/fixtures/xpath_injection/c/unsafe_xpath_query.c @@ -0,0 +1,9 @@ +/* Unsafe: tainted env-string passed straight as the XPath expression to + * xmlXPathEvalExpression. XPATH_INJECTION fires on the expression arg. */ +#include +#include + +xmlXPathObjectPtr do_lookup(xmlXPathContextPtr ctx) { + char *user_expr = getenv("USER_EXPR"); + return xmlXPathEvalExpression((xmlChar *)user_expr, ctx); +} diff --git a/tests/fixtures/xpath_injection/cpp/baseline_constant_xpath.cpp b/tests/fixtures/xpath_injection/cpp/baseline_constant_xpath.cpp new file mode 100644 index 00000000..2a515939 --- /dev/null +++ b/tests/fixtures/xpath_injection/cpp/baseline_constant_xpath.cpp @@ -0,0 +1,7 @@ +// Baseline: expression is a compile-time constant. No taint reaches +// xmlXPathEvalExpression so no XPATH_INJECTION finding fires. +#include + +xmlXPathObjectPtr do_lookup(xmlXPathContextPtr ctx) { + return xmlXPathEvalExpression((xmlChar *)"//user[@role='admin']", ctx); +} diff --git a/tests/fixtures/xpath_injection/cpp/safe_xpath_query.cpp b/tests/fixtures/xpath_injection/cpp/safe_xpath_query.cpp new file mode 100644 index 00000000..af4b8608 --- /dev/null +++ b/tests/fixtures/xpath_injection/cpp/safe_xpath_query.cpp @@ -0,0 +1,13 @@ +// Safe: project-local sanitize_xpath (matches the developer-named +// `sanitize_*` Sanitizer rule) clears caps on the user value before it +// reaches xmlXPathEvalExpression. +#include +#include + +extern "C" char *sanitize_xpath(const char *raw); + +xmlXPathObjectPtr do_lookup(xmlXPathContextPtr ctx) { + char *user_expr = std::getenv("USER_EXPR"); + char *safe = sanitize_xpath(user_expr); + return xmlXPathEvalExpression((xmlChar *)safe, ctx); +} diff --git a/tests/fixtures/xpath_injection/cpp/unsafe_xpath_query.cpp b/tests/fixtures/xpath_injection/cpp/unsafe_xpath_query.cpp new file mode 100644 index 00000000..344da2d6 --- /dev/null +++ b/tests/fixtures/xpath_injection/cpp/unsafe_xpath_query.cpp @@ -0,0 +1,9 @@ +// Unsafe: tainted env-string passed straight as the XPath expression to +// xmlXPathEvalExpression. XPATH_INJECTION fires on the expression arg. +#include +#include + +xmlXPathObjectPtr do_lookup(xmlXPathContextPtr ctx) { + char *user_expr = std::getenv("USER_EXPR"); + return xmlXPathEvalExpression((xmlChar *)user_expr, ctx); +} diff --git a/tests/fixtures/xpath_injection/java/BaselineConstantXpath.java b/tests/fixtures/xpath_injection/java/BaselineConstantXpath.java new file mode 100644 index 00000000..debbf308 --- /dev/null +++ b/tests/fixtures/xpath_injection/java/BaselineConstantXpath.java @@ -0,0 +1,14 @@ +// Baseline: expression is a compile-time constant. No taint reaches +// `xpath.evaluate` so no XPATH_INJECTION finding fires. +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathConstants; +import javax.xml.xpath.XPathFactory; +import org.w3c.dom.Document; +import org.w3c.dom.NodeList; + +public class BaselineConstantXpath { + public NodeList lookup(Document doc) throws Exception { + XPath xpath = XPathFactory.newInstance().newXPath(); + return (NodeList) xpath.evaluate("//user[@role='admin']", doc, XPathConstants.NODESET); + } +} diff --git a/tests/fixtures/xpath_injection/java/ParameterizedXpath.java b/tests/fixtures/xpath_injection/java/ParameterizedXpath.java new file mode 100644 index 00000000..5e60da63 --- /dev/null +++ b/tests/fixtures/xpath_injection/java/ParameterizedXpath.java @@ -0,0 +1,26 @@ +// Parameterised XPath: user input is bound via XPathVariableResolver +// (RFC-correct parameterisation) and the expression itself is a compile-time +// constant. Even with the variable-resolver call preceding the evaluate(), +// the expression argument is not taint-bearing, so no XPATH_INJECTION +// finding fires. +import javax.xml.namespace.QName; +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathConstants; +import javax.xml.xpath.XPathFactory; +import javax.xml.xpath.XPathVariableResolver; +import javax.servlet.http.HttpServletRequest; +import org.w3c.dom.Document; +import org.w3c.dom.NodeList; + +public class ParameterizedXpath { + public NodeList lookup(HttpServletRequest req, Document doc) throws Exception { + final String user = req.getParameter("user"); + XPath xpath = XPathFactory.newInstance().newXPath(); + xpath.setXPathVariableResolver(new XPathVariableResolver() { + public Object resolveVariable(QName name) { + return user; + } + }); + return (NodeList) xpath.evaluate("//user[name=$u]", doc, XPathConstants.NODESET); + } +} diff --git a/tests/fixtures/xpath_injection/java/SafeXPathQuery.java b/tests/fixtures/xpath_injection/java/SafeXPathQuery.java new file mode 100644 index 00000000..f91de03e --- /dev/null +++ b/tests/fixtures/xpath_injection/java/SafeXPathQuery.java @@ -0,0 +1,23 @@ +// Safe: user-supplied substring routed through the project-local +// `escapeXpath` helper before being concatenated into the XPath expression. +// The sanitizer clears the XPATH_INJECTION cap so the sink does not fire. +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathConstants; +import javax.xml.xpath.XPathFactory; +import javax.servlet.http.HttpServletRequest; +import org.w3c.dom.Document; +import org.w3c.dom.NodeList; + +public class SafeXPathQuery { + public static String escapeXpath(String raw) { + return raw.replace("'", "'"); + } + + public NodeList lookup(HttpServletRequest req, Document doc) throws Exception { + String user = req.getParameter("user"); + String safe = escapeXpath(user); + String expr = "//user[name='" + safe + "']"; + XPath xpath = XPathFactory.newInstance().newXPath(); + return (NodeList) xpath.evaluate(expr, doc, XPathConstants.NODESET); + } +} diff --git a/tests/fixtures/xpath_injection/java/TaintedParameterizedXpath.java b/tests/fixtures/xpath_injection/java/TaintedParameterizedXpath.java new file mode 100644 index 00000000..f1856689 --- /dev/null +++ b/tests/fixtures/xpath_injection/java/TaintedParameterizedXpath.java @@ -0,0 +1,36 @@ +// Tainted-expression-with-resolver: user input flows into the XPath +// expression argument, but the receiver was bound to an +// XPathVariableResolver before the evaluate() call. Phase 03's +// name-only `setXPathVariableResolver` sanitizer rule would not have +// suppressed this (the rule fires on the resolver-binding call, which +// has no flow-tied taint to clear). The receiver-config sidecar in +// `src/ssa/xpath_config.rs` flips `has_resolver` on the bound XPath +// instance and the SSA sink-emission site strips XPATH_INJECTION from +// any later evaluate() on that receiver. +import javax.xml.namespace.QName; +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathConstants; +import javax.xml.xpath.XPathFactory; +import javax.xml.xpath.XPathVariableResolver; +import javax.servlet.http.HttpServletRequest; +import org.w3c.dom.Document; +import org.w3c.dom.NodeList; + +public class TaintedParameterizedXpath { + public NodeList lookup(HttpServletRequest req, Document doc) throws Exception { + final String user = req.getParameter("user"); + XPath xpath = XPathFactory.newInstance().newXPath(); + xpath.setXPathVariableResolver(new XPathVariableResolver() { + public Object resolveVariable(QName name) { + return user; + } + }); + // Tainted expression interpolation: user bypasses the resolver + // and reaches `evaluate` directly. Real-world parameterised + // XPath would use a constant expression with `$u` here, but the + // engineering decision modelled by the sidecar is: a bound + // resolver indicates intended parameterisation, so suppress. + String expr = "//user[name='" + user + "']"; + return (NodeList) xpath.evaluate(expr, doc, XPathConstants.NODESET); + } +} diff --git a/tests/fixtures/xpath_injection/java/UnsafeXPathQuery.java b/tests/fixtures/xpath_injection/java/UnsafeXPathQuery.java new file mode 100644 index 00000000..f1f24de0 --- /dev/null +++ b/tests/fixtures/xpath_injection/java/UnsafeXPathQuery.java @@ -0,0 +1,17 @@ +// Unsafe: attacker-controlled username concatenated into an XPath expression +// passed to XPath.evaluate. The flat matcher catches the qualified call. +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathConstants; +import javax.xml.xpath.XPathFactory; +import javax.servlet.http.HttpServletRequest; +import org.w3c.dom.Document; +import org.w3c.dom.NodeList; + +public class UnsafeXPathQuery { + public NodeList lookup(HttpServletRequest req, Document doc) throws Exception { + String user = req.getParameter("user"); + String expr = "//user[name='" + user + "']"; + XPath xpath = XPathFactory.newInstance().newXPath(); + return (NodeList) xpath.evaluate(expr, doc, XPathConstants.NODESET); + } +} diff --git a/tests/fixtures/xpath_injection/javascript/baseline_constant_xpath.js b/tests/fixtures/xpath_injection/javascript/baseline_constant_xpath.js new file mode 100644 index 00000000..d62674cf --- /dev/null +++ b/tests/fixtures/xpath_injection/javascript/baseline_constant_xpath.js @@ -0,0 +1,12 @@ +// Baseline: expression is a compile-time constant. No taint reaches +// xpath.select so no XPATH_INJECTION finding fires. +const xpath = require('xpath'); +const { DOMParser } = require('xmldom'); + +function lookup(req, res) { + const doc = new DOMParser().parseFromString(''); + const nodes = xpath.select("//user[@role='admin']", doc); + res.json(nodes); +} + +module.exports = lookup; diff --git a/tests/fixtures/xpath_injection/javascript/safe_xpath_query.js b/tests/fixtures/xpath_injection/javascript/safe_xpath_query.js new file mode 100644 index 00000000..9b150e97 --- /dev/null +++ b/tests/fixtures/xpath_injection/javascript/safe_xpath_query.js @@ -0,0 +1,20 @@ +// Safe: user-supplied substring routed through the project-local +// `escapeXpath` helper before concatenation. The sanitizer clears the +// XPATH_INJECTION cap so the sink does not fire. +const xpath = require('xpath'); +const { DOMParser } = require('xmldom'); + +function escapeXpath(raw) { + return raw.replace(/'/g, ''').replace(/"/g, '"'); +} + +function lookup(req, res) { + const doc = new DOMParser().parseFromString(''); + const user = req.query.user; + const safe = escapeXpath(user); + const expr = "//user[name='" + safe + "']"; + const nodes = xpath.select(expr, doc); + res.json(nodes); +} + +module.exports = lookup; diff --git a/tests/fixtures/xpath_injection/javascript/unsafe_xpath_query.js b/tests/fixtures/xpath_injection/javascript/unsafe_xpath_query.js new file mode 100644 index 00000000..53304040 --- /dev/null +++ b/tests/fixtures/xpath_injection/javascript/unsafe_xpath_query.js @@ -0,0 +1,14 @@ +// Unsafe: npm `xpath` package's `select` receives an expression assembled +// from req.query. XPATH_INJECTION fires on the expression argument. +const xpath = require('xpath'); +const { DOMParser } = require('xmldom'); + +function lookup(req, res) { + const doc = new DOMParser().parseFromString(''); + const user = req.query.user; + const expr = "//user[name='" + user + "']"; + const nodes = xpath.select(expr, doc); + res.json(nodes); +} + +module.exports = lookup; diff --git a/tests/fixtures/xpath_injection/php/baseline_constant_xpath.php b/tests/fixtures/xpath_injection/php/baseline_constant_xpath.php new file mode 100644 index 00000000..97b96ba4 --- /dev/null +++ b/tests/fixtures/xpath_injection/php/baseline_constant_xpath.php @@ -0,0 +1,5 @@ +xpath("//user[@role='admin']"); diff --git a/tests/fixtures/xpath_injection/php/safe_xpath_query.php b/tests/fixtures/xpath_injection/php/safe_xpath_query.php new file mode 100644 index 00000000..49bd5bb0 --- /dev/null +++ b/tests/fixtures/xpath_injection/php/safe_xpath_query.php @@ -0,0 +1,13 @@ +xpath($expr); diff --git a/tests/fixtures/xpath_injection/php/unsafe_xpath_query.php b/tests/fixtures/xpath_injection/php/unsafe_xpath_query.php new file mode 100644 index 00000000..48df0877 --- /dev/null +++ b/tests/fixtures/xpath_injection/php/unsafe_xpath_query.php @@ -0,0 +1,8 @@ +xpath($expr); diff --git a/tests/fixtures/xpath_injection/python/baseline_constant_xpath.py b/tests/fixtures/xpath_injection/python/baseline_constant_xpath.py new file mode 100644 index 00000000..16ceb6f8 --- /dev/null +++ b/tests/fixtures/xpath_injection/python/baseline_constant_xpath.py @@ -0,0 +1,8 @@ +# Baseline: expression is a compile-time constant. No taint reaches +# `tree.xpath` so no XPATH_INJECTION finding fires. +from lxml import etree + + +def lookup(): + tree = etree.parse("users.xml") + return tree.xpath("//user[@role='admin']") diff --git a/tests/fixtures/xpath_injection/python/safe_xpath_query.py b/tests/fixtures/xpath_injection/python/safe_xpath_query.py new file mode 100644 index 00000000..f63b2848 --- /dev/null +++ b/tests/fixtures/xpath_injection/python/safe_xpath_query.py @@ -0,0 +1,17 @@ +# Safe: user-supplied substring routed through the project-local +# `escape_xpath` helper before being concatenated into the XPath expression. +# The sanitizer clears the XPATH_INJECTION cap so the sink does not fire. +from lxml import etree +from flask import request + + +def escape_xpath(raw): + return raw.replace("'", "'") + + +def lookup(): + tree = etree.parse("users.xml") + user = request.form["user"] + safe = escape_xpath(user) + expr = "//user[name='" + safe + "']" + return tree.xpath(expr) diff --git a/tests/fixtures/xpath_injection/python/unsafe_xpath_query.py b/tests/fixtures/xpath_injection/python/unsafe_xpath_query.py new file mode 100644 index 00000000..fe5511a0 --- /dev/null +++ b/tests/fixtures/xpath_injection/python/unsafe_xpath_query.py @@ -0,0 +1,12 @@ +# Unsafe: tainted form data concatenated into an XPath expression and passed +# to lxml's `tree.xpath()`. Suffix matching on `xpath` catches the +# bound-receiver call directly. +from lxml import etree +from flask import request + + +def lookup(): + tree = etree.parse("users.xml") + user = request.form["user"] + expr = "//user[name='" + user + "']" + return tree.xpath(expr) diff --git a/tests/fixtures/xpath_injection/ruby/baseline_constant_xpath.rb b/tests/fixtures/xpath_injection/ruby/baseline_constant_xpath.rb new file mode 100644 index 00000000..a884cd03 --- /dev/null +++ b/tests/fixtures/xpath_injection/ruby/baseline_constant_xpath.rb @@ -0,0 +1,8 @@ +# Baseline: expression is a compile-time constant. No taint reaches +# `doc.xpath` so no XPATH_INJECTION finding fires. +require 'nokogiri' + +def lookup + doc = Nokogiri::XML(File.read("users.xml")) + doc.xpath("//user[@role='admin']") +end diff --git a/tests/fixtures/xpath_injection/ruby/safe_xpath_query.rb b/tests/fixtures/xpath_injection/ruby/safe_xpath_query.rb new file mode 100644 index 00000000..e31aa332 --- /dev/null +++ b/tests/fixtures/xpath_injection/ruby/safe_xpath_query.rb @@ -0,0 +1,16 @@ +# Safe: user-supplied substring routed through the project-local +# `escape_xpath` helper before interpolation. The sanitizer clears +# XPATH_INJECTION so the sink does not fire. +require 'nokogiri' + +def escape_xpath(raw) + raw.gsub("'", ''').gsub('"', '"') +end + +def lookup(params) + doc = Nokogiri::XML(File.read("users.xml")) + user = params[:user] + safe = escape_xpath(user) + expr = "//user[name='#{safe}']" + doc.xpath(expr) +end diff --git a/tests/fixtures/xpath_injection/ruby/unsafe_xpath_query.rb b/tests/fixtures/xpath_injection/ruby/unsafe_xpath_query.rb new file mode 100644 index 00000000..c593b467 --- /dev/null +++ b/tests/fixtures/xpath_injection/ruby/unsafe_xpath_query.rb @@ -0,0 +1,11 @@ +# Unsafe: Sinatra params concatenated into an XPath expression and passed to +# Nokogiri's `xpath` method. Suffix matching on `xpath` catches the +# bound-receiver call directly. +require 'nokogiri' + +def lookup(params) + doc = Nokogiri::XML(File.read("users.xml")) + user = params[:user] + expr = "//user[name='#{user}']" + doc.xpath(expr) +end diff --git a/tests/fixtures/xpath_injection/typescript/baseline_constant_xpath.ts b/tests/fixtures/xpath_injection/typescript/baseline_constant_xpath.ts new file mode 100644 index 00000000..055ae06e --- /dev/null +++ b/tests/fixtures/xpath_injection/typescript/baseline_constant_xpath.ts @@ -0,0 +1,10 @@ +// Baseline: expression is a compile-time constant. No taint reaches +// xpath.select so no XPATH_INJECTION finding fires. +import * as xpath from 'xpath'; +import { DOMParser } from 'xmldom'; + +export function lookup(req: any, res: any): void { + const doc = new DOMParser().parseFromString(''); + const nodes = xpath.select("//user[@role='admin']", doc); + res.json(nodes); +} diff --git a/tests/fixtures/xpath_injection/typescript/safe_xpath_query.ts b/tests/fixtures/xpath_injection/typescript/safe_xpath_query.ts new file mode 100644 index 00000000..8d38eb30 --- /dev/null +++ b/tests/fixtures/xpath_injection/typescript/safe_xpath_query.ts @@ -0,0 +1,18 @@ +// Safe: user-supplied substring routed through `escapeXpath` before +// concatenation. The sanitizer clears XPATH_INJECTION so the sink does not +// fire. +import * as xpath from 'xpath'; +import { DOMParser } from 'xmldom'; + +function escapeXpath(raw: string): string { + return raw.replace(/'/g, ''').replace(/"/g, '"'); +} + +export function lookup(req: any, res: any): void { + const doc = new DOMParser().parseFromString(''); + const user: string = req.query.user; + const safe: string = escapeXpath(user); + const expr: string = "//user[name='" + safe + "']"; + const nodes = xpath.select(expr, doc); + res.json(nodes); +} diff --git a/tests/fixtures/xpath_injection/typescript/unsafe_xpath_query.ts b/tests/fixtures/xpath_injection/typescript/unsafe_xpath_query.ts new file mode 100644 index 00000000..e6b5e0f1 --- /dev/null +++ b/tests/fixtures/xpath_injection/typescript/unsafe_xpath_query.ts @@ -0,0 +1,12 @@ +// Unsafe: npm `xpath` package's `select` receives an expression assembled +// from req.query. XPATH_INJECTION fires on the expression argument. +import * as xpath from 'xpath'; +import { DOMParser } from 'xmldom'; + +export function lookup(req: any, res: any): void { + const doc = new DOMParser().parseFromString(''); + const user: string = req.query.user; + const expr: string = "//user[name='" + user + "']"; + const nodes = xpath.select(expr, doc); + res.json(nodes); +} diff --git a/tests/fixtures/xxe/java/IrrelevantXmlCall.java b/tests/fixtures/xxe/java/IrrelevantXmlCall.java new file mode 100644 index 00000000..62e8b2cf --- /dev/null +++ b/tests/fixtures/xxe/java/IrrelevantXmlCall.java @@ -0,0 +1,15 @@ +// Baseline: tainted body flows through a non-parser XML helper +// (StringBuilder concat). No XML parser entry point, no XXE label +// classification. Used to confirm taint-xxe doesn't fire on stray +// XML-adjacent string operations. +import javax.servlet.http.HttpServletRequest; + +public class IrrelevantXmlCall { + public String handle(HttpServletRequest req) { + String body = req.getParameter("xml"); + StringBuilder sb = new StringBuilder(""); + sb.append(body); + sb.append(""); + return sb.toString(); + } +} diff --git a/tests/fixtures/xxe/java/SafeLog4jConfig.java b/tests/fixtures/xxe/java/SafeLog4jConfig.java new file mode 100644 index 00000000..fcf9f443 --- /dev/null +++ b/tests/fixtures/xxe/java/SafeLog4jConfig.java @@ -0,0 +1,23 @@ +// Safe counterpart to UnsafeLog4jConfig.java: DOMConfigurator-style XML +// config loader, hardened by setting FEATURE_SECURE_PROCESSING + the +// disallow-doctype-decl feature on the factory before the builder is +// produced. The xml_config sidecar records the hardening fact on the +// factory's SSA value, propagates it to the builder via +// `newDocumentBuilder()`, and the parse sink suppresses the XXE bit. +import javax.servlet.http.HttpServletRequest; +import javax.xml.XMLConstants; +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import org.w3c.dom.Document; +import java.io.File; + +public class SafeLog4jConfig { + public Document loadConfig(HttpServletRequest req) throws Exception { + String configPath = req.getParameter("config"); + DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); + factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true); + factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); + DocumentBuilder builder = factory.newDocumentBuilder(); + return builder.parse(new File(configPath)); + } +} diff --git a/tests/fixtures/xxe/java/SafeXxe.java b/tests/fixtures/xxe/java/SafeXxe.java new file mode 100644 index 00000000..b0144723 --- /dev/null +++ b/tests/fixtures/xxe/java/SafeXxe.java @@ -0,0 +1,10 @@ +// Safe: no XML parser sink reached, body just stored. Used as a baseline +// to confirm taint-xxe does not fire when the dangerous API is absent. +import javax.servlet.http.HttpServletRequest; + +public class SafeXxe { + public String handle(HttpServletRequest req) { + String body = req.getParameter("xml"); + return body.length() > 0 ? body : "empty"; + } +} diff --git a/tests/fixtures/xxe/java/SafeXxeConfig.java b/tests/fixtures/xxe/java/SafeXxeConfig.java new file mode 100644 index 00000000..3e8f98a1 --- /dev/null +++ b/tests/fixtures/xxe/java/SafeXxeConfig.java @@ -0,0 +1,23 @@ +// Safe: tainted XML routed through a hardened DocumentBuilder. The +// factory is configured with `FEATURE_SECURE_PROCESSING = true` before +// the builder is produced, and the produced builder inherits that +// hardening fact via the SSA xml-parser-config pass. The downstream +// `builder.parse(...)` sink call therefore sees a secure receiver and +// the XXE bit is suppressed. +import javax.servlet.http.HttpServletRequest; +import javax.xml.XMLConstants; +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import org.w3c.dom.Document; +import java.io.ByteArrayInputStream; + +public class SafeXxeConfig { + public Document handle(HttpServletRequest req) throws Exception { + String body = req.getParameter("xml"); + DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); + factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true); + factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); + DocumentBuilder builder = factory.newDocumentBuilder(); + return builder.parse(new ByteArrayInputStream(body.getBytes())); + } +} diff --git a/tests/fixtures/xxe/java/SafeXxePhi.java b/tests/fixtures/xxe/java/SafeXxePhi.java new file mode 100644 index 00000000..19f45828 --- /dev/null +++ b/tests/fixtures/xxe/java/SafeXxePhi.java @@ -0,0 +1,27 @@ +// Safe: parser variable reassigned across a branch; both branches harden +// the receiver before reaching `parse`, so the SSA phi-meet preserves +// the secure_processing fact. Validates Phase 07 acceptance: +// "Config fact correctly survives intra-procedural reassignment of the +// parser variable through SSA phi." +import javax.servlet.http.HttpServletRequest; +import javax.xml.XMLConstants; +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import org.w3c.dom.Document; +import java.io.ByteArrayInputStream; + +public class SafeXxePhi { + public Document handle(HttpServletRequest req, boolean useAlternate) throws Exception { + String body = req.getParameter("xml"); + DocumentBuilderFactory factory; + if (useAlternate) { + factory = DocumentBuilderFactory.newInstance(); + factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true); + } else { + factory = DocumentBuilderFactory.newInstance(); + factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true); + } + DocumentBuilder builder = factory.newDocumentBuilder(); + return builder.parse(new ByteArrayInputStream(body.getBytes())); + } +} diff --git a/tests/fixtures/xxe/java/UnsafeLog4jConfig.java b/tests/fixtures/xxe/java/UnsafeLog4jConfig.java new file mode 100644 index 00000000..884a1ec3 --- /dev/null +++ b/tests/fixtures/xxe/java/UnsafeLog4jConfig.java @@ -0,0 +1,23 @@ +// Unsafe: Log4Shell-shape XXE leg. The DOMConfigurator-style loader +// reads an XML config file path supplied by the user, then parses it +// through DocumentBuilder without enabling FEATURE_SECURE_PROCESSING or +// disallowing DOCTYPE declarations. External entities resolve, giving +// the attacker a file-disclosure / SSRF primitive on the host. Real- +// world precedent: CVE-2022-23305 / CVE-2022-23307 (Log4j 1.x JDBC / +// chainsaw config XXE). Exercises the TypeFacts-tagged builder receiver +// + xml_config sidecar end-to-end: builder is XmlParser-typed, no +// secure-processing fact recorded, parse fires the XXE sink. +import javax.servlet.http.HttpServletRequest; +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import org.w3c.dom.Document; +import java.io.File; + +public class UnsafeLog4jConfig { + public Document loadConfig(HttpServletRequest req) throws Exception { + String configPath = req.getParameter("config"); + DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); + DocumentBuilder builder = factory.newDocumentBuilder(); + return builder.parse(new File(configPath)); + } +} diff --git a/tests/fixtures/xxe/java/UnsafeXxe.java b/tests/fixtures/xxe/java/UnsafeXxe.java new file mode 100644 index 00000000..a9292ad0 --- /dev/null +++ b/tests/fixtures/xxe/java/UnsafeXxe.java @@ -0,0 +1,17 @@ +// Unsafe: tainted XML reaches DocumentBuilder.parse without secure-processing +// configuration. The instance receiver `builder` carries TypeKind::XmlParser +// (Phase 07) so the type-qualified `XmlParser.parse` sink rule fires. +import javax.servlet.http.HttpServletRequest; +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; +import org.w3c.dom.Document; +import java.io.ByteArrayInputStream; + +public class UnsafeXxe { + public Document handle(HttpServletRequest req) throws Exception { + String body = req.getParameter("xml"); + DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); + DocumentBuilder builder = factory.newDocumentBuilder(); + return builder.parse(new ByteArrayInputStream(body.getBytes())); + } +} diff --git a/tests/fixtures/xxe/javascript/irrelevant_xml_call.js b/tests/fixtures/xxe/javascript/irrelevant_xml_call.js new file mode 100644 index 00000000..9b33f608 --- /dev/null +++ b/tests/fixtures/xxe/javascript/irrelevant_xml_call.js @@ -0,0 +1,8 @@ +// Baseline: tainted body flows through a non-parser string operation. +// No XML parser entry point, no XXE label classification. +function handle(req, res) { + const body = req.query.xml; + res.send("" + body + ""); +} + +module.exports = { handle }; diff --git a/tests/fixtures/xxe/javascript/safe_xxe.js b/tests/fixtures/xxe/javascript/safe_xxe.js new file mode 100644 index 00000000..904beaca --- /dev/null +++ b/tests/fixtures/xxe/javascript/safe_xxe.js @@ -0,0 +1,14 @@ +// Safe: tainted XML reaches xml2js.parseString with default options. +// xml2js does not expand external entities unless explicitly configured; +// the gate's dangerous_kwargs list (`processEntities`/`explicitEntities`/ +// `strict`) is empty in the literal, so the gate suppresses the finding. +const xml2js = require("xml2js"); + +function handle(req, res) { + const body = req.query.xml; + xml2js.parseString(body, { explicitArray: false }, (err, result) => { + res.json(result); + }); +} + +module.exports = { handle }; diff --git a/tests/fixtures/xxe/javascript/unsafe_fast_xml_parser.js b/tests/fixtures/xxe/javascript/unsafe_fast_xml_parser.js new file mode 100644 index 00000000..d35d298f --- /dev/null +++ b/tests/fixtures/xxe/javascript/unsafe_fast_xml_parser.js @@ -0,0 +1,17 @@ +// Unsafe: tainted XML reaches a fast-xml-parser instance whose +// constructor was explicitly opted into entity resolution +// (`processEntities: true`). fast-xml-parser is XXE-safe by default, +// but this opt-in form is the documented unsafe escape hatch. The +// constructor-driven fact is captured in `XmlParserConfigResult` +// (`external_entities = true`) and the `parser.parse(xml)` call adds +// Cap::XXE on top of the otherwise empty sink_caps. +const { XMLParser } = require("fast-xml-parser"); + +function handle(req, res) { + const body = req.query.xml; + const parser = new XMLParser({ processEntities: true }); + const result = parser.parse(body); + res.json(result); +} + +module.exports = { handle }; diff --git a/tests/fixtures/xxe/javascript/unsafe_xxe.js b/tests/fixtures/xxe/javascript/unsafe_xxe.js new file mode 100644 index 00000000..4ab08eff --- /dev/null +++ b/tests/fixtures/xxe/javascript/unsafe_xxe.js @@ -0,0 +1,12 @@ +// Unsafe: tainted XML reaches xml2js.parseString with `processEntities: true`, +// activating the XXE gate. +const xml2js = require("xml2js"); + +function handle(req, res) { + const body = req.query.xml; + xml2js.parseString(body, { processEntities: true }, (err, result) => { + res.json(result); + }); +} + +module.exports = { handle }; diff --git a/tests/fixtures/xxe/php/irrelevant_xml_call.php b/tests/fixtures/xxe/php/irrelevant_xml_call.php new file mode 100644 index 00000000..c2273d17 --- /dev/null +++ b/tests/fixtures/xxe/php/irrelevant_xml_call.php @@ -0,0 +1,5 @@ +" . $xml . ""; diff --git a/tests/fixtures/xxe/php/safe_xxe.php b/tests/fixtures/xxe/php/safe_xxe.php new file mode 100644 index 00000000..71f7717d --- /dev/null +++ b/tests/fixtures/xxe/php/safe_xxe.php @@ -0,0 +1,8 @@ +title; diff --git a/tests/fixtures/xxe/php/unsafe_xxe.php b/tests/fixtures/xxe/php/unsafe_xxe.php new file mode 100644 index 00000000..bd991fd0 --- /dev/null +++ b/tests/fixtures/xxe/php/unsafe_xxe.php @@ -0,0 +1,6 @@ +title; diff --git a/tests/fixtures/xxe/python/irrelevant_xml_call.py b/tests/fixtures/xxe/python/irrelevant_xml_call.py new file mode 100644 index 00000000..67b42905 --- /dev/null +++ b/tests/fixtures/xxe/python/irrelevant_xml_call.py @@ -0,0 +1,8 @@ +# Baseline: tainted body flows through a non-parser string operation. +# No XML parser entry point, no XXE label classification. +from flask import request + + +def handle(): + body = request.args.get("xml") + return "" + body + "" diff --git a/tests/fixtures/xxe/python/safe_lxml.py b/tests/fixtures/xxe/python/safe_lxml.py new file mode 100644 index 00000000..05e65fb2 --- /dev/null +++ b/tests/fixtures/xxe/python/safe_lxml.py @@ -0,0 +1,10 @@ +# Safe: lxml.etree.parse is XXE-safe by default in modern lxml — external +# entities are not resolved unless `XMLParser(resolve_entities=True)` is +# passed in. No XXE rule should fire here. +import lxml.etree +from flask import request + + +def handle(): + body = request.args.get("xml") + return lxml.etree.parse(body) diff --git a/tests/fixtures/xxe/python/safe_xxe.py b/tests/fixtures/xxe/python/safe_xxe.py new file mode 100644 index 00000000..87677803 --- /dev/null +++ b/tests/fixtures/xxe/python/safe_xxe.py @@ -0,0 +1,9 @@ +# Safe: tainted XML routed through defusedxml, which strips external-entity +# resolution. Treated as a Sanitizer(XXE), so taint-xxe stays clean. +import defusedxml.ElementTree +from flask import request + +def handle(): + body = request.args.get("xml") + tree = defusedxml.ElementTree.fromstring(body) + return tree diff --git a/tests/fixtures/xxe/python/unsafe_lxml_resolve_entities.py b/tests/fixtures/xxe/python/unsafe_lxml_resolve_entities.py new file mode 100644 index 00000000..5482fc9b --- /dev/null +++ b/tests/fixtures/xxe/python/unsafe_lxml_resolve_entities.py @@ -0,0 +1,16 @@ +# Unsafe: tainted XML reaches an lxml.etree.XMLParser instance whose +# constructor was explicitly opted into entity resolution +# (`resolve_entities=True`). lxml is XXE-safe by default, but this +# opt-in form is the documented unsafe escape hatch. The +# constructor-driven fact is captured in XmlParserConfigResult +# (external_entities=True) and the parser.feed(xml) call adds +# Cap::XXE on top of the otherwise empty sink_caps. +from lxml import etree +from flask import request + + +def handle(): + body = request.args.get("xml") + parser = etree.XMLParser(resolve_entities=True) + parser.feed(body) + return parser.close() diff --git a/tests/fixtures/xxe/python/unsafe_xxe.py b/tests/fixtures/xxe/python/unsafe_xxe.py new file mode 100644 index 00000000..45aceaea --- /dev/null +++ b/tests/fixtures/xxe/python/unsafe_xxe.py @@ -0,0 +1,8 @@ +# Unsafe: tainted XML reaches xml.sax.parseString, which is XXE-vulnerable +# by default in Python's stdlib. +import xml.sax +from flask import request + +def handle(): + body = request.args.get("xml") + return xml.sax.parseString(body, MyHandler()) diff --git a/tests/fixtures/xxe/ruby/irrelevant_xml_call.rb b/tests/fixtures/xxe/ruby/irrelevant_xml_call.rb new file mode 100644 index 00000000..78d34dd6 --- /dev/null +++ b/tests/fixtures/xxe/ruby/irrelevant_xml_call.rb @@ -0,0 +1,8 @@ +# Baseline: tainted body flows through a non-parser string operation. +# No XML parser entry point, no XXE label classification. +require "sinatra" + +get "/wrap" do + body = params[:xml] + "#{body}" +end diff --git a/tests/fixtures/xxe/ruby/safe_xxe_nokogiri.rb b/tests/fixtures/xxe/ruby/safe_xxe_nokogiri.rb new file mode 100644 index 00000000..b11a3ced --- /dev/null +++ b/tests/fixtures/xxe/ruby/safe_xxe_nokogiri.rb @@ -0,0 +1,11 @@ +# Safe: Nokogiri ≥ 1.10 is XXE-safe by default; the canonical safe-options +# constant `Nokogiri::XML::ParseOptions::DEFAULT_XML` does not include +# NOENT / DTDLOAD / DTDATTR, so the gate's `dangerous_values` list does +# not match and the call is suppressed. +require "nokogiri" + +def handle(params) + body = params["xml"] + doc = Nokogiri::XML(body, nil, "UTF-8", Nokogiri::XML::ParseOptions::DEFAULT_XML) + doc.root.text +end diff --git a/tests/fixtures/xxe/ruby/unsafe_xxe.rb b/tests/fixtures/xxe/ruby/unsafe_xxe.rb new file mode 100644 index 00000000..3a763e2e --- /dev/null +++ b/tests/fixtures/xxe/ruby/unsafe_xxe.rb @@ -0,0 +1,9 @@ +# Unsafe: tainted XML reaches REXML::Document.new, the legacy default-vulnerable +# pure-Ruby XML parser that resolves external entities by default. +require "rexml/document" + +def handle(params) + body = params["xml"] + doc = REXML::Document.new(body) + doc.root.text +end diff --git a/tests/fixtures/xxe/ruby/unsafe_xxe_nokogiri.rb b/tests/fixtures/xxe/ruby/unsafe_xxe_nokogiri.rb new file mode 100644 index 00000000..b6e264e3 --- /dev/null +++ b/tests/fixtures/xxe/ruby/unsafe_xxe_nokogiri.rb @@ -0,0 +1,11 @@ +# Unsafe: tainted XML reaches Nokogiri::XML with the NOENT option flag, +# enabling external-entity expansion (XXE). Nokogiri ≥ 1.10 is XXE-safe +# by default, so the gate fires only when an unsafe option flag is passed +# explicitly at the activation arg position. +require "nokogiri" + +def handle(params) + body = params["xml"] + doc = Nokogiri::XML(body, nil, "UTF-8", Nokogiri::XML::ParseOptions::NOENT) + doc.root.text +end diff --git a/tests/fixtures/xxe/typescript/irrelevant_xml_call.ts b/tests/fixtures/xxe/typescript/irrelevant_xml_call.ts new file mode 100644 index 00000000..7e091756 --- /dev/null +++ b/tests/fixtures/xxe/typescript/irrelevant_xml_call.ts @@ -0,0 +1,6 @@ +// Baseline: tainted body flows through a non-parser string operation. +// No XML parser entry point, no XXE label classification. +export function handle(req: any, res: any): void { + const body: string = req.query.xml; + res.send("" + body + ""); +} diff --git a/tests/fixtures/xxe/typescript/safe_xxe.ts b/tests/fixtures/xxe/typescript/safe_xxe.ts new file mode 100644 index 00000000..6988d33f --- /dev/null +++ b/tests/fixtures/xxe/typescript/safe_xxe.ts @@ -0,0 +1,11 @@ +// Safe: tainted XML reaches xml2js.parseString with default options. +// xml2js does not expand external entities by default; the gate's +// dangerous_kwargs do not match this options literal. +import * as xml2js from "xml2js"; + +export function handle(req: any, res: any): void { + const body: string = req.query.xml; + xml2js.parseString(body, { explicitArray: false }, (err: any, result: any) => { + res.json(result); + }); +} diff --git a/tests/fixtures/xxe/typescript/unsafe_fast_xml_parser.ts b/tests/fixtures/xxe/typescript/unsafe_fast_xml_parser.ts new file mode 100644 index 00000000..01f3d8cb --- /dev/null +++ b/tests/fixtures/xxe/typescript/unsafe_fast_xml_parser.ts @@ -0,0 +1,15 @@ +// Unsafe: tainted XML reaches a fast-xml-parser instance whose +// constructor was explicitly opted into entity resolution +// (`processEntities: true`). fast-xml-parser is XXE-safe by default, +// but this opt-in form is the documented unsafe escape hatch. The +// constructor-driven fact is captured in `XmlParserConfigResult` +// (`external_entities = true`) and the `parser.parse(xml)` call adds +// Cap::XXE on top of the otherwise empty sink_caps. +import { XMLParser } from "fast-xml-parser"; + +export function handle(req: any, res: any): void { + const body: string = req.query.xml; + const parser = new XMLParser({ processEntities: true }); + const result = parser.parse(body); + res.json(result); +} diff --git a/tests/fixtures/xxe/typescript/unsafe_xxe.ts b/tests/fixtures/xxe/typescript/unsafe_xxe.ts new file mode 100644 index 00000000..5cc660d4 --- /dev/null +++ b/tests/fixtures/xxe/typescript/unsafe_xxe.ts @@ -0,0 +1,10 @@ +// Unsafe: tainted XML reaches xml2js.parseString with `processEntities: true`, +// activating the XXE gate (mirrors javascript/unsafe_xxe.js). +import * as xml2js from "xml2js"; + +export function handle(req: any, res: any): void { + const body: string = req.query.xml; + xml2js.parseString(body, { processEntities: true }, (err: any, result: any) => { + res.json(result); + }); +} diff --git a/tests/header_injection_tests.rs b/tests/header_injection_tests.rs new file mode 100644 index 00000000..7d6a593b --- /dev/null +++ b/tests/header_injection_tests.rs @@ -0,0 +1,240 @@ +//! Phase 04 integration tests for `Cap::HEADER_INJECTION`. +//! +//! Each supported language has two fixtures under +//! `tests/fixtures/header_injection//`: +//! +//! * `unsafe_set_header.*` — taint flows from a request source into a +//! header-write API. Must produce >=1 `taint-header-injection` HIGH. +//! * `safe_set_header.*` — same data flow, routed through a developer-named +//! `stripCRLF` / `strip_crlf` helper. Must produce 0 findings. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-header-injection"; + +fn fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("header_injection") + .join(lang) +} + +fn test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + nyx_scanner::scan_no_index(path, &test_config()).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + scan_dir(dir) + .into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "{lang}/{file_suffix}: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +#[test] +fn javascript_set_header_with_tainted_value_fires() { + assert_unsafe("javascript", "unsafe_set_header.js"); +} + +#[test] +fn javascript_strip_crlf_sanitizes() { + assert_clean("javascript", "safe_set_header.js"); +} + +#[test] +fn typescript_set_header_with_tainted_value_fires() { + assert_unsafe("typescript", "unsafe_set_header.ts"); +} + +#[test] +fn typescript_strip_crlf_sanitizes() { + assert_clean("typescript", "safe_set_header.ts"); +} + +#[test] +fn java_set_header_with_tainted_value_fires() { + assert_unsafe("java", "UnsafeSetHeader.java"); +} + +#[test] +fn java_strip_crlf_sanitizes() { + assert_clean("java", "SafeSetHeader.java"); +} + +#[test] +fn python_headers_add_with_tainted_value_fires() { + assert_unsafe("python", "unsafe_set_header.py"); +} + +#[test] +fn python_strip_crlf_sanitizes() { + assert_clean("python", "safe_set_header.py"); +} + +#[test] +fn php_header_with_tainted_value_fires() { + assert_unsafe("php", "unsafe_set_header.php"); +} + +#[test] +fn php_strip_crlf_sanitizes() { + assert_clean("php", "safe_set_header.php"); +} + +#[test] +fn ruby_subscript_set_with_tainted_value_fires() { + assert_unsafe("ruby", "unsafe_subscript_set.rb"); +} + +#[test] +fn ruby_subscript_set_with_strip_crlf_sanitized() { + assert_clean("ruby", "safe_subscript_set.rb"); +} + +#[test] +fn javascript_subscript_set_with_tainted_value_fires() { + assert_unsafe("javascript", "unsafe_subscript_set.js"); +} + +#[test] +fn javascript_subscript_set_with_strip_crlf_sanitized() { + assert_clean("javascript", "safe_subscript_set.js"); +} + +#[test] +fn typescript_subscript_set_with_tainted_value_fires() { + assert_unsafe("typescript", "unsafe_subscript_set.ts"); +} + +#[test] +fn typescript_subscript_set_with_strip_crlf_sanitized() { + assert_clean("typescript", "safe_subscript_set.ts"); +} + +#[test] +fn python_subscript_set_with_tainted_value_fires() { + assert_unsafe("python", "unsafe_subscript_set.py"); +} + +#[test] +fn python_subscript_set_with_strip_crlf_sanitized() { + assert_clean("python", "safe_subscript_set.py"); +} + +#[test] +fn go_set_header_with_tainted_value_fires() { + assert_unsafe("go", "unsafe_set_header.go"); +} + +#[test] +fn go_strip_crlf_sanitizes() { + assert_clean("go", "safe_set_header.go"); +} + +#[test] +fn rust_set_header_with_tainted_value_fires() { + assert_unsafe("rust", "unsafe_set_header.rs"); +} + +#[test] +fn rust_strip_crlf_sanitizes() { + assert_clean("rust", "safe_set_header.rs"); +} + +/// Phase 04 acceptance: PHP `header("Location: " . $url)` must surface both +/// `taint-header-injection` and `taint-open-redirect` findings on the same +/// call site. The flat HEADER_INJECTION rule (gated `=header`, payload arg 0) +/// and the OPEN_REDIRECT co-tag (gated on the `Location:` first-arg prefix) +/// share the call node, so the multi-gate SSA dispatch must emit one finding +/// per cap. The fixture lives under `open_redirect/php/`. +#[test] +fn php_header_location_cofires_header_injection_and_open_redirect() { + let dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("open_redirect") + .join("php"); + let diags = diags_for_file(&dir, "unsafe_redirect.php"); + let header_count = count_by_prefix(&diags, "taint-header-injection"); + let redirect_count = count_by_prefix(&diags, "taint-open-redirect"); + assert!( + header_count >= 1 && redirect_count >= 1, + "expected both taint-header-injection (>=1, got {header_count}) and \ + taint-open-redirect (>=1, got {redirect_count}) on \ + open_redirect/php/unsafe_redirect.php.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs index 7087f343..cb06d773 100644 --- a/tests/integration_tests.rs +++ b/tests/integration_tests.rs @@ -234,7 +234,7 @@ fn dedup_same_line_different_sinks() { )) .collect::>() ); - let caps: HashSet = taint_on_target_line + let caps: HashSet = taint_on_target_line .iter() .map(|d| d.evidence.as_ref().map(|e| e.sink_caps).unwrap_or(0)) .collect(); diff --git a/tests/ldap_injection_tests.rs b/tests/ldap_injection_tests.rs new file mode 100644 index 00000000..f394ad56 --- /dev/null +++ b/tests/ldap_injection_tests.rs @@ -0,0 +1,284 @@ +//! Phase 02 integration tests for `Cap::LDAP_INJECTION`. +//! +//! Each supported language has three fixtures under +//! `tests/fixtures/ldap_injection//`: +//! +//! * `unsafe_ldap_search.*` — taint flows from a request / env source into +//! an LDAP search/query API. Must produce at least one +//! `taint-ldap-injection` finding at HIGH severity. +//! * `safe_ldap_search.*` — same data flow, but routed through the +//! language-specific LDAP-filter escape sanitizer. Must produce zero +//! `taint-ldap-injection` findings. +//! * `baseline_constant_ldap.*` — filter is a literal constant. Must +//! produce zero `taint-ldap-injection` findings. +//! +//! The Java fixture additionally relies on type-qualified resolution +//! rewriting `ctx.search` → `LdapClient.search` via the new +//! `TypeKind::LdapClient` declared-type mapping (constraint solver). +//! JS/TS, Python, Ruby, and Go fixtures rely on the same mechanism keyed +//! off the constructor (`ldap.createClient` / `ldap.initialize` / +//! `Net::LDAP.new` / `ldap.DialURL`). + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-ldap-injection"; + +fn ldap_fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("ldap_injection") + .join(lang) +} + +/// Test-local config override: enable `include_nonprod` so fixtures under +/// `tests/fixtures/...` (which `is_nonprod_path` would otherwise classify +/// as nonprod and downgrade by one severity tier) report their actual +/// registry severity. Mirrors `common::test_config` in every other respect. +fn ldap_test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + let cfg = ldap_test_config(); + nyx_scanner::scan_no_index(path, &cfg).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + let all = scan_dir(dir); + // Match on the trailing path component, not a substring suffix; otherwise + // `unsafe_ldap_search.php` would be picked up by `safe_ldap_search.php`'s + // `ends_with` filter and the safe-fixture clean assertion would + // accidentally see findings from its sibling. + all.into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = ldap_fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert_eq!( + count, + 1, + "{lang}/{file_suffix}: expected exactly 1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); + let high = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX) && d.severity.as_db_str() == "HIGH") + .count(); + assert_eq!( + high, + 1, + "{lang}/{file_suffix}: expected exactly 1 HIGH-severity {RULE_PREFIX} finding, got {high}.\n\ + All matching: {:#?}", + diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .map(|d| format!("{}:{} [{}]", d.path, d.line, d.severity.as_db_str())) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = ldap_fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +// ── Java ───────────────────────────────────────────────────────────────── + +#[test] +fn java_dir_context_search_with_tainted_filter_fires() { + assert_unsafe("java", "UnsafeLdapSearch.java"); +} + +#[test] +fn java_filter_encode_sanitizes() { + assert_clean("java", "SafeLdapSearch.java"); +} + +#[test] +fn java_baseline_constant_filter_does_not_fire() { + assert_clean("java", "BaselineConstantLdap.java"); +} + +// ── Python ─────────────────────────────────────────────────────────────── + +#[test] +fn python_search_s_with_tainted_filter_fires() { + assert_unsafe("python", "unsafe_ldap_search.py"); +} + +#[test] +fn python_escape_filter_chars_sanitizes() { + assert_clean("python", "safe_ldap_search.py"); +} + +#[test] +fn python_baseline_constant_filter_does_not_fire() { + assert_clean("python", "baseline_constant_ldap.py"); +} + +// ── PHP ────────────────────────────────────────────────────────────────── + +#[test] +fn php_ldap_search_with_tainted_filter_fires() { + assert_unsafe("php", "unsafe_ldap_search.php"); +} + +#[test] +fn php_ldap_escape_sanitizes() { + assert_clean("php", "safe_ldap_search.php"); +} + +#[test] +fn php_baseline_constant_filter_does_not_fire() { + assert_clean("php", "baseline_constant_ldap.php"); +} + +// ── JavaScript ─────────────────────────────────────────────────────────── + +#[test] +fn javascript_ldapjs_search_with_tainted_filter_fires() { + assert_unsafe("javascript", "unsafe_ldap_search.js"); +} + +#[test] +fn javascript_ldap_escape_sanitizes() { + assert_clean("javascript", "safe_ldap_search.js"); +} + +#[test] +fn javascript_baseline_constant_filter_does_not_fire() { + assert_clean("javascript", "baseline_constant_ldap.js"); +} + +// ── TypeScript ─────────────────────────────────────────────────────────── + +#[test] +fn typescript_ldapjs_search_with_tainted_filter_fires() { + assert_unsafe("typescript", "unsafe_ldap_search.ts"); +} + +#[test] +fn typescript_ldap_escape_sanitizes() { + assert_clean("typescript", "safe_ldap_search.ts"); +} + +#[test] +fn typescript_baseline_constant_filter_does_not_fire() { + assert_clean("typescript", "baseline_constant_ldap.ts"); +} + +// ── C ─────────────────────────────────────────────────────────────────── + +#[test] +fn c_ldap_search_ext_s_with_tainted_filter_fires() { + assert_unsafe("c", "unsafe_ldap_search.c"); +} + +#[test] +fn c_sanitize_helper_clears_cap() { + assert_clean("c", "safe_ldap_search.c"); +} + +#[test] +fn c_baseline_constant_filter_does_not_fire() { + assert_clean("c", "baseline_constant_ldap.c"); +} + +// ── C++ ───────────────────────────────────────────────────────────────── + +#[test] +fn cpp_ldap_search_ext_s_with_tainted_filter_fires() { + assert_unsafe("cpp", "unsafe_ldap_search.cpp"); +} + +#[test] +fn cpp_sanitize_helper_clears_cap() { + assert_clean("cpp", "safe_ldap_search.cpp"); +} + +#[test] +fn cpp_baseline_constant_filter_does_not_fire() { + assert_clean("cpp", "baseline_constant_ldap.cpp"); +} + +// ── Ruby ──────────────────────────────────────────────────────────────── + +#[test] +fn ruby_net_ldap_search_with_tainted_filter_fires() { + assert_unsafe("ruby", "unsafe_ldap_search.rb"); +} + +#[test] +fn ruby_filter_escape_sanitizes() { + assert_clean("ruby", "safe_ldap_search.rb"); +} + +#[test] +fn ruby_baseline_constant_filter_does_not_fire() { + assert_clean("ruby", "baseline_constant_ldap.rb"); +} + +// ── Go ────────────────────────────────────────────────────────────────── + +#[test] +fn go_ldap_search_request_with_tainted_filter_fires() { + assert_unsafe("go", "unsafe_ldap_search.go"); +} + +#[test] +fn go_escape_filter_sanitizes() { + assert_clean("go", "safe_ldap_search.go"); +} + +#[test] +fn go_baseline_constant_filter_does_not_fire() { + assert_clean("go", "baseline_constant_ldap.go"); +} diff --git a/tests/open_redirect_tests.rs b/tests/open_redirect_tests.rs new file mode 100644 index 00000000..6144dde0 --- /dev/null +++ b/tests/open_redirect_tests.rs @@ -0,0 +1,287 @@ +//! Phase 05 integration tests for `Cap::OPEN_REDIRECT`. +//! +//! Each supported language has three fixtures under +//! `tests/fixtures/open_redirect//`: +//! +//! * `unsafe_redirect.*` — taint flows from a request source into a +//! redirect API. Must produce >=1 `taint-open-redirect` finding. +//! * `safe_redirect.*` — same flow routed through a developer-named +//! `validateRedirectUrl` / `validate_redirect_url` allowlist. Must +//! produce 0 findings. +//! * `safe_relative_redirect.*` — same flow routed through an +//! `ensureRelativeUrl` / `ensure_relative_url` helper that enforces a +//! leading `/` and rejects scheme-prefixed values (relative-only path). +//! Must produce 0 findings. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-open-redirect"; + +fn fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("open_redirect") + .join(lang) +} + +fn test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + nyx_scanner::scan_no_index(path, &test_config()).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + scan_dir(dir) + .into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "{lang}/{file_suffix}: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +#[test] +fn javascript_redirect_with_tainted_url_fires() { + assert_unsafe("javascript", "unsafe_redirect.js"); +} + +#[test] +fn javascript_validate_url_sanitizes() { + assert_clean("javascript", "safe_redirect.js"); +} + +#[test] +fn javascript_relative_only_sanitizes() { + assert_clean("javascript", "safe_relative_redirect.js"); +} + +#[test] +fn javascript_host_allowlist_sanitizes() { + // `new URL(target).host === ALLOWED_HOST` is recognised by + // PredicateKind::HostAllowlistValidated which clears Cap::OPEN_REDIRECT + // on the validated branch. + assert_clean("javascript", "safe_host_allowlist_redirect.js"); +} + +#[test] +fn typescript_redirect_with_tainted_url_fires() { + assert_unsafe("typescript", "unsafe_redirect.ts"); +} + +#[test] +fn typescript_validate_url_sanitizes() { + assert_clean("typescript", "safe_redirect.ts"); +} + +#[test] +fn typescript_relative_only_sanitizes() { + assert_clean("typescript", "safe_relative_redirect.ts"); +} + +#[test] +fn typescript_host_allowlist_sanitizes() { + assert_clean("typescript", "safe_host_allowlist_redirect.ts"); +} + +#[test] +fn python_redirect_with_tainted_url_fires() { + assert_unsafe("python", "unsafe_redirect.py"); +} + +#[test] +fn python_validate_url_sanitizes() { + assert_clean("python", "safe_redirect.py"); +} + +#[test] +fn python_relative_only_sanitizes() { + assert_clean("python", "safe_relative_redirect.py"); +} + +#[test] +fn python_host_allowlist_sanitizes() { + assert_clean("python", "safe_host_allowlist_redirect.py"); +} + +#[test] +fn java_send_redirect_with_tainted_url_fires() { + assert_unsafe("java", "UnsafeRedirect.java"); +} + +#[test] +fn java_validate_url_sanitizes() { + assert_clean("java", "SafeRedirect.java"); +} + +#[test] +fn java_relative_only_sanitizes() { + assert_clean("java", "SafeRelativeRedirect.java"); +} + +#[test] +fn java_spring_mvc_redirect_prefix_with_tainted_url_fires() { + // Spring MVC controller-return shape: `return "redirect:" + url` is + // matched structurally at CFG construction by emitting a synthetic + // `__spring_redirect__` Sink(OPEN_REDIRECT) call before the Return. + assert_unsafe("java", "UnsafeSpringRedirect.java"); +} + +#[test] +fn java_inline_relative_check_sanitizes() { + // Inline `target.startsWith("/")` guard with no named helper; the + // RelativeUrlValidated predicate strips OPEN_REDIRECT on the true + // branch so `res.sendRedirect(target)` is not flagged. + assert_clean("java", "SafeInlineRelative.java"); +} + +#[test] +fn php_header_location_with_tainted_url_fires() { + assert_unsafe("php", "unsafe_redirect.php"); +} + +#[test] +fn php_validate_redirect_url_sanitizes() { + assert_clean("php", "safe_redirect.php"); +} + +#[test] +fn php_relative_only_sanitizes() { + assert_clean("php", "safe_relative_redirect.php"); +} + +#[test] +fn ruby_redirect_to_with_tainted_url_fires() { + assert_unsafe("ruby", "unsafe_redirect.rb"); +} + +#[test] +fn ruby_validate_redirect_url_sanitizes() { + assert_clean("ruby", "safe_redirect.rb"); +} + +#[test] +fn ruby_relative_only_sanitizes() { + assert_clean("ruby", "safe_relative_redirect.rb"); +} + +#[test] +fn go_http_redirect_with_tainted_url_fires() { + assert_unsafe("go", "unsafe_redirect.go"); +} + +#[test] +fn go_validate_redirect_url_sanitizes() { + assert_clean("go", "safe_redirect.go"); +} + +#[test] +fn go_relative_only_sanitizes() { + assert_clean("go", "safe_relative_redirect.go"); +} + +#[test] +fn rust_axum_redirect_to_with_tainted_url_fires() { + assert_unsafe("rust", "unsafe_redirect.rs"); +} + +#[test] +fn rust_validate_redirect_url_sanitizes() { + assert_clean("rust", "safe_redirect.rs"); +} + +#[test] +fn rust_relative_only_sanitizes() { + assert_clean("rust", "safe_relative_redirect.rs"); +} + +#[test] +fn rust_host_allowlist_multi_statement_sanitizes() { + // Multi-statement form: `let parsed = Url::parse(x)?` followed by a + // separate `if parsed.host_str() == Some(ALLOWED)` check. Recognised + // by PredicateKind::HostAllowlistValidated via the `.host_str()` + // accessor probe (no parse call needed in the condition text). + assert_clean("rust", "safe_host_allowlist_redirect.rs"); +} + +#[test] +fn go_host_allowlist_multi_statement_sanitizes() { + // Multi-statement form: `parsed, err := url.Parse(x)` followed by a + // separate `if parsed.Host == allowedHost` check. Recognised by + // PredicateKind::HostAllowlistValidated via the case-sensitive + // capital-`H` `.Host` accessor probe. + assert_clean("go", "safe_host_allowlist_redirect.go"); +} + +#[test] +fn rust_actix_location_header_with_tainted_url_fires() { + assert_unsafe("rust", "unsafe_actix_location.rs"); +} + +#[test] +fn rust_actix_content_type_header_clean() { + assert_clean("rust", "safe_actix_content_type.rs"); +} + +#[test] +fn rust_actix_location_header_chained_finish_fires() { + assert_unsafe("rust", "unsafe_actix_location_chained.rs"); +} diff --git a/tests/prototype_pollution_tests.rs b/tests/prototype_pollution_tests.rs new file mode 100644 index 00000000..52c0d1e2 --- /dev/null +++ b/tests/prototype_pollution_tests.rs @@ -0,0 +1,238 @@ +//! Phase 08 + Phase 09 integration tests for `Cap::PROTOTYPE_POLLUTION`. +//! +//! Phase 08 (library-mediated) fixtures live under +//! `tests/fixtures/prototype_pollution//`: +//! +//! * `unsafe_lodash_merge.*` — `_.merge(target, req.body)` shape; must +//! produce >=1 `taint-prototype-pollution` finding. +//! * `unsafe_object_assign.js` — `Object.assign(target, req.body)` shape; +//! must produce >=1 finding (JS-only fixture). +//! * `safe_lodash_merge_const.*` — constant-source merge; must produce 0 +//! findings. +//! +//! Phase 09 (full-SSA dynamic-key sink) fixtures live under +//! `tests/fixtures/prototype_pollution/full/`: +//! +//! * `unsafe_dynamic_key.js` — `target[req.query.k] = req.query.v`; must +//! produce >=1 finding via the synthetic `__index_set__` node. +//! * `safe_reject_list.js` — `if (k === "__proto__" || …) return;` guard; +//! must produce 0 findings. +//! * `safe_object_create_null.js` — receiver assigned `Object.create(null)`; +//! must produce 0 findings. +//! * `safe_allowlist.js` — `if (k === "name" || k === "id") obj[k] = v` +//! on the true arm; must produce 0 findings. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-prototype-pollution"; + +fn fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("prototype_pollution") + .join(lang) +} + +fn test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + nyx_scanner::scan_no_index(path, &test_config()).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + scan_dir(dir) + .into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "{lang}/{file_suffix}: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +#[test] +fn javascript_lodash_merge_with_tainted_source_fires() { + assert_unsafe("javascript", "unsafe_lodash_merge.js"); +} + +#[test] +fn javascript_object_assign_with_tainted_source_fires() { + assert_unsafe("javascript", "unsafe_object_assign.js"); +} + +#[test] +fn javascript_lodash_merge_constant_source_does_not_fire() { + assert_clean("javascript", "safe_lodash_merge_const.js"); +} + +#[test] +fn javascript_object_assign_constant_source_does_not_fire() { + assert_clean("javascript", "safe_object_assign_const.js"); +} + +#[test] +fn javascript_set_value_with_tainted_key_fires() { + // `set-value` standalone helper (CVE-2019-10747). Tainted `key` / + // `value` from req.body must surface PROTOTYPE_POLLUTION. + assert_unsafe("javascript", "unsafe_set_value.js"); +} + +#[test] +fn javascript_set_value_constant_does_not_fire() { + assert_clean("javascript", "safe_set_value_const.js"); +} + +#[test] +fn javascript_dot_prop_set_with_tainted_path_fires() { + // `dot-prop` `dotProp.set(obj, path, val)` — CVE-2020-8116. + assert_unsafe("javascript", "unsafe_dot_prop_set.js"); +} + +#[test] +fn javascript_jsonpath_set_with_tainted_path_fires() { + // `jsonpath` `jp.set(obj, path, val)` — prototype chain mutation + // when `path` carries `__proto__` segments. + assert_unsafe("javascript", "unsafe_jsonpath_set.js"); +} + +#[test] +fn typescript_lodash_merge_with_tainted_source_fires() { + assert_unsafe("typescript", "unsafe_lodash_merge.ts"); +} + +#[test] +fn typescript_lodash_merge_constant_source_does_not_fire() { + assert_clean("typescript", "safe_lodash_merge_const.ts"); +} + +#[test] +fn typescript_object_assign_with_tainted_source_fires() { + assert_unsafe("typescript", "unsafe_object_assign.ts"); +} + +#[test] +fn typescript_object_assign_constant_source_does_not_fire() { + assert_clean("typescript", "safe_object_assign_const.ts"); +} + +// ── Bare `extend` deep-merge gate (LiteralOnly activation) ──────────────── + +#[test] +fn javascript_bare_extend_deep_with_tainted_source_fires() { + // `const { extend } = require('jquery'); extend(true, target, req.body)` + // — bare suffix matcher fires when arg 0 is literal `true`. + assert_unsafe("javascript", "unsafe_bare_extend_deep.js"); +} + +#[test] +fn javascript_bare_extend_class_extension_does_not_fire() { + // `Backbone.Model.extend({...})` — arg 0 is an object literal, not the + // deep flag, so LiteralOnly activation suppresses. + assert_clean("javascript", "safe_bare_extend_class.js"); +} + +#[test] +fn javascript_bare_extend_dynamic_arg0_does_not_fire() { + // `extend(target, req.body)` — arg 0 is dynamic; LiteralOnly skips the + // conservative ALL_ARGS_PAYLOAD branch. + assert_clean("javascript", "safe_bare_extend_dynamic.js"); +} + +#[test] +fn typescript_bare_extend_deep_with_tainted_source_fires() { + assert_unsafe("typescript", "unsafe_bare_extend_deep.ts"); +} + +#[test] +fn typescript_bare_extend_class_extension_does_not_fire() { + assert_clean("typescript", "safe_bare_extend_class.ts"); +} + +// ── Phase 09: full-SSA dynamic-key sink ─────────────────────────────────── + +#[test] +fn full_ssa_dynamic_key_with_tainted_key_fires() { + assert_unsafe("full", "unsafe_dynamic_key.js"); +} + +#[test] +fn full_ssa_reject_list_guard_does_not_fire() { + assert_clean("full", "safe_reject_list.js"); +} + +#[test] +fn full_ssa_object_create_null_receiver_does_not_fire() { + assert_clean("full", "safe_object_create_null.js"); +} + +#[test] +fn full_ssa_partial_null_proto_fires_on_unsafe_branch() { + // Phase 09 flow-sensitivity regression: previous AST scan suppressed + // any same-function `Object.create(null)` assignment, masking the + // unsafe else-branch. TypeFacts phi-meet now joins to Unknown so + // the PROTOTYPE_POLLUTION sink fires. + assert_unsafe("full", "unsafe_partial_null_proto.js"); +} + +#[test] +fn full_ssa_allowlist_guard_does_not_fire() { + assert_clean("full", "safe_allowlist.js"); +} diff --git a/tests/python_proto_pollution_tests.rs b/tests/python_proto_pollution_tests.rs new file mode 100644 index 00000000..ad3282e8 --- /dev/null +++ b/tests/python_proto_pollution_tests.rs @@ -0,0 +1,87 @@ +//! Python prototype-pollution opt-in gate (`NYX_PYTHON_PROTO_POLLUTION=1`). +//! +//! Lives in its own test binary so the `GATED_REGISTRY` `Lazy` initialises +//! after this binary's startup env-var setting; merging into other test +//! files would race with their first-access initialisation. +//! +//! Fixture: +//! +//! * `unsafe_dict_update.py` — `target.update(json.loads(body))` shape; the +//! `dict.update` gate (PROTO_POLLUTION_GATES in `src/labels/python.rs`) +//! should fire once the env-var is set. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-prototype-pollution"; + +fn fixture_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("prototype_pollution") + .join("python") +} + +fn test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + nyx_scanner::scan_no_index(path, &test_config()).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + scan_dir(dir) + .into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +#[test] +fn python_dict_update_with_tainted_source_fires() { + // SAFETY: env::set_var is unsafe in 2024 edition; safe here because + // this test binary's `GATED_REGISTRY` Lazy is not yet initialised + // (no other test in this binary scans before this call) and the + // setting is process-local with no other threads observing. + unsafe { + std::env::set_var("NYX_PYTHON_PROTO_POLLUTION", "1"); + } + let dir = fixture_dir(); + let diags = diags_for_file(&dir, "unsafe_dict_update.py"); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "python/unsafe_dict_update.py: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} diff --git a/tests/ssti_tests.rs b/tests/ssti_tests.rs new file mode 100644 index 00000000..05096ee0 --- /dev/null +++ b/tests/ssti_tests.rs @@ -0,0 +1,245 @@ +//! Phase 06 integration tests for `Cap::SSTI`. +//! +//! Fixtures under `tests/fixtures/ssti//`: +//! +//! * `unsafe_*` — taint flows from a request source into a template +//! compile / from_string / render call as the template *source* arg. +//! Must produce >=1 `taint-template-injection` finding. +//! * `safe_*_constant` — template source is a literal; variables at +//! render time may carry user input but do not activate SSTI. Must +//! produce 0 findings. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-template-injection"; + +fn fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("ssti") + .join(lang) +} + +fn test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + nyx_scanner::scan_no_index(path, &test_config()).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + scan_dir(dir) + .into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "{lang}/{file_suffix}: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +#[test] +fn javascript_handlebars_compile_with_tainted_source_fires() { + assert_unsafe("javascript", "unsafe_handlebars_compile.js"); +} + +#[test] +fn javascript_handlebars_constant_source_does_not_fire() { + assert_clean("javascript", "safe_handlebars_constant.js"); +} + +#[test] +fn typescript_handlebars_compile_with_tainted_source_fires() { + assert_unsafe("typescript", "unsafe_handlebars_compile.ts"); +} + +#[test] +fn typescript_handlebars_constant_source_does_not_fire() { + assert_clean("typescript", "safe_handlebars_constant.ts"); +} + +#[test] +fn python_jinja_template_with_tainted_source_fires() { + assert_unsafe("python", "unsafe_jinja_template.py"); +} + +#[test] +fn python_jinja_constant_source_does_not_fire() { + assert_clean("python", "safe_jinja_constant.py"); +} + +#[test] +fn python_jinja_compile_expression_with_tainted_source_fires() { + assert_unsafe("python", "unsafe_jinja_compile_expression.py"); +} + +#[test] +fn python_render_template_with_tainted_var_does_not_fire() { + assert_clean("python", "safe_render_template_var.py"); +} + +#[test] +fn python_mako_lookup_get_template_with_tainted_name_fires() { + // Mako TemplateLookup.get_template loader-path pattern. Tainted + // `name` selects which file becomes the rendered template — arbitrary + // template execution modeled as SSTI on the loader-path arg. + assert_unsafe("python", "unsafe_mako_lookup_get_template.py"); +} + +#[test] +fn python_mako_lookup_constant_name_does_not_fire() { + assert_clean("python", "safe_mako_lookup_constant.py"); +} + +#[test] +fn python_jinja_get_template_with_tainted_name_fires() { + // Jinja2 Environment.get_template loader-path pattern. + assert_unsafe("python", "unsafe_jinja_get_template.py"); +} + +#[test] +fn javascript_nunjucks_render_string_tainted_source_fires() { + assert_unsafe("javascript", "unsafe_nunjucks_render_string.js"); +} + +#[test] +fn javascript_nunjucks_render_string_const_template_does_not_fire() { + assert_clean("javascript", "safe_nunjucks_render_string.js"); +} + +#[test] +fn typescript_nunjucks_render_string_tainted_source_fires() { + assert_unsafe("typescript", "unsafe_nunjucks_render_string.ts"); +} + +#[test] +fn typescript_nunjucks_render_string_const_template_does_not_fire() { + assert_clean("typescript", "safe_nunjucks_render_string.ts"); +} + +#[test] +fn php_twig_create_template_with_tainted_source_fires() { + assert_unsafe("php", "unsafe_twig_create_template.php"); +} + +#[test] +fn php_twig_constant_source_does_not_fire() { + assert_clean("php", "safe_twig_constant.php"); +} + +#[test] +fn php_twig_render_with_tainted_var_does_not_fire() { + assert_clean("php", "safe_twig_template_var.php"); +} + +#[test] +fn php_smarty_string_prefix_with_tainted_source_fires() { + assert_unsafe("php", "unsafe_smarty_string_fetch.php"); +} + +#[test] +fn php_smarty_file_fetch_with_tainted_var_does_not_fire() { + assert_clean("php", "safe_smarty_file_fetch.php"); +} + +#[test] +fn java_freemarker_with_tainted_template_source_fires() { + assert_unsafe("java", "UnsafeFreemarkerTemplate.java"); +} + +#[test] +fn java_freemarker_constant_template_does_not_fire() { + assert_clean("java", "SafeFreemarkerConstant.java"); +} + +#[test] +fn java_freemarker_template_process_with_tainted_source_fires() { + assert_unsafe("java", "UnsafeFreemarkerProcess.java"); +} + +#[test] +fn ruby_erb_new_with_tainted_source_fires() { + assert_unsafe("ruby", "unsafe_erb_new.rb"); +} + +#[test] +fn ruby_erb_constant_source_does_not_fire() { + assert_clean("ruby", "safe_erb_constant.rb"); +} + +#[test] +fn ruby_render_template_with_tainted_var_does_not_fire() { + assert_clean("ruby", "safe_erb_template_var.rb"); +} + +#[test] +fn go_text_template_parse_with_tainted_source_fires() { + assert_unsafe("go", "unsafe_template_parse.go"); +} + +#[test] +fn go_template_constant_source_does_not_fire() { + assert_clean("go", "safe_template_constant.go"); +} + +#[test] +fn go_template_parse_files_with_tainted_var_does_not_fire() { + assert_clean("go", "safe_template_parsefiles.go"); +} diff --git a/tests/xpath_injection_tests.rs b/tests/xpath_injection_tests.rs new file mode 100644 index 00000000..2dcd1136 --- /dev/null +++ b/tests/xpath_injection_tests.rs @@ -0,0 +1,264 @@ +//! Phase 03 integration tests for `Cap::XPATH_INJECTION`. +//! +//! Each supported language has three fixtures under +//! `tests/fixtures/xpath_injection//`: +//! +//! * `unsafe_xpath_query.*` — taint flows from a request / env source into +//! an XPath evaluate / select / query API. Must produce at least one +//! `taint-xpath-injection` finding at HIGH severity. +//! * `safe_xpath_query.*` — same data flow, but routed through a +//! developer-named `escape_xpath` / `escapeXpath` / `sanitize_*` helper. +//! Must produce zero `taint-xpath-injection` findings. +//! * `baseline_constant_xpath.*` — expression is a literal constant. Must +//! produce zero `taint-xpath-injection` findings. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-xpath-injection"; + +fn xpath_fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("xpath_injection") + .join(lang) +} + +fn xpath_test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + let cfg = xpath_test_config(); + nyx_scanner::scan_no_index(path, &cfg).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + let all = scan_dir(dir); + all.into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = xpath_fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "{lang}/{file_suffix}: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); + let high = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX) && d.severity.as_db_str() == "HIGH") + .count(); + assert!( + high >= 1, + "{lang}/{file_suffix}: expected >=1 HIGH-severity {RULE_PREFIX} finding, got {high}.\n\ + All matching: {:#?}", + diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .map(|d| format!("{}:{} [{}]", d.path, d.line, d.severity.as_db_str())) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = xpath_fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +// ── Java ───────────────────────────────────────────────────────────────── + +#[test] +fn java_xpath_evaluate_with_tainted_expr_fires() { + assert_unsafe("java", "UnsafeXPathQuery.java"); +} + +#[test] +fn java_escape_xpath_sanitizes() { + assert_clean("java", "SafeXPathQuery.java"); +} + +#[test] +fn java_baseline_constant_expr_does_not_fire() { + assert_clean("java", "BaselineConstantXpath.java"); +} + +#[test] +fn java_parameterised_xpath_does_not_fire() { + assert_clean("java", "ParameterizedXpath.java"); +} + +#[test] +fn java_tainted_expr_with_resolver_does_not_fire() { + // Receiver-config sidecar (`src/ssa/xpath_config.rs`) clears + // XPATH_INJECTION on `xpath.evaluate(taintedExpr, ...)` when the + // bound XPath instance had `setXPathVariableResolver` called on it + // first. Without the sidecar this fixture would fire. + assert_clean("java", "TaintedParameterizedXpath.java"); +} + +// ── Python ─────────────────────────────────────────────────────────────── + +#[test] +fn python_xpath_with_tainted_expr_fires() { + assert_unsafe("python", "unsafe_xpath_query.py"); +} + +#[test] +fn python_escape_xpath_sanitizes() { + assert_clean("python", "safe_xpath_query.py"); +} + +#[test] +fn python_baseline_constant_expr_does_not_fire() { + assert_clean("python", "baseline_constant_xpath.py"); +} + +// ── PHP ────────────────────────────────────────────────────────────────── + +#[test] +fn php_domxpath_query_with_tainted_expr_fires() { + assert_unsafe("php", "unsafe_xpath_query.php"); +} + +#[test] +fn php_escape_xpath_sanitizes() { + assert_clean("php", "safe_xpath_query.php"); +} + +#[test] +fn php_baseline_constant_expr_does_not_fire() { + assert_clean("php", "baseline_constant_xpath.php"); +} + +// ── JavaScript ─────────────────────────────────────────────────────────── + +#[test] +fn javascript_xpath_select_with_tainted_expr_fires() { + assert_unsafe("javascript", "unsafe_xpath_query.js"); +} + +#[test] +fn javascript_escape_xpath_sanitizes() { + assert_clean("javascript", "safe_xpath_query.js"); +} + +#[test] +fn javascript_baseline_constant_expr_does_not_fire() { + assert_clean("javascript", "baseline_constant_xpath.js"); +} + +// ── TypeScript ─────────────────────────────────────────────────────────── + +#[test] +fn typescript_xpath_select_with_tainted_expr_fires() { + assert_unsafe("typescript", "unsafe_xpath_query.ts"); +} + +#[test] +fn typescript_escape_xpath_sanitizes() { + assert_clean("typescript", "safe_xpath_query.ts"); +} + +#[test] +fn typescript_baseline_constant_expr_does_not_fire() { + assert_clean("typescript", "baseline_constant_xpath.ts"); +} + +// ── Ruby ──────────────────────────────────────────────────────────────── + +#[test] +fn ruby_nokogiri_xpath_with_tainted_expr_fires() { + assert_unsafe("ruby", "unsafe_xpath_query.rb"); +} + +#[test] +fn ruby_escape_xpath_sanitizes() { + assert_clean("ruby", "safe_xpath_query.rb"); +} + +#[test] +fn ruby_baseline_constant_expr_does_not_fire() { + assert_clean("ruby", "baseline_constant_xpath.rb"); +} + +// ── C ─────────────────────────────────────────────────────────────────── + +#[test] +fn c_xml_xpath_eval_with_tainted_expr_fires() { + assert_unsafe("c", "unsafe_xpath_query.c"); +} + +#[test] +fn c_sanitize_helper_clears_cap() { + assert_clean("c", "safe_xpath_query.c"); +} + +#[test] +fn c_baseline_constant_expr_does_not_fire() { + assert_clean("c", "baseline_constant_xpath.c"); +} + +// ── C++ ───────────────────────────────────────────────────────────────── + +#[test] +fn cpp_xml_xpath_eval_with_tainted_expr_fires() { + assert_unsafe("cpp", "unsafe_xpath_query.cpp"); +} + +#[test] +fn cpp_sanitize_helper_clears_cap() { + assert_clean("cpp", "safe_xpath_query.cpp"); +} + +#[test] +fn cpp_baseline_constant_expr_does_not_fire() { + assert_clean("cpp", "baseline_constant_xpath.cpp"); +} diff --git a/tests/xxe_tests.rs b/tests/xxe_tests.rs new file mode 100644 index 00000000..4de085f0 --- /dev/null +++ b/tests/xxe_tests.rs @@ -0,0 +1,260 @@ +//! Phase XXE integration tests for `Cap::XXE`. +//! +//! Fixtures under `tests/fixtures/xxe//`: +//! +//! * `unsafe_xxe.*` — taint flows from a request source into a parser +//! entry point that resolves external entities (Java DocumentBuilder, +//! Python `xml.sax.parseString`, PHP `simplexml_load_string` with +//! `LIBXML_NOENT`, JS `xml2js.parseString` with `processEntities: true`, +//! Ruby `REXML::Document.new`). Must produce >=1 `taint-xxe` finding. +//! * `safe_xxe.*` — same flow routed through a hardened API +//! (defusedxml, default-options `simplexml_load_string`, etc.). +//! Must produce 0 findings. +//! +//! Layer 2 (config-check pattern via abstract-interp) is deferred — see +//! `.pitboss/play/deferred.md`. + +mod common; + +use common::count_by_prefix; +use nyx_scanner::commands::scan::Diag; +use nyx_scanner::utils::config::{AnalysisMode, Config}; +use std::path::{Path, PathBuf}; + +const RULE_PREFIX: &str = "taint-xxe"; + +fn fixture_dir(lang: &str) -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("xxe") + .join(lang) +} + +fn test_config() -> Config { + let mut cfg = Config::default(); + cfg.scanner.mode = AnalysisMode::Full; + cfg.scanner.read_vcsignore = false; + cfg.scanner.require_git_to_read_vcsignore = false; + cfg.scanner.enable_state_analysis = true; + cfg.scanner.enable_auth_analysis = true; + cfg.scanner.include_nonprod = true; + cfg.performance.worker_threads = Some(1); + cfg.performance.batch_size = 64; + cfg.performance.channel_multiplier = 1; + cfg +} + +fn scan_dir(path: &Path) -> Vec { + nyx_scanner::scan_no_index(path, &test_config()).expect("scan_no_index should succeed") +} + +fn diags_for_file(dir: &Path, file_suffix: &str) -> Vec { + scan_dir(dir) + .into_iter() + .filter(|d| { + std::path::Path::new(&d.path) + .file_name() + .and_then(|s| s.to_str()) + == Some(file_suffix) + }) + .collect() +} + +fn assert_unsafe(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let count = count_by_prefix(&diags, RULE_PREFIX); + assert!( + count >= 1, + "{lang}/{file_suffix}: expected >=1 {RULE_PREFIX} finding, got {count}.\n\ + All diags: {:#?}", + diags + .iter() + .map(|d| format!( + "{}:{} [{}] {}", + d.path, + d.line, + d.severity.as_db_str(), + d.id + )) + .collect::>(), + ); +} + +fn assert_clean(lang: &str, file_suffix: &str) { + let dir = fixture_dir(lang); + let diags = diags_for_file(&dir, file_suffix); + let matching: Vec<_> = diags + .iter() + .filter(|d| d.id.starts_with(RULE_PREFIX)) + .collect(); + assert!( + matching.is_empty(), + "{lang}/{file_suffix}: expected 0 {RULE_PREFIX} findings, got {}:\n{:#?}", + matching.len(), + matching + .iter() + .map(|d| format!("{}:{} {}", d.path, d.line, d.id)) + .collect::>(), + ); +} + +#[test] +fn java_document_builder_parse_with_tainted_xml_fires() { + assert_unsafe("java", "UnsafeXxe.java"); +} + +#[test] +fn java_no_xml_parser_clean() { + assert_clean("java", "SafeXxe.java"); +} + +/// Phase 07 acceptance: a `factory.setFeature(FEATURE_SECURE_PROCESSING, +/// true)` before `builder.parse(...)` produces zero `taint-xxe` +/// findings. The hardening fact is recorded on the factory's SSA +/// value, propagated to the builder via `newDocumentBuilder()`, and +/// consulted at the parse sink. +#[test] +fn java_set_feature_secure_processing_clean() { + assert_clean("java", "SafeXxeConfig.java"); +} + +/// Phase 07 acceptance: parser variable reassigned across two branches +/// that both harden the receiver — the SSA phi-meet preserves +/// `secure_processing = true`, and the downstream parse sink stays +/// silent. +#[test] +fn java_phi_reassigned_factory_clean() { + assert_clean("java", "SafeXxePhi.java"); +} + +/// Baseline: tainted body wrapped in a string concat, no XML parser +/// entry point. `taint-xxe` must not surface from XML-adjacent string +/// operations. +#[test] +fn java_irrelevant_xml_call_clean() { + assert_clean("java", "IrrelevantXmlCall.java"); +} + +/// Log4Shell XXE-leg shape (CVE-2022-23305 / CVE-2022-23307 lineage): +/// DOMConfigurator-style loader takes an XML config path from the +/// request, parses through an unhardened `DocumentBuilder`. Exercises +/// the TypeFacts-tagged builder receiver + xml_config sidecar end-to-end. +#[test] +fn java_log4j_config_loader_with_tainted_path_fires() { + assert_unsafe("java", "UnsafeLog4jConfig.java"); +} + +/// Log4Shell XXE-leg hardened: same DOMConfigurator-style loader but +/// `factory.setFeature(FEATURE_SECURE_PROCESSING, true)` and +/// `disallow-doctype-decl` precede the `newDocumentBuilder()` call. +/// xml_config sidecar propagates the hardening fact to the builder so +/// the parse sink suppresses the XXE bit. +#[test] +fn java_log4j_config_loader_secure_processing_clean() { + assert_clean("java", "SafeLog4jConfig.java"); +} + +#[test] +fn python_sax_parse_with_tainted_xml_fires() { + assert_unsafe("python", "unsafe_xxe.py"); +} + +#[test] +fn python_lxml_resolve_entities_fires() { + assert_unsafe("python", "unsafe_lxml_resolve_entities.py"); +} + +#[test] +fn python_defusedxml_sanitizes() { + assert_clean("python", "safe_xxe.py"); +} + +/// Phase 07 acceptance: `lxml.etree.parse` is XXE-safe by default in +/// modern lxml (external entity resolution requires explicit +/// `XMLParser(resolve_entities=True)`). No `taint-xxe` finding. +#[test] +fn python_lxml_default_clean() { + assert_clean("python", "safe_lxml.py"); +} + +#[test] +fn python_irrelevant_xml_call_clean() { + assert_clean("python", "irrelevant_xml_call.py"); +} + +#[test] +fn php_simplexml_load_string_with_noent_fires() { + assert_unsafe("php", "unsafe_xxe.php"); +} + +#[test] +fn php_simplexml_load_string_default_options_clean() { + assert_clean("php", "safe_xxe.php"); +} + +#[test] +fn php_irrelevant_xml_call_clean() { + assert_clean("php", "irrelevant_xml_call.php"); +} + +#[test] +fn javascript_xml2js_with_process_entities_fires() { + assert_unsafe("javascript", "unsafe_xxe.js"); +} + +#[test] +fn javascript_xml2js_default_options_clean() { + assert_clean("javascript", "safe_xxe.js"); +} + +#[test] +fn javascript_fast_xml_parser_with_process_entities_fires() { + assert_unsafe("javascript", "unsafe_fast_xml_parser.js"); +} + +#[test] +fn javascript_irrelevant_xml_call_clean() { + assert_clean("javascript", "irrelevant_xml_call.js"); +} + +#[test] +fn typescript_xml2js_with_process_entities_fires() { + assert_unsafe("typescript", "unsafe_xxe.ts"); +} + +#[test] +fn typescript_xml2js_default_options_clean() { + assert_clean("typescript", "safe_xxe.ts"); +} + +#[test] +fn typescript_fast_xml_parser_with_process_entities_fires() { + assert_unsafe("typescript", "unsafe_fast_xml_parser.ts"); +} + +#[test] +fn typescript_irrelevant_xml_call_clean() { + assert_clean("typescript", "irrelevant_xml_call.ts"); +} + +#[test] +fn ruby_rexml_document_with_tainted_xml_fires() { + assert_unsafe("ruby", "unsafe_xxe.rb"); +} + +#[test] +fn ruby_nokogiri_xml_with_noent_fires() { + assert_unsafe("ruby", "unsafe_xxe_nokogiri.rb"); +} + +#[test] +fn ruby_nokogiri_xml_default_options_clean() { + assert_clean("ruby", "safe_xxe_nokogiri.rb"); +} + +#[test] +fn ruby_irrelevant_xml_call_clean() { + assert_clean("ruby", "irrelevant_xml_call.rb"); +}