new capacity bits (#67)

This commit is contained in:
Eli Peter 2026-05-07 01:29:31 -04:00 committed by GitHub
parent afaffc0df6
commit 7d0e7320e2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
261 changed files with 10591 additions and 231 deletions

2
.gitignore vendored
View file

@ -14,3 +14,5 @@
.pitboss
.node_modules-target
node_modules
__pycache__/
*.pyc

View file

@ -6,6 +6,15 @@ All notable changes to Nyx are documented here. The format is based on [Keep a C
A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go DAO helper precision pass, four CVE corpus pairs, a local web UI visual refresh, and a performance pass on the auth extractor pipeline plus SCCP and the global summaries hash map.
This branch also adds seven new vulnerability classes (LDAP injection, XPath injection, header / CRLF injection, open redirect, server-side template injection, XXE, prototype pollution), a `nyx rules` CLI subcommand, two SSA configuration sidecars (XML parser hardening, XPath variable resolver), two new path-state predicates for inline open-redirect sanitisers, and a flow-sensitive `Object.create(null)` recogniser for prototype-pollution suppression.
### Detector classes
- New `Cap` bits and canonical rule ids: `Cap::LDAP_INJECTION` / `taint-ldap-injection`, `Cap::XPATH_INJECTION` / `taint-xpath-injection`, `Cap::HEADER_INJECTION` / `taint-header-injection`, `Cap::OPEN_REDIRECT` / `taint-open-redirect`, `Cap::SSTI` / `taint-template-injection`, `Cap::XXE` / `taint-xxe`, `Cap::PROTOTYPE_POLLUTION` / `taint-prototype-pollution`. Each ships with per-language sink, sanitizer, and (where applicable) gated-sink rules across JS/TS, Python, Java, PHP, Go, Ruby, Rust, and C/C++. Severity, OWASP 2021 mapping, and human-readable description live in a single `CAP_RULE_REGISTRY` table in `src/labels/mod.rs`; `cap_rule_meta()` and `rule_id_for_caps()` are the public lookups.
- `Cap` widened from `u16` to `u32` to fit the new bits. `Evidence.sink_caps` is now `u32`; `RuleInfo.cap_bits` is also `u32`. The serde decoder accepts any unsigned integer width so caches written before the bump still load. SQLite schema bumped 3 to 4 to force a rescan, since older `source_caps` / `sanitizer_caps` / `sink_caps` blobs were emitted before any of the new bits could appear.
- `owasp_bucket_for` consults `CAP_RULE_REGISTRY` first so adding a new cap class does not require a second-table edit. The match requires an exact rule id or a recognised separator (` `, `(`, `.`) so a future `taint-ssrf-allowlist-violation` can no longer silently inherit `taint-ssrf`'s OWASP bucket. The legacy family-token table now also routes `xpath`, `header`, and `xxe` to A03 / A05.
- `issue_category_label` (dashboard badge) routes the seven new rule-id prefixes to dedicated labels: LDAP Injection, XPath Injection, Header Injection, Open Redirect, Template Injection, XXE, Prototype Pollution.
### Changed
- Refreshed the local web UI visual system around the mint-cyan Nyx brand: warmer light surfaces, deep green accents, updated severity/confidence colors, tighter typography, smaller radii, denser cards, table, badge, button, header, and sidebar styling, and matched graph/code-viewer colors.
@ -16,6 +25,32 @@ A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go
### Added
- `nyx rules list` CLI subcommand. Surfaces the same registry the dashboard's `/api/rules` page reads from: built-in cap-class entries (one per `Cap` with a canonical rule id), per-language label rules (sink / source / sanitizer), gated sinks, and any custom rules from config. Filters: `--lang <slug>`, `--kind <class|source|sink|sanitizer>`, `--class-only` for registry entries only, `--no-class` for per-language rules only. `--json` for machine output. Cap-class entries carry `language = "all"` so a language filter still surfaces them unless `--no-class` is set.
- `RuleInfo.is_class` / `RuleInfo.emission_active` flags. Cap-class entries carry `is_class = true` so dashboards can group them separately from per-language label rules. `emission_active = false` marks legacy classes (SQL_QUERY, SSRF, FILE_IO, FMT_STRING, DESERIALIZE, CODE_EXEC, CRYPTO) whose findings still surface under the catch-all `taint-unsanitised-flow` rule id; the seven new classes plus `unauthorized_id` and `data_exfil` are `emission_active = true`. The active set is pinned in `cap_rule_registry_emission_active_set_is_pinned` so a future migration of a legacy cap to its specific rule id can't drift silently.
- XML-parser configuration tracking. New `src/ssa/xml_config.rs` runs alongside type-fact analysis and carries per-receiver `secure_processing` / `disallow_doctype` / `external_entities` flags forward through copy assignments and phi joins (meet for safe flags, sticky union for the unsafe `external_entities` polarity). `xxe_safe()` queries the result at the type-qualified `XmlParser.parse` sink and strips `Cap::XXE` when the parser was provably hardened (JAXP `setFeature(FEATURE_SECURE_PROCESSING, true)`, lxml `XMLParser(resolve_entities=False, no_network=True)`, fast-xml-parser `processEntities: false`). Persisted to `OptimizeResult.xml_parser_config`.
- XPath-receiver configuration tracking. New `src/ssa/xpath_config.rs` mirrors the XML sidecar for Java's `XPath` instances: `setXPathVariableResolver(...)` flips the receiver's `has_resolver` flag, copy assignments union, phi joins meet. `xpath_safe()` strips `Cap::XPATH_INJECTION` at `xpath.evaluate(expr, ...)` / `xpath.compile(expr)` sinks when the receiver was provably bound to a resolver (parameterised XPath shape). Persisted to `OptimizeResult.xpath_config`.
- Five new `TypeKind` variants: `LdapClient` (JNDI `InitialDirContext` / `InitialLdapContext`, Spring `LdapTemplate`, ldapjs `createClient`, python-ldap `initialize`, ldap3 `Connection`), `XPathClient` (JAXP `newXPath`, lxml `etree.XPath`, npm `xpath`), `XmlParser` (JAXP factory products: `newDocumentBuilder`, `newSAXParser`, `getXMLReader`), `Template` (Apache FreeMarker `new Template(...)` / `Configuration.getTemplate`), and `NullPrototypeObject` for JS/TS values produced by `Object.create(null)`. Each is wired into `constructor_type` for return-type inference and into `TypeKind::label_prefix()` for type-qualified callee resolution. `XPathClient` is kept distinct from `DatabaseConnection` so a generic `pdo->query` SQL_QUERY sink does not collide with `xpath.query`.
- `GateActivation::LiteralOnly`. Strict literal-value activation: the gate fires only when the activation argument is a literal that matches `dangerous_values` / `dangerous_prefixes`. Unknown or dynamic activation argument suppresses (no conservative `ALL_ARGS_PAYLOAD` push). Used for ambiguously named matchers where the dangerous shape is identifiable only by an explicit literal flag, e.g. bare `extend` where `jQuery.extend(true, target, src)` is the deep-merge prototype-pollution form but Backbone's `Model.extend({proto})` shares the suffix.
- Two new `PredicateKind` variants in `src/taint/path_state.rs` for inline open-redirect sanitisers. `RelativeUrlValidated` covers `x.startsWith("/")`, `x.starts_with("/")`, `x.startswith("/")`, PHP `strpos($x, "/") === 0`, and direct `x[0] === "/"`. `HostAllowlistValidated` covers `new URL(x).host === ALLOWED`, `urlparse(x).netloc == ALLOWED`, multi-statement `parsed.host_str() == "..."` for Rust, and `parsed.Host == "..."` / `parsed.Hostname() == "..."` for Go. Both are cap-aware: they clear `Cap::OPEN_REDIRECT` only on the validated branch, leaving any non-redirect taint downstream to fire on its own caps. The Go form gates on case-sensitive capital `H` so a lowercase `u.host == X` field comparison falls through to the generic `Comparison` predicate.
- `Object.create(null)` recogniser. New `is_object_create_null_call` in `cfg/literals.rs` matches `Object.create(null)` (and parenthesised, awaited, or TS type-cast wrappers) and tags `CallMeta.produces_null_proto = true` for JS/TS calls. Type-fact analysis lifts the flag to `TypeKind::NullPrototypeObject` on the returned SSA value so the synthetic `__index_set__` sink is suppressed flow-sensitively. Phi joins drop the tag back to `Unknown` so a partial null-proto receiver still fires on the unsafe path.
- CFG-layer prototype-pollution suppression on the synthetic `__index_set__` sink (JS/TS only, recognised by the existing `try_lower_subscript_write` lowering). Three flow-insensitive shapes elide the `Sink(PROTOTYPE_POLLUTION)` label before SSA sees the node: constant-key fold (literal key not in `__proto__` / `constructor` / `prototype`); reject pattern (an enclosing-block sibling `if (idx === "__proto__" || ...) return / throw / break;`); allowlist pattern (an ancestor `if (idx === "name" || idx === "id") { obj[idx] = v }`). Walks stop at the enclosing function so closure-captured guards in an outer scope can't silently authorise inner assignments.
- Spring MVC `return "redirect:" + tainted` open-redirect recogniser (Java only). New `try_lower_spring_redirect_return` in `cfg/mod.rs` matches the leftmost `+`-chain whose root is a `redirect:` string literal and emits a synthetic `__spring_redirect__` Call sink with `Sink(Cap::OPEN_REDIRECT)` between the predecessors and the Return node. Concatenated identifiers from anywhere in the right-hand chain feed the synthetic node's `arg_uses[0]`, so the existing taint pipeline carries any tainted suffix through OPEN_REDIRECT.
- Subscript-set form classification for header sinks. `response.headers["X-Foo"] = bar` / `headers["X-Foo"] = bar` (Ruby `element_reference`, JS/TS `subscript_expression`, Python `subscript`) had no `property` field on the LHS, so the existing classification path skipped it. `push_node` now walks into the subscript's `object` and classifies its member-expression text (`response.headers`, `res.headers`, `self.response.headers`), so `Cap::HEADER_INJECTION` fires on the bare bracket form alongside `setHeader` / `res.set` / `headers_mut.insert`.
- PHP literal extraction extended in `cfg/literals.rs`. `extract_const_string_arg` now folds: PHP `encapsed_string` (double-quoted) when every child is a pure-literal segment; boolean literals (`true` / `false`) so jQuery's `extend(true, target, src)` deep-merge marker activates the `LiteralOnly` gate; leading-string `binary_expression` concat (PHP `"Location: " . $url`, JS/TS `"Location: " + url`) so `dangerous_prefixes` matching activates on partially dynamic concatenations.
- PHP receiver-text strip for chain construction. `helpers::root_receiver_text` now drops the leading `$` from `variable_name` nodes so `$smarty->fetch(...)` / `$twig->createTemplate(...)` reconstruct as `Smarty.fetch` / `Environment.createTemplate` for suffix-matcher gates instead of carrying a `$smarty.fetch` form that fails the boundary rule.
- Gate-callee resolution hardening for member-source rewrites. When `first_member_label` rewrites a call's `text` to a Source like `req.body` (because the wrapper carries a member-source argument), the gate matcher now reads the call's `function` / `method` / `name` field instead, so `setValue(target, req.body, ...)` matches the `setValue` proto-pollution gate instead of the rewritten `req.body` text. Whitespace stripped from the function field so multi-line chains still match flat gate matchers.
- Ruby option-constant lookup in gate activation. Bare `scope_resolution` / `constant` nodes (`Nokogiri::XML::ParseOptions::NOENT`) now fall back to the macro-arg extractor used by C/C++/PHP, so Nokogiri XXE gates activate on idiomatic option-flag arguments rather than firing conservatively on every positional arg.
- Per-language label rules expanded to cover the seven new caps:
- JavaScript / TypeScript: ldapjs `LdapClient.search`, `escapeXpath` / `xpathEscape`, `document.evaluate` / npm `xpath.select`, `setHeader` / `res.set` / `res.append` / `res.headers[]=`, `stripCRLF` / `escapeHeader`, lodash / dot-prop / object-path deep-merge prototype-pollution gates, Handlebars / EJS / Mustache template sinks, fast-xml-parser / xml2js with `processEntities`-aware activation, `redirect` / `Location` open-redirect sinks.
- Python: python-ldap `LDAPObject.search_s`, ldap3 `Connection.search`, lxml `etree.XPath` / `lxml.etree.parse` with parser-config awareness, Flask `response.headers[]=` / `make_response`, Jinja2 `Template(...)` and Mako `Template(...)` SSTI sinks, `flask.redirect` / `aiohttp HTTPFound` open-redirect.
- Java / Kotlin: `DirContext.search`, `XPath.evaluate` / `XPath.compile`, JAXP `DocumentBuilder.parse` / `SAXParser.parse` / `XMLReader.parse`, FreeMarker `Template.process`, Spring `redirect:` view-name synthetic sink, `HttpServletResponse.setHeader` / `addHeader`.
- PHP: `ldap_search` / `ldap_list` / `ldap_read`, `DOMXPath::query` / `DOMXPath::evaluate`, `header()` with leading-prefix activation, Smarty `fetch` / Twig `createTemplate` / Blade compile + `eval` template forms, `loadXML` / `simplexml_load_string` with `LIBXML_NOENT` activation.
- Go: `go-ldap conn.Search`, `etree.Path` / `xmlpath.Compile`, `http.Header.Set` / `Response.Header().Set`, `html/template` and `text/template` `Parse(...)`, `encoding/xml.Unmarshal` / `Decoder.Decode`, `http.Redirect` with relative-URL / host-allowlist gating.
- Ruby: `Net::LDAP#search`, `Nokogiri::XML::Document#xpath`, `response.headers[]=`, `ERB.new` SSTI, `Nokogiri::XML.parse` with `NOENT` / `DTDLOAD` activation, `redirect_to` with relative-URL gate.
- C / C++: libldap `ldap_search_ext_s`, libxml2 `xmlXPathEval`, `curl_easy_setopt` with header-list activation, libxml2 `xmlReadFile` / `xmlReadMemory` with `XML_PARSE_NOENT` activation.
- Rust: actix-web `HeaderMap.insert` / `HeaderValue::from_str` header-injection gates. `Redirect::to` retagged from `Cap::SSRF` to `Cap::OPEN_REDIRECT` so the open-redirect rule fires distinctly from the SSRF rule.
- `NYX_PYTHON_PROTO_POLLUTION` env var flag. Python `dict.update` / `__dict__.update` proto-pollution gates are opt-in: bare `update` overlaps too broadly with `Counter.update` and ordinary state-mutation patterns to ship as a default sink. When the var is set to `1` / `true` / `yes` / `on` the merged slice is leaked into a `'static` reference so the registry's lifetime invariant holds.
- New per-cap integration suites: `tests/{xpath_injection,xxe,ssti,prototype_pollution,header_injection,open_redirect,ldap_injection}_tests.rs`, plus `python_proto_pollution_tests.rs` for the env-gated Python form. Per-cap fixture trees under `tests/fixtures/<class>/<lang>/` cover safe, unsafe, and irrelevant-baseline shapes for every supported language.
- FastAPI cross-file `include_router` dependency tracking. New `auth_analysis/router_facts.rs` captures per-file router declarations (`<router> = X(deps=[…])`) and `<parent>.include_router(<child_module>.<child_var>)` edges in pass 1, persists them into `GlobalSummaries::router_facts_by_module`, and resolves them into the active file's `AuthorizationModel::cross_file_router_deps` at pass 2 entry. Transitive lifts (`grandparent → parent → child`) handled by iterative index walk. Module identity is the file basename without `.py` (approximate, but sufficient for airflow-style `task_instances.router` naming). Closes the airflow execution-API shape where a child router lives in `routes/task_instances.py` and its auth is declared on the parent in `routes/__init__.py`.
- FastAPI router-level `dependencies=[...]` propagation. Module-level `router = APIRouter(dependencies=[Security(...)])` declarations are pre-walked once per file, then merged onto every `@<router>.<verb>(...)` route attached in the same file. Closes airflow's execution-API routes that re-use a single `ti_id_router` declared once at module scope.
- FastAPI `Security(callable, scopes=[...])` recognised distinctly from `Depends(callable)`. Scoped Security promotes the synthetic `AuthCheck` to `AuthCheckKind::Other` (route-level scope-checked authorization), not just Login. New scope-tracking boolean threaded through `expand_decorator_calls` and `extract_fastapi_dependencies`.
@ -55,6 +90,15 @@ A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go
### Fixed (false positives)
- `Object.create(null)` receivers no longer fire prototype-pollution at the synthetic `__index_set__` sink. Suppression is flow-sensitive via `TypeKind::NullPrototypeObject` so a phi join that only sometimes resolves to a null-proto receiver still fires on the unsafe path.
- `cfg-unguarded-sink` over-fires on JS/TS object-literal property writes guarded by an explicit `__proto__` / `constructor` / `prototype` reject `if` (early `return` / `throw` / `break`) or by an allowlist `if` whose true arm contains the assignment. Resolved at the CFG layer before the SSA sink scan.
- Spring MVC `return "redirect:" + url` flagged generic `taint-unsanitised-flow` even when the redirect destination was the load-bearing taint. Now routed through the synthetic `__spring_redirect__` sink so the finding emerges as `taint-open-redirect`.
- `$smarty->fetch(...)` / `$twig->createTemplate(...)` no longer drop their SSTI gate match on idiomatic PHP receiver shapes. Receiver text strip in `helpers::root_receiver_text` rebuilds the chain text with `.` separators.
- `setValue(target, req.body, ...)` and similar wrappers no longer gate-match on the rewritten Source `req.body` text. Gate matcher now reads the call's `function` / `method` / `name` field when a Source label override has clobbered the call text.
- Nokogiri / lxml / fast-xml-parser parser bodies hardened with `setFeature` / `processEntities: false` / `XMLParser(resolve_entities=False)` no longer fire `taint-xxe`. Suppression runs through the new `xml_parser_config` sidecar.
- `XPath` instances bound to `setXPathVariableResolver(...)` no longer fire `taint-xpath-injection` on subsequent `xpath.evaluate(expr, ...)` sinks. Suppression runs through the new `xpath_config` sidecar.
- Inline `if (!url.startsWith("/")) reject` and `if (new URL(url).host !== ALLOWED) reject` open-redirect sanitisers now narrow the `Cap::OPEN_REDIRECT` bit on the validated branch instead of falling through to the generic `Comparison` predicate. Cap-aware: other taint downstream still fires on its own caps.
- Rust `Redirect::to` no longer fires `taint-ssrf` for what is structurally an open redirect. Retagged to `Cap::OPEN_REDIRECT` so the report classifies the issue under the correct cap.
- ~957 gitea backend DAO `go.auth.missing_ownership_check` findings (id-scalar precision pass, see Added).
- 169 of 216 openmrs `cfg-unguarded-sink` findings (JpaCriteriaQuery type, see Added). Equivalent reductions on xwiki / keycloak Hibernate DAO clusters.
- joomla and drupal `php.deser.unserialize` flagged inside `Serializable::unserialize($input)` magic-method bodies (passthrough recognition, see Added).
@ -74,6 +118,8 @@ A round of cross-file FastAPI auth, two new sink/validator classes, a ~957-FP Go
- New `cfg/cfg_tests.rs` covers ternary-branch CFG lowering shapes.
- New `summary/tests.rs` covers cross-file `include_router` summary persistence and resolution.
- Refactor passes across `auth_analysis`, `ssa/const_prop`, `ssa/type_facts`, `summary`, and the per-framework auth extractors (cleaner conditional checks, simpler function signatures, deduplicated assertions). No behaviour change.
- `parse_cap` and `CapName::FromStr` accept the new short names (`ldap_injection` / `ldapi`, `xpath_injection` / `xpathi`, `header_injection` / `crlf` / `response_splitting`, `open_redirect` / `redirect`, `ssti` / `template_injection`, `xxe`, `prototype_pollution` / `proto_pollution`, plus the existing `data_exfil` alias). The `nyx config add-rule --cap` flag and `[analysis.languages.*.rules]` entries take any of these.
- Frontend `RuleListItem` carries the new `is_class` flag so the dashboard's Rules page can group cap-class entries separately. `RuleDetailView` adds the same field.
## [0.6.1] - 2026-05-03

View file

@ -186,7 +186,7 @@ Two passes over the filesystem, with an optional SQLite index to skip unchanged
3. **Pass 2**: re-analyze each file with cross-file context under bounded context sensitivity (k=1 inlining for intra-file callees, SCC fixpoint capped at 64 iterations, and summary fallback for callees above the inline body-size cap). A forward dataflow worklist propagates taint through the SSA lattice with guaranteed convergence. Call-graph SCCs iterate to fixed-point (within the cap) so mutually recursive functions get accurate summaries.
4. **Rank, dedupe, emit**: findings are scored by severity × evidence strength × source-kind exploitability, then emitted to console, JSON, or SARIF.
Detector families: taint (cross-file source→sink), CFG structural (auth gaps, unguarded sinks, resource leaks), state model (use-after-close, double-close, must-leak, unauthed-access), AST patterns (tree-sitter structural match). Full detector docs: [Detectors](https://elicpeter.github.io/nyx/detectors.html).
Detector families: taint (cross-file source→sink, with cap-specific rule classes for SQLi, XSS, command/code exec, deserialization, SSRF, path traversal, format string, crypto, LDAP injection, XPath injection, HTTP header / response splitting, open redirect, server-side template injection, XXE, prototype pollution, data exfiltration, and the auth fold-in), CFG structural (auth gaps, unguarded sinks, resource leaks), state model (use-after-close, double-close, must-leak, unauthed-access), AST patterns (tree-sitter structural match). Full detector docs: [Detectors](https://elicpeter.github.io/nyx/detectors.html).
---
@ -211,7 +211,7 @@ kind = "sanitizer"
cap = "html_escape"
```
Or add rules interactively: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`. Caps: `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `data_exfil`, `code_exec`, `crypto`, `unauthorized_id`, `all`. Full schema: [Configuration](https://elicpeter.github.io/nyx/configuration.html).
Or add rules interactively: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`. Caps: `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `data_exfil`, `code_exec`, `crypto`, `unauthorized_id`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all`. Full schema: [Configuration](https://elicpeter.github.io/nyx/configuration.html). Run `nyx rules list` to browse the registry from the terminal.
---

View file

@ -275,7 +275,7 @@ Add a custom taint rule. Written to `nyx.local`.
| `--lang` | `rust`, `javascript`, `typescript`, `python`, `go`, `java`, `c`, `cpp`, `php`, `ruby` |
| `--matcher` | Function or property name to match |
| `--kind` | `source`, `sanitizer`, `sink` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all` |
| `--cap` | `env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `data_exfil`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all` |
### `nyx config add-terminator`
@ -287,6 +287,41 @@ Add a terminator function (e.g. `process.exit`). Written to `nyx.local`.
---
## `nyx rules`
Browse the built-in rule registry from the terminal. Same dataset the dashboard's Rules page reads from: cap-class entries (one per `Cap` with a canonical rule id), per-language label rules (sink / source / sanitizer), gated sinks, and any custom rules from your config.
### `nyx rules list`
```
nyx rules list [--lang <SLUG>] [--kind <KIND>] [--class-only|--no-class] [--json]
```
| Flag | Values |
|------|--------|
| `--lang` | Language slug (`javascript`, `typescript`, `python`, `java`, `php`, `go`, `ruby`, `rust`, `c`, `cpp`). Cap-class entries (`language = "all"`) still surface alongside any language filter unless `--no-class` is set. |
| `--kind` | `class` (cap-class entry), `source`, `sink`, `sanitizer` |
| `--class-only` | Show only the cap-class registry entries, suppressing per-language label rules and gated sinks. |
| `--no-class` | Suppress cap-class registry entries, show only per-language label rules and gated sinks. Conflicts with `--class-only`. |
| `--json` | Emit JSON instead of the human-readable table. Schema matches the `/api/rules` response. |
Examples:
```bash
# Browse the seven new vulnerability classes
nyx rules list --class-only
# All Java sinks
nyx rules list --lang java --kind sink
# JSON output for scripted filtering
nyx rules list --json | jq '.[] | select(.cap == "ldap_injection")'
```
The `enabled` column reflects the `analysis.disabled_rules` overlay from your config, so a rule disabled in `nyx.local` shows up here too. Custom rules added via `nyx config add-rule` appear at the end with `is_custom: true`.
---
## Exit codes
See [output.md](output.md#exit-codes). Summary: `0` on success (including findings without `--fail-on`), `1` when `--fail-on` trips, non-zero on scan errors.

View file

@ -253,9 +253,14 @@ cap = "html_escape" # "env_var" | "html_escape" | "shell_escape" |
# "url_encode" | "json_parse" | "file_io" |
# "fmt_string" | "sql_query" | "deserialize" |
# "ssrf" | "data_exfil" | "code_exec" | "crypto" |
# "unauthorized_id" | "all"
# "unauthorized_id" | "ldap_injection" |
# "xpath_injection" | "header_injection" |
# "open_redirect" | "ssti" | "xxe" |
# "prototype_pollution" | "all"
```
Aliases accepted by `parse_cap` and `[..rules].cap`: `data_exfiltration` for `data_exfil`, `ldapi` for `ldap_injection`, `xpathi` for `xpath_injection`, `crlf` and `response_splitting` for `header_injection`, `redirect` for `open_redirect`, `template_injection` for `ssti`, `proto_pollution` for `prototype_pollution`.
---
## Example Configurations

View file

@ -13,11 +13,20 @@ The taint family is split into cap-specific rule classes when a sink callee carr
| Rule id | Cap | Surface |
|---|---|---|
| `taint-unsanitised-flow` | every cap except `data_exfil` and `unauthorized_id` | Default taint flow class |
| `taint-unsanitised-flow` | `sql_query`, `ssrf`, `code_exec`, `file_io`, `fmt_string`, `deserialize`, `crypto` | Catch-all class for the legacy caps that have not migrated to a dedicated rule id yet. |
| `taint-ldap-injection` | `ldap_injection` | Attacker-controlled data concatenated into an LDAP filter or DN without RFC 4515 escaping. Receivers typed as `LdapClient` (JNDI `DirContext`, Spring `LdapTemplate`, ldapjs `Client`, python-ldap `LDAPObject`, ldap3 `Connection`) and chained `.search` / `.searchByEntity` / `.search_s` form the sink set. |
| `taint-xpath-injection` | `xpath_injection` | Attacker-controlled string passed as the XPath expression to `xpath.evaluate` / `xpath.compile` / `document.evaluate` / `DOMXPath::query` / `etree.XPath`. Suppressed when the receiver was bound to an `XPathVariableResolver` (parameterised XPath shape). |
| `taint-header-injection` | `header_injection` | Attacker-controlled bytes landing in an HTTP response header without `\r\n` stripping (response splitting, cache poisoning). Covers `setHeader` / `res.set` / `res.append` / `headers["X-Foo"] = bar` / `Header().Set` / `add_header` / `setcookie` / `http.Header.Set`. |
| `taint-open-redirect` | `open_redirect` | Attacker-controlled URL driving a redirect / `Location` header without an allowlist or relative-URL check. Includes the Spring MVC `return "redirect:" + url` view-name shape via the `__spring_redirect__` synthetic sink. Suppressed by `RelativeUrlValidated` (`startsWith("/")` family) and `HostAllowlistValidated` (`new URL(x).host === ALLOWED`, `urlparse(x).netloc == ...`) inline predicates. |
| `taint-template-injection` | `ssti` | Attacker controls the *template source string* fed to a server-side renderer (Jinja2 / Mako / FreeMarker / Twig / Handlebars / EJS / Mustache / ERB / `text/template` / `html/template` / Smarty / Blade `Template(...)` / `compile(...)`), distinct from rendering a trusted template with tainted variables. |
| `taint-xxe` | `xxe` | Attacker-controlled XML reaching a parser that resolves external entities. Covers JAXP `DocumentBuilder.parse` / `SAXParser.parse` / `XMLReader.parse`, lxml `etree.parse`, Nokogiri, fast-xml-parser, xml2js, libxml2 `xmlReadFile` / `xmlReadMemory`. Suppressed when the receiver carries a hardening fact in `xml_parser_config` (`secure_processing`, `disallow_doctype`, `processEntities: false`, `LIBXML_NOENT` not set). |
| `taint-prototype-pollution` | `prototype_pollution` | Attacker-controlled key reaching an object property assignment that can mutate `Object.prototype`. JS/TS only. Covers `obj[tainted] = v` (synthetic `__index_set__` sink), library-mediated deep-merge / set helpers (`_.merge`, `_.set`, `dotProp.set`, `objectPath.set`, `setValue`), and jQuery's `extend(true, target, src)` deep-merge form via the `LiteralOnly` activation gate. Suppressed by constant-key fold (`__proto__` / `constructor` / `prototype` filtering), reject / allowlist guards on the key, and `Object.create(null)` receivers (flow-sensitive `NullPrototypeObject` type). Python equivalent (`dict.update`) is opt-in via `NYX_PYTHON_PROTO_POLLUTION=1`. |
| `taint-data-exfiltration` | `data_exfil` | Sensitive data flowing into the payload of an outbound network request (body / headers / json on `fetch`, body on `XMLHttpRequest.send`). Distinct from SSRF: the destination is fixed but attacker-influenced bytes leave the process. |
| `rs.auth.missing_ownership_check.taint` | `unauthorized_id` | Rust auth subsystem fold-in; see [auth.md](auth.md). |
A single call site can fire several of these at once when it carries multiple gates — `fetch(taintedUrl, {body: tainted})` produces both an SSRF finding (URL flow) and a `taint-data-exfiltration` finding (body flow), each with its own cap mask rather than a conflated union.
A single call site can fire several of these at once when it carries multiple gates. `fetch(taintedUrl, {body: tainted})` produces both an SSRF finding (URL flow) and a `taint-data-exfiltration` finding (body flow), each with its own cap mask rather than a conflated union.
Each cap-class entry is registered in `CAP_RULE_REGISTRY` (`src/labels/mod.rs`) with its title, severity, OWASP 2021 code, and description. Browse the registry from the CLI with `nyx rules list --class-only`, or `nyx rules list --kind class --json` for machine output.
For Rust auth-specific rules (`rs.auth.*`), see [auth.md](auth.md).

View file

@ -135,10 +135,17 @@ Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer onl
| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
| `ssrf` | | URL-prefix locks | `requests.get`, `fetch` URL arg, outbound HTTP destination |
| `data_exfil` | cookies, headers, env, db rows, file reads (Sensitive-tier sources only) | | `fetch` body / headers / json, `XMLHttpRequest.send` body |
| `code_exec` | | | `eval`, `exec`, `Function` |
| `crypto` | | | weak-algorithm constructors |
| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
| `ldap_injection` | | `ldap-escape` filter / dn helpers, project-local `escapeLdapFilter` | `DirContext.search`, `LdapClient.search`, `ldap_search`, `Net::LDAP#search`, `ldap_search_ext_s` |
| `xpath_injection` | | bound `XPathVariableResolver`, `escapeXpath` / `xpathEscape` helpers | `XPath.evaluate`, `DOMXPath::query`, `document.evaluate`, `xpath.select`, `etree.XPath` |
| `header_injection` | | `stripCRLF` / `escapeHeader` / `sanitizeHeader` | `setHeader`, `res.set`, `res.append`, `headers["X-Foo"] = bar`, `Header().Set`, `header()`, `setcookie` |
| `open_redirect` | | leading-slash check (`startsWith("/")`), URL-parse + host allowlist (`new URL(x).host === ALLOWED`) | `Redirect::to`, Spring `redirect:` view name, `flask.redirect`, `http.Redirect`, `redirect_to` |
| `ssti` | | | template constructors fed by tainted source: `Jinja2 Template(...)`, `freemarker.Template`, `Twig::createTemplate`, Handlebars `compile`, `ERB.new`, Mako `Template(...)` |
| `xxe` | | hardened parser config (`secure_processing`, `disallow-doctype-decl`, `processEntities: false`, `LIBXML_NOENT` not set) | `DocumentBuilder.parse`, `SAXParser.parse`, `xml2js`, `fast-xml-parser`, `lxml.etree.parse`, `xmlReadFile` |
| `prototype_pollution` | | constant-key fold, reject / allowlist guards on the key, `Object.create(null)` receivers | `obj[tainted] = v` synthetic `__index_set__`, `_.merge`, `_.set`, `dotProp.set`, `objectPath.set`, jQuery `extend(true, ...)` |
| `data_exfil` | cookies, headers, env, db rows, file reads (Sensitive-tier sources only) | | `fetch` body / headers / json, `XMLHttpRequest.send` body |
| `all` | Sources typically use `all` so they match any sink | | |
Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.

View file

@ -24,13 +24,22 @@ Language prefixes: `rs`, `c`, `cpp`, `go`, `java`, `js`, `ts`, `py`, `php`, `rb`
### Taint
One rule covers every source-to-sink flow. The parenthetical identifies the source location.
The taint family is split into cap-specific rule classes. The `taint-unsanitised-flow` id is the catch-all for the legacy caps that have not migrated to a dedicated rule id yet (`sql_query`, `ssrf`, `code_exec`, `file_io`, `fmt_string`, `deserialize`, `crypto`). The seven new vulnerability classes plus auth and data-exfil emerge under their own rule id. The parenthetical identifies the source location.
| Rule ID | Severity |
|---|---|
| `taint-unsanitised-flow (source L:C)` | Varies by source kind and sink capability |
| Rule ID | Cap | Severity |
|---|---|---|
| `taint-unsanitised-flow (source L:C)` | `sql_query` / `ssrf` / `code_exec` / `file_io` / `fmt_string` / `deserialize` / `crypto` | Varies |
| `taint-ldap-injection` | `ldap_injection` | High |
| `taint-xpath-injection` | `xpath_injection` | High |
| `taint-header-injection` | `header_injection` | High |
| `taint-open-redirect` | `open_redirect` | Medium |
| `taint-template-injection` | `ssti` | High |
| `taint-xxe` | `xxe` | High |
| `taint-prototype-pollution` | `prototype_pollution` | High |
| `taint-data-exfiltration` | `data_exfil` | High / Medium |
| `rs.auth.missing_ownership_check.taint` | `unauthorized_id` | High |
The matcher sets (sources, sanitizers, sinks, gated sinks) live per-language in `src/labels/<lang>.rs`. [Language maturity](language-maturity.md) gives per-language counts and what's covered.
Each cap-class entry is registered in `CAP_RULE_REGISTRY` (`src/labels/mod.rs`). Browse the registry from the CLI with `nyx rules list --class-only`, or via the dashboard's Rules page. The matcher sets (sources, sanitizers, sinks, gated sinks) live per-language in `src/labels/<lang>.rs`. [Language maturity](language-maturity.md) gives per-language counts and what's covered.
### CFG structural
@ -257,6 +266,8 @@ The tables below are generated from `src/patterns/<lang>.rs` by [`tools/docgen`]
`nyx config add-rule --cap <name>` and `[analysis.languages.*.rules]` in config accept:
`env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `all`
`env_var`, `html_escape`, `shell_escape`, `url_encode`, `json_parse`, `file_io`, `fmt_string`, `sql_query`, `deserialize`, `ssrf`, `code_exec`, `crypto`, `unauthorized_id`, `data_exfil`, `ldap_injection`, `xpath_injection`, `header_injection`, `open_redirect`, `ssti`, `xxe`, `prototype_pollution`, `all`
Source for both the enum and the `to_cap` mapping: [`src/labels/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/labels/mod.rs) (`Cap`) and [`src/utils/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/utils/config.rs) (`CapName`).
Aliases: `data_exfiltration` for `data_exfil`, `ldapi` for `ldap_injection`, `xpathi` for `xpath_injection`, `crlf` and `response_splitting` for `header_injection`, `redirect` for `open_redirect`, `template_injection` for `ssti`, `proto_pollution` for `prototype_pollution`.
Source for both the enum and the `to_cap` mapping: [`src/labels/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/labels/mod.rs) (`Cap` and `CAP_RULE_REGISTRY`) and [`src/utils/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/utils/config.rs) (`CapName`).

View file

@ -355,6 +355,7 @@ export interface RuleListItem {
enabled: boolean;
is_custom: boolean;
is_gated: boolean;
is_class: boolean;
case_sensitive: boolean;
finding_count: number;
suppression_rate: number;

View file

@ -377,7 +377,7 @@ fn build_taint_diag(
// Resolved sink capability bits, used by deduplication to distinguish
// sinks with different cap types on the same source line (e.g.
// `sink_sql(x); sink_shell(x);`).
let sink_caps_bits: u16 = cfg_graph[finding.sink]
let sink_caps_bits: u32 = cfg_graph[finding.sink]
.taint
.labels
.iter()
@ -385,7 +385,7 @@ fn build_taint_diag(
crate::labels::DataLabel::Sink(c) => Some(c.bits()),
_ => None,
})
.fold(0u16, |acc, b| acc | b);
.fold(0u32, |acc, b| acc | b);
// Cap-specific rule-id routing.
//
@ -508,6 +508,14 @@ fn build_taint_diag(
|| (finding.source_kind.sensitivity() >= crate::labels::Sensitivity::Sensitive
&& (flow_has_body_bind || source_is_credential_bearing)));
// Cap-specific rule routing. Auth-as-taint and data-exfil keep their
// pre-existing branches so the routing rules they encode (auth-finding
// namespace alignment; body-bind / source-sensitivity gate) stay
// exactly as before. New cap classes (LDAP / XPath / Header / Open
// redirect / SSTI / XXE / Prototype pollution) route through
// `cap_rule_meta()` so the canonical rule ids in the registry are the
// single source of truth. Legacy generic taint findings continue to
// emit `taint-unsanitised-flow`.
let diag_id = if effective_caps.contains(crate::labels::Cap::UNAUTHORIZED_ID) {
"rs.auth.missing_ownership_check.taint".to_string()
} else if is_data_exfil_rule {
@ -516,6 +524,25 @@ fn build_taint_diag(
source_point.row + 1,
source_point.column + 1
)
} else if let Some(meta) = [
crate::labels::Cap::LDAP_INJECTION,
crate::labels::Cap::XPATH_INJECTION,
crate::labels::Cap::HEADER_INJECTION,
crate::labels::Cap::OPEN_REDIRECT,
crate::labels::Cap::SSTI,
crate::labels::Cap::XXE,
crate::labels::Cap::PROTOTYPE_POLLUTION,
]
.iter()
.find(|c| effective_caps.contains(**c))
.and_then(|c| crate::labels::cap_rule_meta(*c))
{
format!(
"{} (source {}:{})",
meta.rule_id,
source_point.row + 1,
source_point.column + 1
)
} else {
format!(
"taint-unsanitised-flow (source {}:{})",
@ -576,6 +603,23 @@ fn build_taint_diag(
}
_ => crate::patterns::Severity::Medium,
}
} else if let Some(meta) = [
crate::labels::Cap::LDAP_INJECTION,
crate::labels::Cap::XPATH_INJECTION,
crate::labels::Cap::HEADER_INJECTION,
crate::labels::Cap::OPEN_REDIRECT,
crate::labels::Cap::SSTI,
crate::labels::Cap::XXE,
crate::labels::Cap::PROTOTYPE_POLLUTION,
]
.iter()
.find(|c| effective_caps.contains(**c))
.and_then(|c| crate::labels::cap_rule_meta(*c))
{
// New cap classes draw severity from the rule registry so a single
// edit to `CAP_RULE_REGISTRY` cascades through SARIF, the dashboard,
// and the integration suite without per-language source-kind nudges.
meta.severity
} else {
severity_for_source_kind(finding.source_kind)
};

View file

@ -206,8 +206,8 @@ pub fn run_auth_analysis_with_model(
// (when provided) for cross-file helpers that live in other files.
apply_helper_lifting(&mut model, lang, file_path, scan_root, global_summaries);
// Phase 1 caller-scope IPA: propagate route-handler-level auth
// checks DOWN to callee helper units within the same file. See
// Caller-scope IPA: propagate route-handler-level auth checks DOWN
// to callee helper units within the same file. See
// [`apply_caller_scope_propagation`] for the propagation rule.
apply_caller_scope_propagation(&mut model);
@ -547,8 +547,8 @@ fn apply_helper_lifting(
}
}
/// Phase 1 caller-scope IPA: propagate route-handler-level auth checks
/// DOWN to callee helper units within the same file.
/// Caller-scope IPA: propagate route-handler-level auth checks DOWN to
/// callee helper units within the same file.
///
/// `apply_helper_lifting` walks UPWARD: a helper that internally
/// proves ownership / membership / etc. has its summary lifted onto

View file

@ -1190,6 +1190,7 @@ fn clone_preserves_all_sub_structs() {
destination_uses: None,
gate_filters: Vec::new(),
is_constructor: false,
produces_null_proto: false,
},
taint: TaintMeta {
labels: {
@ -1841,9 +1842,12 @@ def outer(cmd):
assert_eq!(kwargs[1].0, "check");
}
/// Languages without keyword-argument grammar should leave `kwargs` empty.
/// JS object-literal positional args lift their `pair` children into
/// `kwargs` so consumers like xml_config's `processEntities` /
/// `resolve_entities` opt-in detector can read them without re-walking
/// the tree-sitter AST.
#[test]
fn call_node_kwargs_empty_for_javascript() {
fn call_node_kwargs_lifts_javascript_object_literal_pairs() {
let src = br"
function outer(cmd) {
child_process.exec(cmd, { shell: true });
@ -1861,9 +1865,12 @@ fn call_node_kwargs_empty_for_javascript() {
.is_some_and(|c| c.ends_with("exec"))
})
.expect("child_process.exec call node should exist");
let kwargs = &call_node.call.kwargs;
assert!(
call_node.call.kwargs.is_empty(),
"JS object-literal arg is not a keyword_argument — kwargs should stay empty"
kwargs
.iter()
.any(|(k, vs)| k == "shell" && vs.iter().any(|v| v == "true")),
"JS object-literal `{{ shell: true }}` should surface as kwarg, got {kwargs:?}"
);
}

View file

@ -7,7 +7,7 @@
//! Strictly additive: classes whose fields cannot be classified produce
//! a `DtoFields` with an empty `fields` map, the caller must decide
//! whether to use that as a "Dto with no inferred fields" or fall back
//! to the pre-Phase-6 Object/Unknown classification.
//! to the generic Object/Unknown classification.
use std::collections::{HashMap, HashSet};

View file

@ -35,6 +35,16 @@ pub(crate) fn root_receiver_text(n: Node, lang: &str, code: &[u8]) -> Option<Str
None => text_of(n, code),
}
}
// PHP `variable_name` text carries a leading `$` (`$smarty`, `$twig`).
// Strip it so chain text built downstream (`{recv}.{method}`) presents
// a `.`-only delimiter sequence — required by the suffix-matcher
// boundary rule, which only accepts `.`/`:` as chain separators.
// Without this strip, gate matchers like `Smarty.fetch` /
// `Environment.createTemplate` never fire on idiomatic
// `$smarty->fetch(...)` / `$twig->createTemplate(...)` shapes.
_ if lang == "php" && n.kind() == "variable_name" => {
text_of(n, code).map(|s| s.trim_start_matches('$').to_string())
}
_ => text_of(n, code),
}
}

View file

@ -195,6 +195,56 @@ pub(super) fn extract_destination_kwarg_pairs(
/// Extract the string-literal content at argument position `index` (0-based).
/// Returns `None` if the argument is not a string literal or the index is out of range.
/// True when `call_node` is `Object.create(null)` (or its parenthesised /
/// awaited / type-cast wrappers). Strict literal-`null` first-arg match,
/// no aliasing through intermediate variables. Caller restricts to JS/TS.
pub(super) fn is_object_create_null_call(call_node: Node, code: &[u8]) -> bool {
if !matches!(call_node.kind(), "call_expression") {
return false;
}
let callee = call_node
.child_by_field_name("function")
.and_then(|f| text_of(f, code))
.unwrap_or_default();
if callee != "Object.create" {
return false;
}
let Some(args) = call_node.child_by_field_name("arguments") else {
return false;
};
let mut cursor = args.walk();
let named: Vec<Node> = args.named_children(&mut cursor).collect();
if named.len() != 1 {
return false;
}
let mut arg = named[0];
// Unwrap parens / await / TS type-assertions.
for _ in 0..4 {
match arg.kind() {
"parenthesized_expression" => {
if let Some(inner) = arg.named_child(0) {
arg = inner;
continue;
}
}
"await_expression" => {
if let Some(inner) = arg.child_by_field_name("argument") {
arg = inner;
continue;
}
}
"as_expression" | "type_assertion" => {
if let Some(inner) = arg.named_child(0) {
arg = inner;
continue;
}
}
_ => break,
}
}
arg.kind() == "null" || text_of(arg, code).as_deref() == Some("null")
}
pub(super) fn extract_const_string_arg(
call_node: Node,
index: usize,
@ -222,6 +272,37 @@ pub(super) fn extract_const_string_arg(
None
}
}
// Boolean literals — JS/TS `true`/`false` are their own node kinds; some
// grammars wrap them as identifiers carrying the keyword text. Returned
// verbatim so `dangerous_values` matching can detect deep-flag forms
// like `extend(true, target, src)`.
"true" | "false" => Some(arg.kind().to_string()),
// PHP double-quoted strings parse as `encapsed_string` whose body is
// a sequence of `string_content` / `escape_sequence` / interpolation
// nodes. Treat the string as constant only when every child is a
// pure-literal segment (no `variable_name` / `subscript_expression`
// interpolations); the returned value is the concatenation of the
// literal segments verbatim.
"encapsed_string" => {
let mut c = arg.walk();
let mut buf = String::new();
for ch in arg.named_children(&mut c) {
match ch.kind() {
"string_content" => {
if let Some(s) = text_of(ch, code) {
buf.push_str(&s);
}
}
"escape_sequence" => {
if let Some(s) = text_of(ch, code) {
buf.push_str(&s);
}
}
_ => return None,
}
}
Some(buf)
}
"template_string" => {
// Only treat as constant if no interpolation (no template_substitution children)
let mut c = arg.walk();
@ -238,6 +319,44 @@ pub(super) fn extract_const_string_arg(
None
}
}
// Concat-style binary expression with a leading string literal, e.g.
// PHP `"Location: " . $url`, JS/TS `"Location: " + url`. Returns the
// left-most literal so prefix-driven gates (`dangerous_prefixes`) can
// activate on partially-dynamic concatenations; falls through to
// `None` when the leading segment is not a string literal so
// exact-`dangerous_values` matching keeps its strict semantics.
"binary_expression" => {
let left = arg.child_by_field_name("left")?;
match left.kind() {
"string"
| "string_literal"
| "interpreted_string_literal"
| "raw_string_literal" => {
let raw = text_of(left, code)?;
if raw.len() >= 2 {
Some(raw[1..raw.len() - 1].to_string())
} else {
None
}
}
"encapsed_string" => {
let mut c = left.walk();
let mut buf = String::new();
for ch in left.named_children(&mut c) {
match ch.kind() {
"string_content" | "escape_sequence" => {
if let Some(s) = text_of(ch, code) {
buf.push_str(&s);
}
}
_ => return None,
}
}
Some(buf)
}
_ => None,
}
}
_ => None,
}
}
@ -271,6 +390,27 @@ pub(super) fn extract_const_macro_arg(
"identifier" | "name" | "qualified_name" | "scoped_identifier" => {
text_of(arg, code).map(|s| s.to_string())
}
// Ruby bare constant (`NOENT`) — leaf form.
"constant" => text_of(arg, code).map(|s| s.to_string()),
// Ruby scope-qualified constant (`Nokogiri::XML::ParseOptions::NOENT`).
// Return only the rightmost `name` segment so the gate's
// `dangerous_values` list can stay identifier-bare instead of
// enumerating every possible namespacing. Falls back to the full
// text if the `name` field is missing for any reason.
"scope_resolution" => arg
.child_by_field_name("name")
.and_then(|n| text_of(n, code))
.map(|s| s.to_string())
.or_else(|| text_of(arg, code).map(|s| s.to_string())),
// Integer literals at the activation arg position. PHP / C / C++
// commonly use plain `0` to opt into the safe-default option set
// (e.g. `simplexml_load_string($xml, "SimpleXMLElement", 0)`). The
// gate's `dangerous_values` list is identifier-only, so returning
// the literal text lets the comparison fail against `LIBXML_NOENT`
// and suppresses the conservative-fire branch.
"integer" | "integer_literal" | "number_literal" | "decimal_integer_literal" => {
text_of(arg, code).map(|s| s.to_string())
}
_ => None,
}
}
@ -728,35 +868,72 @@ pub(super) fn find_chained_inner_call<'a>(
return Some((function, inner_text));
}
// The function/method field for a chained call is a member_expression
// (JS/TS) or attribute (Python) etc.; its `object` field is the
// receiver expression. Only proceed when that receiver is itself a
// call.
let object = function.child_by_field_name("object")?;
// (JS/TS), attribute (Python), or field_expression (Rust); its
// receiver is the `object` field (JS/TS/Python) or `value` field
// (Rust). Only proceed when that receiver is itself a call.
let object = function
.child_by_field_name("object")
.or_else(|| function.child_by_field_name("value"))?;
if !matches!(lookup(lang, object.kind()), Kind::CallFn | Kind::CallMethod) {
return None;
}
// Recurse: the inner call may itself be chained
// (`axios.get(u).then(h).catch(h)`, innermost is `axios.get`).
if let Some(inner) = find_chained_inner_call(object, lang, code) {
return Some(inner);
}
// `object` is the innermost call_expression in the chain. Extract
// its callee identifier the same way `first_call_ident_with_span`
// does for a CallFn (member_expression text → "http.get").
let inner_func = object
// Decide whether `object` is itself a chained method call (its
// function/method field is a member-style expression). When yes,
// recurse one more level so deeper chains resolve to their innermost
// method (e.g. `axios.get(u).then(h).catch(h)` → `axios.get`).
// When no — the receiver is a plain function/constructor call like
// Rust's `HttpResponse::Found()` — descending one more level would
// strand us on the non-method leaf whose text would not match any
// gate matcher. Stop here and return the current `outer` level,
// which IS the innermost method call.
let object_function = object
.child_by_field_name("function")
.or_else(|| object.child_by_field_name("method"))
.or_else(|| object.child_by_field_name("name"))?;
// Multi-line dotted member expressions (`http\n .get`) include
// formatting whitespace in the source-text slice. The labels map
// keys are literal `"http.get"` etc., strip whitespace so the
// chained-call inner-gate rebinding fires for both single-line and
// multi-line chain styles. Also strips `\r` for CRLF sources.
// Motivated by upstream Parse Server CVE-2025-64430 which uses the
// multi-line `http\n .get(uri, ...)\n .on(...)` form.
let raw = text_of(inner_func, code)?;
.or_else(|| object.child_by_field_name("method"));
let object_is_chained_method = object_function
.map(|f| {
matches!(
f.kind(),
"member_expression"
| "attribute"
| "field_expression"
| "scoped_identifier"
| "scope_resolution"
) && f
.child_by_field_name("object")
.or_else(|| f.child_by_field_name("value"))
.is_some()
})
.unwrap_or(false);
if object_is_chained_method {
// Recurse: the inner call may itself be chained.
if let Some(inner) = find_chained_inner_call(object, lang, code) {
return Some(inner);
}
// `object` is the innermost call_expression in the chain. Extract
// its callee identifier the same way `first_call_ident_with_span`
// does for a CallFn (member_expression text → "http.get").
let inner_func = object
.child_by_field_name("function")
.or_else(|| object.child_by_field_name("method"))
.or_else(|| object.child_by_field_name("name"))?;
// Multi-line dotted member expressions (`http\n .get`) include
// formatting whitespace in the source-text slice. The labels map
// keys are literal `"http.get"` etc., strip whitespace so the
// chained-call inner-gate rebinding fires for both single-line and
// multi-line chain styles. Also strips `\r` for CRLF sources.
// Motivated by upstream Parse Server CVE-2025-64430 which uses the
// multi-line `http\n .get(uri, ...)\n .on(...)` form.
let raw = text_of(inner_func, code)?;
let inner_text: String = raw.chars().filter(|c| !c.is_whitespace()).collect();
return Some((object, inner_text));
}
// Receiver is a non-chained call (Rust constructor `Foo::new()` /
// `HttpResponse::Found()`, JS bare `f()`). Outer level IS the
// innermost method call — return its own function text so gate
// matching sees the method name.
let raw = text_of(function, code)?;
let inner_text: String = raw.chars().filter(|c| !c.is_whitespace()).collect();
Some((object, inner_text))
Some((outer, inner_text))
}
/// Recursively walk the receiver chain of `outer` (a CallFn / CallMethod
@ -1389,6 +1566,47 @@ pub(super) fn extract_kwargs(call_node: Node, code: &[u8]) -> Vec<(String, Vec<S
let mut cursor = args_node.walk();
for child in args_node.named_children(&mut cursor) {
let kind = child.kind();
// JS/TS object-literal positional arg: `f(x, { a: true, b: 'str' })`.
// The pairs inside the object are not tree-sitter
// `keyword_argument` nodes (those are Python/Ruby), but
// downstream consumers (xml_config's
// `lookup_kwargs(inst.cfg_node)` JS branch checking
// `processEntities`) expect these fields in the kwargs vector.
// Lift each `pair` (and `shorthand_property_identifier`) into
// the kwargs list using the property name as kwarg name and the
// raw text of the value expression as the single value.
// Boolean / numeric / string / identifier values all surface as
// their textual form, which is what xml_config's kwarg-value
// matchers (e.g. `v == "true"`) compare against.
if kind == "object" {
let mut oc = child.walk();
for pair in child.named_children(&mut oc) {
let pk = pair.kind();
if pk == "pair" {
let Some(kn) = pair.child_by_field_name("key") else {
continue;
};
let Some(vn) = pair.child_by_field_name("value") else {
continue;
};
let Some(raw_name) = text_of(kn, code) else {
continue;
};
let name = raw_name
.trim_start_matches(['"', '\''])
.trim_end_matches(['"', '\''])
.to_string();
if let Some(val_text) = text_of(vn, code) {
out.push((name, vec![val_text.to_string()]));
}
} else if pk == "shorthand_property_identifier" {
if let Some(name) = text_of(pair, code) {
out.push((name.to_string(), vec![name.to_string()]));
}
}
}
continue;
}
if kind != "keyword_argument" && kind != "named_argument" {
continue;
}
@ -1413,6 +1631,32 @@ pub(super) fn extract_kwargs(call_node: Node, code: &[u8]) -> Vec<(String, Vec<S
collect_idents_with_paths(vn, code, &mut idents, &mut paths);
let mut combined = paths;
combined.extend(idents);
// Boolean / numeric literal kwarg values (Python `True`/`False`,
// Ruby `true`/`false`/integer/float, JS `true`/`false`/number)
// do not surface through `collect_idents_with_paths` — the value
// node's kind is `true`/`false`/`integer`/`float`/`number`, not
// an identifier kind. Capture the raw text so consumers like
// `xml_config::classify_call` (which checks
// `values.iter().any(|v| v == "True" || v == "true")` for the
// lxml `resolve_entities=True` opt-in) can match.
if combined.is_empty() {
if matches!(
vn.kind(),
"true"
| "false"
| "integer"
| "float"
| "number"
| "string"
| "string_literal"
| "true_constant"
| "false_constant"
) {
if let Some(txt) = text_of(vn, code) {
combined.push(txt.trim_matches(['"', '\'']).to_string());
}
}
}
out.push((name, combined));
}
out
@ -1718,6 +1962,29 @@ pub(super) fn extract_arg_string_literals(call_node: Node, code: &[u8]) -> Vec<O
let raw = text_of(target, code);
raw.and_then(|s| strip_literal_quotes(&s, target, code))
}
// Boolean / null / numeric literal tokens — capture verbatim so
// downstream pattern-aware analysis (e.g. the XXE config-fact
// pass that needs to read the boolean polarity arg of
// `setFeature(NAME, true)`) can recover the literal text without
// re-walking the AST. Existing string-only consumers (URL
// prefix matching, etc.) are unaffected: a "true" / "false"
// token never satisfies their matching predicates.
"true"
| "false"
| "null"
| "null_literal"
| "nil"
| "nil_literal"
| "none"
| "boolean_literal"
| "true_literal"
| "false_literal"
| "decimal_integer_literal"
| "integer_literal"
| "integer"
| "number"
| "number_literal"
| "decimal_literal" => text_of(target, code).map(|s| s.to_string()),
_ => None,
};
result.push(literal);

View file

@ -70,8 +70,8 @@ use literals::{
extract_destination_field_pairs, extract_destination_kwarg_pairs, extract_kwargs,
extract_literal_rhs, extract_object_arg_property, extract_shell_array_payload_idents,
find_call_node, find_call_node_deep, find_chained_inner_call, has_keyword_arg,
has_object_arg_property, has_only_literal_args, is_parameterized_query_call,
java_chain_arg0_kind_for_method, js_chain_arg0_kind_for_method,
has_object_arg_property, has_only_literal_args, is_object_create_null_call,
is_parameterized_query_call, java_chain_arg0_kind_for_method, js_chain_arg0_kind_for_method,
js_chain_outer_method_for_inner, ruby_chain_arg0_for_method, walk_chain_inner_call_args,
};
use params::{
@ -359,6 +359,14 @@ pub struct CallMeta {
/// must not survive into the constructed object.
#[serde(default)]
pub is_constructor: bool,
/// True when this call is `Object.create(null)` (or alias). The returned
/// value has no prototype chain. Consumed by TypeFacts to tag the
/// SsaValue with [`crate::ssa::type_facts::TypeKind::NullPrototypeObject`]
/// so PROTOTYPE_POLLUTION suppression can fire flow-sensitively at the
/// synthetic `__index_set__` sink. Set during CFG node construction so
/// SSA does not need to re-walk the AST.
#[serde(default)]
pub produces_null_proto: bool,
}
/// One gate's contribution at a call site whose callee matches multiple
@ -601,8 +609,7 @@ pub struct BodyMeta {
/// decorators / annotations / static type text at CFG construction
/// time. Same length as `params`; positions with no recoverable
/// type info are `None`. Strictly additive, when every entry is
/// `None`, downstream behaviour is identical to the pre-Phase-1
/// engine.
/// `None`, downstream behaviour is identical to the type-unaware path.
pub param_types: Vec<Option<crate::ssa::type_facts::TypeKind>>,
/// Per-parameter destructured-binding sibling names. Same length
/// as `params`; entry `i` lists field names bound by the same
@ -1811,6 +1818,31 @@ pub(super) fn push_node<'a>(
labels.push(l);
}
}
// Subscript-set form: `response.headers["X-Foo"] = bar`
// (Ruby `element_reference`, JS/TS `subscript_expression`,
// Python `subscript`). The LHS has no `property` field, so
// walk into the subscript's `object` and try classifying its
// member-expression text (e.g. `response.headers`). This
// lets header-injection sinks fire on the bare bracket form
// alongside the `set_header` / `headers_mut.insert` method
// shapes already covered above.
if labels.is_empty()
&& matches!(
lhs.kind(),
"subscript_expression" | "subscript" | "element_reference"
)
{
let obj = lhs
.child_by_field_name("object")
.or_else(|| lhs.child_by_field_name("value"))
.or_else(|| lhs.child(0));
if let Some(obj_node) = obj
&& let Some(obj_text) = member_expr_text(obj_node, code)
&& let Some(l) = classify(lang, &obj_text, extra)
{
labels.push(l);
}
}
}
}
@ -1933,18 +1965,45 @@ pub(super) fn push_node<'a>(
{
let gate_call = call_ast.or_else(|| find_call_node_deep(ast, lang, 4));
if let Some(cn) = gate_call {
let gate_callee_text = if call_ast.is_some() {
// Derive the gate's callee text from the call's
// `function`/`method`/`name` field, falling back to `text`.
//
// The default is `text`, which by this point reflects the
// qualified callee for method calls (`Velocity.evaluate`,
// `$smarty->fetch`) reconstructed in the `Kind::CallMethod`
// arm. When `first_member_label` rewrites `text` to a member
// Source like `req.body` (because the wrapper carries one as
// an argument), the rewrite is correct for source attribution
// but defeats gate matching against a bare callee
// (`setValue(target, req.body, …)` would gate-match
// `req.body` instead of `setValue`).
//
// Detect that case structurally: a Source label is present AND
// the call's function-field text differs from `text`. The
// function field carries the actual callee identifier; when it
// disagrees with `text`, `text` was clobbered by a member-source
// override and the function field is the right gate target.
// Whitespace is stripped to mirror `find_chained_inner_call`
// so multi-line chains (`http\n .get(...)`) still match flat
// gate matchers like `http.get`.
let function_field_text: Option<String> = cn
.child_by_field_name("function")
.or_else(|| cn.child_by_field_name("method"))
.or_else(|| cn.child_by_field_name("name"))
.and_then(|f| text_of(f, code))
.map(|t| t.chars().filter(|c| !c.is_whitespace()).collect::<String>());
let has_source_label = labels
.iter()
.any(|l| matches!(l, crate::labels::DataLabel::Source(_)));
let gate_callee_text = if let Some(ff) = function_field_text.as_deref()
&& has_source_label
&& ff != text.as_str()
{
ff.to_string()
} else if call_ast.is_some() {
text.clone()
} else {
// Inner call reached via wrapper, use the call-expression's
// function name directly. Falls back to `text` so non-call-
// expression kinds (method calls, Ruby `call` nodes, macros)
// still have a usable callee string.
cn.child_by_field_name("function")
.or_else(|| cn.child_by_field_name("method"))
.or_else(|| cn.child_by_field_name("name"))
.and_then(|f| text_of(f, code))
.unwrap_or_else(|| text.clone())
function_field_text.unwrap_or_else(|| text.clone())
};
let matches = classify_gated_sink(
lang,
@ -1953,12 +2012,15 @@ pub(super) fn push_node<'a>(
extract_const_string_arg(cn, idx, code).or_else(|| {
// C/C++ preprocessor macros and PHP `define`d constants
// surface as identifier nodes, not string literals.
// Falling back to the macro-arg extractor for those
// languages lets gates like `curl_easy_setopt` /
// `curl_setopt` activate on a `CURLOPT_POSTFIELDS`
// ident match instead of firing conservatively on
// every positional arg.
if matches!(lang, "c" | "cpp" | "c++" | "php") {
// Ruby option constants (e.g.
// `Nokogiri::XML::ParseOptions::NOENT`) surface as
// `scope_resolution` / `constant` nodes. Falling back
// to the macro-arg extractor for those languages lets
// gates like `curl_easy_setopt` / `curl_setopt` /
// `Nokogiri::XML` activate on a bare-leaf identifier
// match instead of firing conservatively on every
// positional arg.
if matches!(lang, "c" | "cpp" | "c++" | "php" | "ruby" | "rb") {
extract_const_macro_arg(cn, idx, code)
} else {
None
@ -2656,6 +2718,13 @@ pub(super) fn push_node<'a>(
|| call_ast
.is_some_and(|cn| matches!(cn.kind(), "new_expression" | "object_creation_expression"));
// Detect `Object.create(null)` so TypeFacts can tag the returned
// SsaValue with `NullPrototypeObject` for flow-sensitive
// prototype-pollution suppression. Restricted to JS/TS where
// `Object.create` is the idiomatic null-prototype constructor.
let produces_null_proto = matches!(lang, "javascript" | "typescript")
&& call_ast.is_some_and(|cn| is_object_create_null_call(cn, code));
let idx = g.add_node(NodeInfo {
kind,
call: CallMeta {
@ -2672,6 +2741,7 @@ pub(super) fn push_node<'a>(
destination_uses,
gate_filters,
is_constructor,
produces_null_proto,
},
taint: TaintMeta {
labels,
@ -2860,6 +2930,31 @@ fn try_lower_subscript_write(
*call_ordinal += 1;
let mut uses_all: Vec<String> = vec![arr_text.clone(), idx_text.clone()];
uses_all.extend(rhs_uses.iter().cloned());
// Prototype pollution sink classification on the synthetic
// `__index_set__` node for JS/TS. Tainted *key* in `obj[key] = val`
// is the pollution channel (a `__proto__` / `constructor` literal flowing
// through `key` mutates `Object.prototype` globally), so the gate's
// payload arg list is `[0]` (the key only — the value at index 1 is
// benign on its own). Sanitizer recognition is structural (no taint
// engine plumbing) and runs before label attachment, so suppressed
// shapes never enter the SSA sink scan:
// * constant string key whose literal value is not in the dangerous
// set (`__proto__` / `constructor` / `prototype`),
// * receiver was assigned `Object.create(null)` in this function
// (no prototype chain to pollute),
// * the assignment is dominated by an `if` whose condition rejects
// dangerous keys with an early `return` / `throw` / `break`, or
// that allowlists the key against safe constants on its true arm.
let mut pp_labels: smallvec::SmallVec<[DataLabel; 2]> = smallvec::SmallVec::new();
let mut pp_payload_args: Option<Vec<usize>> = None;
if matches!(lang, "javascript" | "typescript" | "js" | "ts")
&& !pp_should_suppress_index_set(assign_ast, subscript_node, &arr_text, &idx_text, code)
{
pp_labels.push(DataLabel::Sink(Cap::PROTOTYPE_POLLUTION));
pp_payload_args = Some(vec![0]);
}
let n = g.add_node(NodeInfo {
kind: StmtKind::Call,
call: CallMeta {
@ -2867,9 +2962,11 @@ fn try_lower_subscript_write(
receiver: Some(arr_text.clone()),
arg_uses: vec![vec![idx_text.clone()], rhs_uses.clone()],
call_ordinal: ord,
sink_payload_args: pp_payload_args,
..Default::default()
},
taint: TaintMeta {
labels: pp_labels,
uses: uses_all,
..Default::default()
},
@ -2883,6 +2980,477 @@ fn try_lower_subscript_write(
Some(n)
}
/// Spring MVC controller-return open-redirect recogniser. Detects the
/// shape `return "redirect:" + tainted` (Java string concatenation) and
/// emits a synthetic `__spring_redirect__` Call sink with
/// `Sink(OPEN_REDIRECT)` so the existing taint pipeline propagates the
/// concatenated suffix through the OPEN_REDIRECT cap. The synthetic
/// node sequences between `preds` and the eventual Return node.
///
/// Returns `Some(synthetic_idx)` when matched, otherwise `None`.
/// Java only — Spring's `redirect:` view-name convention has no
/// counterpart in the other supported languages, and matching the
/// literal across non-Spring code would over-fire.
fn try_lower_spring_redirect_return(
ast: Node,
preds: &[NodeIndex],
g: &mut Cfg,
lang: &str,
code: &[u8],
enclosing_func: Option<&str>,
call_ordinal: &mut u32,
) -> Option<NodeIndex> {
if lang != "java" {
return None;
}
// `return EXPR ;` — find the returned expression. tree-sitter-java
// wraps the value in a `return_statement` whose first named child
// is the expression.
let expr = ast.named_child(0)?;
// Strip parentheses.
let mut cur = expr;
while cur.kind() == "parenthesized_expression" {
cur = cur.named_child(0)?;
}
if cur.kind() != "binary_expression" {
return None;
}
let op = cur.child_by_field_name("operator")?;
let op_text = text_of(op, code)?;
if op_text != "+" {
return None;
}
// Walk leftmost descent through left-associated `+` chains so that
// `"redirect:" + a + b` still matches (the AST nests as
// `(("redirect:" + a) + b)`).
let mut leftmost = cur;
loop {
let left = leftmost.child_by_field_name("left")?;
let mut left_inner = left;
while left_inner.kind() == "parenthesized_expression" {
left_inner = left_inner.named_child(0)?;
}
if left_inner.kind() == "binary_expression" {
let op_l = left_inner.child_by_field_name("operator")?;
if text_of(op_l, code).as_deref() == Some("+") {
leftmost = left_inner;
continue;
}
}
// `left_inner` is the leftmost atom — must be a string literal
// whose constant value starts with `redirect:`.
if !matches!(left_inner.kind(), "string_literal" | "string") {
return None;
}
let lit = text_of(left_inner, code)?;
if lit.len() < 2 {
return None;
}
let inner = &lit[1..lit.len() - 1];
if !inner.starts_with("redirect:") {
return None;
}
break;
}
// Collect identifiers referenced anywhere in the original concat
// expression — the tainted URL piece is one of them. Receiver-style
// method calls (`view.toString()`) are intentionally captured via
// the bare identifier; precision improvements are deferred to the
// SSA / abstract-string layer.
let mut concat_uses: Vec<String> = Vec::new();
collect_idents(cur, code, &mut concat_uses);
if concat_uses.is_empty() {
return None;
}
let span = (ast.start_byte(), ast.end_byte());
let ord = *call_ordinal;
*call_ordinal += 1;
let mut labels: smallvec::SmallVec<[DataLabel; 2]> = smallvec::SmallVec::new();
labels.push(DataLabel::Sink(Cap::OPEN_REDIRECT));
let n = g.add_node(NodeInfo {
kind: StmtKind::Call,
call: CallMeta {
callee: Some("__spring_redirect__".to_string()),
arg_uses: vec![concat_uses.clone()],
call_ordinal: ord,
sink_payload_args: Some(vec![0]),
..Default::default()
},
taint: TaintMeta {
labels,
uses: concat_uses,
..Default::default()
},
ast: AstMeta {
span,
enclosing_func: enclosing_func.map(|s| s.to_string()),
},
..Default::default()
});
connect_all(g, preds, n, EdgeKind::Seq);
Some(n)
}
/// Prototype-pollution suppression decisions for the synthetic
/// `__index_set__` node emitted by `try_lower_subscript_write`.
///
/// Returns `true` when the assignment is provably safe and the
/// `Cap::PROTOTYPE_POLLUTION` sink label should be elided. The three
/// CFG-layer recognised shapes are flow-insensitive AST patterns:
///
/// 1. Constant string key whose value is not one of the dangerous
/// keys (`__proto__`, `constructor`, `prototype`). A literal-keyed
/// write cannot pollute even if the value is tainted.
/// 2. Reject pattern `if (idx === "__proto__" || idx === "constructor"
/// || idx === "prototype") <return/throw/break>` enclosing the
/// assignment. The dangerous-key path terminates before reaching
/// the synthesised store.
/// 3. Allowlist pattern `if (idx === "name" || idx === "id") { obj[idx]
/// = v }`. The assignment only executes when `idx` is one of a
/// small set of known-safe constants.
///
/// The null-prototype receiver suppression (`Object.create(null)`) is
/// handled flow-sensitively in the SSA taint engine via
/// `TypeKind::NullPrototypeObject`, since AST scans cannot honour
/// branch-local re-bindings or phi joins.
///
/// Conservative: any unrecognised shape returns `false` so the sink
/// label is attached and the SSA layer decides on taint reachability.
fn pp_should_suppress_index_set(
assign_ast: Node,
subscript_node: Node,
_arr_text: &str,
idx_text: &str,
code: &[u8],
) -> bool {
// 1. Constant-key fold.
if let Some(idx_node) = subscript_node
.child_by_field_name("index")
.or_else(|| subscript_node.child_by_field_name("subscript"))
.or_else(|| {
let mut cur = subscript_node.walk();
subscript_node.named_children(&mut cur).nth(1)
})
{
if let Some(literal) = pp_string_literal_value(idx_node, code) {
return !pp_is_dangerous_proto_key(&literal);
}
}
// 2 + 3. Dominator-style guard ancestors (reject + allowlist).
if pp_is_guarded_by_proto_check(assign_ast, idx_text, code) {
return true;
}
false
}
/// Dangerous prototype-pollution key strings. Matches the literal
/// values that JS engines treat as references into the prototype chain.
fn pp_is_dangerous_proto_key(s: &str) -> bool {
matches!(s, "__proto__" | "constructor" | "prototype")
}
/// Extract the value of a JS/TS string literal node, stripping the
/// outer quote bytes (single, double, or backtick). Returns `None`
/// for non-literal nodes, template literals containing interpolation,
/// or anything that doesn't resemble a single-segment string.
fn pp_string_literal_value(n: Node, code: &[u8]) -> Option<String> {
let kind = n.kind();
if !matches!(kind, "string" | "string_literal" | "template_string") {
return None;
}
let raw = std::str::from_utf8(&code[n.start_byte()..n.end_byte()]).ok()?;
if raw.len() < 2 {
return None;
}
let bytes = raw.as_bytes();
let first = bytes[0];
let last = bytes[bytes.len() - 1];
if !matches!(first, b'"' | b'\'' | b'`') || first != last {
return None;
}
let inner = &raw[1..raw.len() - 1];
// Reject template literals carrying `${...}` interpolation — we
// can't fold those to a single concrete value.
if first == b'`' && inner.contains("${") {
return None;
}
Some(inner.to_string())
}
/// Walk up from the assignment node looking for two structural guard
/// shapes:
///
/// * **Reject pattern** — a *previous sibling* `if_statement` in any
/// enclosing block whose condition is `idx === DANGEROUS [|| …]` and
/// whose consequence terminates control flow (`return` / `throw` /
/// `break` / `continue`). The dangerous-key path never reaches the
/// subsequent assignment.
/// * **Allowlist pattern** — an *ancestor* `if_statement` whose
/// condition is `idx === SAFE [|| …]` and through whose consequence
/// the descendant flows. Only the safe-key arm reaches the
/// assignment.
///
/// Both shapes must compare against the same key variable as the
/// synthetic `__index_set__` node. Stops at the enclosing function so
/// guards in an outer scope around a closure passed elsewhere don't
/// accidentally suppress inner assignments.
fn pp_is_guarded_by_proto_check(from: Node, idx_text: &str, code: &[u8]) -> bool {
let mut cur = from;
while let Some(parent) = cur.parent() {
match parent.kind() {
"function_declaration"
| "function"
| "function_expression"
| "arrow_function"
| "method_definition"
| "generator_function_declaration"
| "program"
| "source_file" => return false,
"if_statement" => {
if let Some(cond) = parent.child_by_field_name("condition") {
let consequence = parent.child_by_field_name("consequence");
if let Some(verdict) =
pp_classify_proto_guard(cond, consequence, cur, idx_text, code)
{
return verdict;
}
}
}
_ => {}
}
// Reject pattern: scan previous siblings in the parent block
// for `if (idx === DANGEROUS [|| …]) { return; }` shapes that
// dominate the assignment via early-return.
let mut sibling_cursor = parent.walk();
for sibling in parent.named_children(&mut sibling_cursor) {
if sibling.start_byte() >= cur.start_byte() {
break;
}
if sibling.kind() != "if_statement" {
continue;
}
if pp_is_reject_pattern(sibling, idx_text, code) {
return true;
}
}
cur = parent;
}
false
}
/// True when `if_node` is `if (idx === DANGEROUS [|| idx === DANGEROUS]
/// …) { return; / throw …; / break; }` shaped — every disjunct
/// compares the named key variable to a dangerous prototype key, and
/// the consequence terminates control flow.
fn pp_is_reject_pattern(if_node: Node, idx_text: &str, code: &[u8]) -> bool {
let Some(cond) = if_node.child_by_field_name("condition") else {
return false;
};
let consequence = if_node.child_by_field_name("consequence");
let clauses = pp_split_or_clauses(cond);
if clauses.is_empty() {
return false;
}
for clause in &clauses {
let Some((var, lit)) = pp_extract_eq_compare(*clause, code) else {
return false;
};
if var != idx_text || !pp_is_dangerous_proto_key(&lit) {
return false;
}
}
consequence.map(pp_block_terminates).unwrap_or(false)
}
/// Decide whether an enclosing `if` clause around an `__index_set__`
/// statement constitutes a prototype-pollution guard.
///
/// `cond` is the if's condition expression, `consequence` is the
/// optional consequence block, and `descendant` is the node on the
/// path from the if-statement down to the assignment (used to
/// distinguish "assignment lives inside the consequence" from
/// "assignment lives after the if"). `idx_text` is the textual key
/// variable used by the synthetic `__index_set__`.
///
/// Returns `Some(true)` to suppress, `Some(false)` to keep the gate
/// (e.g. an unrelated guard), and `None` when the if-statement is
/// not a recognised guard so the walker continues outward.
fn pp_classify_proto_guard(
cond: Node,
consequence: Option<Node>,
descendant: Node,
idx_text: &str,
code: &[u8],
) -> Option<bool> {
let cond_clauses = pp_split_or_clauses(cond);
if cond_clauses.is_empty() {
return None;
}
let mut all_against_idx = true;
let mut all_dangerous = true;
let mut all_safe = true;
for clause in &cond_clauses {
let (var, lit) = pp_extract_eq_compare(*clause, code)?;
if var != idx_text {
all_against_idx = false;
break;
}
let dangerous = pp_is_dangerous_proto_key(&lit);
if dangerous {
all_safe = false;
} else {
all_dangerous = false;
}
}
if !all_against_idx {
return None;
}
let consequence_contains_descendant = consequence
.map(|c| pp_subtree_contains(c, descendant))
.unwrap_or(false);
// Allowlist pattern: every clause is `idx === SAFE` and the
// assignment lives inside the consequence (true arm).
if all_safe && consequence_contains_descendant {
return Some(true);
}
// Reject pattern: every clause is `idx === DANGEROUS` and the
// consequence terminates control flow before reaching the
// assignment. Only suppress when the assignment is *outside* the
// consequence (i.e., follows the if).
if all_dangerous
&& !consequence_contains_descendant
&& consequence.map(pp_block_terminates).unwrap_or(false)
{
return Some(true);
}
None
}
/// True when `descendant` is identical to or transitively a child of
/// `root`. Identity is checked via byte-range equality because
/// tree-sitter `Node` doesn't implement `Eq` directly.
fn pp_subtree_contains(root: Node, descendant: Node) -> bool {
let dr = (descendant.start_byte(), descendant.end_byte());
let rr = (root.start_byte(), root.end_byte());
dr.0 >= rr.0 && dr.1 <= rr.1
}
/// True when `block` (typically an `if` consequence) terminates
/// control flow on every path: the last meaningful statement is a
/// return / throw / break / continue. Conservative — falls back to
/// `false` for empty blocks or anything non-trivial.
fn pp_block_terminates(block: Node) -> bool {
// Bare statement consequence (no braces): the if's consequence is
// the terminator itself.
if pp_is_terminator(block) {
return true;
}
if !matches!(block.kind(), "statement_block" | "block") {
return false;
}
let mut cursor = block.walk();
let last_stmt = block.named_children(&mut cursor).last();
match last_stmt {
Some(s) => pp_is_terminator(s),
None => false,
}
}
/// True when `n` is a control-flow-ending statement: return / throw /
/// break / continue.
fn pp_is_terminator(n: Node) -> bool {
matches!(
n.kind(),
"return_statement" | "throw_statement" | "break_statement" | "continue_statement"
)
}
/// Split an expression by top-level `||` operators. Returns the
/// individual disjunct sub-expressions. Single (non-OR) expressions
/// yield a one-element vector. Walks `binary_expression` nodes whose
/// `operator` field is `||` and recurses into both sides.
fn pp_split_or_clauses<'a>(expr: Node<'a>) -> Vec<Node<'a>> {
let mut out = Vec::new();
pp_collect_or_clauses(expr, &mut out);
out
}
fn pp_collect_or_clauses<'a>(expr: Node<'a>, out: &mut Vec<Node<'a>>) {
let stripped = pp_unwrap_paren(expr);
if matches!(stripped.kind(), "binary_expression") {
let op = stripped
.child_by_field_name("operator")
.map(|o| o.kind())
.unwrap_or("");
if op == "||" {
if let Some(l) = stripped.child_by_field_name("left") {
pp_collect_or_clauses(l, out);
}
if let Some(r) = stripped.child_by_field_name("right") {
pp_collect_or_clauses(r, out);
}
return;
}
}
out.push(stripped);
}
fn pp_unwrap_paren(n: Node) -> Node {
let mut cur = n;
while matches!(cur.kind(), "parenthesized_expression") {
match cur.named_child(0) {
Some(inner) => cur = inner,
None => break,
}
}
cur
}
/// Extract `(var_text, literal_value)` from an equality comparison
/// `var === "literal"` / `var == "literal"` (and reversed forms).
/// Returns `None` for any other shape.
fn pp_extract_eq_compare(expr: Node, code: &[u8]) -> Option<(String, String)> {
let stripped = pp_unwrap_paren(expr);
if !matches!(stripped.kind(), "binary_expression") {
return None;
}
let op = stripped
.child_by_field_name("operator")
.map(|o| o.kind())
.unwrap_or("");
if !matches!(op, "===" | "==") {
return None;
}
let left = stripped.child_by_field_name("left")?;
let right = stripped.child_by_field_name("right")?;
let left = pp_unwrap_paren(left);
let right = pp_unwrap_paren(right);
if let (Some(lv), Some(rs)) = (text_of(left, code), pp_string_literal_value(right, code)) {
if matches!(left.kind(), "identifier" | "shorthand_property_identifier") {
return Some((lv, rs));
}
}
if let (Some(rv), Some(ls)) = (text_of(right, code), pp_string_literal_value(left, code)) {
if matches!(right.kind(), "identifier" | "shorthand_property_identifier") {
return Some((rv, ls));
}
}
None
}
/// Step 1 (`pre_emit_arg_source_nodes`): scan the AST, create Source nodes,
/// wire them to `preds`, and return (effective_preds, synth_bindings,
/// uses_only_synth_names).
@ -3682,6 +4250,21 @@ pub(super) fn build_sub<'a>(
Vec::new()
} else {
// Spring MVC `return "redirect:" + url` open-redirect
// synthetic-sink emission. When matched the synthetic
// call sequences between `preds` and the Return node.
let mut effective_preds: Vec<NodeIndex> = preds.to_vec();
if let Some(synth) = try_lower_spring_redirect_return(
ast,
&effective_preds,
g,
lang,
code,
enclosing_func,
call_ordinal,
) {
effective_preds = vec![synth];
}
let ret = push_node(
g,
StmtKind::Return,
@ -3692,7 +4275,7 @@ pub(super) fn build_sub<'a>(
0,
analysis_rules,
);
connect_all(g, preds, ret, EdgeKind::Seq);
connect_all(g, &effective_preds, ret, EdgeKind::Seq);
Vec::new() // terminates this path
}
}

View file

@ -13,7 +13,7 @@ use tree_sitter::Node;
/// of `build_cfg`. Returns the [`TypeKind::Dto`] carrying the
/// per-field type map when the class is declared in the same file;
/// returns `None` otherwise so callers can fall through to the
/// pre-Phase-6 behaviour (Object / Unknown).
/// generic Object / Unknown classification.
fn lookup_dto_class(class_name: &str) -> Option<TypeKind> {
DTO_CLASSES.with(|cell| cell.borrow().get(class_name).cloned().map(TypeKind::Dto))
}
@ -27,7 +27,7 @@ fn lookup_dto_class(class_name: &str) -> Option<TypeKind> {
/// for the JS/TS object-pattern formal `({ a, b, c })`, the entry is
/// `("a", None, ["b", "c"])`. Strictly additive: when the param is
/// not a destructured pattern (or the language has no destructure
/// concept), behaviour is identical to the pre-Phase-5 names-only path.
/// concept), behaviour is identical to the names-only path.
///
/// Closes the residual gap behind CVE-2026-25544 (PayloadCMS Drizzle
/// SQL injection): a per-parameter taint probe that seeds only the

View file

@ -49,6 +49,7 @@ impl Commands {
match self {
Commands::Scan { explain_engine, .. } => *explain_engine,
Commands::List { .. } => true,
Commands::Rules { .. } => true,
Commands::Config { action } => {
matches!(action, ConfigAction::Show { .. } | ConfigAction::Path)
}
@ -459,6 +460,12 @@ pub enum Commands {
action: ConfigAction,
},
/// Browse the built-in rule registry (cap classes + per-language label rules)
Rules {
#[command(subcommand)]
action: RulesAction,
},
/// Start the local web UI for browsing scan results
Serve {
/// Path to scan root (defaults to current directory)
@ -525,6 +532,36 @@ pub enum ConfigAction {
},
}
#[derive(Subcommand)]
pub enum RulesAction {
/// List built-in rules
List {
/// Filter by language slug (e.g. javascript, java, python). Cap-class
/// entries (`language = "all"`) are always shown unless `--no-class`
/// is set.
#[arg(long)]
lang: Option<String>,
/// Filter by rule kind (`class`, `source`, `sink`, `sanitizer`).
#[arg(long)]
kind: Option<String>,
/// Show only the cap-class registry entries (one per vulnerability
/// class), suppressing per-language label rules.
#[arg(long, conflicts_with = "no_class")]
class_only: bool,
/// Suppress cap-class registry entries (show only per-language label
/// rules and gated sinks).
#[arg(long)]
no_class: bool,
/// Emit JSON instead of the human-readable table.
#[arg(long)]
json: bool,
},
}
#[derive(Subcommand)]
pub enum IndexAction {
/// Build or update index for current project

View file

@ -10,6 +10,7 @@ pub mod clean;
pub mod config;
pub mod index;
pub mod list;
pub mod rules;
pub mod scan;
#[cfg(feature = "serve")]
pub mod serve;
@ -352,6 +353,9 @@ pub fn handle_command(
}
}
}
Commands::Rules { action } => {
self::rules::handle(action, config)?;
}
Commands::Serve {
path,
port,

248
src/commands/rules.rs Normal file
View file

@ -0,0 +1,248 @@
//! `nyx rules` subcommand.
//!
//! Surfaces the rule registry from the terminal so users can enumerate
//! the same content that the dashboard's `/api/rules` endpoint and the
//! browser's Rules page show. The output composes built-in cap-class
//! entries (one per `Cap` with a canonical rule id), per-language label
//! rules (sink/source/sanitizer), gated sinks, and any custom rules
//! defined in the user's config.
use crate::cli::RulesAction;
use crate::errors::NyxResult;
use crate::labels::{self, RuleInfo};
use crate::utils::config::{Config, RuleKind};
use console::style;
pub fn handle(action: RulesAction, config: &Config) -> NyxResult<()> {
match action {
RulesAction::List {
lang,
kind,
class_only,
no_class,
json: as_json,
} => list(
config,
lang.as_deref(),
kind.as_deref(),
class_only,
no_class,
as_json,
),
}
}
fn list(
config: &Config,
lang_filter: Option<&str>,
kind_filter: Option<&str>,
class_only: bool,
no_class: bool,
as_json: bool,
) -> NyxResult<()> {
let mut rules = labels::enumerate_builtin_rules();
// Apply disabled-rules overlay so the CLI matches the dashboard view.
for rule in &mut rules {
if config.analysis.disabled_rules.contains(&rule.id) {
rule.enabled = false;
}
}
// Append custom rules from config. Mirrors the projection in
// `src/server/routes/rules.rs::build_rule_list`.
for (cfg_lang, lang_cfg) in &config.analysis.languages {
let canonical = labels::canonical_lang(cfg_lang);
for cr in &lang_cfg.rules {
let kind_str = match cr.kind {
RuleKind::Source => "source",
RuleKind::Sanitizer => "sanitizer",
RuleKind::Sink => "sink",
};
let id = labels::custom_rule_id(canonical, kind_str, &cr.matchers);
let first = cr.matchers.first().map(|s| s.as_str()).unwrap_or("?");
let title = format!("{} (custom {})", first, kind_str);
let cap = cr.cap.to_cap();
let enabled = !config.analysis.disabled_rules.contains(&id);
rules.push(RuleInfo {
id,
title,
language: canonical.to_string(),
kind: kind_str.to_string(),
cap: labels::cap_to_name(cap).to_string(),
cap_bits: cap.bits(),
matchers: cr.matchers.clone(),
case_sensitive: cr.case_sensitive,
is_custom: true,
is_gated: false,
is_class: false,
emission_active: true,
enabled,
});
}
}
// Filter.
let lang_filter_canonical = lang_filter.map(labels::canonical_lang);
rules.retain(|r| {
if class_only && !r.is_class {
return false;
}
if no_class && r.is_class {
return false;
}
if let Some(want) = lang_filter_canonical {
// Cap-class entries (`language == "all"`) are language-agnostic;
// surface them alongside any language filter unless explicitly
// suppressed via `--no-class`.
if r.language != want && r.language != "all" {
return false;
}
}
if let Some(want) = kind_filter
&& !r.kind.eq_ignore_ascii_case(want)
{
return false;
}
true
});
if as_json {
let body = serde_json::to_string_pretty(&rules)
.map_err(|e| crate::errors::NyxError::Msg(format!("rules JSON serialise: {e}")))?;
println!("{body}");
return Ok(());
}
if rules.is_empty() {
println!("{}", style("(no rules match the supplied filters)").dim());
return Ok(());
}
// Header.
println!(
"{}",
style("Rules (built-in registry, per-language labels, and custom rules from config)")
.bold()
);
println!();
// Cap-class section first, distinct from per-language entries.
let class_rules: Vec<&RuleInfo> = rules.iter().filter(|r| r.is_class).collect();
if !class_rules.is_empty() {
println!(" {}", style("Vulnerability classes").cyan().bold());
for r in &class_rules {
print_class_row(r);
}
println!();
}
let builtin_label_rules: Vec<&RuleInfo> = rules
.iter()
.filter(|r| !r.is_class && !r.is_custom)
.collect();
if !builtin_label_rules.is_empty() {
println!(" {}", style("Built-in label rules").cyan().bold());
for r in &builtin_label_rules {
print_label_row(r);
}
println!();
}
let custom_rules: Vec<&RuleInfo> = rules.iter().filter(|r| r.is_custom).collect();
if !custom_rules.is_empty() {
println!(" {}", style("Custom rules (from config)").cyan().bold());
for r in &custom_rules {
print_label_row(r);
}
println!();
}
println!(
"{}",
style(format!(
"{} class · {} built-in label · {} custom · {} total",
class_rules.len(),
builtin_label_rules.len(),
custom_rules.len(),
rules.len()
))
.dim()
);
Ok(())
}
fn print_class_row(r: &RuleInfo) {
let status = if r.enabled {
style("on ").green().to_string()
} else {
style("off").red().dim().to_string()
};
// Forward-declared classes (registered but not yet wired through
// `ast.rs::diag_for_finding`) carry a tag so users don't expect
// findings under the class id; live findings still surface under
// the legacy `taint-unsanitised-flow` rule id.
let tag = if r.emission_active {
String::new()
} else {
format!(" {}", style("(forward-declared)").yellow())
};
println!(
" {} {:<32} {} {}{}",
status,
style(&r.id).white().bold(),
style(format!("[{}]", r.cap)).dim(),
style(&r.title).dim(),
tag,
);
}
fn print_label_row(r: &RuleInfo) {
let status = if r.enabled {
style("on ").green().to_string()
} else {
style("off").red().dim().to_string()
};
let tag = if r.is_custom {
style(" custom").yellow().to_string()
} else if r.is_gated {
style(" gated").magenta().to_string()
} else {
String::new()
};
let matchers = if r.matchers.is_empty() {
String::new()
} else {
let joined = r.matchers.join(", ");
format!("{joined}")
};
println!(
" {} {:<10} {:<10} {:<14}{}{}",
status,
style(&r.language).cyan(),
style(&r.kind).white(),
style(&r.cap).dim(),
tag,
style(matchers).dim(),
);
}
#[cfg(test)]
mod tests {
use super::*;
use crate::utils::config::Config;
#[test]
fn list_runs_without_panic_default_config() {
let cfg = Config::default();
// Plain list, no filters.
list(&cfg, None, None, false, false, false).unwrap();
// Class-only.
list(&cfg, None, None, true, false, false).unwrap();
// JSON output.
list(&cfg, None, None, false, false, true).unwrap();
// Lang + kind filters.
list(&cfg, Some("javascript"), Some("sink"), false, true, false).unwrap();
}
}

View file

@ -544,14 +544,14 @@ pub(crate) fn deduplicate_taint_flows(diags: &mut Vec<Diag>) {
id.starts_with(TAINT_BASE)
}
fn sink_cap_bits(d: &Diag) -> u16 {
fn sink_cap_bits(d: &Diag) -> u32 {
d.evidence.as_ref().map(|e| e.sink_caps).unwrap_or(0)
}
// Group candidates by (path, line, severity, sink_cap_bits). Only
// `taint-unsanitised-flow` rule IDs participate; findings with other
// bases (e.g. `js.code_exec.eval`) are left untouched per guardrails.
let mut groups: HashMap<(String, usize, Severity, u16), Vec<usize>> = HashMap::new();
let mut groups: HashMap<(String, usize, Severity, u32), Vec<usize>> = HashMap::new();
for (i, d) in diags.iter().enumerate() {
if is_taint_flow(&d.id) {
groups
@ -690,8 +690,8 @@ pub const SCC_UNCONVERGED_CROSS_FILE_NOTE_PREFIX: &str = "scc_unconverged:cross-
/// file set. Semantics match [`diff_cap_snapshots`], a key that
/// appears or disappears counts as changed.
fn changed_cap_keys_of(
before: &HashMap<crate::symbol::FuncKey, (u16, u16, u16, Vec<usize>)>,
after: &HashMap<crate::symbol::FuncKey, (u16, u16, u16, Vec<usize>)>,
before: &HashMap<crate::symbol::FuncKey, (u32, u32, u32, Vec<usize>)>,
after: &HashMap<crate::symbol::FuncKey, (u32, u32, u32, Vec<usize>)>,
) -> HashSet<crate::symbol::FuncKey> {
let mut changed = HashSet::new();
for (k, v_after) in after {
@ -971,10 +971,10 @@ fn run_topo_batches(
// with a 64-iter budget; the classifier only needs the tail.
let mut delta_trajectory: smallvec::SmallVec<[u32; 4]> = smallvec::SmallVec::new();
// Phase-B worklist: files to re-analyse in this iteration.
// SCC fixpoint worklist: files to re-analyse in this iteration.
// Initialised to the full batch so iteration 0 behaves like
// the pre-Phase-B implementation; subsequent iterations
// prune to files containing a caller of a changed summary.
// the unconditional re-analysis; subsequent iterations prune
// to files containing a caller of a changed summary.
//
// Storing `PathBuf` clones (matching how the rest of the
// SCC loop identifies files) so membership tests are cheap

View file

@ -113,22 +113,22 @@ impl ConstValue {
// ── TypeSet ─────────────────────────────────────────────────────────────
/// Bitset over [`TypeKind`] variants (12 bits used of u16).
/// Bitset over [`TypeKind`] variants (19 bits used of u32).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub struct TypeSet(u16);
pub struct TypeSet(u32);
impl TypeSet {
/// All 12 type bits set, no type constraint (Top).
pub const TOP: Self = Self(0x0FFF);
/// All 19 type bits set, no type constraint (Top).
pub const TOP: Self = Self(0x0007_FFFF);
/// No type bits, unsatisfiable (Bottom).
pub const BOTTOM: Self = Self(0);
pub fn singleton(kind: &TypeKind) -> Self {
Self(1u16 << type_kind_index(kind))
Self(1u32 << type_kind_index(kind))
}
pub fn contains(&self, kind: &TypeKind) -> bool {
self.0 & (1u16 << type_kind_index(kind)) != 0
self.0 & (1u32 << type_kind_index(kind)) != 0
}
/// Meet (intersection): refine type knowledge.
@ -156,7 +156,7 @@ impl TypeSet {
/// Check if this set contains exactly one type matching the given kind.
pub fn is_singleton_of(&self, kind: &TypeKind) -> bool {
self.0 != 0 && self.0 == (1u16 << type_kind_index(kind))
self.0 != 0 && self.0 == (1u32 << type_kind_index(kind))
}
/// Return the TypeKind if this is a singleton set (exactly one type).
@ -186,12 +186,21 @@ fn type_kind_index(kind: &TypeKind) -> u32 {
TypeKind::LocalCollection => 12,
TypeKind::RequestBuilder => 13,
TypeKind::JpaCriteriaQuery => 14,
TypeKind::LdapClient => 15,
TypeKind::XPathClient => 16,
TypeKind::XmlParser => 17,
TypeKind::Template => 18,
// the analysis DTO types carry per-field structural info that the
// bitset domain can't represent. Collapse to Unknown so callers
// still see "any type possible" rather than crashing on an
// unhandled variant. Same-file/cross-file Dto-aware paths read
// the structured TypeKind directly, not via this index.
TypeKind::Dto(_) => 6,
// NullPrototypeObject is a JS-only sub-kind of Object used for
// flow-sensitive prototype-pollution suppression. The bitset
// domain has no dedicated slot, share the Object index so
// singleton recovery still maps to a meaningful TypeKind.
TypeKind::NullPrototypeObject => 3,
}
}
@ -212,6 +221,10 @@ fn type_kind_from_index(idx: u32) -> Option<TypeKind> {
12 => Some(TypeKind::LocalCollection),
13 => Some(TypeKind::RequestBuilder),
14 => Some(TypeKind::JpaCriteriaQuery),
15 => Some(TypeKind::LdapClient),
16 => Some(TypeKind::XPathClient),
17 => Some(TypeKind::XmlParser),
18 => Some(TypeKind::Template),
_ => None,
}
}
@ -801,7 +814,7 @@ pub struct PathEnv {
/// Per-key meet count for widening decisions.
meet_counts: SmallVec<[(SsaValue, u8); 8]>,
/// Refinement counter (bounded per block).
refine_count: u16,
refine_count: u32,
}
impl PathEnv {
@ -837,7 +850,7 @@ impl PathEnv {
if self.unsat {
return;
}
if self.refine_count >= MAX_REFINE_PER_BLOCK as u16 {
if self.refine_count >= MAX_REFINE_PER_BLOCK as u32 {
return; // bounded
}
let canonical = self.uf.find_immutable(v);
@ -860,7 +873,7 @@ impl PathEnv {
// but `refine_single` is also invoked directly from `assume_eq`,
// `assume_neq`, and a few internal sites. Large generated inputs
// (thousands of short statements on one line) can drive millions
// of calls and overflow a plain u16 `refine_count`. Saturate to
// of calls and overflow a plain u32 `refine_count`. Saturate to
// stay within bounds, the refinement pipeline is already
// idempotent past the cap, so saturation is semantically a no-op.
self.refine_count = self.refine_count.saturating_add(1);

View file

@ -250,6 +250,31 @@ pub fn class_name_to_type_kind(name: &str) -> Option<TypeKind> {
// Java I/O supertypes (enables hierarchy fallback for subtypes)
| "InputStream" | "OutputStream" | "Reader" | "Writer" | "PrintWriter"
| "BufferedInputStream" | "BufferedOutputStream" => Some(TypeKind::FileHandle),
// JNDI / Spring LDAP directory-service types. Field- and method-typed
// declarations (`DirContext ctx = ...`, `LdapTemplate ldapTemplate;`)
// attach this fact to the receiver SSA value so type-qualified
// resolution rewrites `ctx.search(...)` → `LdapClient.search`.
"DirContext" | "LdapContext" | "InitialDirContext" | "InitialLdapContext"
| "LdapTemplate" => Some(TypeKind::LdapClient),
// JAXP XML parser instances. Field/local declarations like
// `DocumentBuilder builder = factory.newDocumentBuilder();` route
// through this map so the receiver SSA value carries
// `TypeKind::XmlParser` and the type-qualified
// `XmlParser.parse` rule fires on `builder.parse(...)`.
"DocumentBuilder" | "SAXParser" | "XMLReader" | "SAXBuilder" => {
Some(TypeKind::XmlParser)
}
// JAXP XPath instances. `XPath xpath = factory.newXPath();`
// routes through this map so the receiver carries
// `TypeKind::XPathClient`, enabling the type-qualified
// `XPathClient.evaluate` resolution and the resolver-binding
// suppression sidecar.
"XPath" | "XPathExpression" => Some(TypeKind::XPathClient),
// Apache FreeMarker `Template` declared receiver type. Routes
// `Template tpl = ...; tpl.process(model, out)` through
// type-qualified resolution to `Template.process`, the SSTI
// sink defined in `labels/java.rs`.
"Template" => Some(TypeKind::Template),
// Python qualified type names.
// Only covers raw lowered names from isinstance(). The lowering in lower.rs
// extracts the literal type text: isinstance(x, requests.Session) produces

View file

@ -225,7 +225,17 @@ pub mod index {
/// * `"3"`, `ssa_function_bodies.body` changed from JSON TEXT to
/// bincode BLOB. Old JSON payloads cannot be deserialised by the
/// new engine, so they are silently rebuilt on open.
pub const SCHEMA_VERSION: &str = "3";
/// * `"4"`, `Cap` widened from u16 to u32 to accommodate cap bits
/// ≥ 14 (LDAP_INJECTION, XPATH_INJECTION, HEADER_INJECTION,
/// OPEN_REDIRECT, SSTI, XXE, PROTOTYPE_POLLUTION). The `Cap`
/// deserialiser accepts both u16- and u32-width JSON values, so
/// pre-bump caches load without crashing, but the cached
/// `source_caps` / `sanitizer_caps` / `sink_caps` blobs were
/// produced before any of these caps could appear and would
/// underreport rules that emit them. Bumping forces a rescan so
/// newly-emitted gates and sinks land in the cache with the wider
/// footprint.
pub const SCHEMA_VERSION: &str = "4";
// TODO: ADD CLEANS FOR EACH TABLE BASED ON PROJECT WHICH RUNS ON CLEAN
// TODO: ADD DROP AND GIVE A CLI PARAMETER FOR DROP
@ -2899,6 +2909,8 @@ fn make_test_callee_body(
type_facts: crate::ssa::type_facts::TypeFactResult {
facts: std::collections::HashMap::new(),
},
xml_parser_config: crate::ssa::xml_config::XmlParserConfigResult::default(),
xpath_config: crate::ssa::xpath_config::XPathConfigResult::default(),
alias_result: crate::ssa::alias::BaseAliasResult::empty(),
points_to: crate::ssa::heap::PointsToResult::empty(),
module_aliases: std::collections::HashMap::new(),
@ -3765,7 +3777,7 @@ fn metadata_table_survives_clear() {
/// receiver sentinel (`u32::MAX`), the container-element marker
/// (`<elem>`), and the `overflow` flag across serialise → store →
/// load → deserialise. This is the strict-additive contract for
/// pre-Phase-5 blobs (default-empty deserialises cleanly) and the
/// older blobs without field_points_to (default-empty deserialises cleanly) and the
/// completeness check for the W3 cross-call resolver.
#[test]
fn ssa_summaries_round_trip_preserves_field_points_to() {
@ -3840,15 +3852,15 @@ fn ssa_summaries_round_trip_preserves_field_points_to() {
assert!(!sum.field_points_to.overflow);
}
/// Pre-Phase-5 blob compatibility: a summary serialised without
/// Older blob compatibility: a summary serialised without
/// `field_points_to` deserialises with the empty default, no
/// migration needed because the field is `#[serde(default)]`.
#[test]
fn ssa_summaries_pre_phase5_blob_decodes_with_empty_field_points_to() {
fn ssa_summaries_legacy_blob_decodes_with_empty_field_points_to() {
use crate::summary::ssa_summary::SsaFuncSummary;
// Hand-craft JSON without the `field_points_to` key.
let pre_phase5_json = r#"{
let legacy_json = r#"{
"param_to_return": [],
"param_to_sink": [],
"source_caps": 0,
@ -3865,7 +3877,7 @@ fn ssa_summaries_pre_phase5_blob_decodes_with_empty_field_points_to() {
"return_path_facts": [],
"typed_call_receivers": []
}"#;
let sum: SsaFuncSummary = serde_json::from_str(pre_phase5_json).unwrap();
let sum: SsaFuncSummary = serde_json::from_str(legacy_json).unwrap();
assert!(
sum.field_points_to.is_empty(),
"missing field_points_to must default to empty",

View file

@ -217,15 +217,15 @@ pub struct Evidence {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub symbolic: Option<SymbolicVerdict>,
/// Resolved sink capability bits (u16 from `Cap::bits()`).
/// Resolved sink capability bits (u32 from `Cap::bits()`).
///
/// Used by deduplication to distinguish findings that share a
/// `(path, line, severity)` key but target different sinks (e.g.
/// `sink_sql(x); sink_shell(x);` on the same line). 0 when the sink
/// caps could not be resolved at the CFG node (e.g. pure summary
/// resolution where the caller's sink node carries no label).
#[serde(default, skip_serializing_if = "is_zero_u16")]
pub sink_caps: u16,
#[serde(default, skip_serializing_if = "is_zero_cap_bits")]
pub sink_caps: u32,
/// Engine provenance notes attached to this finding (e.g. "worklist
/// iteration budget was hit before convergence"), propagated from
@ -243,7 +243,7 @@ pub struct Evidence {
pub data_exfil_field: Option<String>,
}
fn is_zero_u16(v: &u16) -> bool {
fn is_zero_cap_bits(v: &u32) -> bool {
*v == 0
}

View file

@ -67,6 +67,30 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::SSRF),
case_sensitive: false,
},
// ─── LDAP injection sinks ───
//
// OpenLDAP / libldap surface: `ldap_search_s(ld, base, scope, filter, ...)`
// and the asynchronous variant `ldap_search_ext_s(ld, base, scope, filter,
// attrs, attrsonly, serverctrls, clientctrls, timeout, sizelimit, *res)`.
// The filter argument (position 3) is the LDAP-injection vector. No
// standard libldap escape helper exists in the C surface; sanitisation is
// typically caller-implemented (`sanitize_*` covers the developer-named
// case via the existing prefix rule above).
LabelRule {
matchers: &["ldap_search_s", "ldap_search_ext_s"],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── XPath injection sinks ───
//
// libxml2 evaluation entry points: `xmlXPathEvalExpression(expr, ctx)`,
// `xmlXPathEval(expr, ctx)`, `xmlXPathCompile(expr)`. The expression
// string is arg 0 and is the canonical XPath-injection vector.
LabelRule {
matchers: &["xmlXPathEvalExpression", "xmlXPathEval", "xmlXPathCompile"],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: false,
},
];
/// Gated sinks for C.

View file

@ -89,6 +89,24 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::SSRF),
case_sensitive: false,
},
// ─── LDAP injection sinks ───
//
// OpenLDAP / libldap C interface (also used from C++ wrappers): the filter
// argument carries attacker-controlled data unless explicitly escaped.
LabelRule {
matchers: &["ldap_search_s", "ldap_search_ext_s"],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── XPath injection sinks ───
//
// libxml2 (the dominant C++ XML parser surface): `xmlXPathEvalExpression`,
// `xmlXPathEval`, `xmlXPathCompile` accept the expression string as arg 0.
LabelRule {
matchers: &["xmlXPathEvalExpression", "xmlXPathEval", "xmlXPathCompile"],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: false,
},
];
/// Gated sinks for C++.

View file

@ -148,6 +148,97 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::CRYPTO),
case_sensitive: false,
},
// ─── LDAP injection sinks ───
//
// go-ldap (`github.com/go-ldap/ldap/v3`): `conn, _ := ldap.DialURL(url);
// req := ldap.NewSearchRequest(base, scope, deref, sizeLimit, timeLimit,
// typesOnly, filter, attrs, controls)`. The filter argument (position 6)
// is the LDAP-injection vector; passing the request to `conn.Search(req)`
// executes the filter. Type-qualified resolution rewrites `conn.Search`
// → `LdapClient.Search` when the receiver was returned by
// `ldap.DialURL` / `ldap.Dial` / `ldap.DialTLS` (see
// [`crate::ssa::type_facts::constructor_type`]). We also tag
// `ldap.NewSearchRequest` directly so taint reaching the filter argument
// surfaces at the construction call (matches the typical FP-free shape
// where the request is built once and passed straight to `Search`).
LabelRule {
matchers: &[
"LdapClient.Search",
"LdapClient.SearchWithPaging",
"ldap.NewSearchRequest",
],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── LDAP-filter sanitizer ───
//
// go-ldap exposes `ldap.EscapeFilter(s string) string` (RFC 4515 metachar
// escaping). Treat any call as clearing the LDAP_INJECTION cap.
LabelRule {
matchers: &["ldap.EscapeFilter"],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── Header / CRLF injection sinks ───
//
// `net/http` `ResponseWriter.Header()` returns a `Header` map; calls to
// `Set(name, val)` / `Add(name, val)` write a single header value.
// After paren-group stripping the chain text becomes
// `w.Header.Set` / `w.Header.Add`, so suffix matchers on `Header.Set` /
// `Header.Add` cover both the bound-receiver form (`w.Header().Set(...)`)
// and the documentation-style class-qualified form (`Header.Set`).
// Tainted strings without `\r\n` stripping enable response splitting.
LabelRule {
matchers: &["Header.Set", "Header.Add"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: true,
},
// ─── Header / CRLF sanitizers ───
//
// Project-local `stripCRLF` / `escapeHeader` helpers that strip `\r` and
// `\n` from a value before it is written to a response header.
LabelRule {
matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Open redirect sinks ───
//
// `net/http` `http.Redirect(w, r, url, code)` writes a `Location` header
// and a 3xx status from the supplied URL. Without an allowlist check,
// a tainted `url` is the canonical Go open-redirect vector.
LabelRule {
matchers: &["http.Redirect"],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
LabelRule {
matchers: &[
"validateRedirectUrl",
"isSafeRedirect",
"stripScheme",
"ensureRelativeUrl",
"assertRelativePath",
"isRelativeUrl",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── SSTI sinks ───
//
// `text/template` and `html/template` parse a template source string via
// `template.New(name).Parse(src)`. After paren-group stripping the chain
// text becomes `template.New.Parse`, so the suffix matcher catches both
// packages (`text/template`, `html/template`) regardless of import alias.
// `template.ParseFiles` / `ParseGlob` take file paths (path-traversal,
// not SSTI) and are intentionally excluded. `html/template`'s auto-
// escaping applies during `Execute`, not `Parse`, so a tainted source
// string still yields SSTI.
LabelRule {
matchers: &["template.New.Parse"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
},
];
/// Argument-role-aware Go sinks. Two classes coexist on the outbound HTTP

View file

@ -1,4 +1,6 @@
use crate::labels::{Cap, DataLabel, Kind, LabelRule, ParamConfig, RuntimeLabelRule};
use crate::labels::{
Cap, DataLabel, GateActivation, Kind, LabelRule, ParamConfig, RuntimeLabelRule, SinkGate,
};
use crate::utils::project::{DetectedFramework, FrameworkContext};
use phf::{Map, phf_map};
@ -265,6 +267,223 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::CODE_EXEC),
case_sensitive: false,
},
// ─── LDAP injection sinks ───
//
// JNDI / Spring LDAP search APIs accept an attacker-influenceable filter
// expression as either the second positional argument (`DirContext.search(name,
// filter, controls)` / `LdapTemplate.search(base, filter, mapper)`). Without
// RFC 4515 escaping the filter can be rewritten to bypass authentication or
// exfiltrate directory entries. Type-qualified resolution rewrites
// `ctx.search(...)` → `LdapClient.search` when the receiver carries a
// `TypeKind::LdapClient` fact (set by `class_name_to_type_kind` for the
// declared types `DirContext`, `InitialDirContext`, `LdapContext`,
// `LdapTemplate`, or by `constructor_type` for `new InitialDirContext(...)`
// / `new InitialLdapContext(...)`). Direct flat matchers cover the
// documentation-style class-qualified call forms that bypass receiver
// typing.
LabelRule {
matchers: &[
"LdapClient.search",
"LdapClient.searchByEntity",
"LdapClient.searchForObject",
"LdapClient.searchForContext",
"DirContext.search",
"LdapTemplate.search",
"LdapTemplate.searchByEntity",
"LdapTemplate.searchForObject",
"LdapTemplate.searchForContext",
"ctx.search",
],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── LDAP-filter sanitizers ───
//
// Spring LDAP's `LdapEncoder.filterEncode(s)` applies RFC 4515 escaping to
// metacharacters (`\`, `*`, `(`, `)`, ``). `nameEncode` performs the
// companion DN-component escaping. Both fully clear the LDAP_INJECTION
// cap; downstream sinks see a sanitised value.
LabelRule {
matchers: &["LdapEncoder.filterEncode", "LdapEncoder.nameEncode"],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── XPath injection sinks ───
//
// `javax.xml.xpath.XPath.evaluate(expr, source, ...)` and the matching
// `XPathExpression.evaluate(source)` accept an attacker-influenceable
// expression string. Without parameterisation via
// `XPathVariableResolver` the expression can be rewritten to bypass
// authentication or exfiltrate document subtrees. `XPath.compile(expr)`
// is the equivalent pre-compile entry point. Direct flat matchers cover
// the documentation-style class-qualified call forms.
LabelRule {
matchers: &[
"XPath.evaluate",
"XPath.compile",
"XPathExpression.evaluate",
"xpath.evaluate",
"xpath.compile",
],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── XPath escape sanitizers ───
//
// OWASP ESAPI's `Encoder.encodeForXPath(s)` escapes the XPath
// metacharacters (`'`, `"`, `[`, `]`, `(`, `)`, `,`, `=`, `<`, `>`,
// `*`). Project-local `xpathEscape` / `escapeXpath` are the common
// developer-named equivalents.
LabelRule {
matchers: &["Encoder.encodeForXPath", "xpathEscape", "escapeXpath"],
label: DataLabel::Sanitizer(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// Parameterised XPath via `XPath.setXPathVariableResolver(resolver)`
// suppression is implemented as a receiver-config sidecar in
// [`crate::ssa::xpath_config::XPathConfigResult`]: a
// `setXPathVariableResolver` call on a receiver carrying
// `TypeKind::XPathClient` flips the receiver's `has_resolver` flag,
// and the SSA sink-emission site strips `Cap::XPATH_INJECTION` from
// any later `xpath.evaluate(taintedExpr, ...)` whose receiver is
// provably bound. No flat sanitizer rule is needed (and a
// name-only rule would clear the wrong call site).
// ─── Header / CRLF injection sinks ───
//
// `HttpServletResponse.setHeader(name, val)` / `addHeader(name, val)`
// accept a single header value; tainted strings without `\r\n` stripping
// let an attacker inject extra headers (response splitting).
// `addCookie(c)` carries a `Cookie` whose constructor takes a value
// string; track at the higher-level setHeader / addHeader entry points.
LabelRule {
matchers: &["setHeader", "addHeader", "addCookie"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF sanitizers ───
LabelRule {
matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Open redirect sinks ───
//
// Servlet API: `HttpServletResponse.sendRedirect(url)`. Spring MVC
// controllers can also return a `"redirect:"` prefixed string but that
// sink shape is not modelled here.
LabelRule {
matchers: &["sendRedirect"],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
LabelRule {
matchers: &[
"validateRedirectUrl",
"isSafeRedirect",
"stripScheme",
"ensureRelativeUrl",
"assertRelativePath",
"isRelativeUrl",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── SSTI sinks ───
//
// Apache FreeMarker `Template.process(model, writer)` renders an
// already-parsed template; the SSTI vector is when the template source
// is attacker-influenced (e.g. `new Template(name, new StringReader(src), cfg)`).
// The flat matcher fires only when the receiver chain text resolves to
// `Template.process` — typically through a `Template`-typed declared
// receiver routed via type-qualified resolution. Without a `Template`
// TypeKind, idiomatic `Template tpl = new Template(...); tpl.process(...)`
// shapes are not recognised; tracked under deferred phases.
//
// Apache Velocity `Velocity.evaluate(ctx, writer, tag, src)` is modelled
// as a gated sink in `GATED_SINKS` below so only the template-source
// arg (index 3) activates SSTI; tainted variables in the `ctx` arg
// (data) stay clean.
LabelRule {
matchers: &["Template.process"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: true,
},
// ─── XXE sinks ───
//
// Java's stock XML parsers (JAXP) are XXE-vulnerable by default: the
// factories ship with external-entity / DTD resolution enabled and only
// become safe after `setFeature(FEATURE_SECURE_PROCESSING, true)` /
// disabling `external-general-entities` / `external-parameter-entities`.
// Tainted XML reaching any of these parser entry points is treated as
// an XXE flow; a config-check sanitizer pass (Phase XXE Layer 2) is
// out of scope for this rule and is the follow-up listed in
// `.pitboss/play/deferred.md`.
//
// Class-qualified suffix matching covers both the documentation-style
// `javax.xml.parsers.DocumentBuilder.parse(...)` form and the bound-
// receiver `XmlParser.parse(...)` form (when the receiver's TypeKind
// resolves to `XmlParser`). Bare `parse` is intentionally avoided to
// prevent collisions with `Integer.parseInt`, `LocalDate.parse`,
// generic JSON parsers, etc.
LabelRule {
matchers: &[
"DocumentBuilder.parse",
"SAXParser.parse",
"XMLReader.parse",
"SAXBuilder.build",
"XmlParser.parse",
"XmlParser.build",
],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
},
// ─── XXE config-setter sanitizers ───
//
// Phase 07: a JAXP `setFeature(...)` / `setExpandEntityReferences(...)`
// call is itself a label-level Sanitizer for `Cap::XXE` so that the
// *call's return value* (rare but exists for fluent factory APIs)
// does not carry XXE through it. The real load-bearing suppression
// is the receiver-fact path in
// [`crate::ssa::xml_config::XmlParserConfigResult`], which the SSA
// sink emission consults at every parse-class sink site. This rule
// is conservative noise reduction for downstream sinks that consume
// the setter call's value.
LabelRule {
matchers: &[
"setFeature",
"setExpandEntityReferences",
"setXIncludeAware",
"setValidating",
],
label: DataLabel::Sanitizer(Cap::XXE),
case_sensitive: true,
},
];
/// Java gated sinks. Argument-position-aware classification for callees
/// where the SSTI activation is restricted to the template-source arg
/// rather than every positional argument.
pub static GATED_SINKS: &[SinkGate] = &[
// Apache Velocity static API: `Velocity.evaluate(ctx, writer, logTag, src)`.
// Arg 3 carries the inline template source; tainted text at that
// position is SSTI. Tainted data in the context (arg 0) is rendered
// through Velocity's escape policy, not parsed as template source, so
// those flows must not activate SSTI. Activation is unconditional;
// payload_args narrows the cap to the template-source position.
SinkGate {
callee_matcher: "Velocity.evaluate",
arg_index: 3,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: true,
payload_args: &[3],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {

View file

@ -310,6 +310,178 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::SQL_QUERY),
case_sensitive: true,
},
// ─── LDAP injection sinks ───
//
// `ldapjs`: both the bound-variable idiom
// `const client = ldap.createClient({...}); client.search(...)` and the
// chained idiom `ldap.createClient({...}).search(...)` are covered by
// type-qualified receiver resolution. The receiver of the inner call is
// typed `TypeKind::LdapClient` via `ssa::type_facts::constructor_type`,
// and (for the bound-variable form) closure-captured types are forwarded
// into the per-function type-fact result by
// [`crate::taint::inject_external_type_facts`], so the qualified callee
// text resolves to `LdapClient.search` in both shapes.
LabelRule {
matchers: &["LdapClient.search"],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── LDAP-filter sanitizers ───
//
// The `ldap-escape` package exports `filter` and `dn` tagged-template
// helpers (`filter`\`(uid=${input})\``). After tree-sitter lifts the
// template-tag identifier, the callee text is the function name; suffix
// matching on `ldapEscape` / `ldapescape` covers `const ldapEscape =
// require('ldap-escape')` plus default-import shapes.
LabelRule {
matchers: &[
"ldapEscape",
"ldap-escape",
"ldapescape.filter",
"ldapescape.dn",
],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── XPath injection sinks ───
//
// `document.evaluate(expr, contextNode, ...)` (DOM) and the npm `xpath`
// package's `xpath.select(expr, doc)` / `xpath.evaluate(expr, doc, ...)`
// accept the expression string as arg 0; concatenated user input there
// is the canonical XPath-injection vector.
LabelRule {
matchers: &[
"document.evaluate",
"xpath.select",
"xpath.evaluate",
"xpath.select1",
],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── XPath escape sanitizers ───
//
// No standard library helper escapes XPath metacharacters; project-local
// `escapeXpath` / `xpathEscape` are the developer-named equivalents.
LabelRule {
matchers: &["escapeXpath", "xpathEscape", "escape_xpath"],
label: DataLabel::Sanitizer(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF injection sinks ───
//
// Express/Fastify/Node `http` response APIs that write a single header
// value: `res.setHeader(name, val)` (case-insensitive verb), `res.set`,
// `res.header`, `res.append`. Tainted strings here without `\r\n`
// stripping let an attacker inject extra headers (response splitting).
LabelRule {
matchers: &["setHeader", "res.set", "res.header", "res.append"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// Subscript-set form: `res.headers["X-Foo"] = bar` /
// `response.headers["X-Foo"] = bar`. The LHS-subscript classification
// path in `cfg/mod.rs::push_node` walks into the subscript's `object`
// and classifies its member-expression text, so the bare bracket form
// fires alongside `setHeader` / `res.set` / `res.header` / `res.append`.
LabelRule {
matchers: &["res.headers", "response.headers", "self.response.headers"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF sanitizers ───
//
// Project-local `stripCRLF` / `escapeHeader` helpers that strip `\r` and
// `\n` from a value before it is written to a response header.
LabelRule {
matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Prototype pollution sinks (library-mediated) ───
//
// Recursive merge / deep-assign helpers from lodash / common bundles.
// Argument-role gating (target vs src) is enforced via Destination
// activation in `GATED_SINKS` below: only taint flowing into the
// source-object arguments (positions 1+) activates; tainted-target-
// only is benign because writes to a tainted target object don't
// pollute `Object.prototype`. Flat rules here are intentionally
// empty for the merge family; see GATED_SINKS for the per-call
// gating. `_.template` is excluded — it is handled separately as
// a gated CODE_EXEC sink (Strapi CVE-2023-22621 evaluate:false
// suppression).
// ─── Open redirect sinks ───
//
// Express response redirect: `res.redirect(url)`. Browser-side
// navigation: `location.replace` / `location.assign` fire as direct
// calls; `window.location = url` / `window.location.href = url` /
// `location.href = url` fire as assignment-LHS sinks via the
// `member_expr_text` classification path in `cfg::push_node`.
// `router.navigate` covers the Angular Router (`Router.navigate`,
// `Router.navigateByUrl`) and the React-Router `useNavigate`-returned
// `navigate` function; suffix matching catches both the bound-receiver
// and direct-call shapes.
LabelRule {
matchers: &[
"res.redirect",
"location.replace",
"location.assign",
"router.navigate",
"router.navigateByUrl",
"window.location",
"window.location.href",
"location.href",
],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── Open-redirect URL allowlist sanitizers ───
//
// Project-local helpers that allowlist hosts or enforce relative-only
// URLs. `validateRedirectUrl` / `isSafeRedirect` are the canonical
// developer-named allowlist helpers; `stripScheme` clears any absolute
// scheme and degrades the URL to a relative path. `ensureRelativeUrl`
// / `assertRelativePath` cover the leading-slash / no-scheme idiom.
LabelRule {
matchers: &[
"validateRedirectUrl",
"isSafeRedirect",
"stripScheme",
"ensureRelativeUrl",
"assertRelativePath",
"isRelativeUrl",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── SSTI sinks ───
//
// Template-engine entry points that accept the template *source string*
// as the first argument: tainted arg 0 lets the attacker drive
// arbitrary template execution. `_.template` is excluded — it has
// its own gated CODE_EXEC classifier (Strapi CVE-2023-22621) that
// respects the `evaluate:false` opt-out. `nunjucks.renderString` is
// also excluded — see GATED_SINKS below for arg-0-only payload
// gating (suppresses tainted-`ctx`-only flows).
LabelRule {
matchers: &["Handlebars.compile"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
},
// ─── XXE sinks ───
//
// libxmljs `parseXmlString` / `parseXml` resolve external entities by
// default when called with `{ noent: true }` or
// `{ replaceEntities: true }`. The flat-rule modeling treats any call
// as a sink, the safe path requires explicit option suppression.
// libxmljs's own default ignores entities so the sink is conservative
// here; xml2js / fast-xml-parser are gated below in GATED_SINKS to
// suppress the safe-default case.
LabelRule {
matchers: &["libxmljs.parseXmlString", "libxmljs.parseXml"],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
},
];
/// Callee patterns that must never be classified as source/sanitizer/sink.
@ -420,6 +592,33 @@ pub static GATED_SINKS: &[SinkGate] = &[
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// ── XML XXE gates ─────────────────────────────────────────────────────
//
// `xml2js.parseString(xml, opts, cb)` is XXE-safe by default; opts
// `{ explicitChildren: true, charkey: '__cdata' }` are benign, but
// resolving entities at the underlying sax-js layer requires user
// intent. The gate fires only when the option object literal carries
// an entity-resolution kwarg with a truthy value (or is dynamic). Only
// the XML payload (arg 0) is the protected position.
SinkGate {
callee_matcher: "xml2js.parseString",
arg_index: 1,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[
("processEntities", &["true"]),
("explicitEntities", &["true"]),
("strict", &["false"]),
],
activation: GateActivation::ValueMatch,
},
// Note: `fast-xml-parser` (`new XMLParser({...}).parse(xml)`) is XXE-safe
// by default; flagging it would require constructor-option tracking via
// TypeFacts (XmlParser type with config carry). Deferred to Layer 2.
// ── Outbound HTTP clients (SSRF) ──────────────────────────────────────
//
// Policy: SSRF fires only when taint reaches the destination-bearing
@ -797,6 +996,282 @@ pub static GATED_SINKS: &[SinkGate] = &[
object_destination_fields: &[],
},
},
// `nunjucks.renderString(src, ctx)` — Nunjucks SSTI sink. Only the
// template *source* (arg 0) lets an attacker drive template execution;
// the `ctx` data object (arg 1) is rendered via the template's escape
// policy and is not itself a code-injection vector. Gate via
// Destination-style activation with `payload_args: &[0]` so taint
// flowing only into `ctx` is suppressed.
SinkGate {
callee_matcher: "nunjucks.renderString",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// ── Prototype pollution gates ────────────────────────────────────────
//
// Library-mediated recursive merge / deep-assign helpers. Argument-
// role gating: `(target, src1, src2, ...)` — only taint reaching a
// *source* position (index 1+) can pollute `Object.prototype` via
// `__proto__` / `constructor` keys on attacker-controlled input.
// Tainted target alone is benign (it just mutates that object).
// `payload_args: &[1, 2, 3, 4, 5]` covers the canonical 1-target +
// up-to-5-source signatures used by lodash / Object.assign / jQuery
// extend; arity beyond 5 is rare in practice and would over-suppress
// only at the long tail.
SinkGate {
callee_matcher: "_.merge",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.mergeWith",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.defaultsDeep",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `_.set(obj, path, value)` — both `path` (arg 1) and `value` (arg 2)
// can drive prototype pollution: a tainted path of `__proto__.foo`
// mutates `Object.prototype`, and a tainted value into `obj.__proto__`
// does the same. Object (arg 0) is the canonical target.
SinkGate {
callee_matcher: "_.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.setWith",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// Generic project-local deep-merge helpers. Suffix-matched so any
// `*.deepMerge` / `*.defaultsDeep` qualified call also resolves.
SinkGate {
callee_matcher: "deepMerge",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "defaultsDeep",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `Object.assign(target, ...sources)` is safe with constant-literal
// sources (`{a: 1, b: 2}`) but dangerous with attacker-controlled
// input (`req.body`). Gate target out of payload_args so tainted-
// target alone does not fire.
SinkGate {
callee_matcher: "Object.assign",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// jQuery / Zepto `$.extend(target, ...sources)` and `jQuery.extend`.
// Arg 0 may be a deep-flag boolean (`true`) when the deep-merge form
// is in use, in which case sources start at arg 2. Cover both
// shapes by listing arg 1, 2, 3, 4 in `payload_args`: a `true` first
// arg never carries taint, so its inclusion is harmless; for the
// shallow `$.extend(target, src)` form, src at arg 1 still fires.
SinkGate {
callee_matcher: "$.extend",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2, 3, 4],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "jQuery.extend",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2, 3, 4],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// Bare `extend` (suffix-matched) for jQuery's deep form imported as a
// bound name: `const { extend } = require('jquery'); extend(true, t, s)`.
// Suffix `extend` would over-fire on Backbone's `Model.extend(proto)` /
// `View.extend({...})` class-extension idiom, so this gate uses
// `LiteralOnly` activation: it fires only when arg 0 is the literal
// boolean `true` (the deep-flag form, never used by Backbone subclassing).
// Sources start at arg 2 because arg 0 is the flag and arg 1 is the
// target; tainting the target alone is benign.
SinkGate {
callee_matcher: "extend",
arg_index: 0,
dangerous_values: &["true"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::LiteralOnly,
},
// `set-value` standalone helper: `setValue(obj, key, val)` — historic
// CVE-2019-10747 (set-value <2.0.1) and CVE-2021-23440 (set-value <4.0.1)
// recursive set-by-path helper that did not block `__proto__` keys.
// Suffix-matched so qualified imports (`require('set-value')`) bound to
// `setValue` still resolve.
SinkGate {
callee_matcher: "setValue",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `dot-prop` standalone helper: `dotProp.set(obj, path, val)` —
// CVE-2020-8116. Path is a dotted-string with prototype-key support;
// a tainted `path` of `__proto__.x` mutates Object.prototype.
SinkGate {
callee_matcher: "dotProp.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `JSONPath` / `jsonpath-plus` `JSONPath({path: p, json: o, callback: fn})`
// historically supported a `resultType: 'value'` mode that, combined with
// `parent`/`parentProperty` writes inside the callback, can mutate the
// prototype chain. Recognise the `jp.set(obj, path, value)` family
// (jsonpath, jsonpath-plus) on the same shape as `_.set`.
SinkGate {
callee_matcher: "jp.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "jsonpath.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {

View file

@ -66,6 +66,17 @@ pub enum GateActivation {
/// selects which attribute is being set) and `parseFromString` (activation
/// arg selects the MIME type).
ValueMatch,
/// Strict literal-value activation. The gate fires only when the
/// activation arg is a literal that matches `dangerous_values` /
/// `dangerous_prefixes`. Unknown/dynamic activation arg suppresses
/// (no conservative ALL_ARGS_PAYLOAD push).
///
/// Used for ambiguously-named matchers where the dangerous shape is
/// only identifiable by an explicit literal flag — e.g. bare `extend`
/// where the deep-merge form is `extend(true, target, src)` but
/// Backbone's `Model.extend({proto})` shares the suffix. Conservative
/// fallback would over-fire on the class-extension form.
LiteralOnly,
/// Destination-bearing flow activation. The gate fires when taint reaches
/// a declared destination location at the call site, no literal
/// inspection, no prefix heuristic.
@ -156,53 +167,83 @@ bitflags! {
/// In practice: a finding fires when a tainted value reaches a sink and
/// `(value_caps & sink_caps) != 0`.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Cap: u16 {
pub struct Cap: u32 {
/// Taint that originated from an environment variable read.
/// Used as a source-origin marker for env-injection rules.
const ENV_VAR = 0b0000_0000_0000_0001; // bit 0
const ENV_VAR = 1 << 0;
/// Sanitizer: the value has passed through HTML entity escaping.
/// Strips XSS risk from values that reach HTML output sinks.
const HTML_ESCAPE = 0b0000_0000_0000_0010; // bit 1
const HTML_ESCAPE = 1 << 1;
/// Sanitizer: the value has been shell-argument escaped.
/// Strips command-injection risk before shell sinks.
const SHELL_ESCAPE = 0b0000_0000_0000_0100; // bit 2
const SHELL_ESCAPE = 1 << 2;
/// Sanitizer: the value has been percent-encoded for use in a URL.
const URL_ENCODE = 0b0000_0000_0000_1000; // bit 3
const URL_ENCODE = 1 << 3;
/// Sanitizer: the value was parsed through a structured JSON decoder
/// (as opposed to `eval`-based or regex parsing).
const JSON_PARSE = 0b0000_0000_0001_0000; // bit 4
const JSON_PARSE = 1 << 4;
/// Sink: file system read or write operation (path traversal, arbitrary
/// file read/write).
const FILE_IO = 0b0000_0000_0010_0000; // bit 5
const FILE_IO = 1 << 5;
/// Sink: format string injection (e.g. `printf`-family, `String.format`).
const FMT_STRING = 0b0000_0000_0100_0000; // bit 6
const FMT_STRING = 1 << 6;
/// Sink: SQL query construction. Fires for string-concatenated queries
/// and parameterized-query builders where the query text itself is tainted.
const SQL_QUERY = 0b0000_0000_1000_0000; // bit 7
const SQL_QUERY = 1 << 7;
/// Sink: unsafe object deserialization (Java `ObjectInputStream`,
/// Python `pickle`, Ruby `Marshal`, PHP `unserialize`, etc.).
const DESERIALIZE = 0b0000_0001_0000_0000; // bit 8
const DESERIALIZE = 1 << 8;
/// Sink: server-side request forgery. Fires when attacker-controlled
/// data reaches the destination URL of an outbound HTTP request.
const SSRF = 0b0000_0010_0000_0000; // bit 9
const SSRF = 1 << 9;
/// Sink: code or command execution (shell injection, `eval`, `exec`,
/// dynamic `require`/`import`, template injection).
const CODE_EXEC = 0b0000_0100_0000_0000; // bit 10
const CODE_EXEC = 1 << 10;
/// Sink: cryptographic operation with a tainted algorithm name or seed
/// (weak-crypto / predictable-randomness patterns).
const CRYPTO = 0b0000_1000_0000_0000; // bit 11
const CRYPTO = 1 << 11;
/// Request-bound, caller-supplied identifier that has not yet been
/// validated against an ownership/membership check. Used as the
/// carrier cap for folding `auth_analysis` into the SSA/taint
/// engine.
const UNAUTHORIZED_ID = 0b0001_0000_0000_0000; // bit 12
const UNAUTHORIZED_ID = 1 << 12;
/// Cross-boundary data-exfiltration: tainted sensitive data flowing
/// into outbound request bodies, headers, or other payload-bearing
/// fields of network egress APIs. Distinct from `SSRF` (attacker
/// control over the destination URL), `DATA_EXFIL` fires when the
/// destination is fixed but attacker-influenced data leaves the
/// process via the request payload.
const DATA_EXFIL = 0b0010_0000_0000_0000; // bit 13
const DATA_EXFIL = 1 << 13;
/// Sink: LDAP search/query construction. Fires when attacker-controlled
/// data reaches a directory-service filter or DN argument without
/// LDAP-filter escaping.
const LDAP_INJECTION = 1 << 14;
/// Sink: XPath expression construction. Fires when attacker-controlled
/// data is concatenated into an XPath query rather than passed via
/// XPath variable bindings.
const XPATH_INJECTION = 1 << 15;
/// Sink: HTTP response header value (or any CRLF-sensitive output).
/// Fires when attacker-controlled data lands in a `Set-Header` /
/// header-add call without `\r\n` stripping (response splitting).
const HEADER_INJECTION = 1 << 16;
/// Sink: redirect / `Location` header destination. Fires when an
/// attacker-controlled URL reaches a redirect call without an
/// allowlist or relative-URL check.
const OPEN_REDIRECT = 1 << 17;
/// Sink: server-side template injection. Fires when the **template
/// source string** itself is attacker-controlled (e.g.
/// `Template(user_input).render()`), distinct from rendering a
/// trusted template with tainted variables.
const SSTI = 1 << 18;
/// Sink: XML external entity resolution. Fires when attacker-controlled
/// XML reaches a parser configured to resolve external entities (or
/// missing the secure-processing feature).
const XXE = 1 << 19;
/// Sink: prototype pollution. Fires when an attacker-controlled key
/// reaches an object property assignment that can mutate
/// `Object.prototype` (`__proto__`, `constructor.prototype`, deep-merge
/// helpers).
const PROTOTYPE_POLLUTION = 1 << 20;
}
}
@ -214,14 +255,18 @@ impl Default for Cap {
impl serde::Serialize for Cap {
fn serialize<S: serde::Serializer>(&self, s: S) -> Result<S::Ok, S::Error> {
s.serialize_u16(self.bits())
s.serialize_u32(self.bits())
}
}
impl<'de> serde::Deserialize<'de> for Cap {
fn deserialize<D: serde::Deserializer<'de>>(d: D) -> Result<Self, D::Error> {
let bits = u16::deserialize(d)?;
Ok(Cap::from_bits_truncate(bits))
// Accept any unsigned integer width (existing JSON written with the
// u16 representation must continue to deserialise into the widened
// u32 cap field). serde-json hands these through `deserialize_u64`;
// the truncating cast preserves all currently-defined cap bits.
let bits = u64::deserialize(d)?;
Ok(Cap::from_bits_truncate(bits as u32))
}
}
@ -370,16 +415,46 @@ static GATED_REGISTRY: Lazy<HashMap<&'static str, &'static [SinkGate]>> = Lazy::
m.insert("js", javascript::GATED_SINKS);
m.insert("typescript", typescript::GATED_SINKS);
m.insert("ts", typescript::GATED_SINKS);
m.insert("python", python::GATED_SINKS);
m.insert("py", python::GATED_SINKS);
// Python prototype-pollution gates are opt-in: `dict.update(target,
// src)` overlaps too broadly with non-pollution use of `update`
// (Counter, namespaced state mutation) to ship as a default sink.
// The `NYX_PYTHON_PROTO_POLLUTION` env var enables them; when set
// the merged slice is leaked into a `'static` reference so the
// registry's lifetime invariant holds.
let python_gates: &'static [SinkGate] = if env_python_proto_pollution() {
let mut combined: Vec<SinkGate> = python::GATED_SINKS.to_vec();
combined.extend_from_slice(python::PROTO_POLLUTION_GATES);
Box::leak(combined.into_boxed_slice())
} else {
python::GATED_SINKS
};
m.insert("python", python_gates);
m.insert("py", python_gates);
m.insert("go", go::GATED_SINKS);
m.insert("php", php::GATED_SINKS);
m.insert("c", c::GATED_SINKS);
m.insert("cpp", cpp::GATED_SINKS);
m.insert("c++", cpp::GATED_SINKS);
m.insert("ruby", ruby::GATED_SINKS);
m.insert("rb", ruby::GATED_SINKS);
m.insert("java", java::GATED_SINKS);
m.insert("rust", rust::GATED_SINKS);
m.insert("rs", rust::GATED_SINKS);
m
});
/// Feature flag for the Python prototype-pollution gates. Disabled by
/// default; set `NYX_PYTHON_PROTO_POLLUTION=1` (or `true`) to enable
/// `dict.update` / `__dict__.update` proto-pollution detection.
fn env_python_proto_pollution() -> bool {
matches!(
std::env::var("NYX_PYTHON_PROTO_POLLUTION").ok().as_deref(),
Some("1") | Some("true") | Some("TRUE") | Some("yes") | Some("on")
)
}
/// Per-language exclusion patterns: callee text that must never be classified.
static EXCLUDES: Lazy<HashMap<&'static str, &'static [&'static str]>> = Lazy::new(|| {
let mut m = HashMap::new();
@ -725,6 +800,13 @@ pub fn parse_cap(s: &str) -> Option<Cap> {
"crypto" => Some(Cap::CRYPTO),
"unauthorized_id" => Some(Cap::UNAUTHORIZED_ID),
"data_exfil" | "data_exfiltration" => Some(Cap::DATA_EXFIL),
"ldap_injection" | "ldapi" => Some(Cap::LDAP_INJECTION),
"xpath_injection" | "xpathi" => Some(Cap::XPATH_INJECTION),
"header_injection" | "crlf" | "response_splitting" => Some(Cap::HEADER_INJECTION),
"open_redirect" | "redirect" => Some(Cap::OPEN_REDIRECT),
"ssti" | "template_injection" => Some(Cap::SSTI),
"xxe" => Some(Cap::XXE),
"prototype_pollution" | "proto_pollution" => Some(Cap::PROTOTYPE_POLLUTION),
"all" => Some(Cap::all()),
_ => None,
}
@ -1274,7 +1356,15 @@ pub fn classify_gated_sink(
// where `userAttr` is user-controlled) is itself a vulnerability
// path. Return ALL_ARGS_PAYLOAD so downstream sink scanning
// considers every positional argument.
//
// `LiteralOnly` opts out of this conservative branch: the gate
// requires positive literal evidence to fire, so unknown
// activation suppresses entirely (avoids false positives on
// ambiguously-named suffix matchers like bare `extend`).
None => {
if matches!(gate.activation, GateActivation::LiteralOnly) {
continue;
}
out.push(GateMatch {
label: gate.label,
payload_args: ALL_ARGS_PAYLOAD,
@ -1396,10 +1486,283 @@ pub fn cap_to_name(cap: Cap) -> &'static str {
Cap::CODE_EXEC => "code_exec",
Cap::CRYPTO => "crypto",
Cap::UNAUTHORIZED_ID => "unauthorized_id",
Cap::DATA_EXFIL => "data_exfil",
Cap::LDAP_INJECTION => "ldap_injection",
Cap::XPATH_INJECTION => "xpath_injection",
Cap::HEADER_INJECTION => "header_injection",
Cap::OPEN_REDIRECT => "open_redirect",
Cap::SSTI => "ssti",
Cap::XXE => "xxe",
Cap::PROTOTYPE_POLLUTION => "prototype_pollution",
_ => "unknown",
}
}
// ── Cap rule registry ────────────────────────────────────────────────────
//
// Static, single-source-of-truth metadata table keyed by [`Cap`]. Every
// vulnerability class with its own canonical rule id appears here; the
// per-language `RULES` arrays only carry the language-specific match shapes.
// Sink-cap fields on a finding (or `Cap::DATA_EXFIL` carried alongside) feed
// `cap_rule_meta()` to pick the rule id surfaced to SARIF, the dashboard,
// and `enumerate_builtin_rules()` for `nyx rules list`.
/// Static metadata for one cap-defined vulnerability class.
#[derive(Debug, Clone, Copy)]
pub struct CapRuleMeta {
pub cap: Cap,
/// Canonical rule id surfaced by finding emission (no source-suffix).
pub rule_id: &'static str,
/// Display title for `nyx rules list` and dashboard.
pub title: &'static str,
pub severity: crate::patterns::Severity,
/// OWASP 2021 code (e.g. `"A03"`).
pub owasp_code: &'static str,
/// OWASP 2021 long label (e.g. `"Injection"`).
pub owasp_label: &'static str,
pub description: &'static str,
/// `false` only for caps gated behind a config flag (e.g.
/// `Cap::UNAUTHORIZED_ID`, which still defers to the standalone
/// `auth_analysis` subsystem unless `enable_auth_as_taint` is on).
pub default_enabled: bool,
/// Whether the diag-id emission path in `ast.rs` actually surfaces
/// findings under [`Self::rule_id`]. When `false`, sink findings
/// for this cap currently surface under the legacy
/// `taint-unsanitised-flow` id (the per-language family-token
/// dispatch in [`crate::server::owasp::owasp_bucket_for`] still
/// buckets them correctly). Dashboards and `nyx rules list` consume
/// this flag to decide whether to surface the synthetic class entry
/// alongside live findings or hide it as forward-declared.
///
/// Migrating a cap from `false` → `true` requires adding it to the
/// cap-specific routing list in `ast.rs::diag_for_finding`; tests
/// that pin the legacy `taint-unsanitised-flow` rule id for that
/// cap must be updated to the cap-specific id.
pub emission_active: bool,
}
/// Registry of cap-class metadata. Keyed in cap-bit order so additions
/// stay clustered with their bitflag declarations.
pub static CAP_RULE_REGISTRY: &[CapRuleMeta] = &[
CapRuleMeta {
cap: Cap::FILE_IO,
rule_id: "taint-path-traversal",
title: "Path Traversal / Arbitrary File Access",
severity: crate::patterns::Severity::High,
owasp_code: "A01",
owasp_label: "Broken Access Control",
description: "Attacker-controlled data flows into a filesystem path without canonicalisation \
or root-confinement, allowing reads or writes outside the intended directory.",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::FMT_STRING,
rule_id: "taint-format-string",
title: "Format String Injection",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker-controlled data is used as a format string argument (printf-family, \
String.format) and can leak memory or crash the process.",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::SQL_QUERY,
rule_id: "taint-sql-injection",
title: "SQL Injection",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker-controlled data is concatenated into a SQL query string instead of \
being bound through a parameterised statement.",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::DESERIALIZE,
rule_id: "taint-deserialization",
title: "Unsafe Deserialization",
severity: crate::patterns::Severity::High,
owasp_code: "A08",
owasp_label: "Software and Data Integrity Failures",
description: "Attacker-controlled bytes are fed to an unsafe object deserialiser \
(pickle, ObjectInputStream, Marshal, unserialize) enabling arbitrary code \
execution via crafted payloads.",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::SSRF,
rule_id: "taint-ssrf",
title: "Server-Side Request Forgery",
severity: crate::patterns::Severity::High,
owasp_code: "A10",
owasp_label: "Server-Side Request Forgery",
description: "Attacker-controlled URL reaches the destination of an outbound HTTP request \
without an allowlist or scheme/host restriction.",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::CODE_EXEC,
rule_id: "taint-code-execution",
title: "Code / Command Execution",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker-controlled data reaches an `eval`/`exec`/shell sink, dynamic \
require/import, or other arbitrary-code construct.",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::CRYPTO,
rule_id: "taint-crypto-misuse",
title: "Tainted Cryptographic Parameter",
severity: crate::patterns::Severity::Medium,
owasp_code: "A02",
owasp_label: "Cryptographic Failures",
description: "Attacker-controlled data drives the algorithm name, key, or seed of a \
cryptographic primitive (weak-crypto / predictable-randomness).",
default_enabled: true,
emission_active: false,
},
CapRuleMeta {
cap: Cap::UNAUTHORIZED_ID,
rule_id: "rs.auth.missing_ownership_check.taint",
title: "Missing Ownership Check (taint variant)",
severity: crate::patterns::Severity::High,
owasp_code: "A01",
owasp_label: "Broken Access Control",
description: "Request-bound identifier reaches a privileged sink without an intervening \
ownership/membership check. Companion to the standalone `auth_analysis` \
rule; gated by `scanner.enable_auth_as_taint`.",
default_enabled: false,
emission_active: true,
},
CapRuleMeta {
cap: Cap::DATA_EXFIL,
rule_id: "taint-data-exfiltration",
title: "Sensitive Data Exfiltration",
severity: crate::patterns::Severity::High,
owasp_code: "A04",
owasp_label: "Insecure Design",
description: "Sensitive data (cookies, headers, env, db rows, files) flows into the body, \
headers, or other payload field of an outbound network request to a fixed \
destination.",
default_enabled: true,
emission_active: true,
},
// ── Cap-specific rule ids ────────────────────────────────────────────
CapRuleMeta {
cap: Cap::LDAP_INJECTION,
rule_id: "taint-ldap-injection",
title: "LDAP Injection",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker-controlled data is concatenated into an LDAP filter or DN without \
RFC 4515 escaping, letting the attacker rewrite the directory query.",
default_enabled: true,
emission_active: true,
},
CapRuleMeta {
cap: Cap::XPATH_INJECTION,
rule_id: "taint-xpath-injection",
title: "XPath Injection",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker-controlled data is concatenated into an XPath expression instead of \
passed through XPath variable bindings, letting the attacker rewrite the \
query.",
default_enabled: true,
emission_active: true,
},
CapRuleMeta {
cap: Cap::HEADER_INJECTION,
rule_id: "taint-header-injection",
title: "HTTP Header / Response Splitting",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker-controlled data lands in an HTTP response header without `\\r\\n` \
stripping, enabling response splitting and cache-poisoning attacks.",
default_enabled: true,
emission_active: true,
},
CapRuleMeta {
cap: Cap::OPEN_REDIRECT,
rule_id: "taint-open-redirect",
title: "Open Redirect",
severity: crate::patterns::Severity::Medium,
owasp_code: "A01",
owasp_label: "Broken Access Control",
description: "Attacker-controlled URL drives a redirect / `Location` header without an \
allowlist or relative-URL check, enabling phishing pivots.",
default_enabled: true,
emission_active: true,
},
CapRuleMeta {
cap: Cap::SSTI,
rule_id: "taint-template-injection",
title: "Server-Side Template Injection",
severity: crate::patterns::Severity::High,
owasp_code: "A03",
owasp_label: "Injection",
description: "Attacker controls the template *source string* (not just template variables) \
passed to a server-side renderer (Jinja2, Twig, Handlebars, ERB), enabling \
arbitrary expression evaluation.",
default_enabled: true,
emission_active: true,
},
CapRuleMeta {
cap: Cap::XXE,
rule_id: "taint-xxe",
title: "XML External Entity Resolution",
severity: crate::patterns::Severity::High,
owasp_code: "A05",
owasp_label: "Security Misconfiguration",
description: "Attacker-controlled XML reaches a parser configured to resolve external \
entities (or missing the secure-processing feature), enabling SSRF, file \
read, and DoS.",
default_enabled: true,
emission_active: true,
},
CapRuleMeta {
cap: Cap::PROTOTYPE_POLLUTION,
rule_id: "taint-prototype-pollution",
title: "Prototype Pollution",
severity: crate::patterns::Severity::High,
owasp_code: "A05",
owasp_label: "Security Misconfiguration",
description: "Attacker-controlled key reaches an object property assignment that can mutate \
`Object.prototype` (deep-merge / `__proto__` / dynamic subscript).",
default_enabled: true,
emission_active: true,
},
];
/// Resolve a cap to its canonical rule metadata. Returns `None` for caps
/// without a rule-emission role (origin / sanitizer markers like
/// [`Cap::ENV_VAR`], [`Cap::HTML_ESCAPE`]).
pub fn cap_rule_meta(cap: Cap) -> Option<&'static CapRuleMeta> {
CAP_RULE_REGISTRY.iter().find(|m| m.cap == cap)
}
/// Resolve any subset of `effective_caps` to a single rule id. When
/// multiple bits are set, picks the first registry entry that intersects
/// (registry order is bit-position). Returns `None` when no bit in the
/// set has a registered rule id.
pub fn rule_id_for_caps(effective_caps: Cap) -> Option<&'static str> {
CAP_RULE_REGISTRY
.iter()
.find(|m| effective_caps.contains(m.cap))
.map(|m| m.rule_id)
}
/// Generate a stable rule ID from language, kind, and matchers.
pub fn rule_id(lang: &str, kind: &str, matchers: &[&str]) -> String {
let mut sorted: Vec<&str> = matchers.to_vec();
@ -1418,11 +1781,25 @@ pub struct RuleInfo {
pub language: String,
pub kind: String,
pub cap: String,
pub cap_bits: u16,
pub cap_bits: u32,
pub matchers: Vec<String>,
pub case_sensitive: bool,
pub is_custom: bool,
pub is_gated: bool,
/// Cap-class registry entry (one per `Cap` with a canonical rule id),
/// distinct from per-language sink/source/sanitizer match rules. The
/// dashboard groups these separately so the rules surface does not mix
/// "the LDAP injection class exists" with "Java's `DirContext.search`
/// is a sink for that class".
pub is_class: bool,
/// For class entries (`is_class == true`), whether the diag-id
/// emission path in `ast.rs` actually surfaces findings under
/// [`Self::id`]. When `false`, the class is registered but live
/// findings still emerge under the legacy `taint-unsanitised-flow`
/// rule id; dashboards can use this flag to suppress the synthetic
/// entry until the cap is migrated to its specific rule id.
/// Always `true` for non-class label rules.
pub emission_active: bool,
pub enabled: bool,
}
@ -1430,6 +1807,27 @@ pub struct RuleInfo {
pub fn enumerate_builtin_rules() -> Vec<RuleInfo> {
let mut out = Vec::new();
// Cap-class entries (one per registered vulnerability class). Kind
// `class` so dashboards can distinguish them from per-language
// sink/source/sanitizer entries.
for meta in CAP_RULE_REGISTRY {
out.push(RuleInfo {
id: meta.rule_id.to_string(),
title: meta.title.to_string(),
language: "all".to_string(),
kind: "class".to_string(),
cap: cap_to_name(meta.cap).to_string(),
cap_bits: meta.cap.bits(),
matchers: Vec::new(),
case_sensitive: false,
is_custom: false,
is_gated: false,
is_class: true,
emission_active: meta.emission_active,
enabled: meta.default_enabled,
});
}
for &lang in CANONICAL_LANGS {
if let Some(rules) = REGISTRY.get(lang) {
for rule in *rules {
@ -1453,6 +1851,8 @@ pub fn enumerate_builtin_rules() -> Vec<RuleInfo> {
case_sensitive: rule.case_sensitive,
is_custom: false,
is_gated: false,
is_class: false,
emission_active: true,
enabled: true,
});
}
@ -1479,6 +1879,8 @@ pub fn enumerate_builtin_rules() -> Vec<RuleInfo> {
case_sensitive: gate.case_sensitive,
is_custom: false,
is_gated: true,
is_class: false,
emission_active: true,
enabled: true,
});
}
@ -1498,6 +1900,65 @@ pub fn custom_rule_id(lang: &str, kind: &str, matchers: &[String]) -> String {
mod tests {
use super::*;
/// Pin the current set of caps whose `rule_id` is reachable via the
/// diag-id routing in `ast.rs::diag_for_finding`. When migrating a
/// legacy cap (e.g. SQL_QUERY → `taint-sql-injection`), update both
/// `ast.rs` (add the cap to the cap-specific routing list) and the
/// `emission_active: true` flag in `CAP_RULE_REGISTRY`, then update
/// this assertion. The split exists because legacy taint findings
/// historically all surfaced under the generic `taint-unsanitised-flow`
/// rule id; the seven cap-specific routes (LDAP / XPath / header /
/// open redirect / SSTI / XXE / prototype pollution) plus
/// `unauthorized_id` and `data_exfil` are the only ones wired through.
#[test]
fn cap_rule_registry_emission_active_set_is_pinned() {
let active: Vec<Cap> = CAP_RULE_REGISTRY
.iter()
.filter(|m| m.emission_active)
.map(|m| m.cap)
.collect();
let expected = [
Cap::UNAUTHORIZED_ID,
Cap::DATA_EXFIL,
Cap::LDAP_INJECTION,
Cap::XPATH_INJECTION,
Cap::HEADER_INJECTION,
Cap::OPEN_REDIRECT,
Cap::SSTI,
Cap::XXE,
Cap::PROTOTYPE_POLLUTION,
];
for c in expected {
assert!(
active.contains(&c),
"cap {:?} expected to be emission_active in CAP_RULE_REGISTRY",
c
);
}
let inactive: Vec<Cap> = CAP_RULE_REGISTRY
.iter()
.filter(|m| !m.emission_active)
.map(|m| m.cap)
.collect();
let expected_inactive = [
Cap::FILE_IO,
Cap::FMT_STRING,
Cap::SQL_QUERY,
Cap::DESERIALIZE,
Cap::SSRF,
Cap::CODE_EXEC,
Cap::CRYPTO,
];
for c in expected_inactive {
assert!(
inactive.contains(&c),
"cap {:?} expected to be emission_inactive in CAP_RULE_REGISTRY (legacy \
finding still emits as taint-unsanitised-flow)",
c
);
}
}
#[test]
fn receiver_validator_python_relative_to() {
// Bare method name fires.
@ -1781,6 +2242,33 @@ mod tests {
// from `File.open` / `IO.open` / `URI.open`, each of which has its
// own non-piping semantics. Without the sigil, the suffix-with-
// boundary matcher would over-fire on every `X.open` call.
#[test]
fn classify_javascript_set_value_is_proto_pollution_gate() {
let no_kw = |_: &str| None;
let no_kw_present = |_: &str| false;
let result = classify_gated_sink("javascript", "setValue", |_| None, no_kw, no_kw_present);
assert!(
result
.iter()
.any(|m| m.label == DataLabel::Sink(Cap::PROTOTYPE_POLLUTION)),
"expected PROTOTYPE_POLLUTION gate match for bare `setValue`, got {result:?}"
);
}
#[test]
fn classify_javascript_dot_prop_set_is_proto_pollution_gate() {
let no_kw = |_: &str| None;
let no_kw_present = |_: &str| false;
let result =
classify_gated_sink("javascript", "dotProp.set", |_| None, no_kw, no_kw_present);
assert!(
result
.iter()
.any(|m| m.label == DataLabel::Sink(Cap::PROTOTYPE_POLLUTION)),
"expected PROTOTYPE_POLLUTION gate match for `dotProp.set`, got {result:?}"
);
}
#[test]
fn classify_ruby_bare_open_is_shell_escape_sink() {
let result = classify("ruby", "open", None);
@ -2419,7 +2907,7 @@ mod tests {
);
assert_eq!(
classify("rust", "Redirect::to(next)", Some(&extras)),
Some(DataLabel::Sink(Cap::SSRF)),
Some(DataLabel::Sink(Cap::OPEN_REDIRECT)),
);
let empty = rust::framework_rules(&FrameworkContext::default());
@ -2470,7 +2958,7 @@ mod tests {
);
assert_eq!(
classify("rust", "Redirect::to(next)", Some(&extras)),
Some(DataLabel::Sink(Cap::SSRF)),
Some(DataLabel::Sink(Cap::OPEN_REDIRECT)),
);
}
}

View file

@ -178,6 +178,143 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::DATA_EXFIL),
case_sensitive: true,
},
// ─── LDAP injection sinks ───
//
// PHP's procedural LDAP API: `ldap_search($ds, $base, $filter)`,
// `ldap_list($ds, $base, $filter)`, `ldap_read($ds, $base, $filter)`.
// The filter argument is the LDAP-injection vector when concatenated
// with attacker-controlled input.
LabelRule {
matchers: &["ldap_search", "ldap_list", "ldap_read"],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── LDAP-filter sanitizer ───
//
// `ldap_escape($value, $ignore, LDAP_ESCAPE_FILTER)` applies RFC 4515
// escaping; treat any `ldap_escape` call as clearing the LDAP_INJECTION
// cap (the no-flag default also escapes filter metacharacters
// conservatively).
LabelRule {
matchers: &["ldap_escape"],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── XPath injection sinks ───
//
// `DOMXPath::query($expr, $ctx)` and `DOMXPath::evaluate($expr, $ctx)`
// accept the expression string as arg 0; concatenated user input there
// is the canonical PHP XPath-injection vector. `SimpleXMLElement::xpath`
// takes the same shape. Direct flat matchers cover the
// class-qualified call forms.
// Type-qualified rewrites: `$xp = new DOMXPath($doc)` tags `$xp` as
// `TypeKind::XPathClient`, so `$xp->query(...)` / `$xp->evaluate(...)`
// resolve to `XPathClient.query` / `XPathClient.evaluate`. Without
// the distinct TypeKind, bare `query` would match the SQL_QUERY sink.
LabelRule {
matchers: &[
"XPathClient.query",
"XPathClient.evaluate",
"DOMXPath::query",
"DOMXPath::evaluate",
"SimpleXMLElement::xpath",
],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// Bare `xpath` method: SimpleXMLElement instances expose `->xpath($expr)`
// and Symfony / DOMCrawler wrappers do the same. Suffix matching on
// `xpath` covers `$xml->xpath(...)` and similar bound-receiver shapes
// where the receiver type is not statically known. Case-sensitive to
// avoid collisions with the `XPath` capitalisation used by qualified
// names.
LabelRule {
matchers: &["xpath"],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: true,
},
// ─── XPath escape sanitizers ───
//
// No PHP standard library helper escapes XPath metacharacters; project-
// local `escape_xpath` / `xpath_escape` are the developer-named
// equivalents.
LabelRule {
matchers: &["escape_xpath", "xpath_escape"],
label: DataLabel::Sanitizer(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF injection sinks ───
//
// PHP's `header($line)` writes a raw header line. Tainted strings
// without `\r\n` stripping let an attacker inject extra headers
// (response splitting); see GATED_SINKS for the corresponding
// OPEN_REDIRECT co-tag on `Location: ...` forms.
//
// The HEADER_INJECTION sink is intentionally implemented as a gate
// (not a flat rule) so the multi-gate SSA dispatch can co-emit it
// alongside the OPEN_REDIRECT gate on the same call site, producing
// separate findings for each cap with their canonical rule ids.
// ─── Header / CRLF sanitizers ───
LabelRule {
matchers: &["strip_crlf", "escape_header", "sanitize_header"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Open-redirect URL allowlist sanitizers ───
//
// Mirrors the JS/TS rule. Developer-named functions that allowlist
// / scheme-strip a redirect URL clear OPEN_REDIRECT taint before it
// reaches `header("Location: …")`. PHP also commonly uses
// `snake_case` variants.
LabelRule {
matchers: &[
"validateRedirectUrl",
"isSafeRedirect",
"stripScheme",
"validate_redirect_url",
"is_safe_redirect",
"strip_scheme",
"ensure_relative_url",
"ensureRelativeUrl",
"assert_relative_path",
"assertRelativePath",
"is_relative_url",
"isRelativeUrl",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── SSTI sinks ───
//
// Twig `\Twig\Environment::createTemplate(string $template)` parses an
// arbitrary template source string at runtime; a tainted source yields
// SSTI when the resulting template is rendered. `Environment::render`
// / `Environment::load` take a *template name* (file lookup, not source)
// and are intentionally excluded. After PHP scope-resolution stripping
// the chain text covers both `$twig->createTemplate($src)` and
// `Twig\Environment::createTemplate(...)` shapes.
LabelRule {
matchers: &["Environment.createTemplate", "Twig.createTemplate"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: true,
},
// ─── XXE sanitizers ───
//
// `libxml_disable_entity_loader(true)` (PHP <8) / `libxml_set_external_entity_loader($cb)`
// disable external-entity expansion process-wide. Treat their return
// value as XXE-cleared so config-style fixtures (`libxml_disable_entity_loader(true);
// simplexml_load_string($xml, ...)`) suppress the gate when the call is
// present in the same SSA scope. The flat-rule sanitizer is a coarse
// approximation, the real config-check pattern would track parser-instance
// hardening (deferred Layer 2).
LabelRule {
matchers: &[
"libxml_disable_entity_loader",
"libxml_set_external_entity_loader",
],
label: DataLabel::Sanitizer(Cap::XXE),
case_sensitive: false,
},
];
/// Gated sinks for PHP.
@ -193,18 +330,157 @@ pub static RULES: &[LabelRule] = &[
///
/// Identifier-based activation is enabled via the macro-arg fallback in
/// `cfg::mod::classify_gated_sink` for `lang == "php"`.
pub static GATED_SINKS: &[SinkGate] = &[SinkGate {
callee_matcher: "curl_setopt",
arg_index: 1,
dangerous_values: &["CURLOPT_POSTFIELDS", "CURLOPT_COPYPOSTFIELDS"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::DATA_EXFIL),
case_sensitive: true,
payload_args: &[2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
}];
pub static GATED_SINKS: &[SinkGate] = &[
SinkGate {
callee_matcher: "curl_setopt",
arg_index: 1,
dangerous_values: &["CURLOPT_POSTFIELDS", "CURLOPT_COPYPOSTFIELDS"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::DATA_EXFIL),
case_sensitive: true,
payload_args: &[2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// PHP `header($line)` HEADER_INJECTION sink. Modelled as a gate so
// it can coexist with the OPEN_REDIRECT gate below: the multi-gate
// SSA dispatch needs each capability declared on its own gate filter
// to emit one finding per cap. Always activates (Destination), with
// payload arg 0 only (`header()` only accepts the line as arg 0;
// arg 1 is `replace`/`response_code`, not the line content).
SinkGate {
callee_matcher: "=header",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// PHP `simplexml_load_string($xml, $class, $options)` —
// XXE sink gated on the `LIBXML_NOENT` flag (or `LIBXML_DTDLOAD`,
// `LIBXML_DTDATTR`). PHP's libxml is XXE-safe by default since 2.9.0;
// the gate fires only when the `$options` literal includes one of the
// dangerous flags. Identifier-based activation works via the macro-arg
// fallback in `cfg::mod::classify_gated_sink` for `lang == "php"`.
SinkGate {
callee_matcher: "simplexml_load_string",
arg_index: 2,
dangerous_values: &["LIBXML_NOENT", "LIBXML_DTDLOAD", "LIBXML_DTDATTR"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
SinkGate {
callee_matcher: "simplexml_load_file",
arg_index: 2,
dangerous_values: &["LIBXML_NOENT", "LIBXML_DTDLOAD", "LIBXML_DTDATTR"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// DOMDocument::loadXML($xml, $options) — same gating as
// simplexml_load_string. The chain-normalised callee text for
// `$dom->loadXML(...)` is `dom.loadXML`; suffix matching on
// `loadXML` covers the bound-receiver form.
SinkGate {
callee_matcher: "loadXML",
arg_index: 1,
dangerous_values: &["LIBXML_NOENT", "LIBXML_DTDLOAD", "LIBXML_DTDATTR"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// PHP `header($line)` co-tag for OPEN_REDIRECT.
//
// The flat HEADER_INJECTION sink (`=header`) above already fires for
// any `header(...)` call regardless of the line content. This gate
// adds the OPEN_REDIRECT co-tag specifically when the first argument
// is a `Location: ...` header, so the dashboard / OWASP bucket
// correctly classifies redirect-class flows independently of CRLF.
//
// Activation: arg 0 prefix `Location:` (case-insensitive). When arg
// 0 is a constant string starting with `Location:` the gate fires and
// checks payload arg 0 for taint; constants like `Content-Type: ...`
// are suppressed by the safe-literal branch. When arg 0 is a binary
// expression (`"Location: " . $url`) or otherwise dynamic, the
// value-extraction returns `None` and the gate fires conservatively
// — matching the existing convention in `setAttribute`/`parseFromString`.
SinkGate {
callee_matcher: "=header",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &["Location:"],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// Smarty `$smarty->fetch($name)` — only the `string:` resource prefix
// accepts an inline template *source*; the bare form (`page.tpl`) is a
// file lookup (not SSTI). Gate activates only when arg 0's leading
// literal segment is the `string:` prefix; the constant-string suffix
// and concat (`"string:" . $src`) shapes both reach `extract_const_string_arg`'s
// leading-literal path and trigger activation. Payload is arg 0
// itself — taint reaching the template source string is the SSTI flow.
// Suffix matching catches both `Smarty.fetch` and the bound-receiver
// `$smarty->fetch(...)` forms.
SinkGate {
callee_matcher: "Smarty.fetch",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &["string:"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// Twig `\Twig\Environment::createTemplate(string $template)` —
// gated SSTI sink. Activation is unconditional (no value gate);
// payload arg 0 is the template source string. Bare suffix
// `createTemplate` matches the idiomatic instance shape
// `$twig->createTemplate($src)` (chain text `twig.createTemplate`)
// as well as the static `Environment::createTemplate(...)` form;
// `createTemplate` is Twig-specific terminology so over-fire risk
// is low. The matching flat rule remains for documentation-style
// class-qualified call shapes.
SinkGate {
callee_matcher: "createTemplate",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {
// control-flow

View file

@ -61,7 +61,7 @@ pub static RULES: &[LabelRule] = &[
// pattern that follows `from flask import session`. The `=session`
// exact-match form fires only when the call is the bare top-level
// `session(...)` so accidental field projections like
// `obj.client.session` (Phase 2 chained-receiver lowering) don't get
// `obj.client.session` (chained-receiver lowering) don't get
// mis-labelled as sources.
LabelRule {
matchers: &[
@ -284,6 +284,212 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::DESERIALIZE),
case_sensitive: false,
},
// ─── LDAP injection sinks ───
//
// python-ldap exposes module-level `ldap.search_s` / `ldap.search_ext_s`
// and method-style `conn.search_s(base, scope, filter)` after `conn =
// ldap.initialize(url)`. Suffix matching on the method names catches both
// the qualified form (`ldap.search_s`, matched as a literal) and the
// bound-receiver form (`conn.search_s` ends with `search_s`). ldap3 uses
// `Connection(server, ...)` whose `.search(...)` accepts a filter kwarg /
// positional; receiver typing tags the connection as `TypeKind::LdapClient`
// so type-qualified resolution rewrites `conn.search` → `LdapClient.search`.
LabelRule {
matchers: &[
"ldap.search_s",
"ldap.search_ext_s",
"search_s",
"search_ext_s",
"LdapClient.search",
"ldap3.Connection.search",
],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── LDAP-filter sanitizers ───
//
// python-ldap: `ldap.filter.escape_filter_chars(s)` and ldap3's
// `ldap3.utils.conv.escape_filter_chars(s)` both apply RFC 4515 escaping
// to filter metacharacters. Suffix matching on `escape_filter_chars`
// covers both the fully-qualified import and the bare-name destructured
// import (`from ldap.filter import escape_filter_chars`).
LabelRule {
matchers: &[
"escape_filter_chars",
"ldap.filter.escape_filter_chars",
"ldap3.utils.conv.escape_filter_chars",
],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── XPath injection sinks ───
//
// lxml: `tree.xpath(expr)` / `etree.XPath(expr)` accept an
// attacker-influenceable expression string. ElementTree's
// `find` / `findall` / `findtext` accept the same kind of XPath subset
// and admit injection when the path is built by string concatenation.
// Suffix matching on the bare method names catches both
// `lxml.etree._Element.xpath(...)` and `tree.xpath(...)` shapes.
LabelRule {
matchers: &[
"xpath",
"lxml.etree.XPath",
"etree.XPath",
"ElementTree.find",
"ElementTree.findall",
"ElementTree.findtext",
],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: true,
},
// ─── XPath escape sanitizers ───
//
// No standard library helper escapes XPath metacharacters; project-local
// `escape_xpath` / `xpath_escape` are the developer-named equivalents.
LabelRule {
matchers: &["escape_xpath", "xpath_escape"],
label: DataLabel::Sanitizer(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF injection sinks ───
//
// Flask / Werkzeug response APIs that write a single header value:
// `response.headers.add(name, val)`, `response.set_cookie(name, val)`,
// and the bare subscript-set form `response.headers[name] = val`.
// The subscript-set form is picked up via the LHS-subscript
// classification path in `cfg/mod.rs::push_node`: the LHS object's
// member-expression text matches `response.headers` /
// `self.response.headers` and tags the assignment as a HEADER_INJECTION
// sink.
LabelRule {
matchers: &["headers.add", "headers.set", "set_cookie"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
LabelRule {
matchers: &["response.headers", "self.response.headers", "resp.headers"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF sanitizers ───
LabelRule {
matchers: &["strip_crlf", "escape_header", "sanitize_header"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Open redirect sinks ───
//
// Flask `redirect(url)`, Django `HttpResponseRedirect(url)`, FastAPI /
// Starlette `RedirectResponse(url=...)`. Tainted URL flowing to any of
// these without an allowlist check is an open-redirect vector.
LabelRule {
matchers: &[
"redirect",
"flask.redirect",
"django.shortcuts.redirect",
"HttpResponseRedirect",
"RedirectResponse",
],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: true,
},
LabelRule {
matchers: &[
"validate_redirect_url",
"is_safe_redirect",
"strip_scheme",
"ensure_relative_url",
"assert_relative_path",
"is_relative_url",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── SSTI sinks ───
//
// Template-engine constructors / `from_string` factories that accept the
// template *source string* as arg 0. `flask.render_template` takes a
// file PATH (not source) so does NOT match here — the safe API stays
// clean by name.
LabelRule {
matchers: &[
"=Template",
"jinja2.Template",
"jinja2.Environment.from_string",
"Environment.from_string",
// `compile_expression` is jinja2-specific terminology (it returns a
// callable from an inline expression source). Bare suffix lets the
// rule fire on idiomatic instance shapes (`env.compile_expression(s)`)
// without a `jinja2.Environment` TypeKind.
"compile_expression",
"mako.template.Template",
"Template.render",
],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: true,
},
// Template-loader paths: a tainted `name` lets the attacker swap the
// resolved template behind the renderer. Mako's `TemplateLookup.get_template`
// and Jinja2's `Environment.get_template` / `select_template` /
// `loader.get_source` all take a template name (path-like) as arg 0.
// Modeling these as SSTI sinks captures the loader-path attack — the
// file resolver itself becomes the gadget when the name is attacker-controlled.
LabelRule {
matchers: &[
"TemplateLookup.get_template",
"Environment.get_template",
"Environment.select_template",
"loader.get_source",
// Bare-suffix forms for the idiomatic instance shapes
// (`env.get_template(name)`, `lookup.get_template(name)`).
"get_template",
"select_template",
],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: true,
},
// ─── XXE sinks ───
//
// Python's stock `xml.sax.parseString` / `xml.sax.parse` parsers are
// XXE-vulnerable by default; `xml.dom.minidom.parseString` /
// `xml.dom.minidom.parse` likewise resolve external entities through
// the underlying expat parser unless the entity-loader is hardened.
// Each entry is the dotted-module suffix; bare `parseString` / `parse`
// are intentionally avoided to prevent collisions with JSON parsers
// (`json.loads`), `lxml.etree.fromstring` is excluded — modern lxml
// disables external entities by default and would over-fire here.
LabelRule {
matchers: &[
"xml.sax.parseString",
"xml.sax.parse",
"xml.dom.minidom.parseString",
"xml.dom.minidom.parse",
"xml.dom.pulldom.parseString",
"xml.dom.pulldom.parse",
],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
},
// `defusedxml.*` is the canonical hardened drop-in: every parser in
// the package strips external-entity / DTD resolution and raises on
// the patterns that would otherwise XXE. Treat any defusedxml
// call as an XXE sanitizer.
LabelRule {
matchers: &[
"defusedxml.ElementTree.fromstring",
"defusedxml.ElementTree.parse",
"defusedxml.minidom.parseString",
"defusedxml.minidom.parse",
"defusedxml.sax.parseString",
"defusedxml.sax.parse",
"defusedxml.pulldom.parseString",
"defusedxml.pulldom.parse",
"defusedxml.lxml.fromstring",
"defusedxml.lxml.parse",
],
label: DataLabel::Sanitizer(Cap::XXE),
case_sensitive: true,
},
];
/// Method-call validators that strip caps from their *receiver* (and
@ -1041,6 +1247,55 @@ pub static GATED_SINKS: &[SinkGate] = &[
},
];
/// Prototype-pollution-style gates for Python. Opt-in via the
/// `NYX_PYTHON_PROTO_POLLUTION` env var (see
/// `super::env_python_proto_pollution`); when enabled they are merged
/// into the language's `GATED_REGISTRY` slice at startup.
///
/// Coverage is deliberately narrow: the `dict.update(target, src)`
/// class-method form (where the first arg is the target and the second
/// is the source) is the canonical attack shape for `__class__` /
/// `__dict__` pollution in Python frameworks that thread user input
/// through configuration objects. The bound-method form
/// (`config.update(req_data)`) is handled by the suffix-matched
/// `dict.update` callee text only when the receiver text literally
/// equals `dict`, keeping the gate from over-firing on every `update`
/// method in the codebase.
pub static PROTO_POLLUTION_GATES: &[SinkGate] = &[
// `dict.update(target, src)` — class-method form. Argument-role
// gating: only `src` (arg 1) taint activates; tainted target alone
// is benign.
SinkGate {
callee_matcher: "dict.update",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// `obj.__dict__.update(src)` — instance-attribute pollution shape.
SinkGate {
callee_matcher: "__dict__.update",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {
// control-flow
"if_statement" => Kind::If,

View file

@ -1,4 +1,6 @@
use crate::labels::{Cap, DataLabel, Kind, LabelRule, ParamConfig, RuntimeLabelRule};
use crate::labels::{
Cap, DataLabel, GateActivation, Kind, LabelRule, ParamConfig, RuntimeLabelRule, SinkGate,
};
use crate::utils::project::{DetectedFramework, FrameworkContext};
use phf::{Map, phf_map};
@ -226,10 +228,30 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::SQL_QUERY),
case_sensitive: true,
},
// Open redirect: redirect_to with user-controlled destination.
// Open redirect: redirect_to (Rails) / redirect (Sinatra) with
// user-controlled destination. `redirect` is a top-level Sinatra
// helper; case-sensitive matching keeps it from over-firing on
// unrelated identifiers. `redirect_to` is the Rails canonical.
LabelRule {
matchers: &["redirect_to"],
label: DataLabel::Sink(Cap::SSRF),
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
LabelRule {
matchers: &["redirect"],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: true,
},
LabelRule {
matchers: &[
"validate_redirect_url",
"is_safe_redirect",
"strip_scheme",
"ensure_relative_url",
"assert_relative_path",
"is_relative_url",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// Path traversal: file serving with user-controlled path.
@ -244,6 +266,173 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::HTML_ESCAPE),
case_sensitive: false,
},
// ─── LDAP injection sinks ───
//
// `Net::LDAP.new(host:, ...).search(base:, filter:, ...)` is the canonical
// ruby-ldap shape. Type-qualified resolution rewrites `ldap.search` →
// `LdapClient.search` when the receiver was constructed via `Net::LDAP.new`
// / `Net::LDAP.open` (see [`crate::ssa::type_facts::constructor_type`]).
// The chained literal form `Net::LDAP.new(...).search(...)` is also caught
// by the suffix matcher `Net::LDAP.search` after `()` stripping (the
// post-strip text is `Net::LDAP.new.search`, which ends in `.search`; the
// explicit `LDAP.search` keyword form `Net::LDAP.search(filter)` matches
// the same matcher directly).
LabelRule {
matchers: &["LdapClient.search", "Net::LDAP.search"],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── LDAP-filter sanitizer ───
//
// `Net::LDAP::Filter.escape(value)` applies RFC 4515 escaping; treat any
// call as clearing the LDAP_INJECTION cap.
LabelRule {
matchers: &["Net::LDAP::Filter.escape"],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── XPath injection sinks ───
//
// `Nokogiri::XML::Node#xpath(expr)`, `at_xpath(expr)`, and `search(expr)`
// accept the expression string as arg 0; concatenated user input there is
// the canonical Nokogiri XPath-injection vector. Suffix matching on the
// bare method names catches the bound-receiver form (`doc.xpath(expr)`).
LabelRule {
matchers: &["xpath", "at_xpath"],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: true,
},
// ─── XPath escape sanitizers ───
//
// No Nokogiri / stdlib helper escapes XPath metacharacters; project-local
// `escape_xpath` / `xpath_escape` are the developer-named equivalents.
LabelRule {
matchers: &["escape_xpath", "xpath_escape"],
label: DataLabel::Sanitizer(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF injection sinks ───
//
// Rack `Response#set_header(name, value)` / `add_header(name, value)`
// and `ActionDispatch::Response#headers[]=` write a single header value.
// The subscript-set form `response.headers["X-Foo"] = bar` is picked up
// via the LHS-subscript classification path in `cfg/mod.rs`: when the
// LHS object's member-expression text matches `response.headers` (or a
// synonym), the assignment is tagged as a HEADER_INJECTION sink.
// Tainted strings without `\r\n` stripping enable response splitting.
LabelRule {
matchers: &["set_header", "add_header"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
LabelRule {
matchers: &["response.headers", "res.headers", "self.response.headers"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
LabelRule {
matchers: &["strip_crlf", "escape_header", "sanitize_header"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── SSTI sinks ───
//
// `ERB.new(template_source)` and `Liquid::Template.parse(source)` accept
// the template *source string* as arg 0; tainted source there yields
// arbitrary template execution at the corresponding `result(binding)` /
// `render` step. `=ERB.new` exact-matcher syntax limits the rule to the
// direct call (the leading `=` is the same convention used elsewhere in
// this file for Kernel-style globals like `=open`).
LabelRule {
matchers: &["=ERB.new", "Liquid::Template.parse"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: true,
},
// ─── XXE sinks ───
//
// `REXML::Document.new(xml)` instantiates the (legacy, default-vulnerable)
// pure-Ruby XML parser; an attacker-controlled `xml` is XXE.
//
// Nokogiri (`Nokogiri::XML(xml)` / `Nokogiri::XML::Document.parse(xml)`)
// is XXE-safe by default since 1.10, but resolving external entities
// requires explicitly opting in via `Nokogiri::XML::ParseOptions::NOENT`
// (or `DTDLOAD` / `DTDATTR`). Option-flagged detection lives in
// `GATED_SINKS` below.
LabelRule {
matchers: &["REXML::Document.new"],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
},
];
/// Ruby gated sinks. Argument-role-aware classification for callees that
/// are XXE-safe by default but become unsafe when the caller passes an
/// option flag that re-enables external-entity resolution.
///
/// Activation uses the bare-leaf comparison: scope-qualified constants like
/// `Nokogiri::XML::ParseOptions::NOENT` are reduced to the rightmost
/// `name` segment by the `scope_resolution` branch in
/// `cfg::literals::extract_const_macro_arg`, so the
/// `dangerous_values` list stays identifier-bare.
///
/// Default-arg semantics: Ruby `Nokogiri::XML(xml)` with no options arg
/// reaches the gate's `None` activation branch (the activation arg
/// position simply doesn't exist), which falls through to a conservative
/// fire. Callers wishing to suppress the gate explicitly should pass a
/// safe options literal at the activation position (e.g.
/// `Nokogiri::XML::ParseOptions::DEFAULT_XML`); any non-dangerous
/// scope-qualified constant disables the gate.
pub static GATED_SINKS: &[SinkGate] = &[
// `Nokogiri::XML(xml, url=nil, encoding=nil, options=NIL)` — top-level
// module method. arg 3 carries the parse-option flag literal.
//
// tree-sitter-ruby parses `Nokogiri::XML(args)` as a `call` whose
// `receiver` field is the `Nokogiri` constant and `method` field is
// the `XML` constant (with `::` as the call operator). `push_node`'s
// `CallMethod` path joins these as `{receiver}.{method}` → matchable
// suffix `Nokogiri.XML`.
SinkGate {
callee_matcher: "Nokogiri.XML",
arg_index: 3,
dangerous_values: &["NOENT", "DTDLOAD", "DTDATTR"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// `Nokogiri::XML::Document.parse(xml, url=nil, encoding=nil, options=NIL)`
// — receiver is the scope_resolution `Nokogiri::XML::Document` (text of
// the whole receiver is preserved verbatim) and method is `parse`, so
// the constructed callee text is `Nokogiri::XML::Document.parse`.
SinkGate {
callee_matcher: "Nokogiri::XML::Document.parse",
arg_index: 3,
dangerous_values: &["NOENT", "DTDLOAD", "DTDATTR"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// `Nokogiri::HTML(html, ..., options)` shares the same option flags as
// the XML helper. Same callee normalization as `Nokogiri.XML`.
SinkGate {
callee_matcher: "Nokogiri.HTML",
arg_index: 3,
dangerous_values: &["NOENT", "DTDLOAD", "DTDATTR"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {

View file

@ -1,4 +1,6 @@
use crate::labels::{Cap, DataLabel, Kind, LabelRule, ParamConfig, RuntimeLabelRule};
use crate::labels::{
Cap, DataLabel, GateActivation, Kind, LabelRule, ParamConfig, RuntimeLabelRule, SinkGate,
};
use crate::utils::project::{DetectedFramework, FrameworkContext};
use phf::{Map, phf_map};
@ -245,6 +247,89 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::DESERIALIZE),
case_sensitive: false,
},
// ─── Header / CRLF injection sinks ───
//
// `http::HeaderMap::insert(name, val)` / `append(...)` write a single
// header value. The canonical idiom is `response.headers_mut().insert(...)`
// (axum, actix-web `HttpResponse.headers_mut`, hyper `Response::headers_mut`).
// After paren-group stripping the chain text becomes
// `response.headers_mut.insert`, so suffix matchers on
// `headers_mut.insert` / `headers_mut.append` cover the bound-receiver
// form regardless of the response builder's concrete type. Tainted
// strings without CRLF stripping enable response splitting.
LabelRule {
matchers: &["headers_mut.insert", "headers_mut.append"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
LabelRule {
matchers: &["strip_crlf", "escape_header", "sanitize_header"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Open redirect sinks ───
//
// axum / rocket `Redirect::to(url)` / `Redirect::permanent(url)` /
// `Redirect::temporary(url)` build a 3xx response with the URL in the
// `Location` header. Without an allowlist check, a tainted `url` is
// the canonical Rust open-redirect vector. Listed unconditionally (not
// gated on framework detection) so non-framework helpers / re-exports
// still surface; the framework-conditional rules below are
// intentionally not duplicating this label. Actix
// `HttpResponse::Found().header("Location", x)` is covered by the
// existing `header` HEADER_INJECTION sink and any Location-line
// co-tagging is deferred to the abstract-string-domain pattern hook.
LabelRule {
matchers: &["Redirect::to", "Redirect::permanent", "Redirect::temporary"],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: true,
},
LabelRule {
matchers: &[
"validate_redirect_url",
"is_safe_redirect",
"strip_scheme",
"ensure_relative_url",
"assert_relative_path",
"is_relative_url",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
];
/// Rust gated sinks. Argument-position-aware classification for callees
/// where activation depends on a literal arg value rather than the bare
/// callee name.
pub static GATED_SINKS: &[SinkGate] = &[
// actix-web `HttpResponse::Found().header("Location", url)` (and other
// builder variants like `Ok().header(...)`, `MovedPermanently().header(...)`).
// After chain normalisation the callee text is e.g.
// `HttpResponse.Found.header`; suffix matching on `header` covers every
// builder variant.
//
// Activation: arg 0 case-insensitive equality with `"Location"`. When
// arg 0 is a constant string equal to `Location` the gate fires and
// checks payload arg 1 for taint; constants like `"Content-Type"` are
// suppressed by the safe-literal branch. When arg 0 is dynamic the
// gate fires conservatively (per the existing `setAttribute` /
// `parseFromString` convention).
//
// Mirrors PHP's `=header` Location gate; the Rust analog is split
// across two args (`name`, `value`) instead of PHP's single `Location: ...`
// line.
SinkGate {
callee_matcher: "header",
arg_index: 0,
dangerous_values: &["Location"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: true,
payload_args: &[1],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {
@ -337,11 +422,8 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
label: DataLabel::Sink(Cap::HTML_ESCAPE),
case_sensitive: true,
});
rules.push(RuntimeLabelRule {
matchers: vec!["Redirect::to".into()],
label: DataLabel::Sink(Cap::SSRF),
case_sensitive: true,
});
// `Redirect::to` is declared unconditionally as Sink(OPEN_REDIRECT)
// in `RULES` above; no framework-conditional duplicate needed.
}
if ctx.has(DetectedFramework::ActixWeb) {
@ -395,11 +477,8 @@ pub fn framework_rules(ctx: &FrameworkContext) -> Vec<RuntimeLabelRule> {
label: DataLabel::Sink(Cap::HTML_ESCAPE),
case_sensitive: true,
});
rules.push(RuntimeLabelRule {
matchers: vec!["Redirect::to".into()],
label: DataLabel::Sink(Cap::SSRF),
case_sensitive: true,
});
// `Redirect::to` is declared unconditionally as Sink(OPEN_REDIRECT)
// in `RULES` above; no framework-conditional duplicate needed.
}
rules

View file

@ -255,6 +255,113 @@ pub static RULES: &[LabelRule] = &[
label: DataLabel::Sink(Cap::SQL_QUERY),
case_sensitive: true,
},
// ─── LDAP injection sinks ───
//
// Mirror of `labels/javascript.rs`; ldapjs / ts-ldapjs has the same
// `client.search(...)` shape. Type-qualified resolution covers both
// `const client = ldap.createClient({...}); client.search(...)` (bound
// variable, type forwarded from the parent body via
// [`crate::taint::inject_external_type_facts`]) and the chained
// `ldap.createClient({...}).search(...)` form.
LabelRule {
matchers: &["LdapClient.search"],
label: DataLabel::Sink(Cap::LDAP_INJECTION),
case_sensitive: true,
},
// ─── LDAP-filter sanitizers ───
LabelRule {
matchers: &[
"ldapEscape",
"ldap-escape",
"ldapescape.filter",
"ldapescape.dn",
],
label: DataLabel::Sanitizer(Cap::LDAP_INJECTION),
case_sensitive: false,
},
// ─── XPath injection sinks ─── (mirrors `labels/javascript.rs`)
LabelRule {
matchers: &[
"document.evaluate",
"xpath.select",
"xpath.evaluate",
"xpath.select1",
],
label: DataLabel::Sink(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── XPath escape sanitizers ─── (mirrors `labels/javascript.rs`)
LabelRule {
matchers: &["escapeXpath", "xpathEscape", "escape_xpath"],
label: DataLabel::Sanitizer(Cap::XPATH_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF injection sinks ─── (mirrors `labels/javascript.rs`)
LabelRule {
matchers: &["setHeader", "res.set", "res.header", "res.append"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// Subscript-set form (mirrors `labels/javascript.rs`).
LabelRule {
matchers: &["res.headers", "response.headers", "self.response.headers"],
label: DataLabel::Sink(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Header / CRLF sanitizers ─── (mirrors `labels/javascript.rs`)
LabelRule {
matchers: &["stripCRLF", "stripCrlf", "escapeHeader", "sanitizeHeader"],
label: DataLabel::Sanitizer(Cap::HEADER_INJECTION),
case_sensitive: false,
},
// ─── Prototype pollution sinks ─── (mirrors `labels/javascript.rs`)
//
// Argument-role gating is enforced via Destination activation in
// `GATED_SINKS` below: only taint flowing into source-object
// arguments (positions 1+) activates; tainted-target alone is
// benign. Flat rules here are intentionally empty for the merge
// family.
// ─── Open redirect sinks ─── (mirrors `labels/javascript.rs`)
LabelRule {
matchers: &[
"res.redirect",
"location.replace",
"location.assign",
"router.navigate",
"router.navigateByUrl",
"window.location",
"window.location.href",
"location.href",
],
label: DataLabel::Sink(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
LabelRule {
matchers: &[
"validateRedirectUrl",
"isSafeRedirect",
"stripScheme",
"ensureRelativeUrl",
"assertRelativePath",
"isRelativeUrl",
],
label: DataLabel::Sanitizer(Cap::OPEN_REDIRECT),
case_sensitive: false,
},
// ─── SSTI sinks ─── (mirrors `labels/javascript.rs`; `_.template`
// and `nunjucks.renderString` excluded — gated classifiers in
// GATED_SINKS)
LabelRule {
matchers: &["Handlebars.compile"],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
},
// ─── XXE sinks ─── (mirrors `labels/javascript.rs`)
LabelRule {
matchers: &["libxmljs.parseXmlString", "libxmljs.parseXml"],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
},
];
/// Callee patterns that must never be classified as source/sanitizer/sink.
@ -309,6 +416,23 @@ pub static GATED_SINKS: &[SinkGate] = &[
dangerous_kwargs: &[],
activation: GateActivation::ValueMatch,
},
// ── XML XXE gates, mirrors `labels/javascript.rs` ────────────────────
SinkGate {
callee_matcher: "xml2js.parseString",
arg_index: 1,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::XXE),
case_sensitive: true,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[
("processEntities", &["true"]),
("explicitEntities", &["true"]),
("strict", &["false"]),
],
activation: GateActivation::ValueMatch,
},
// ── Outbound HTTP clients (SSRF), see javascript.rs for rationale ────
SinkGate {
callee_matcher: "fetch",
@ -603,6 +727,189 @@ pub static GATED_SINKS: &[SinkGate] = &[
object_destination_fields: &[],
},
},
// `nunjucks.renderString(src, ctx)` — Nunjucks SSTI sink. Only the
// template *source* (arg 0) lets an attacker drive template
// execution; the `ctx` data object (arg 1) is rendered via the
// template's escape policy and is not itself a code-injection
// vector. Gate via Destination-style activation with
// `payload_args: &[0]` so taint flowing only into `ctx` is
// suppressed. Mirrors `labels/javascript.rs`.
SinkGate {
callee_matcher: "nunjucks.renderString",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::SSTI),
case_sensitive: false,
payload_args: &[0],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// ── Prototype pollution gates ────────────────────────────────────────
//
// Mirrors `labels/javascript.rs` GATED_SINKS proto-pollution block.
// Argument-role gating: `(target, src1, src2, ...)`, only source
// positions trigger. See the JS module for the rationale and the
// `payload_args` width choice.
SinkGate {
callee_matcher: "_.merge",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.mergeWith",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.defaultsDeep",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.set",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "_.setWith",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "deepMerge",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "defaultsDeep",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: false,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "Object.assign",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "$.extend",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2, 3, 4],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
SinkGate {
callee_matcher: "jQuery.extend",
arg_index: 0,
dangerous_values: &[],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[1, 2, 3, 4],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::Destination {
object_destination_fields: &[],
},
},
// Bare `extend` (suffix-matched) — see labels/javascript.rs for full
// rationale. `LiteralOnly` activation requires arg 0 to be literal `true`
// so Backbone's `Model.extend({proto})` class-extension form does not
// fire (its arg 0 is an object literal, not a boolean).
SinkGate {
callee_matcher: "extend",
arg_index: 0,
dangerous_values: &["true"],
dangerous_prefixes: &[],
label: DataLabel::Sink(Cap::PROTOTYPE_POLLUTION),
case_sensitive: true,
payload_args: &[2, 3, 4, 5],
keyword_name: None,
dangerous_kwargs: &[],
activation: GateActivation::LiteralOnly,
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {

View file

@ -1181,7 +1181,12 @@ fn type_kind_tag(k: &TypeKind) -> String {
TypeKind::LocalCollection => "LocalCollection".into(),
TypeKind::RequestBuilder => "RequestBuilder".into(),
TypeKind::JpaCriteriaQuery => "JpaCriteriaQuery".into(),
TypeKind::LdapClient => "LdapClient".into(),
TypeKind::XPathClient => "XPathClient".into(),
TypeKind::XmlParser => "XmlParser".into(),
TypeKind::Template => "Template".into(),
TypeKind::Dto(_) => "Dto".into(),
TypeKind::NullPrototypeObject => "NullPrototypeObject".into(),
}
}
@ -1538,6 +1543,8 @@ pub fn analyse_function_taint(
receiver_seed: None,
const_values: Some(&opt.const_values),
type_facts: Some(&opt.type_facts),
xml_parser_config: Some(&opt.xml_parser_config),
xpath_config: Some(&opt.xpath_config),
ssa_summaries: None,
extra_labels: None,
callee_bodies: None,

View file

@ -138,6 +138,7 @@ pub struct RuleListItem {
pub enabled: bool,
pub is_custom: bool,
pub is_gated: bool,
pub is_class: bool,
pub case_sensitive: bool,
pub finding_count: usize,
pub suppression_rate: f64,
@ -156,6 +157,7 @@ pub struct RuleDetailView {
pub enabled: bool,
pub is_custom: bool,
pub is_gated: bool,
pub is_class: bool,
pub finding_count: usize,
pub suppression_rate: f64,
pub example_findings: Vec<RelatedFindingView>,

View file

@ -25,6 +25,20 @@ fn extract_family(rule_id: &str) -> &str {
rule_id
}
/// True when `rule_id` either equals `prefix` or starts with `prefix`
/// followed by one of the recognised separator characters used by the
/// finding-id emitter. Prevents `taint-ssrf-allowlist-violation`
/// from silently inheriting `taint-ssrf`'s OWASP bucket.
fn matches_cap_rule_id(rule_id: &str, prefix: &str) -> bool {
if !rule_id.starts_with(prefix) {
return false;
}
matches!(
rule_id.as_bytes().get(prefix.len()),
None | Some(b' ') | Some(b'(') | Some(b'.')
)
}
/// Return the OWASP 2021 (code, label) pair for a given rule id, or `None` if unmapped.
pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> {
let family = extract_family(rule_id);
@ -32,6 +46,27 @@ pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> {
return None;
}
// Cap-class rule ids carry their canonical OWASP code in
// `CAP_RULE_REGISTRY`; consult that first so adding a new cap class
// does not require updating two tables. The legacy family-token
// dispatch below covers per-language tree-sitter pattern rules
// (`js.xss.outer_html` style) that have no cap entry.
//
// Match shape: exact equality, or registry id followed by a separator
// that the emitter actually uses (` ` for ` (source 1:1)` suffixes,
// `(` for `(source 1:1)` style without a leading space, `.` for
// dotted variants like `rs.auth.missing_ownership_check.taint`).
// Plain `starts_with` would silently bucket a future
// `taint-ssrf-allowlist-violation` under the SSRF entry; the
// separator gate keeps unrelated suffixes from inheriting a parent
// bucket.
if let Some(meta) = crate::labels::CAP_RULE_REGISTRY
.iter()
.find(|m| matches_cap_rule_id(rule_id, m.rule_id))
{
return Some((meta.owasp_code, meta.owasp_label));
}
Some(match family {
// A01, Broken Access Control
"auth" | "csrf" | "mass_assign" | "path" | "redirect" => ("A01", "Broken Access Control"),
@ -39,10 +74,10 @@ pub fn owasp_bucket_for(rule_id: &str) -> Option<(&'static str, &'static str)> {
"crypto" | "secrets" => ("A02", "Cryptographic Failures"),
// A03, Injection (covers SQLi, XSS, command, code-eval, template, NoSQL, LDAP, reflection,
// and engine-level taint findings without a more specific family tag).
"sqli" | "xss" | "cmdi" | "code_exec" | "template" | "nosql" | "ldap" | "reflection"
| "taint" => ("A03", "Injection"),
// A05, Security Misconfiguration (TLS verify off, cookie flags, prototype pollution)
"config" | "transport" | "prototype" => ("A05", "Security Misconfiguration"),
"sqli" | "xss" | "cmdi" | "code_exec" | "template" | "nosql" | "ldap" | "xpath"
| "header" | "reflection" | "taint" => ("A03", "Injection"),
// A05, Security Misconfiguration (TLS verify off, cookie flags, prototype pollution, XXE)
"config" | "transport" | "prototype" | "xxe" => ("A05", "Security Misconfiguration"),
// A08, Software and Data Integrity Failures
"deser" => ("A08", "Software and Data Integrity Failures"),
// A09, Logging & Monitoring Failures
@ -112,6 +147,30 @@ fn issue_category_label(rule_id: &str) -> &'static str {
if rule_id.starts_with("taint-data-exfiltration") {
return "Data Exfiltration";
}
// Cap-class rule ids share the `taint` family token but each represent
// a distinct vulnerability class. Match them before falling through
// to family-based dispatch so the dashboard surfaces the right badge.
if rule_id.starts_with("taint-ldap-injection") {
return "LDAP Injection";
}
if rule_id.starts_with("taint-xpath-injection") {
return "XPath Injection";
}
if rule_id.starts_with("taint-header-injection") {
return "Header Injection";
}
if rule_id.starts_with("taint-open-redirect") {
return "Open Redirect";
}
if rule_id.starts_with("taint-template-injection") {
return "Template Injection";
}
if rule_id.starts_with("taint-xxe") {
return "XXE";
}
if rule_id.starts_with("taint-prototype-pollution") {
return "Prototype Pollution";
}
match extract_family(rule_id) {
"sqli" => "SQL Injection",
"xss" => "Cross-Site Scripting",
@ -229,6 +288,40 @@ mod tests {
assert_eq!(out[2].count, 2);
}
#[test]
fn cap_rule_id_match_requires_separator() {
// Exact match → bucketed.
assert_eq!(
owasp_bucket_for("taint-ssrf"),
Some(("A10", "Server-Side Request Forgery"))
);
// Suffix after recognised separators is bucketed.
assert_eq!(
owasp_bucket_for("taint-ssrf (source 1:1)"),
Some(("A10", "Server-Side Request Forgery"))
);
assert_eq!(
owasp_bucket_for("taint-ssrf(source 1:1)"),
Some(("A10", "Server-Side Request Forgery"))
);
// Dotted suffix (used by `rs.auth.missing_ownership_check.taint`).
assert_eq!(
owasp_bucket_for("rs.auth.missing_ownership_check.taint"),
Some(("A01", "Broken Access Control"))
);
// Hyphenated suffix without separator must NOT silently inherit
// the parent bucket. Falls through to the family-token table,
// where `ssrf` still resolves to A10, so use a hypothetical
// sibling that would only resolve via the cap registry.
assert_eq!(
owasp_bucket_for("taint-ldap-injection-allowlist"),
// Family token "taint" → A03; without separator gating this
// would have inherited the LDAP entry's A03 anyway, but the
// important property is that the registry match was rejected.
Some(("A03", "Injection"))
);
}
#[test]
fn issue_category_label_routes_data_exfil_to_dedicated_bucket() {
// `taint-data-exfiltration` shares the `taint` family token with

View file

@ -53,6 +53,8 @@ fn build_rule_list(state: &AppState) -> Vec<RuleInfo> {
case_sensitive: cr.case_sensitive,
is_custom: true,
is_gated: false,
is_class: false,
emission_active: true,
enabled,
});
}
@ -89,6 +91,7 @@ async fn list_rules(State(state): State<AppState>) -> Json<Vec<RuleListItem>> {
enabled: r.enabled,
is_custom: r.is_custom,
is_gated: r.is_gated,
is_class: r.is_class,
case_sensitive: r.case_sensitive,
finding_count: count,
suppression_rate: rate,
@ -134,6 +137,7 @@ async fn get_rule(
enabled: rule.enabled,
is_custom: rule.is_custom,
is_gated: rule.is_gated,
is_class: rule.is_class,
finding_count: total,
suppression_rate: rate,
example_findings: examples,

View file

@ -31,6 +31,8 @@ pub mod param_points_to;
pub mod pointsto;
pub mod static_map;
pub mod type_facts;
pub mod xml_config;
pub mod xpath_config;
#[allow(unused_imports)]
pub use ir::*;
@ -51,6 +53,20 @@ pub struct OptimizeResult {
pub const_values: HashMap<SsaValue, const_prop::ConstLattice>,
/// Type fact analysis results.
pub type_facts: type_facts::TypeFactResult,
/// XML-parser configuration facts: per-receiver SSA value
/// `secure_processing` / `disallow_doctype` / `external_entities`
/// flags carried forward from setter calls and constructor kwargs.
/// Consumed by the SSA taint engine to suppress XXE on parse-class
/// sinks whose receiver was provably hardened.
#[serde(default)]
pub xml_parser_config: xml_config::XmlParserConfigResult,
/// XPath-receiver configuration facts: per-receiver SSA value
/// `has_resolver` flag set by `setXPathVariableResolver` calls.
/// Consumed by the SSA taint engine to suppress XPATH_INJECTION on
/// `evaluate` / `compile` sinks whose receiver was provably bound
/// to a variable resolver (parameterised XPath shape).
#[serde(default)]
pub xpath_config: xpath_config::XPathConfigResult,
/// Base-variable alias groups from copy propagation.
pub alias_result: alias::BaseAliasResult,
/// Points-to analysis: per-SSA-value abstract heap object sets.
@ -100,6 +116,17 @@ pub fn optimize_ssa_with_param_types(
let type_facts =
type_facts::analyze_types_with_param_types(body, cfg, &cp.values, lang, param_types);
// 5b. XML-parser config analysis. Tracks per-receiver hardening
// flags so XXE sinks can be suppressed when the parser was provably
// configured for secure processing.
let xml_parser_config = xml_config::analyze_xml_parser_config(body, cfg, &cp.values, lang);
// 5c. XPath-receiver config analysis. Tracks per-receiver
// `has_resolver` flag so `XPath.evaluate(taintedExpr, ...)` sinks
// can be suppressed when the receiver was bound to an
// `XPathVariableResolver` (parameterised-XPath shape).
let xpath_config = xpath_config::analyze_xpath_config(body, cfg, lang);
// 6. Points-to analysis (uses allocation site detection + SSA def-use)
let points_to = heap::analyze_points_to(body, cfg, lang);
@ -113,6 +140,8 @@ pub fn optimize_ssa_with_param_types(
OptimizeResult {
const_values: cp.values,
type_facts,
xml_parser_config,
xpath_config,
alias_result,
points_to,
module_aliases,

View file

@ -52,12 +52,55 @@ pub enum TypeKind {
/// where openmrs / xwiki / keycloak Hibernate DAOs build queries
/// via `cb.createQuery(Foo.class)` + `Root` / `Predicate` API.
JpaCriteriaQuery,
/// An LDAP directory-service client / connection (`DirContext`,
/// `LdapTemplate`, `Net::LDAP`, `ldap3.Connection`, `ldap.createClient`,
/// `ldap.DialURL`, etc.). Distinct from `DatabaseConnection` so the
/// type-qualified `LdapClient.search` rule fires only on directory
/// search APIs rather than every DB receiver with a `search` method.
LdapClient,
/// An XPath query / evaluation client (`DOMXPath`, `XPath`,
/// `XPathExpression`, `lxml.etree.XPath`, etc.). Distinct from
/// `DatabaseConnection` so the type-qualified `XPathClient.query` /
/// `XPathClient.evaluate` rules fire only on XPath APIs rather than
/// every receiver with a generic `query` / `evaluate` method (avoids
/// collision with PHP `$pdo->query` SQL_QUERY sink).
XPathClient,
/// A pre-parsed template object whose `process` / `merge` /
/// `render` method renders bound data through an already-compiled
/// template body. The SSTI vector is when the template *source*
/// fed to the constructor / factory was attacker-influenced; the
/// render-time call site is the sink. Currently populated by
/// `new freemarker.template.Template(...)`; the type-qualified
/// resolver rewrites `tpl.process(...)` → `Template.process` so
/// the existing flat SSTI rule fires on idiomatic
/// `Template tpl = new Template(...); tpl.process(model, out)`
/// shapes.
Template,
/// An XML parser instance produced by a JAXP factory call
/// (`DocumentBuilderFactory.newDocumentBuilder()`,
/// `SAXParserFactory.newSAXParser()`, `XMLReaderFactory.createXMLReader()`).
/// `DOMXPath` and friends keep their own `XPathClient` tag. Used so
/// the type-qualified `XmlParser.parse` rule fires on instance-style
/// calls (`builder.parse(input)`) without needing a flat-rule
/// matcher per concrete subclass. Also gates the XXE config-fact
/// suppression: only XmlParser-typed receivers consult the
/// [`crate::ssa::xml_config::XmlParserConfigResult`] sidecar.
XmlParser,
/// A framework-injected DTO body whose field types are known.
/// Populated when a parameter is recognised as a typed extractor and
/// the DTO class / struct / Pydantic model is resolvable in scope.
/// Strictly additive, without a DTO definition, callers fall back
/// to name-only resolution.
Dto(DtoFields),
/// An object created with `Object.create(null)` — has no prototype
/// chain, so subscript-write keys cannot pollute `Object.prototype`.
/// Populated for JS/TS values whose constructor call is
/// `Object.create(null)`. The PROTOTYPE_POLLUTION suppression at the
/// synthetic `__index_set__` sink consults this fact (via SSA receiver
/// value) so the suppression is flow-sensitive: if a phi join leaves
/// the receiver only sometimes null-prototyped, the fact widens to
/// `Unknown` and the sink fires on the unsafe path.
NullPrototypeObject,
}
/// structural carrier for a recognised DTO type. Maps
@ -99,6 +142,10 @@ impl TypeKind {
Self::Url => Some("URL"),
Self::RequestBuilder => Some("RequestBuilder"),
Self::JpaCriteriaQuery => Some("JpaCriteriaQuery"),
Self::LdapClient => Some("LdapClient"),
Self::XPathClient => Some("XPathClient"),
Self::XmlParser => Some("XmlParser"),
Self::Template => Some("Template"),
_ => None,
}
}
@ -288,9 +335,11 @@ pub fn is_safe_query_object_arg(
/// authoritative, and consumers see Unknown instead of a wrong
/// type tag.
///
/// `_args` and `_consts` are kept on the signature so we can later
/// add arg-shape narrowing when class-literal lowering captures
/// `Foo.class` as an arg-use.
/// `_args` and `_consts` allow arg-shape narrowing when an arg's
/// constant value distinguishes overloads. Reserved for future Java
/// `createQuery(Foo.class)` shape (the `Object.create(null)` case is
/// driven by the `produces_null_proto` CFG flag instead, since a
/// literal `null` arg leaves no SSA value to inspect).
fn arg_aware_call_type(
lang: Lang,
callee: &str,
@ -392,6 +441,40 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option<TypeKind> {
"createCriteriaUpdate" | "createCriteriaDelete" | "createTupleQuery" | "subquery" => {
Some(TypeKind::JpaCriteriaQuery)
}
// LDAP directory-service clients. `new InitialDirContext(env)` /
// `new InitialLdapContext(env, ctls)` instantiate the JNDI LDAP
// provider; `new LdapTemplate(...)` / `LdapTemplate.<init>` is the
// Spring LDAP wrapper. Both expose `search` / `searchByEntity`
// /`searchForObject` overloads where filter/DN strings are LDAP
// injection sinks.
"InitialDirContext" | "InitialLdapContext" | "LdapTemplate" => {
Some(TypeKind::LdapClient)
}
// JAXP factory-produced XML parser instances. Each is
// XXE-vulnerable by default until hardened with
// `setFeature(FEATURE_SECURE_PROCESSING, true)` (or
// disallow-doctype-decl, etc.). The
// [`crate::ssa::xml_config::XmlParserConfigResult`] sidecar
// suppresses the XXE bit at the type-qualified `XmlParser.parse`
// sink when the receiver carries a hardening fact.
"newDocumentBuilder" | "newSAXParser" | "getXMLReader" | "newXMLReader"
| "createXMLReader" => Some(TypeKind::XmlParser),
// `XPathFactory.newXPath()` returns a JAXP `XPath` instance.
// Mapping it to `XPathClient` lets the type-qualified resolver
// pick up `xpath.evaluate(...)` against the existing
// `XPathClient.evaluate` rule and lets the
// [`crate::ssa::xpath_config::XPathConfigResult`] sidecar
// suppress XPATH_INJECTION when the receiver was bound to an
// `XPathVariableResolver`.
"newXPath" => Some(TypeKind::XPathClient),
// Apache FreeMarker `new Template(name, reader, cfg)` /
// `cfg.getTemplate(name)`. The `Template` instance's
// `.process(model, out)` is an SSTI sink when the
// constructor source / template body came from tainted
// input. Type-qualified resolution rewrites
// `tpl.process(...)` → `Template.process` against the
// existing flat rule in `labels/java.rs`.
"Template" | "getTemplate" => Some(TypeKind::Template),
_ => None,
},
Lang::JavaScript | Lang::TypeScript => match suffix {
@ -409,6 +492,12 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option<TypeKind> {
// `elementsMap.get(id)`, `origIdToDuplicateId.get(...)`,
// `groupIdMapForOperation.set(...)` shapes).
"Map" | "Set" | "WeakMap" | "WeakSet" | "Array" => Some(TypeKind::LocalCollection),
// ldapjs client factory: `ldap.createClient({ url: '…' })` returns
// a Client whose `search(base, opts, cb)` is an LDAP injection
// sink. Match the qualified callee text rather than the bare
// `createClient` suffix to avoid widening to unrelated factories
// with the same verb name.
"createClient" if callee.contains("ldap") => Some(TypeKind::LdapClient),
_ => None,
},
Lang::Python => {
@ -429,6 +518,15 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option<TypeKind> {
} else if suffix == "open" && !callee.contains('.') {
// Bare `open()` is file I/O in Python
Some(TypeKind::FileHandle)
} else if callee == "ldap.initialize"
|| callee == "ldap3.Connection"
|| callee.ends_with(".initialize") && callee.contains("ldap")
{
// python-ldap: `conn = ldap.initialize(url)` returns an
// LDAPObject whose `search_s` / `search_ext_s` methods are
// LDAP-injection sinks. ldap3: `Connection(server, ...)`
// returns a Connection with a `search()` method.
Some(TypeKind::LdapClient)
} else {
None
}
@ -442,6 +540,10 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option<TypeKind> {
Some(TypeKind::FileHandle)
} else if callee.contains("url.") && suffix == "Parse" {
Some(TypeKind::Url)
} else if callee.contains("ldap.") && matches!(suffix, "Dial" | "DialURL" | "DialTLS") {
// go-ldap (`github.com/go-ldap/ldap/v3`): `conn, _ := ldap.DialURL(url)`
// returns `*ldap.Conn` whose `Search(req)` is an LDAP-injection sink.
Some(TypeKind::LdapClient)
} else {
None
}
@ -451,6 +553,10 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option<TypeKind> {
"curl_init" => Some(TypeKind::HttpClient),
"fopen" => Some(TypeKind::FileHandle),
"SplFileObject" => Some(TypeKind::FileHandle),
// DOMXPath: `$xp = new DOMXPath($doc)`. `$xp->query($expr)` /
// `$xp->evaluate($expr)` are XPath-injection sinks; without a
// distinct TypeKind they collide with the bare `query` SQL sink.
"DOMXPath" => Some(TypeKind::XPathClient),
_ => None,
},
Lang::C => match suffix {
@ -524,6 +630,11 @@ pub(crate) fn constructor_type(lang: Lang, callee: &str) -> Option<TypeKind> {
Some(TypeKind::DatabaseConnection)
} else if after_colons.starts_with("File.") && matches!(suffix, "open" | "new") {
Some(TypeKind::FileHandle)
} else if callee.contains("Net::LDAP") && matches!(suffix, "new" | "open") {
// net-ldap gem: `Net::LDAP.new(host: ...)` / `Net::LDAP.open`
// returns a connection whose `search(base:, filter:)` accepts
// an attacker-influenceable filter expression.
Some(TypeKind::LdapClient)
} else {
None
}
@ -768,8 +879,7 @@ pub fn analyze_types(
/// Same as [`analyze_types`] but seeds [`SsaOp::Param`] values with
/// per-position [`TypeKind`] facts from `param_types` (parallel-vec to
/// the function's BodyMeta.params). An entry of `None` (or an out-of-
/// range index) leaves the value at the default Param fact (Unknown),
/// preserving the pre-Phase-3 behaviour.
/// range index) leaves the value at the default Param fact (Unknown).
pub fn analyze_types_with_param_types(
body: &SsaBody,
cfg: &Cfg,
@ -810,8 +920,7 @@ pub fn analyze_types_with_param_types(
SsaOp::Param { index } => {
// Seed from the function's BodyMeta.param_types when
// a TypeKind was recovered at CFG construction time.
// Out-of-range / None entries fall back to Unknown,
// matching the pre-Phase-3 behaviour.
// Out-of-range / None entries fall back to Unknown.
match param_types.get(*index).and_then(|t| t.clone()) {
Some(tk) => TypeFact::from_kind(tk),
None => TypeFact::unknown(),
@ -820,7 +929,19 @@ pub fn analyze_types_with_param_types(
SsaOp::SelfParam => TypeFact::from_kind(TypeKind::Object),
SsaOp::CatchParam => TypeFact::from_kind(TypeKind::Object),
SsaOp::Call { callee, args, .. } => {
if let Some(ty) = lang.and_then(|l| constructor_type(l, callee)) {
// CFG marks `Object.create(null)` (and future
// null-prototype constructors) at lowering time.
// Honour it ahead of generic constructor / arg-aware
// dispatch so the returned SsaValue carries
// `NullPrototypeObject` for prototype-pollution
// suppression.
let null_proto = cfg
.node_weight(inst.cfg_node)
.map(|ni| ni.call.produces_null_proto)
.unwrap_or(false);
if null_proto {
TypeFact::from_kind(TypeKind::NullPrototypeObject)
} else if let Some(ty) = lang.and_then(|l| constructor_type(l, callee)) {
TypeFact::from_kind(ty)
} else if let Some(ty) =
lang.and_then(|l| arg_aware_call_type(l, callee, args, consts))
@ -1667,7 +1788,7 @@ mod tests {
/// Param values seeded from `param_types` must surface
/// the right TypeKind for downstream sink suppression. An out-of-
/// range index falls back to Unknown (the pre-Phase-3 default).
/// range index falls back to Unknown.
#[test]
fn param_types_seed_param_value_facts() {
use crate::cfg::Cfg;
@ -1728,7 +1849,7 @@ mod tests {
// Index 99 is out of range → falls back to Unknown.
assert_eq!(result.get_type(SsaValue(1)), Some(&TypeKind::Unknown));
// Empty slice = pre-Phase-3 behaviour.
// Empty slice = type-unaware fallback (analyze_types path).
let result2 = analyze_types(&body, &cfg, &consts, Some(Lang::Java));
assert_eq!(result2.get_type(SsaValue(0)), Some(&TypeKind::Unknown));
}
@ -2364,7 +2485,7 @@ mod tests {
));
}
// ── JPA Criteria query suppression (Phase: real-repo openmrs FP) ───
// ── JPA Criteria query suppression (real-repo openmrs FP) ─────────
//
// These tests pin the `TypeKind::JpaCriteriaQuery` variant + the
// `is_safe_query_object_arg` predicate + the

614
src/ssa/xml_config.rs Normal file
View file

@ -0,0 +1,614 @@
//! Per-SSA-value XML-parser configuration tracking.
//!
//! Tracks "is this XML parser configured to disable external entities / DTD
//! resolution" facts on parser-receiver SSA values. When a parse-class sink
//! is reached and the receiver is provably configured for secure processing,
//! the XXE bit is stripped from the sink's cap mask.
//!
//! The pass is intentionally a small forward dataflow run alongside type-fact
//! analysis. It does NOT flow through the SSA taint engine's worklist. Phi
//! nodes propagate the meet of operand configs (a flag is "set" only when all
//! reaching operands set it), and copy assignments propagate the receiver's
//! config. Recognised setter calls update the receiver's config in place;
//! identity-style transformer calls that produce a child parser (e.g.
//! `factory.newDocumentBuilder()`) inherit the receiver's config into the
//! result value.
use std::collections::HashMap;
use super::const_prop::ConstLattice;
use super::ir::*;
use crate::cfg::Cfg;
use crate::symbol::Lang;
use serde::{Deserialize, Serialize};
/// Receiver-instance config carried forward from setter calls.
///
/// All flags default to `false` (parser may be unsafe). A `true` flag
/// means: we have proven this parser was hardened along this control-flow
/// path. The XXE-suppression check is `secure_processing ||
/// disallow_doctype` — either gate is sufficient to neutralise external
/// entity resolution in JAXP / lxml / xml2js.
///
/// `external_entities` is the *unsafe* polarity: when set to `true`, the
/// parser was explicitly opted into external-entity resolution (e.g.
/// `XMLParser(resolve_entities=True)`). A parse call with this flag
/// retains XXE even if the language default would otherwise be safe.
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
pub struct XmlParserConfig {
pub secure_processing: bool,
pub disallow_doctype: bool,
pub external_entities: bool,
}
impl XmlParserConfig {
/// True when the parser is provably hardened against XXE.
pub fn is_secure(&self) -> bool {
(self.secure_processing || self.disallow_doctype) && !self.external_entities
}
/// Phi-meet: a flag survives only when *both* operands set it. Used
/// when the parser variable was reassigned across branches.
fn meet(&self, other: &Self) -> Self {
XmlParserConfig {
secure_processing: self.secure_processing && other.secure_processing,
disallow_doctype: self.disallow_doctype && other.disallow_doctype,
// Unsafe polarity: ANY branch enabling external entities
// contaminates the join. Conservative w.r.t. XXE.
external_entities: self.external_entities || other.external_entities,
}
}
/// Union: caller updates the same receiver across multiple setter
/// calls. All known-safe flags accumulate; unsafe is sticky.
fn union(&self, other: &Self) -> Self {
XmlParserConfig {
secure_processing: self.secure_processing || other.secure_processing,
disallow_doctype: self.disallow_doctype || other.disallow_doctype,
external_entities: self.external_entities || other.external_entities,
}
}
}
/// Result of XML-parser config analysis.
#[derive(Clone, Debug, Default, Serialize, Deserialize)]
pub struct XmlParserConfigResult {
pub configs: HashMap<SsaValue, XmlParserConfig>,
}
impl XmlParserConfigResult {
/// True when the value carries a config fact proving secure processing.
pub fn is_secure(&self, v: SsaValue) -> bool {
self.configs.get(&v).is_some_and(|c| c.is_secure())
}
/// True when the value was explicitly opted into external-entity
/// resolution (e.g. lxml `resolve_entities=True`).
pub fn is_unsafe_explicit(&self, v: SsaValue) -> bool {
self.configs.get(&v).is_some_and(|c| c.external_entities)
}
}
/// Suppress the `Cap::XXE` bit when the receiver of an XXE-class sink
/// was provably hardened. Returns `true` when XXE should be stripped
/// from the sink's cap mask.
///
/// Conservative defaults:
/// * No receiver SSA value (free function) → returns `false` (cannot
/// prove safety, fall through to existing classification).
/// * Receiver carries no config fact → returns `false`.
/// * `external_entities` flag is set → returns `false` even if a safe
/// flag is also set, since the unsafe opt-in dominates.
pub fn xxe_safe(receiver: Option<SsaValue>, xml_config: &XmlParserConfigResult) -> bool {
let Some(rv) = receiver else {
return false;
};
xml_config.is_secure(rv)
}
/// Per-call analysis result: how this call mutates the parser-config
/// universe.
#[allow(dead_code)] // SeedResult reserved for future constructor-driven seeding
enum ConfigEffect {
/// No effect on parser configuration.
None,
/// Update the call's receiver in place by OR-ing the supplied config
/// into its current config. Used for setter calls
/// (`factory.setFeature(FEATURE_SECURE_PROCESSING, true)`).
UpdateReceiver(XmlParserConfig),
/// Inherit the receiver's config into the call's result value.
/// Used for identity-style transformer calls
/// (`factory.newDocumentBuilder()` returns a builder that shares
/// the factory's hardening state).
InheritFromReceiver,
/// Initialise the call's result value with the supplied config.
/// Used for constructor calls whose options reveal the unsafe-explicit
/// opt-in (`new XMLParser({ processEntities: true })`,
/// `lxml.etree.XMLParser(resolve_entities=True)`).
SeedResult(XmlParserConfig),
}
/// Classify a Call instruction's effect on the parser-config universe.
///
/// `arg_const` looks up the const-lattice value for an SSA arg position
/// (returns `None` if the position is out of range or the SSA value is
/// not a known constant). Setter detection consults arg-0 (the feature
/// name) and arg-1 (the boolean flag).
///
/// `arg_idents` is the matching CFG-level [`info.call.arg_uses`] vector
/// (per-position identifier text from the source AST). Used to recover
/// non-literal feature names like `XMLConstants.FEATURE_SECURE_PROCESSING`
/// or bare identifiers (`FEATURE_SECURE_PROCESSING`, `Boolean.TRUE`)
/// that const-propagation cannot fold to a literal.
///
/// `arg_literals` is the matching CFG-level
/// [`info.call.arg_string_literals`] vector (per-position literal text;
/// strings, booleans, and null/nil/None tokens). Used to recover the
/// boolean polarity of `setFeature(NAME, true)` since SSA lowering does
/// not bind boolean arg literals to any SSA value (`arg_uses` skips them
/// because they are not identifiers).
fn classify_call(
lang: Lang,
callee: &str,
args: &[smallvec::SmallVec<[SsaValue; 2]>],
receiver: Option<SsaValue>,
consts: &HashMap<SsaValue, ConstLattice>,
arg_idents: &[Vec<String>],
arg_literals: &[Option<String>],
) -> ConfigEffect {
let suffix = callee.rsplit(['.', ':']).next().unwrap_or(callee);
// Helper: lookup the const lattice for arg N's first SSA value.
let arg_const = |n: usize| -> Option<&ConstLattice> {
args.get(n)
.and_then(|vals| vals.first())
.and_then(|v| consts.get(v))
};
// Helper: text of the const lattice (for string/identifier comparison).
let arg_text = |n: usize| -> Option<String> {
match arg_const(n)? {
ConstLattice::Str(s) => Some(s.clone()),
ConstLattice::Bool(b) => Some(b.to_string()),
ConstLattice::Int(i) => Some(i.to_string()),
_ => None,
}
};
// Helper: textual identifier(s) at arg N from the CFG node. Non-literal
// feature names (`XMLConstants.FEATURE_SECURE_PROCESSING`, bare
// `FEATURE_SECURE_PROCESSING`, etc.) surface here.
let arg_ident_text = |n: usize| -> Vec<&str> {
arg_idents
.get(n)
.map(|v| v.iter().map(|s| s.as_str()).collect())
.unwrap_or_default()
};
let arg_bool = |n: usize| -> Option<bool> {
if let Some(b) = arg_const(n).and_then(|c| match c {
ConstLattice::Bool(b) => Some(*b),
ConstLattice::Str(s) => match s.as_str() {
"True" | "true" => Some(true),
"False" | "false" => Some(false),
_ => None,
},
_ => None,
}) {
return Some(b);
}
// Fallback: tree-sitter classifies `true` / `false` as bare
// identifiers in some grammars. Inspect the arg's use list.
for tok in arg_ident_text(n) {
match tok {
"true" | "True" | "Boolean.TRUE" => return Some(true),
"false" | "False" | "Boolean.FALSE" => return Some(false),
_ => {}
}
}
// Fallback: literal tokens lifted by `extract_arg_string_literals`
// (booleans / null / numeric tokens). Java `setFeature(NAME, true)`
// does not bind the `true` token to any SSA value, but the literal
// surfaces here so the polarity can still be read.
if let Some(Some(lit)) = arg_literals.get(n) {
match lit.as_str() {
"true" | "True" | "Boolean.TRUE" => return Some(true),
"false" | "False" | "Boolean.FALSE" => return Some(false),
_ => {}
}
}
None
};
match lang {
Lang::Java => match suffix {
// `factory.setFeature(NAME, BOOL)` — the canonical JAXP
// hardening switch. Three feature names matter:
// * `FEATURE_SECURE_PROCESSING` (XMLConstants.FEATURE_SECURE_PROCESSING)
// * `http://apache.org/xml/features/disallow-doctype-decl`
// * `http://xml.org/sax/features/external-general-entities`
// * `http://xml.org/sax/features/external-parameter-entities`
// The first two harden by being SET TRUE; the entity ones
// harden by being SET FALSE.
"setFeature" => {
if receiver.is_none() {
return ConfigEffect::None;
}
let name_lit = arg_text(0).unwrap_or_default();
let name_idents = arg_ident_text(0);
let value = arg_bool(1);
let any_ident = |needle: &str| name_idents.iter().any(|s| s.contains(needle));
let mut cfg = XmlParserConfig::default();
if name_lit == "FEATURE_SECURE_PROCESSING"
|| name_lit.contains("XMLConstants.FEATURE_SECURE_PROCESSING")
|| name_lit.contains("javax.xml.XMLConstants/feature/secure-processing")
|| any_ident("FEATURE_SECURE_PROCESSING")
{
if value == Some(true) {
cfg.secure_processing = true;
}
} else if name_lit.contains("disallow-doctype-decl")
|| any_ident("disallow-doctype-decl")
{
if value == Some(true) {
cfg.disallow_doctype = true;
}
} else if (name_lit.contains("external-general-entities")
|| name_lit.contains("external-parameter-entities")
|| name_lit.contains("load-external-dtd")
|| any_ident("external-general-entities")
|| any_ident("external-parameter-entities")
|| any_ident("load-external-dtd"))
&& value == Some(false)
{
cfg.disallow_doctype = true;
}
if cfg == XmlParserConfig::default() {
ConfigEffect::None
} else {
ConfigEffect::UpdateReceiver(cfg)
}
}
// `factory.setExpandEntityReferences(false)` —
// DocumentBuilderFactory legacy hardening switch.
"setExpandEntityReferences" => {
if receiver.is_none() {
return ConfigEffect::None;
}
if arg_bool(0) == Some(false) {
ConfigEffect::UpdateReceiver(XmlParserConfig {
disallow_doctype: true,
..Default::default()
})
} else {
ConfigEffect::None
}
}
// `factory.newDocumentBuilder()` / `factory.newSAXParser()` /
// `parser.getXMLReader()` propagate the hardening state from
// the factory (receiver) onto the produced parser instance
// (return value). Without this propagation, a hardened
// factory's child builder would parse with no config.
"newDocumentBuilder" | "newSAXParser" | "getXMLReader" | "newXMLReader" => {
if receiver.is_some() {
ConfigEffect::InheritFromReceiver
} else {
ConfigEffect::None
}
}
_ => ConfigEffect::None,
},
Lang::Python => {
// `lxml.etree.XMLParser(resolve_entities=False)` — the lxml
// parser default resolves entities; the keyword argument
// changes that. Const-propagation will not generally see the
// kwarg value here (kwargs land in `info.call.kwargs`, not
// positional args), so we treat the constructor as a
// best-effort initialiser keyed off the keyword's literal
// text via the static-map. When neither keyword surfaces,
// the parser keeps the default-empty config.
if callee.ends_with("etree.XMLParser") || suffix == "XMLParser" {
// Positional kwargs aren't reliable here; rely on the
// call's static-map kwargs (handled by the per-callsite
// pass below). Fall through to None at this layer.
ConfigEffect::None
} else {
ConfigEffect::None
}
}
_ => ConfigEffect::None,
}
}
/// Run the XML-parser config analysis on an SSA body.
pub fn analyze_xml_parser_config(
body: &SsaBody,
cfg: &Cfg,
consts: &HashMap<SsaValue, ConstLattice>,
lang: Option<Lang>,
) -> XmlParserConfigResult {
let Some(lang) = lang else {
return XmlParserConfigResult::default();
};
let mut configs: HashMap<SsaValue, XmlParserConfig> = HashMap::new();
// Helper: read the kwargs attached to the original CFG node for the
// call instruction at hand. Used for languages where parser
// hardening flags arrive as keyword arguments (Python lxml).
let lookup_kwargs = |node_idx: petgraph::graph::NodeIndex| -> Vec<(String, Vec<String>)> {
cfg.node_weight(node_idx)
.map(|ni| ni.call.kwargs.clone())
.unwrap_or_default()
};
// Helper: read the positional arg-use identifier vectors (e.g.
// `XMLConstants.FEATURE_SECURE_PROCESSING` surfaces as a dotted path
// here even when const-prop folds it to nothing).
let lookup_arg_idents = |node_idx: petgraph::graph::NodeIndex| -> Vec<Vec<String>> {
cfg.node_weight(node_idx)
.map(|ni| ni.call.arg_uses.clone())
.unwrap_or_default()
};
// Helper: read the per-position literal-token vector
// (`arg_string_literals` lifts strings, booleans, null tokens, and
// numeric tokens — see `extract_arg_string_literals`).
let lookup_arg_literals = |node_idx: petgraph::graph::NodeIndex| -> Vec<Option<String>> {
cfg.node_weight(node_idx)
.map(|ni| ni.call.arg_string_literals.clone())
.unwrap_or_default()
};
// Pass 1 — direct effects from Call instructions in source order.
// Setter updates and constructor seeds are effectively monotone
// (we OR safe flags onto the receiver / value), so a single pass is
// sufficient when phi nodes only appear after the setter. Pass 2
// below handles phi/copy propagation.
for block in &body.blocks {
for inst in block.body.iter() {
if let SsaOp::Call {
callee,
args,
receiver,
..
} = &inst.op
{
// Python lxml.etree.XMLParser(resolve_entities=...): the
// kwarg lives on the CFG node's `kwargs` list, not in
// the SSA Call args. Inspect it directly.
if matches!(lang, Lang::Python)
&& (callee.ends_with("etree.XMLParser")
|| callee.rsplit(['.', ':']).next() == Some("XMLParser"))
{
let kwargs = lookup_kwargs(inst.cfg_node);
for (name, values) in &kwargs {
if name == "resolve_entities" {
// Look up the literal text on the matching
// argument; tree-sitter-python keywords surface
// the value identifier in the `values` slot.
if values.iter().any(|v| v == "True" || v == "true") {
let entry = configs.entry(inst.value).or_default();
entry.external_entities = true;
} else if values.iter().any(|v| v == "False" || v == "false") {
let entry = configs.entry(inst.value).or_default();
entry.disallow_doctype = true;
}
}
if name == "no_network" && values.iter().any(|v| v == "True" || v == "true")
{
let entry = configs.entry(inst.value).or_default();
entry.disallow_doctype = true;
}
}
continue;
}
// JS/TS: `new XMLParser({ processEntities: true, ... })`.
// The fast-xml-parser constructor's option-object fields
// are not exposed via const-prop, but the CFG layer
// captures string-literal kwargs in the call's
// `arg_string_literals` for object-literal positions.
// For now, mark the result as unsafe-explicit only when
// the static-kwargs list carries `processEntities=true`.
if matches!(lang, Lang::JavaScript | Lang::TypeScript)
&& (callee.ends_with("XMLParser") || callee.ends_with(".XMLParser"))
{
let kwargs = lookup_kwargs(inst.cfg_node);
for (name, values) in &kwargs {
if name == "processEntities" && values.iter().any(|v| v == "true") {
let entry = configs.entry(inst.value).or_default();
entry.external_entities = true;
}
}
continue;
}
let arg_idents = lookup_arg_idents(inst.cfg_node);
let arg_literals = lookup_arg_literals(inst.cfg_node);
match classify_call(
lang,
callee,
args,
*receiver,
consts,
&arg_idents,
&arg_literals,
) {
ConfigEffect::None => {}
ConfigEffect::UpdateReceiver(delta) => {
if let Some(rv) = *receiver {
let entry = configs.entry(rv).or_default();
*entry = entry.union(&delta);
}
}
ConfigEffect::InheritFromReceiver => {
if let Some(rv) = *receiver
&& let Some(parent) = configs.get(&rv).copied()
{
let entry = configs.entry(inst.value).or_default();
*entry = entry.union(&parent);
}
}
ConfigEffect::SeedResult(seed) => {
let entry = configs.entry(inst.value).or_default();
*entry = entry.union(&seed);
}
}
}
}
}
// Pass 2 — fixed-point propagation through copy assignments and phi
// joins. Caps the iteration count: in practice 2-3 rounds suffice
// on intra-procedural shapes.
for _ in 0..6 {
let mut changed = false;
for block in &body.blocks {
for inst in &block.phis {
if let SsaOp::Phi(operands) = &inst.op {
let mut acc: Option<XmlParserConfig> = None;
for (_, val) in operands {
let cfg_val = configs.get(val).copied().unwrap_or_default();
acc = Some(match acc {
None => cfg_val,
Some(prev) => prev.meet(&cfg_val),
});
}
if let Some(joined) = acc
&& joined != XmlParserConfig::default()
{
let prev = configs.get(&inst.value).copied();
if prev != Some(joined) {
configs.insert(inst.value, joined);
changed = true;
}
}
}
}
for inst in &block.body {
if let SsaOp::Assign(uses) = &inst.op
&& uses.len() == 1
&& let Some(src_cfg) = configs.get(&uses[0]).copied()
&& src_cfg != XmlParserConfig::default()
{
let prev = configs.get(&inst.value).copied().unwrap_or_default();
let new_cfg = prev.union(&src_cfg);
if Some(new_cfg) != configs.get(&inst.value).copied() {
configs.insert(inst.value, new_cfg);
changed = true;
}
}
// InheritFromReceiver may need a re-pass when the
// receiver's config was set after the call itself was
// visited (e.g. the call appears in a later block whose
// dominator chain only resolves on the second iteration).
if let SsaOp::Call {
callee,
receiver: Some(rv),
..
} = &inst.op
{
let suffix = callee.rsplit(['.', ':']).next().unwrap_or(callee);
let inherit = matches!(lang, Lang::Java)
&& matches!(
suffix,
"newDocumentBuilder" | "newSAXParser" | "getXMLReader" | "newXMLReader"
);
if inherit && let Some(parent) = configs.get(rv).copied() {
let prev = configs.get(&inst.value).copied().unwrap_or_default();
let new_cfg = prev.union(&parent);
if Some(new_cfg) != configs.get(&inst.value).copied()
&& new_cfg != XmlParserConfig::default()
{
configs.insert(inst.value, new_cfg);
changed = true;
}
}
}
}
}
if !changed {
break;
}
}
XmlParserConfigResult { configs }
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn default_config_is_unsafe() {
let c = XmlParserConfig::default();
assert!(!c.is_secure());
}
#[test]
fn secure_processing_alone_is_safe() {
let c = XmlParserConfig {
secure_processing: true,
..Default::default()
};
assert!(c.is_secure());
}
#[test]
fn external_entities_overrides_safe_flag() {
let c = XmlParserConfig {
secure_processing: true,
external_entities: true,
..Default::default()
};
assert!(!c.is_secure());
}
#[test]
fn meet_keeps_only_intersection_of_safe_flags() {
let a = XmlParserConfig {
secure_processing: true,
disallow_doctype: true,
..Default::default()
};
let b = XmlParserConfig {
secure_processing: true,
..Default::default()
};
let m = a.meet(&b);
assert!(m.secure_processing);
assert!(!m.disallow_doctype);
}
#[test]
fn meet_propagates_unsafe_flag() {
let a = XmlParserConfig {
secure_processing: true,
..Default::default()
};
let b = XmlParserConfig {
external_entities: true,
..Default::default()
};
let m = a.meet(&b);
// Unsafe sticky → no longer secure even though one branch was.
assert!(!m.is_secure());
}
#[test]
fn xxe_safe_returns_false_without_receiver() {
let result = XmlParserConfigResult::default();
assert!(!xxe_safe(None, &result));
}
#[test]
fn xxe_safe_uses_receiver_config() {
let mut configs = HashMap::new();
configs.insert(
SsaValue(7),
XmlParserConfig {
secure_processing: true,
..Default::default()
},
);
let result = XmlParserConfigResult { configs };
assert!(xxe_safe(Some(SsaValue(7)), &result));
assert!(!xxe_safe(Some(SsaValue(8)), &result));
}
}

235
src/ssa/xpath_config.rs Normal file
View file

@ -0,0 +1,235 @@
//! Per-SSA-value XPath-receiver configuration tracking.
//!
//! Mirrors [`crate::ssa::xml_config`] but for `XPath` instances rather
//! than JAXP parser instances. Tracks "is this XPath receiver bound to
//! an `XPathVariableResolver`" along the control-flow path: when a
//! resolver has been bound, subsequent `xpath.evaluate(expr, ...)` calls
//! are treated as parameterised and the `XPATH_INJECTION` bit is
//! stripped from the sink's cap mask.
//!
//! Same engine shape as [`crate::ssa::xml_config::XmlParserConfigResult`]:
//! a small forward dataflow run alongside type-fact analysis. Phi nodes
//! propagate the meet of operand configs (a flag is "set" only when all
//! reaching operands set it), copy assignments propagate the receiver's
//! config, and `setXPathVariableResolver` calls update the receiver's
//! config in place.
use std::collections::HashMap;
use super::ir::*;
use crate::cfg::Cfg;
use crate::symbol::Lang;
use serde::{Deserialize, Serialize};
/// Receiver-instance config carried forward from `setXPathVariableResolver`
/// calls. All flags default to `false` (resolver not bound). A `true`
/// flag means: we have proven this XPath receiver was configured for
/// parameterised evaluation along this control-flow path.
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
pub struct XPathReceiverConfig {
/// True when `xpath.setXPathVariableResolver(...)` has been called
/// on this receiver. Set by Pass 1 on the receiver SSA value;
/// propagated through phi joins (meet) and copy assignments (union).
pub has_resolver: bool,
}
impl XPathReceiverConfig {
/// True when the receiver is provably bound to a variable resolver.
pub fn is_parameterised(&self) -> bool {
self.has_resolver
}
/// Phi-meet: a flag survives only when *both* operands set it. Used
/// when the XPath variable was reassigned across branches and only
/// some branches bound a resolver.
fn meet(&self, other: &Self) -> Self {
XPathReceiverConfig {
has_resolver: self.has_resolver && other.has_resolver,
}
}
/// Union: caller binds a resolver after a copy / phi-join. Any
/// branch setting the flag wins for the union (used for copy
/// propagation, which preserves the source value's flags).
fn union(&self, other: &Self) -> Self {
XPathReceiverConfig {
has_resolver: self.has_resolver || other.has_resolver,
}
}
}
/// Result of XPath-receiver config analysis.
#[derive(Clone, Debug, Default, Serialize, Deserialize)]
pub struct XPathConfigResult {
pub configs: HashMap<SsaValue, XPathReceiverConfig>,
}
impl XPathConfigResult {
/// True when the value carries a config fact proving resolver
/// binding.
pub fn is_parameterised(&self, v: SsaValue) -> bool {
self.configs.get(&v).is_some_and(|c| c.is_parameterised())
}
}
/// Suppress the `Cap::XPATH_INJECTION` bit when the receiver of an XPath
/// `evaluate` / `compile` sink was provably bound to a variable
/// resolver. Returns `true` when XPATH_INJECTION should be stripped
/// from the sink's cap mask.
///
/// Conservative defaults:
/// * No receiver SSA value (free function) → returns `false` (cannot
/// prove safety, fall through to existing classification).
/// * Receiver carries no config fact → returns `false`.
pub fn xpath_safe(receiver: Option<SsaValue>, xpath_config: &XPathConfigResult) -> bool {
let Some(rv) = receiver else {
return false;
};
xpath_config.is_parameterised(rv)
}
/// Run the XPath-receiver config analysis on an SSA body.
///
/// Currently models Java's `setXPathVariableResolver` only — the only
/// language-level resolver-binding API for XPath in the existing
/// detection corpus. PHP's `DOMXPath::registerPhpFunctions()` is a
/// different mechanism (PHP function registration) and not modelled
/// here.
pub fn analyze_xpath_config(body: &SsaBody, cfg: &Cfg, lang: Option<Lang>) -> XPathConfigResult {
let Some(lang) = lang else {
return XPathConfigResult::default();
};
if !matches!(lang, Lang::Java) {
return XPathConfigResult::default();
}
let mut configs: HashMap<SsaValue, XPathReceiverConfig> = HashMap::new();
// Pass 1 — direct effects from Call instructions in source order.
// `setXPathVariableResolver` updates the call's receiver in place;
// any non-null argument is treated as a resolver binding. Argument
// null-check would require a const-prop fact, but the conservative
// direction here is to assume the bound value is non-null (matches the
// XML parser-config setter semantics).
for block in &body.blocks {
for inst in block.body.iter() {
if let SsaOp::Call {
callee, receiver, ..
} = &inst.op
{
let suffix = callee.rsplit(['.', ':']).next().unwrap_or(callee);
if suffix == "setXPathVariableResolver"
&& let Some(rv) = receiver
{
let entry = configs.entry(*rv).or_default();
entry.has_resolver = true;
}
}
}
}
if configs.is_empty() {
return XPathConfigResult::default();
}
// Pass 2 — fixed-point propagation through copy assignments and
// phi joins. Caps the iteration count: in practice 2-3 rounds
// suffice on intra-procedural shapes.
let _ = cfg; // CFG retained for parity with `xml_config`; reserved for
// future kwarg-driven seeds (e.g. constructor options).
for _ in 0..6 {
let mut changed = false;
for block in &body.blocks {
for inst in &block.phis {
if let SsaOp::Phi(operands) = &inst.op {
let mut acc: Option<XPathReceiverConfig> = None;
for (_, val) in operands {
let cfg_val = configs.get(val).copied().unwrap_or_default();
acc = Some(match acc {
None => cfg_val,
Some(prev) => prev.meet(&cfg_val),
});
}
if let Some(joined) = acc
&& joined != XPathReceiverConfig::default()
{
let prev = configs.get(&inst.value).copied();
if prev != Some(joined) {
configs.insert(inst.value, joined);
changed = true;
}
}
}
}
for inst in &block.body {
if let SsaOp::Assign(uses) = &inst.op
&& uses.len() == 1
&& let Some(src_cfg) = configs.get(&uses[0]).copied()
&& src_cfg != XPathReceiverConfig::default()
{
let prev = configs.get(&inst.value).copied().unwrap_or_default();
let new_cfg = prev.union(&src_cfg);
if Some(new_cfg) != configs.get(&inst.value).copied() {
configs.insert(inst.value, new_cfg);
changed = true;
}
}
}
}
if !changed {
break;
}
}
XPathConfigResult { configs }
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn default_config_is_unparameterised() {
let c = XPathReceiverConfig::default();
assert!(!c.is_parameterised());
}
#[test]
fn has_resolver_marks_parameterised() {
let c = XPathReceiverConfig { has_resolver: true };
assert!(c.is_parameterised());
}
#[test]
fn meet_keeps_intersection() {
let a = XPathReceiverConfig { has_resolver: true };
let b = XPathReceiverConfig {
has_resolver: false,
};
let m = a.meet(&b);
assert!(!m.has_resolver);
}
#[test]
fn meet_both_set_keeps_set() {
let a = XPathReceiverConfig { has_resolver: true };
let b = XPathReceiverConfig { has_resolver: true };
let m = a.meet(&b);
assert!(m.has_resolver);
}
#[test]
fn xpath_safe_returns_false_without_receiver() {
let result = XPathConfigResult::default();
assert!(!xpath_safe(None, &result));
}
#[test]
fn xpath_safe_uses_receiver_config() {
let mut configs = HashMap::new();
configs.insert(SsaValue(7), XPathReceiverConfig { has_resolver: true });
let result = XPathConfigResult { configs };
assert!(xpath_safe(Some(SsaValue(7)), &result));
assert!(!xpath_safe(Some(SsaValue(8)), &result));
}
}

View file

@ -49,7 +49,7 @@ pub struct SinkSite {
impl SinkSite {
/// Dedup key: two sites with the same `(file_rel, line, col, cap)`
/// describe the same consumption and collapse on merge.
pub(crate) fn dedup_key(&self) -> (&str, u32, u32, u16) {
pub(crate) fn dedup_key(&self) -> (&str, u32, u32, u32) {
(self.file_rel.as_str(), self.line, self.col, self.cap.bits())
}
@ -277,18 +277,18 @@ pub struct FuncSummary {
pub param_names: Vec<String>,
// ── Taint behaviour ──────────────────────────────────────────────────
// Stored as raw `u16` so serde doesn't need to know about `bitflags`.
// Stored as raw `u32` so serde doesn't need to know about `bitflags`.
/// Caps this function **introduces**, i.e. the return value carries
/// freshlytainted data even if no argument was tainted.
pub source_caps: u16,
pub source_caps: u32,
/// Caps this function **cleans**, passing tainted data through this
/// function strips the corresponding bits.
pub sanitizer_caps: u16,
pub sanitizer_caps: u32,
/// Caps this function **consumes unsafely**, calling it with tainted
/// arguments that still carry these bits is a finding.
pub sink_caps: u16,
pub sink_caps: u32,
/// Which parameter indices (0based) flow through to the return value.
#[serde(default)]
@ -1163,7 +1163,7 @@ impl GlobalSummaries {
/// Returns `(source_caps, sanitizer_caps, sink_caps, propagating_params)`
/// per key. Used by the SCC fixed-point loop to detect when an iteration
/// has not changed any summary, i.e. convergence.
pub fn snapshot_caps(&self) -> HashMap<FuncKey, (u16, u16, u16, Vec<usize>)> {
pub fn snapshot_caps(&self) -> HashMap<FuncKey, (u32, u32, u32, Vec<usize>)> {
self.by_key
.iter()
.map(|(k, s)| {

View file

@ -283,7 +283,7 @@ pub struct SsaFuncSummary {
///
/// Default-empty (most functions don't field-mutate their params)
/// and elided from serialised output via `skip_serializing_if` so
/// pre-Phase-5 summaries deserialise cleanly without migration.
/// older summaries without this field deserialise cleanly without migration.
/// Built by extraction in `summary_extract.rs` when the per-body
/// [`crate::pointer::PointsToFacts`] are available
/// (`NYX_POINTER_ANALYSIS=1`); empty otherwise.

View file

@ -9,7 +9,7 @@ fn cap_sites(cap: Cap) -> SmallVec<[SinkSite; 1]> {
smallvec![SinkSite::cap_only(cap)]
}
fn make(name: &str, src: u16, san: u16, sink: u16) -> FuncSummary {
fn make(name: &str, src: u32, san: u32, sink: u32) -> FuncSummary {
FuncSummary {
name: name.into(),
file_path: "test.rs".into(),
@ -263,7 +263,7 @@ fn lookup_same_lang_returns_all_matches() {
}
#[test]
fn u16_caps_round_trip_serde() {
fn cap_bits_round_trip_serde() {
let summary = FuncSummary {
name: "dangerous".into(),
file_path: "test.rs".into(),
@ -292,9 +292,96 @@ fn u16_caps_round_trip_serde() {
assert!(!json.contains("propagates_taint"));
}
/// Every new cap class persists across the serde JSON round-trip used
/// for SQLite blob storage and the `/debug` endpoint. Catches a
/// width-mismatch (cap bits truncated to u16) as a hard fail rather than
/// silent zeroing of the upper bits.
#[test]
fn new_cap_classes_round_trip_serde() {
let new_caps = Cap::LDAP_INJECTION
| Cap::XPATH_INJECTION
| Cap::HEADER_INJECTION
| Cap::OPEN_REDIRECT
| Cap::SSTI
| Cap::XXE
| Cap::PROTOTYPE_POLLUTION;
// Sanity: bit-width must accommodate every new cap.
assert_ne!(
new_caps.bits(),
0,
"every new cap must carry a non-zero bit"
);
assert_eq!(
new_caps.bits().count_ones(),
7,
"exactly seven bits must be set across the new caps"
);
// Bit collisions with existing caps would mask a finding.
let existing = Cap::ENV_VAR
| Cap::HTML_ESCAPE
| Cap::SHELL_ESCAPE
| Cap::URL_ENCODE
| Cap::JSON_PARSE
| Cap::FILE_IO
| Cap::FMT_STRING
| Cap::SQL_QUERY
| Cap::DESERIALIZE
| Cap::SSRF
| Cap::CODE_EXEC
| Cap::CRYPTO
| Cap::UNAUTHORIZED_ID
| Cap::DATA_EXFIL;
assert!(
(existing & new_caps).is_empty(),
"new caps must not collide"
);
let summary = FuncSummary {
name: "all_new_classes".into(),
file_path: "fixture.rs".into(),
lang: "rust".into(),
param_count: 0,
param_names: vec![],
source_caps: 0,
sanitizer_caps: 0,
sink_caps: new_caps.bits(),
propagating_params: vec![],
propagates_taint: false,
tainted_sink_params: vec![],
callees: vec![],
..Default::default()
};
// serde JSON round-trip (the on-disk SQLite format).
let json = serde_json::to_string(&summary).unwrap();
let back: FuncSummary = serde_json::from_str(&json).unwrap();
assert_eq!(back.sink_caps, new_caps.bits());
assert!(back.sink_caps().contains(Cap::LDAP_INJECTION));
assert!(back.sink_caps().contains(Cap::PROTOTYPE_POLLUTION));
// Cap registry must surface a rule id for each new cap.
for cap in [
Cap::LDAP_INJECTION,
Cap::XPATH_INJECTION,
Cap::HEADER_INJECTION,
Cap::OPEN_REDIRECT,
Cap::SSTI,
Cap::XXE,
Cap::PROTOTYPE_POLLUTION,
] {
let meta = crate::labels::cap_rule_meta(cap)
.unwrap_or_else(|| panic!("missing CAP_RULE_REGISTRY entry for {cap:?}"));
assert!(meta.rule_id.starts_with("taint-"));
assert!(!meta.title.is_empty());
assert!(!meta.description.is_empty());
}
}
#[test]
fn backward_compat_u8_json_deserializes() {
// Old u8-range values still deserialize correctly into u16 fields
// Old u8-range values still deserialize correctly into u32 fields
let json = r#"{
"name": "old_func",
"file_path": "legacy.py",
@ -948,6 +1035,8 @@ fn make_callee_body(
type_facts: crate::ssa::type_facts::TypeFactResult {
facts: std::collections::HashMap::new(),
},
xml_parser_config: crate::ssa::xml_config::XmlParserConfigResult::default(),
xpath_config: crate::ssa::xpath_config::XPathConfigResult::default(),
alias_result: crate::ssa::alias::BaseAliasResult::empty(),
points_to: crate::ssa::heap::PointsToResult::empty(),
module_aliases: std::collections::HashMap::new(),
@ -1413,7 +1502,7 @@ fn fs_with(
arity: usize,
kind: FuncKind,
disambig: Option<u32>,
sink_bits: u16,
sink_bits: u32,
) -> (FuncKey, FuncSummary) {
let key = FuncKey {
lang: Lang::Java,
@ -1611,7 +1700,7 @@ fn interop_lookup_returns_none_when_disambig_none_matches_many() {
// and only disambig distinguishes them, the relaxed interop lookup must
// return None rather than picking arbitrarily.
let mut gs = GlobalSummaries::new();
let mk = |disambig: u32, bits: u16| {
let mk = |disambig: u32, bits: u32| {
let k = FuncKey {
lang: Lang::Go,
namespace: "lib.go".into(),
@ -2102,7 +2191,7 @@ fn method_summary(
container: &str,
name: &str,
arity: usize,
sink_bits: u16,
sink_bits: u32,
) -> (FuncKey, FuncSummary) {
fs_with(
namespace,
@ -2119,7 +2208,7 @@ fn free_summary(
namespace: &str,
name: &str,
arity: usize,
sink_bits: u16,
sink_bits: u32,
) -> (FuncKey, FuncSummary) {
fs_with(
namespace,
@ -2912,7 +3001,7 @@ fn legacy_summary(
param_names: Vec<String>,
kind: FuncKind,
container: &str,
sink: u16,
sink: u32,
) -> FuncSummary {
FuncSummary {
name: name.into(),
@ -3778,7 +3867,7 @@ fn cross_file_devirt_does_not_union_unrelated_findbyids() {
use crate::labels::Cap;
use crate::symbol::FuncKey;
fn method_summary(name: &str, container: &str, file: &str, sink_caps: u16) -> FuncSummary {
fn method_summary(name: &str, container: &str, file: &str, sink_caps: u32) -> FuncSummary {
FuncSummary {
name: name.into(),
file_path: file.into(),
@ -3989,7 +4078,7 @@ mod hierarchy_widened_tests {
container: &str,
name: &str,
arity: usize,
sink_bits: u16,
sink_bits: u32,
hierarchy_edges: Vec<(String, String)>,
) -> (FuncKey, FuncSummary) {
let (key, mut summary) = fs_with(

View file

@ -580,9 +580,19 @@ pub(crate) fn analyse_file_with_lowered(
f.source.index(),
!f.path_validated,
f.path_hash,
f.effective_sink_caps.bits(),
)
});
all_findings.dedup_by_key(|f| {
(
f.body_id,
f.sink,
f.source,
f.path_validated,
f.path_hash,
f.effective_sink_caps.bits(),
)
});
all_findings.dedup_by_key(|f| (f.body_id, f.sink, f.source, f.path_validated, f.path_hash));
// 5. Assign stable finding IDs now that `body_id` has been set and
// the dedup has picked the final set of distinct flows. The ID
@ -679,9 +689,118 @@ fn containment_order(bodies: &[BodyCfg]) -> Vec<usize> {
order
}
/// Build a `var_name → TypeKind` map from a body's optimised SSA + type-fact
/// result. Used by [`analyse_multi_body`] to forward closure-captured types
/// from a parent body into its children, so that bound-variable receiver
/// idioms (`const c = ldap.createClient(...); function f() { c.search(...) }`)
/// pick up `TypeKind::LdapClient` on the inner reference via the
/// [`ssa_transfer::resolve_type_qualified_labels`] receiver scan.
///
/// Conflict policy: if the same `var_name` reaches multiple SSA values with
/// distinct `TypeKind`s the entry is dropped — propagating an ambiguous type
/// into a child body would fabricate facts, while dropping it just falls back
/// to the existing structural resolution paths.
fn extract_named_type_facts(
ssa: &crate::ssa::SsaBody,
type_facts: &crate::ssa::type_facts::TypeFactResult,
) -> HashMap<String, crate::ssa::type_facts::TypeKind> {
use crate::ssa::type_facts::TypeKind;
let mut acc: HashMap<String, TypeKind> = HashMap::new();
let mut conflicts: HashSet<String> = HashSet::new();
for block in &ssa.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
let Some(name) = inst.var_name.as_deref() else {
continue;
};
if conflicts.contains(name) {
continue;
}
let Some(kind) = type_facts.get_type(inst.value) else {
continue;
};
if matches!(kind, TypeKind::Unknown) {
continue;
}
match acc.get(name) {
Some(existing) if existing != kind => {
acc.remove(name);
conflicts.insert(name.to_string());
}
Some(_) => {}
None => {
acc.insert(name.to_string(), kind.clone());
}
}
}
}
acc
}
/// Inject parent-known closure-capture types into a per-body
/// [`crate::ssa::type_facts::TypeFactResult`].
///
/// Scoped lowering ([`crate::ssa::lower_to_ssa_with_params`]) injects a
/// `SsaOp::Param` (or `SsaOp::SelfParam`) at the entry block for every
/// free / closure-captured variable read by the body. The per-body type
/// analysis can only seed declared formal-parameter types (via
/// `BodyMeta.param_types`); free variables are left as `TypeKind::Unknown`
/// because their definition lives in an enclosing body whose SSA is not
/// in scope.
///
/// This pass walks the entry block's synthetic prologue and, for each
/// external Param whose name resolves in `parent_var_types`, inserts the
/// matching [`crate::ssa::type_facts::TypeFact`] into `type_facts.facts`.
/// Strictly additive: existing facts (e.g. a fact already produced by
/// `BodyMeta.param_types` seeding for a real formal that happens to share
/// a name) are never overwritten.
fn inject_external_type_facts(
ssa: &crate::ssa::SsaBody,
type_facts: &mut crate::ssa::type_facts::TypeFactResult,
parent_var_types: &HashMap<String, crate::ssa::type_facts::TypeKind>,
) {
use crate::ssa::ir::SsaOp;
use crate::ssa::type_facts::TypeFact;
if parent_var_types.is_empty() || ssa.blocks.is_empty() {
return;
}
for inst in ssa.blocks[0].body.iter() {
if !matches!(inst.op, SsaOp::Param { .. } | SsaOp::SelfParam) {
continue;
}
if type_facts.facts.contains_key(&inst.value) {
// `analyze_types_with_param_types` may have already typed this
// value via a non-Unknown entry from BodyMeta.param_types; in
// that case the formal-parameter declaration wins. Note: the
// analysis seeds an Unknown placeholder for unparameterised
// Param ops, so we still need to override Unknown entries.
if !matches!(
type_facts.facts.get(&inst.value).map(|f| &f.kind),
Some(crate::ssa::type_facts::TypeKind::Unknown)
) {
continue;
}
}
let Some(name) = inst.var_name.as_deref() else {
continue;
};
let Some(kind) = parent_var_types.get(name) else {
continue;
};
let nullable = matches!(kind, crate::ssa::type_facts::TypeKind::Null);
type_facts.facts.insert(
inst.value,
TypeFact {
kind: kind.clone(),
nullable,
},
);
}
}
/// Analyse a single body with an optional parent seed.
///
/// Shared logic extracted from `analyse_multi_body` to avoid deep nesting.
#[allow(clippy::type_complexity)]
fn analyse_body_with_seed(
body: &BodyCfg,
lang: Lang,
@ -698,9 +817,11 @@ fn analyse_body_with_seed(
seed: Option<&HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>>,
import_bindings: Option<&crate::cfg::ImportBindings>,
cross_file_bodies: Option<&std::collections::HashMap<FuncKey, ssa_transfer::CalleeSsaBody>>,
parent_var_types: Option<&HashMap<String, crate::ssa::type_facts::TypeKind>>,
) -> (
Vec<Finding>,
Option<HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>>,
Option<HashMap<String, crate::ssa::type_facts::TypeKind>>,
) {
let cfg = &body.graph;
let entry = body.entry;
@ -757,12 +878,21 @@ fn analyse_body_with_seed(
match ssa_result {
Ok(mut ssa_body) => {
let opt = crate::ssa::optimize_ssa_with_param_types(
let mut opt = crate::ssa::optimize_ssa_with_param_types(
&mut ssa_body,
cfg,
Some(lang),
&body.meta.param_types,
);
// Forward parent-body type facts onto closure-captured Param ops
// before any consumer reads `opt.type_facts`. This is the lever
// that makes bound-variable receiver idioms work in scoped bodies
// (`let c = ldap.createClient(...); function f() { c.search(...) }`)
// — without it the inner `c` SSA value stays Unknown because the
// per-body type-fact pass cannot see the enclosing definition.
if let Some(pvt) = parent_var_types {
inject_external_type_facts(&ssa_body, &mut opt.type_facts, pvt);
}
if tracing::enabled!(tracing::Level::TRACE) {
tracing::trace!(
func = body.meta.name.as_deref().unwrap_or("<anon>"),
@ -811,6 +941,8 @@ fn analyse_body_with_seed(
receiver_seed: None,
const_values: Some(&opt.const_values),
type_facts: Some(&opt.type_facts),
xml_parser_config: Some(&opt.xml_parser_config),
xpath_config: Some(&opt.xpath_config),
ssa_summaries,
extra_labels,
base_aliases: Some(&opt.alias_result),
@ -909,7 +1041,16 @@ fn analyse_body_with_seed(
&transfer,
body_id,
);
(findings, Some(exit_state))
// Snapshot named TypeKinds so child bodies can pick up
// closure-captured types (e.g. an outer `LdapClient` flowing
// into an inner function via free-variable read).
let named_types = extract_named_type_facts(&ssa_body, &opt.type_facts);
let named_types = if named_types.is_empty() {
None
} else {
Some(named_types)
};
(findings, Some(exit_state), named_types)
}
Err(e) => {
// SSA lowering produced no analyzable body. We still surface
@ -929,7 +1070,7 @@ fn analyse_body_with_seed(
// Drain the collector so the note does not bleed into the
// next body (which will call reset on entry, but be explicit).
let _ = ssa_transfer::take_body_engine_notes();
(Vec::new(), None)
(Vec::new(), None, None)
}
}
}
@ -967,6 +1108,14 @@ fn analyse_multi_body(
HashMap<ssa_transfer::BindingKey, crate::taint::domain::VarTaint>,
> = HashMap::new();
// Per-body `var_name → TypeKind` snapshots, used to forward closure-
// captured types from parent bodies into their children's type-fact
// results. Only populated when a body produces a non-empty set of
// typed named values, i.e. it has at least one named SSA value with
// a concrete `TypeKind` after optimisation.
let mut body_var_types: HashMap<BodyId, HashMap<String, crate::ssa::type_facts::TypeKind>> =
HashMap::new();
// ── Pass 1: lexical containment propagation ──────────────────────
for &idx in &order {
let body = &file_cfg.bodies[idx];
@ -975,8 +1124,12 @@ fn analyse_multi_body(
.meta
.parent_body_id
.and_then(|pid| body_exit_states.get(&pid));
let parent_var_types = body
.meta
.parent_body_id
.and_then(|pid| body_var_types.get(&pid));
let (findings, exit_state) = analyse_body_with_seed(
let (findings, exit_state, var_types) = analyse_body_with_seed(
body,
lang,
namespace,
@ -990,6 +1143,7 @@ fn analyse_multi_body(
parent_seed,
import_bindings,
cross_file_bodies,
parent_var_types,
);
tracing::debug!(
body_id = body.meta.id.0,
@ -1003,6 +1157,9 @@ fn analyse_multi_body(
if let Some(es) = exit_state {
body_exit_states.insert(body.meta.id, es);
}
if let Some(vt) = var_types {
body_var_types.insert(body.meta.id, vt);
}
}
// ── Pass 2: JS/TS iterative convergence ──────────────────────────
@ -1163,8 +1320,12 @@ fn analyse_multi_body(
.meta
.parent_body_id
.and_then(|pid| body_exit_states.get(&pid));
let parent_var_types = body
.meta
.parent_body_id
.and_then(|pid| body_var_types.get(&pid));
let (findings, exit_state) = analyse_body_with_seed(
let (findings, exit_state, var_types) = analyse_body_with_seed(
body,
lang,
namespace,
@ -1178,11 +1339,15 @@ fn analyse_multi_body(
parent_seed,
import_bindings,
cross_file_bodies,
parent_var_types,
);
// Phase-B: replace (not append) this body's findings
// in the cache. Previous rounds' findings for this
// body are superseded by the new round's output.
findings_by_body.insert(body.meta.id, findings);
if let Some(vt) = var_types {
body_var_types.insert(body.meta.id, vt);
}
if let Some(es) = exit_state {
// Phase-C Gauss-Seidel: immediately publish this
// body's filtered exit into `current_seed` and
@ -2073,6 +2238,8 @@ fn augment_summaries_with_child_sinks(
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: Some(summaries),
extra_labels: None,
base_aliases: None,
@ -2135,6 +2302,8 @@ fn augment_summaries_with_child_sinks(
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: Some(summaries),
extra_labels: None,
base_aliases: None,

View file

@ -30,6 +30,26 @@ pub enum PredicateKind {
/// and the **false branch is the validated path**. Use inverted polarity
/// when applying branch predicates.
ShellMetaValidated,
/// Inline relative-URL validation: `x.startsWith("/")` / `x.starts_with("/")`
/// / `x.startswith("/")` / `strpos(x, "/") === 0`. The TRUE branch
/// constrains `x` to a relative path (no scheme, no `//host`), which is
/// the standard inline form of an open-redirect sanitiser when the
/// developer didn't extract a named helper. Cap-aware: clears
/// [`crate::labels::Cap::OPEN_REDIRECT`] only on the validated branch
/// so non-redirect sinks downstream still fire on the residual taint.
/// Mirrors [`ShellMetaValidated`](Self::ShellMetaValidated) but with
/// non-inverted polarity (true branch is the validated path).
RelativeUrlValidated,
/// Inline URL-parse + host-allowlist validation:
/// `new URL(x).host === ALLOWED` (JS/TS),
/// `urlparse(x).netloc == ALLOWED` (Python),
/// `urlparse(x).hostname in ALLOWED_HOSTS` (Python).
/// The TRUE branch constrains the parsed URL's host to a developer-chosen
/// allowlist value, the canonical multi-statement open-redirect sanitiser
/// for absolute URLs. Cap-aware: clears
/// [`crate::labels::Cap::OPEN_REDIRECT`] only on the validated branch so
/// non-redirect sinks downstream still fire on residual taint.
HostAllowlistValidated,
/// Bounded-length rejection: `x.len() > N` / `x.length < N` with N >= 2.
///
/// Commonly paired with `ShellMetaValidated` in OR-chain rejection
@ -178,6 +198,324 @@ fn is_metachar_regex_class(text: &str) -> bool {
false
}
/// Check whether `text` is an inline relative-URL validation: a leading-
/// slash check on a string variable. Recognised shapes:
///
/// * `<X>.startsWith("/")` — JS/TS/Java/Kotlin
/// * `<X>.starts_with("/")` — Rust
/// * `<X>.startswith("/")` — Python
/// * `strpos($X, "/") === 0` / `mb_strpos(...)` — PHP
/// * `<X>[0] === "/"` / `<X>[0] == '/'` — JS/TS direct index
///
/// Negation prefixes (`!`, `not`) are NOT stripped, the caller's
/// classification path handles those uniformly via the predicate
/// polarity inversion machinery.
fn is_leading_slash_check(text: &str) -> bool {
let lower = text.to_ascii_lowercase();
// Method-call form: `.startswith("/")` covers JS/TS/Java (`startsWith`
// lower-cases to `startswith`), Python (`startswith`), Rust
// (`starts_with` → `starts_with` after lower). Keep the variants
// explicit so we don't miss the underscore form.
for method in [".startswith(", ".starts_with("] {
if let Some(idx) = lower.find(method) {
let args_start = idx + method.len();
if let Some(needle) = extract_first_string_arg(&lower[args_start..]) {
if needle == "/" {
return true;
}
}
}
}
// PHP `strpos($x, "/") === 0` / `mb_strpos($x, "/") === 0` — leading-
// slash detection via offset-zero substring match. Both equality
// forms (`===`, `==`) accepted; the `0` literal is the load-bearing
// bit. Conservative: requires the closing `=== 0` form; bare
// `strpos(...)` (truthy check) is not recognised.
for prefix in ["strpos(", "mb_strpos("] {
if let Some(start) = lower.find(prefix) {
let after = &lower[start + prefix.len()..];
// Find the closing paren of the strpos call.
let mut depth = 1usize;
let bytes = after.as_bytes();
let mut close = None;
let mut i = 0;
while i < bytes.len() {
match bytes[i] {
b'(' => depth += 1,
b')' => {
depth -= 1;
if depth == 0 {
close = Some(i);
break;
}
}
_ => {}
}
i += 1;
}
let Some(close) = close else { continue };
let args = &after[..close];
// Need at least one comma so we have two args.
let mut depth = 0i32;
let mut comma = None;
for (j, ch) in args.char_indices() {
match ch {
'(' | '[' | '{' => depth += 1,
')' | ']' | '}' => depth -= 1,
',' if depth == 0 => {
comma = Some(j);
break;
}
_ => {}
}
}
let Some(comma) = comma else { continue };
let second = args[comma + 1..].trim();
// Strip optional surrounding parens / quotes.
let needle = second.trim_matches(|c: char| c == '"' || c == '\'');
if needle != "/" {
continue;
}
// Tail after the strpos `)` should compare against 0 with
// `===` / `==`. Allow whitespace.
let tail = after[close + 1..].trim_start();
if let Some(rest) = tail.strip_prefix("===").or_else(|| tail.strip_prefix("==")) {
if rest.trim() == "0" {
return true;
}
}
}
}
// Direct subscript form: `<X>[0] === '/'` / `<X>[0] == "/"`.
// Conservative: the literal `[0]` immediately followed by an
// equality op and a single-char `/` literal.
for op in ["===", "=="] {
let probe = format!("[0] {}", op);
if let Some(idx) = lower.find(&probe) {
let after = lower[idx + probe.len()..].trim_start();
if after.starts_with("'/'") || after.starts_with("\"/\"") {
return true;
}
}
// Without spaces around the operator: `[0]==='/'`.
let probe_tight = format!("[0]{}", op);
if let Some(idx) = lower.find(&probe_tight) {
let after = lower[idx + probe_tight.len()..].trim_start();
if after.starts_with("'/'") || after.starts_with("\"/\"") {
return true;
}
}
}
false
}
/// Check whether `text` is an inline URL-parse + host-allowlist validation.
///
/// Recognises the canonical multi-statement open-redirect sanitiser shapes:
///
/// * `new URL(<X>).host === ALLOWED` / `new URL(<X>).hostname === ALLOWED`
/// / `new URL(<X>).origin === ALLOWED` (JS/TS) — accepts `==` and `===`.
/// * `urlparse(<X>).netloc == ALLOWED` / `urlparse(<X>).hostname == ALLOWED`
/// (Python `urllib.parse.urlparse` and the `urlparse.urlparse` legacy alias)
/// — accepts `==`.
/// * `urllib.parse.urlparse(<X>).netloc == ALLOWED` (qualified Python form).
/// * `<parsed>.host_str() == ALLOWED` (Rust `url::Url::host_str()`).
/// * `<parsed>.Host == ALLOWED` / `<parsed>.Hostname() == ALLOWED`
/// (Go `*url.URL` — case-sensitive capital `H`).
///
/// The Rust/Go forms intentionally do not look for the parse call in the
/// condition text — those parse on a separate line (`let parsed = Url::parse(x)?`,
/// `parsed, err := url.Parse(x)`) and the validated branch then references
/// `parsed` directly as the redirect target. Distinctive accessor names
/// (`.host_str()`, capital-`H` `.Host`/`.Hostname()`) gate the match so a bare
/// `u.host == X` (lowercase, ambiguous) still falls through to `Comparison`.
///
/// The right-hand side may be a string literal or a bare identifier
/// (`ALLOWED_HOST` / `cfg.allowed_origin`) — what matters is that the
/// validation pins the parsed host to one fixed value, locking off the
/// scheme/authority that would otherwise let the redirect leave the trusted
/// origin. The membership form
/// `ALLOWED_HOSTS.includes(new URL(<X>).host)` / `urlparse(<X>).host in ALLOWED`
/// is intentionally NOT recognised here, those fall through to
/// `AllowlistCheck` whose generic validated-must mechanic already clears
/// every cap for the matched receiver / member token.
///
/// Negation prefixes are not stripped, the caller's polarity-inversion
/// machinery handles `!`-wrapped forms uniformly.
fn is_host_allowlist_check(text: &str) -> bool {
let lower = text.to_ascii_lowercase();
// Need an equality operator so we know the host is being pinned to a
// specific allowed value (not e.g. assigned, indexed, or used as a key).
if !(lower.contains("==") || lower.contains("!=")) {
return false;
}
let has_parse_call = lower.contains("new url(")
|| lower.contains("urlparse(")
|| lower.contains("url.parse(")
|| lower.contains("urllib.parse.urlparse(");
if has_parse_call {
// Need a host-style accessor on the parse result.
return lower.contains(".host")
|| lower.contains(".hostname")
|| lower.contains(".netloc")
|| lower.contains(".origin");
}
// Multi-statement form: parse happened on a prior line. Match
// distinctive Rust/Go accessor names so we don't misclassify a
// generic `obj.host == X` field comparison.
//
// Rust: `parsed.host_str() == Some("x")`
// Go: `parsed.Host == "x"` / `parsed.Hostname() == "x"`
//
// `.host_str()` is Rust-specific (lowercase-stable identifier).
// `.Host`/`.Hostname()` use case-sensitive capital `H` to avoid
// matching lowercase `u.host` (which `host_allowlist_requires_parse_call`
// explicitly excludes).
if lower.contains(".host_str(") {
return true;
}
if has_capital_host_accessor(text) {
return true;
}
false
}
/// Test whether `text` contains a Go-style capital-`H` URL host accessor:
/// `.Host` (followed by whitespace or `==`/`!=`) or `.Hostname(`.
fn has_capital_host_accessor(text: &str) -> bool {
if text.contains(".Hostname(") {
return true;
}
let mut rest = text;
while let Some(pos) = rest.find(".Host") {
let after = &rest[pos + ".Host".len()..];
// Reject `.Hostname` (handled above) and any continuation that
// would make `.Host` part of a longer identifier (`.Hostess` etc.).
let next = after.chars().next();
let is_terminator = match next {
None => true,
Some(c) => !c.is_ascii_alphanumeric() && c != '_',
};
if is_terminator {
// Require an equality op somewhere after the accessor so it's
// a comparison, not e.g. an assignment target.
let trimmed = after.trim_start();
if trimmed.starts_with("==") || trimmed.starts_with("!=") {
return true;
}
}
rest = after;
}
false
}
/// Extract the parse-call argument from a host-allowlist condition.
///
/// Inline form (single-statement parse + check, JS/TS/Python):
/// recognises `new URL(<X>)`, `urlparse(<X>)`, `URL.parse(<X>)`,
/// `urllib.parse.urlparse(<X>)`. Returns `Some("X")` when the argument is a
/// bare identifier (with optional `&` or PHP `$` sigil stripped).
///
/// Multi-statement form (Rust/Go): recognises the receiver of `.host_str()`,
/// case-sensitive `.Host`/`.Hostname()` and returns the receiver identifier
/// (the parsed-URL var), which is what downstream code redirects on.
///
/// Returns `None` for nested expressions / multi-arg calls so branch
/// narrowing doesn't widen to a non-existent var. Mirrors the conservative
/// target shape used by [`extract_validation_target`].
fn extract_host_allowlist_target(text: &str) -> Option<String> {
let lower = text.to_ascii_lowercase();
for probe in [
"new url(",
"urllib.parse.urlparse(",
"urlparse(",
"url.parse(",
] {
if let Some(idx) = lower.find(probe) {
let args_start = idx + probe.len();
if args_start <= text.len() {
if let Some(first_arg) = first_call_arg(&text[args_start..]) {
let first_arg = first_arg.strip_prefix('&').unwrap_or(first_arg).trim();
let first_arg = first_arg.strip_prefix('$').unwrap_or(first_arg);
if !first_arg.is_empty() && is_identifier(first_arg) {
return Some(first_arg.to_string());
}
}
}
}
}
// Multi-statement form: receiver of the host accessor is the
// parsed-URL var. Walk the original text (case-sensitive for Go).
extract_host_accessor_receiver(text)
}
/// Walk `text` for `<receiver>.host_str(` (Rust), `<receiver>.Host` followed
/// by `==`/`!=` (Go), or `<receiver>.Hostname(` (Go). Returns `Some(receiver)`
/// when the receiver is a bare identifier (optionally with a `&` deref-prefix
/// stripped, e.g. Rust `&parsed.host_str()`); `None` otherwise.
fn extract_host_accessor_receiver(text: &str) -> Option<String> {
let probes: &[(&str, bool)] = &[
(".host_str(", false), // Rust, case-stable
(".Hostname(", false), // Go
(".Host", true), // Go, requires `==`/`!=` after
];
for (probe, requires_eq) in probes {
if let Some(idx) = text.find(probe) {
if *requires_eq {
let after = &text[idx + probe.len()..];
// Reject `.Hostname` (handled by its own probe) and any
// longer-identifier continuation.
if let Some(c) = after.chars().next()
&& (c.is_ascii_alphanumeric() || c == '_')
{
continue;
}
let trimmed = after.trim_start();
if !(trimmed.starts_with("==") || trimmed.starts_with("!=")) {
continue;
}
}
let before = &text[..idx];
// Receiver = trailing identifier of `before`, optionally
// preceded by `&` (Rust deref). `parsed.foo.host_str()`
// would yield `foo`, which is not a parse var, so we
// conservatively reject any receiver with a `.` or `::`.
let recv = trailing_identifier(before)?;
if recv.contains('.') || recv.contains(':') {
return None;
}
return Some(recv);
}
}
None
}
/// Walk back from the end of `s` and return the trailing identifier token.
///
/// `&parsed` → `Some("parsed")`, `foo.bar` → `Some("bar")`,
/// `()` → `None`. Used by [`extract_host_accessor_receiver`] to pull the
/// parsed-URL var out of `parsed.host_str() == ...`.
fn trailing_identifier(s: &str) -> Option<String> {
let bytes = s.as_bytes();
let mut end = bytes.len();
while end > 0 {
let c = bytes[end - 1];
if c.is_ascii_alphanumeric() || c == b'_' {
end -= 1;
} else {
break;
}
}
if end == bytes.len() {
return None;
}
let ident = &s[end..];
if ident.is_empty() || ident.as_bytes()[0].is_ascii_digit() {
return None;
}
Some(ident.to_string())
}
/// Check whether `text` looks like a bounded-length rejection:
/// `x.len() > N`, `x.len() < N`, `x.length >= N`, etc. where `N` is an
/// integer literal >= 2. Excludes `> 0` / `>= 1` / `< 1`, those are
@ -330,6 +668,28 @@ pub fn classify_condition(text: &str) -> PredicateKind {
return PredicateKind::ShellMetaValidated;
}
// ── Inline relative-URL validation ──────────────────────────────────
//
// `x.startsWith("/")` (JS/TS/Java/Kotlin), `x.starts_with("/")` (Rust),
// `x.startswith("/")` (Python), `strpos($x, "/") === 0` (PHP).
// The TRUE branch constrains `x` to a leading-slash relative path —
// the canonical inline open-redirect sanitiser. Matched BEFORE
// AllowlistCheck (which would otherwise capture `.starts_with(`).
if is_leading_slash_check(text) {
return PredicateKind::RelativeUrlValidated;
}
// ── Host-allowlist URL-parse validation ─────────────────────────────
//
// `new URL(x).host === ALLOWED` (JS/TS), `urlparse(x).netloc == ALLOWED`
// (Python), etc. Matched BEFORE AllowlistCheck so the membership form
// `ALLOWED.includes(new URL(x).host)` doesn't fall through here, and
// BEFORE the generic Comparison branch so the equality operator
// doesn't classify generically.
if is_host_allowlist_check(text) {
return PredicateKind::HostAllowlistValidated;
}
// ── Allowlist / membership checks ────────────────────────────────────
if lower.contains(".includes(")
|| lower.contains(".include?(")
@ -552,6 +912,19 @@ pub fn classify_condition_with_target(text: &str) -> (PredicateKind, Option<Stri
let target = extract_validation_target(text);
(kind, target)
}
PredicateKind::RelativeUrlValidated => {
// Receiver of `.startsWith("/")` / `.startswith("/")` /
// `.starts_with("/")`, or first arg of `strpos($x, "/")`.
// Same machinery as ShellMetaValidated.
let target = extract_validation_target(text);
(kind, target)
}
PredicateKind::HostAllowlistValidated => {
// Argument of the parse call: `new URL(x).host` → `x`,
// `urlparse(x).netloc` → `x`.
let target = extract_host_allowlist_target(text);
(kind, target)
}
PredicateKind::Comparison => {
// `x === '/login'`, `x == 5`, `null != obj`, when exactly one
// side is a literal, extract the identifier side as the target.
@ -1731,6 +2104,150 @@ mod tests {
assert!(is_bounded_length_check("x.len() > 2"));
assert!(is_bounded_length_check("x.len() <= 256"));
}
// ── HostAllowlistValidated ────────────────────────────────────────────
#[test]
fn classify_host_allowlist_js_strict_eq() {
assert_eq!(
classify_condition("new URL(target).host === ALLOWED_HOST"),
PredicateKind::HostAllowlistValidated
);
assert_eq!(
classify_condition("new URL(target).hostname === \"trusted.example.com\""),
PredicateKind::HostAllowlistValidated
);
assert_eq!(
classify_condition("new URL(target).origin === ALLOWED_ORIGIN"),
PredicateKind::HostAllowlistValidated
);
}
#[test]
fn classify_host_allowlist_python_urlparse() {
assert_eq!(
classify_condition("urlparse(target).netloc == ALLOWED_HOST"),
PredicateKind::HostAllowlistValidated
);
assert_eq!(
classify_condition("urllib.parse.urlparse(target).hostname == \"trusted.example.com\""),
PredicateKind::HostAllowlistValidated
);
}
#[test]
fn target_host_allowlist_extracts_parse_arg_js() {
let (kind, target) =
classify_condition_with_target("new URL(target).host === ALLOWED_HOST");
assert_eq!(kind, PredicateKind::HostAllowlistValidated);
assert_eq!(target.as_deref(), Some("target"));
}
#[test]
fn target_host_allowlist_extracts_parse_arg_python() {
let (kind, target) =
classify_condition_with_target("urlparse(target).netloc == ALLOWED_HOST");
assert_eq!(kind, PredicateKind::HostAllowlistValidated);
assert_eq!(target.as_deref(), Some("target"));
}
#[test]
fn host_allowlist_requires_parse_call() {
// Bare `.host == X` without a parse call is not host-allowlist.
let kind = classify_condition("u.host == ALLOWED_HOST");
assert_ne!(kind, PredicateKind::HostAllowlistValidated);
}
#[test]
fn host_allowlist_requires_equality_op() {
// `new URL(x)` without an equality op is not host-allowlist.
let kind = classify_condition("new URL(target).host");
assert_ne!(kind, PredicateKind::HostAllowlistValidated);
}
// ── Multi-statement form: Rust `.host_str()` ──────────────────────────
#[test]
fn classify_host_allowlist_rust_host_str() {
assert_eq!(
classify_condition("parsed.host_str() == Some(\"trusted.example.com\")"),
PredicateKind::HostAllowlistValidated
);
}
#[test]
fn target_host_allowlist_rust_host_str_extracts_receiver() {
let (kind, target) =
classify_condition_with_target("parsed.host_str() == Some(\"trusted.example.com\")");
assert_eq!(kind, PredicateKind::HostAllowlistValidated);
assert_eq!(target.as_deref(), Some("parsed"));
}
#[test]
fn target_host_allowlist_rust_host_str_strips_amp_deref() {
// `&parsed.host_str()` is not idiomatic but we still pull out the
// receiver via the trailing-identifier walk.
let (kind, target) =
classify_condition_with_target("&parsed.host_str() == Some(\"trusted.com\")");
assert_eq!(kind, PredicateKind::HostAllowlistValidated);
assert_eq!(target.as_deref(), Some("parsed"));
}
// ── Multi-statement form: Go `.Host` / `.Hostname()` ──────────────────
#[test]
fn classify_host_allowlist_go_capital_host() {
assert_eq!(
classify_condition("parsed.Host == \"trusted.example.com\""),
PredicateKind::HostAllowlistValidated
);
}
#[test]
fn classify_host_allowlist_go_hostname_method() {
assert_eq!(
classify_condition("parsed.Hostname() == \"trusted.example.com\""),
PredicateKind::HostAllowlistValidated
);
}
#[test]
fn target_host_allowlist_go_extracts_receiver() {
let (kind, target) =
classify_condition_with_target("parsed.Host == \"trusted.example.com\"");
assert_eq!(kind, PredicateKind::HostAllowlistValidated);
assert_eq!(target.as_deref(), Some("parsed"));
}
#[test]
fn target_host_allowlist_go_hostname_extracts_receiver() {
let (kind, target) =
classify_condition_with_target("parsed.Hostname() == \"trusted.example.com\"");
assert_eq!(kind, PredicateKind::HostAllowlistValidated);
assert_eq!(target.as_deref(), Some("parsed"));
}
#[test]
fn host_allowlist_rejects_lowercase_host_field() {
// `.host` (lowercase) without a parse call must NOT match — that
// shape is too generic (could be any struct field named `host`).
let kind = classify_condition("u.host == ALLOWED_HOST");
assert_ne!(kind, PredicateKind::HostAllowlistValidated);
}
#[test]
fn host_allowlist_rejects_capital_host_without_eq() {
// `parsed.Host` used as a side-effect call argument, not a guard.
let kind = classify_condition("log(parsed.Host)");
assert_ne!(kind, PredicateKind::HostAllowlistValidated);
}
#[test]
fn host_allowlist_rejects_capital_host_substring_in_identifier() {
// `.Hostess` is NOT `.Host` — must not match.
let kind = classify_condition("party.Hostess == \"alice\"");
assert_ne!(kind, PredicateKind::HostAllowlistValidated);
}
}
#[cfg(test)]

View file

@ -277,7 +277,14 @@ pub fn ssa_events_to_findings(
ssa: &SsaBody,
cfg: &Cfg,
) -> Vec<crate::taint::Finding> {
type FindingDedupKey = (usize, usize, Option<(String, u32, u32)>);
// The dedup key includes `cap_bits` so the multi-gate dispatch can
// co-emit separate findings for distinct capabilities at the same
// (origin, sink) pair (e.g. PHP `header("Location: " . $url)` fires
// both HEADER_INJECTION and OPEN_REDIRECT, attributed by the gate
// filters' per-cap masks). Single-cap call sites are unaffected:
// every event in that case carries the same `sink_caps`, so the key
// collapses identically with or without the extra component.
type FindingDedupKey = (usize, usize, Option<(String, u32, u32)>, u32);
let mut findings = Vec::new();
let mut seen: HashSet<FindingDedupKey> = HashSet::new();
@ -345,12 +352,14 @@ pub fn ssa_events_to_findings(
.as_ref()
.map(|l| (l.file_rel.clone(), l.line, l.col));
for (val, caps, origins) in &event.tainted_values {
let cap_specificity = (*caps & event.sink_caps).bits().count_ones() as u8;
let effective_caps = event.sink_caps & *caps;
let cap_specificity = effective_caps.bits().count_ones() as u8;
for origin in origins {
if seen.insert((
origin.node.index(),
event.sink_node.index(),
loc_key.clone(),
effective_caps.bits(),
)) {
let hop_count = block_distance(ssa, origin.node, event.sink_node);
let flow_steps = reconstruct_flow_path(*val, origin, event.sink_node, ssa, cfg);

View file

@ -21,7 +21,7 @@ pub(super) const MAX_INLINE_BLOCKS: usize = 500;
/// Compact cache key: per-arg-position cap bits (sorted, non-empty
/// only). Origin identity is not part of the key.
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub(crate) struct ArgTaintSig(pub(super) SmallVec<[(usize, u16); 4]>);
pub(crate) struct ArgTaintSig(pub(super) SmallVec<[(usize, u32); 4]>);
/// Call-site-adapted result of inline-analyzing a callee. Built fresh
/// per call site so origins point to the current caller's chain.
@ -79,7 +79,7 @@ pub(crate) struct ReturnShape {
impl CachedInlineShape {
/// Cap bits of the return value, or zero if this shape records "no
/// return taint". Used by [`inline_cache_fingerprint`].
fn return_caps_bits(&self) -> u16 {
fn return_caps_bits(&self) -> u32 {
self.0.as_ref().map(|s| s.caps.bits()).unwrap_or(0)
}
}
@ -101,7 +101,7 @@ pub(crate) fn inline_cache_clear_epoch(cache: &mut InlineCache) {
#[allow(dead_code)]
pub(crate) fn inline_cache_fingerprint(
cache: &InlineCache,
) -> HashMap<(FuncKey, ArgTaintSig), u16> {
) -> HashMap<(FuncKey, ArgTaintSig), u32> {
cache
.iter()
.map(|(k, v)| (k.clone(), v.return_caps_bits()))

View file

@ -105,6 +105,18 @@ pub struct SsaTaintTransfer<'a> {
/// Type facts from type analysis.
/// Used for type-aware sink filtering (e.g., suppress SQL injection for int-typed values).
pub type_facts: Option<&'a crate::ssa::type_facts::TypeFactResult>,
/// XML-parser config facts. Used to suppress XXE bits at parse-class
/// sinks whose receiver was provably hardened
/// (`setFeature(FEATURE_SECURE_PROCESSING, true)`, etc.). Strictly
/// additive: `None` falls back to the existing flat / gated XXE
/// classification.
pub xml_parser_config: Option<&'a crate::ssa::xml_config::XmlParserConfigResult>,
/// XPath-receiver config facts. Used to suppress XPATH_INJECTION at
/// `evaluate` / `compile` sinks whose receiver was provably bound to
/// an `XPathVariableResolver` (parameterised-XPath shape). Strictly
/// additive: `None` falls back to the existing flat / gated XPATH
/// classification.
pub xpath_config: Option<&'a crate::ssa::xpath_config::XPathConfigResult>,
/// Precise per-function SSA summaries for intra-file callee resolution.
/// Checked before legacy FuncSummary resolution.
///
@ -1207,6 +1219,85 @@ fn apply_branch_predicates(
}
}
// RelativeUrlValidated: TRUE branch is the validated path
// (`x.startsWith("/")` succeeded → `x` cannot redirect off-host).
// Cap-aware: clear `Cap::OPEN_REDIRECT` only; non-redirect sinks
// (XSS / SQLi / FILE_IO) downstream still fire on residual taint.
if kind == PredicateKind::RelativeUrlValidated && polarity {
for var in condition_vars {
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
for (val, _) in state.values.iter() {
if let Some(name) = ssa
.value_defs
.get(val.0 as usize)
.and_then(|vd| vd.var_name.as_deref())
{
if name == var {
to_clear.push(*val);
}
}
}
for val in to_clear {
if let Some(taint) = state.get(val).cloned() {
let new_caps = taint.caps & !Cap::OPEN_REDIRECT;
if new_caps.is_empty() {
state.remove(val);
} else {
state.set(
val,
VarTaint {
caps: new_caps,
origins: taint.origins,
uses_summary: taint.uses_summary,
},
);
}
}
}
}
}
// HostAllowlistValidated: TRUE branch is the validated path
// (`new URL(x).host === ALLOWED` succeeded → `x` cannot redirect off-host).
// Cap-aware: clear `Cap::OPEN_REDIRECT` only; non-redirect sinks downstream
// still fire on the residual taint caps. Mirrors the
// `RelativeUrlValidated` handler exactly, the only difference is the
// recogniser shape (multi-statement parse + host comparison instead of
// inline leading-slash check).
if kind == PredicateKind::HostAllowlistValidated && polarity {
for var in condition_vars {
let mut to_clear: SmallVec<[SsaValue; 4]> = SmallVec::new();
for (val, _) in state.values.iter() {
if let Some(name) = ssa
.value_defs
.get(val.0 as usize)
.and_then(|vd| vd.var_name.as_deref())
{
if name == var {
to_clear.push(*val);
}
}
}
for val in to_clear {
if let Some(taint) = state.get(val).cloned() {
let new_caps = taint.caps & !Cap::OPEN_REDIRECT;
if new_caps.is_empty() {
state.remove(val);
} else {
state.set(
val,
VarTaint {
caps: new_caps,
origins: taint.origins,
uses_summary: taint.uses_summary,
},
);
}
}
}
}
}
// ShellMetaValidated: inverted polarity, the FALSE branch (no metachar
// found) is the validated path; the TRUE branch is the rejection path.
//
@ -2203,6 +2294,8 @@ fn inline_analyse_callee(
receiver_seed: receiver_seed.as_ref(),
const_values: Some(&callee_body.opt.const_values),
type_facts: Some(&callee_body.opt.type_facts),
xml_parser_config: Some(&callee_body.opt.xml_parser_config),
xpath_config: Some(&callee_body.opt.xpath_config),
ssa_summaries: transfer.ssa_summaries,
extra_labels: transfer.extra_labels,
base_aliases: Some(&callee_body.opt.alias_result),
@ -5891,6 +5984,34 @@ fn collect_block_events(
sink_caps &= !Cap::DATA_EXFIL;
}
// Receiver-type-incompatibility stripping. When the receiver's type
// proves a structurally-attached cap cannot apply (e.g. an
// `LdapClient` receiver carrying an `HTML_ESCAPE` Sink label that was
// attached to the CFG node by a `*.send`/`*.json`-style suffix
// matcher), drop the offending bits *before* the type-qualified-
// resolution branch below, so that branch is reachable on the
// remaining empty `sink_caps` and can re-anchor a precise sink class
// (`LdapClient.search` → `Cap::LDAP_INJECTION`). Both the
// flow-sensitive type from `path_env` and the static type from
// `type_facts` are consulted; the static path is what enables
// closure-captured receivers (parent body → child body via
// [`crate::taint::inject_external_type_facts`]) to participate.
if let SsaOp::Call {
receiver: Some(rv), ..
} = &inst.op
{
if let Some(ref env) = state.path_env {
if let Some(kind) = env.get(*rv).types.as_singleton() {
sink_caps &= !receiver_incompatible_sink_caps(&kind, sink_caps);
}
}
if let Some(tf) = transfer.type_facts {
if let Some(kind) = tf.get_type(*rv) {
sink_caps &= !receiver_incompatible_sink_caps(kind, sink_caps);
}
}
}
// Type-qualified sink resolution: when normal sink resolution found nothing,
// try using the receiver's inferred type to construct a qualified callee name.
if sink_caps.is_empty() {
@ -5954,6 +6075,39 @@ fn collect_block_events(
}
}
// ADD XXE on opt-in. When the receiver was constructed
// with an explicit external-entity opt-in
// (`new XMLParser({ processEntities: true })`,
// `lxml.etree.XMLParser(resolve_entities=True)`), the subsequent
// `parser.parse(xml)` is an XXE flow even though the callee
// carries no flat XXE rule (fast-xml-parser and lxml are
// XXE-safe by default). Runs BEFORE the empty check below so a
// previously-empty sink_caps becomes non-empty and downstream
// emission proceeds. The complementary `xxe_safe` suppress path
// still runs after this; a call where the receiver was both
// opt-in AND later hardened by a setter results in net-zero
// (suppress strips what we added).
if let SsaOp::Call {
receiver: Some(rv),
callee: callee_str,
..
} = &inst.op
{
if let Some(xc) = transfer.xml_parser_config {
if xc.is_unsafe_explicit(*rv) {
let suffix = callee_str
.rsplit(['.', ':'])
.next()
.unwrap_or(callee_str.as_str());
// `feed` covers Python lxml incremental parsing
// (`parser.feed(body); parser.close()`).
if matches!(suffix, "parse" | "parseString" | "parseFromString" | "feed") {
sink_caps |= Cap::XXE;
}
}
}
}
if sink_caps.is_empty() {
// Callback pattern: check if callee has source_to_callback and the
// actual callback argument has a matching param_to_sink.
@ -6055,17 +6209,89 @@ fn collect_block_events(
continue;
}
// Receiver type incompatibility check.
// If the receiver's flow-sensitive type proves it cannot be the kind
// of object the sink expects (e.g., Int receiver → not an HTTP response
// sink), strip those sink caps.
if let Some(ref env) = state.path_env {
if sink_caps.is_empty() {
continue;
}
// XXE config-fact suppression. A parse-class sink whose receiver
// was provably hardened (`setFeature(FEATURE_SECURE_PROCESSING,
// true)`, `setExpandEntityReferences(false)`, etc.) is not an XXE
// flow. Drop the bit before downstream sink emission. Runs after
// type-qualified resolution / module alias resolution so the XXE
// bit added by `XmlParser.parse` resolution is visible here.
if sink_caps.intersects(Cap::XXE) {
if let SsaOp::Call {
receiver: Some(rv), ..
} = &inst.op
{
if let Some(kind) = env.get(*rv).types.as_singleton() {
sink_caps &= !receiver_incompatible_sink_caps(&kind, sink_caps);
if let Some(xc) = transfer.xml_parser_config {
if crate::ssa::xml_config::xxe_safe(Some(*rv), xc) {
sink_caps &= !Cap::XXE;
}
}
}
}
if sink_caps.is_empty() {
continue;
}
// XPath resolver-binding suppression. An XPath `evaluate` /
// `compile` sink whose receiver was provably bound to an
// `XPathVariableResolver` is treated as parameterised and the
// XPATH_INJECTION bit is stripped. Mirrors the XXE config-fact
// shape above. Only fires when the receiver also carries
// `TypeKind::XPathClient` (gates the suppression behind
// type-fact disambiguation so a generic `obj.evaluate(...)`
// matched as XPATH_INJECTION via name-only labelling does not
// accidentally clear).
if sink_caps.intersects(Cap::XPATH_INJECTION) {
if let SsaOp::Call {
receiver: Some(rv), ..
} = &inst.op
{
if let Some(xpc) = transfer.xpath_config {
let receiver_is_xpath = transfer
.type_facts
.and_then(|tf| tf.get_type(*rv))
.map(|kind| matches!(kind, crate::ssa::type_facts::TypeKind::XPathClient))
.unwrap_or(false);
if receiver_is_xpath && crate::ssa::xpath_config::xpath_safe(Some(*rv), xpc) {
sink_caps &= !Cap::XPATH_INJECTION;
}
}
}
}
if sink_caps.is_empty() {
continue;
}
// Prototype-pollution suppression (flow-sensitive).
// `Object.create(null)` produces a `NullPrototypeObject`-typed
// value; subscript writes to such an object cannot pollute
// `Object.prototype` because there is no prototype chain.
// Receiver SsaValue is read off the synthetic `__index_set__`
// Call op; phi joins downgrade to `Unknown` via `TypeFact::meet`
// so an if/else where only one branch initialises with
// `Object.create(null)` keeps the PROTOTYPE_POLLUTION bit on
// the unsafe path.
if sink_caps.intersects(Cap::PROTOTYPE_POLLUTION) {
if let SsaOp::Call {
callee,
receiver: Some(rv),
..
} = &inst.op
{
if callee == "__index_set__" {
let receiver_is_null_proto = transfer
.type_facts
.and_then(|tf| tf.get_type(*rv))
.map(|kind| {
matches!(kind, crate::ssa::type_facts::TypeKind::NullPrototypeObject)
})
.unwrap_or(false);
if receiver_is_null_proto {
sink_caps &= !Cap::PROTOTYPE_POLLUTION;
}
}
}
}
@ -6436,7 +6662,7 @@ fn pick_primary_sink_sites(
return Vec::new();
};
let mut out: Vec<SinkSite> = Vec::new();
let mut seen: HashSet<(String, u32, u32, u16)> = HashSet::new();
let mut seen: HashSet<(String, u32, u32, u32)> = HashSet::new();
for (param_idx, sites) in param_to_sink_sites {
let Some(arg_vals) = args.get(*param_idx) else {
continue;
@ -6475,7 +6701,7 @@ fn pick_primary_sink_sites_from_resolved(
return Vec::new();
}
let mut out: Vec<SinkSite> = Vec::new();
let mut seen: HashSet<(String, u32, u32, u16)> = HashSet::new();
let mut seen: HashSet<(String, u32, u32, u32)> = HashSet::new();
for (_, sites) in param_to_sink_sites {
for site in sites {
if site.line == 0 {
@ -8127,13 +8353,36 @@ fn type_safe_for_taint_sink(kind: &crate::ssa::type_facts::TypeKind, cap: Cap) -
fn receiver_incompatible_sink_caps(kind: &crate::ssa::type_facts::TypeKind, sink_caps: Cap) -> Cap {
use crate::ssa::type_facts::TypeKind;
let mut remove = Cap::empty();
// HTML_ESCAPE requires HTTP response-like receiver
if sink_caps.intersects(Cap::HTML_ESCAPE) {
// HTML_ESCAPE / OPEN_REDIRECT / HEADER_INJECTION all require an HTTP
// response-like receiver: each is a write-side rule that fires when
// attacker data is rendered into / written onto the response stream
// (`*.send` / `*.redirect` / `*.setHeader` / etc.). Receivers proven
// to be a different class — directory-service connections (LDAP),
// database connections, file handles, in-memory collections, query-
// builder objects, URL values, HTTP clients (request-side), and so on
// — cannot host these sinks even when a same-named matcher
// (`*.send`, `*.set`, `*.append`) attaches the label by suffix.
let response_like_caps = Cap::HTML_ESCAPE | Cap::OPEN_REDIRECT | Cap::HEADER_INJECTION;
if sink_caps.intersects(response_like_caps) {
match kind {
TypeKind::HttpResponse => {} // compatible
TypeKind::Unknown | TypeKind::Object => {} // could be response
_ => {
remove |= Cap::HTML_ESCAPE;
remove |= sink_caps & response_like_caps;
}
}
}
// LDAP_INJECTION strictly requires a directory-service receiver.
// Non-LdapClient receivers carrying the cap by accident (e.g. a
// generic `*.search` suffix matcher firing on a Vec/HashMap) get the
// bit stripped. Unknown/Object stay untouched so type-fact gaps
// don't silently drop real sinks.
if sink_caps.intersects(Cap::LDAP_INJECTION) {
match kind {
TypeKind::LdapClient => {} // compatible
TypeKind::Unknown | TypeKind::Object => {} // could be ldap
_ => {
remove |= Cap::LDAP_INJECTION;
}
}
}
@ -9364,7 +9613,7 @@ fn resolve_callee_full(
}
// 0.5) Cross-file SSA summaries (GlobalSummaries.ssa_by_key) with
// optional Phase-6 hierarchy fan-out.
// optional class-hierarchy fan-out.
//
// When the call has an authoritative receiver type AND
// `GlobalSummaries::install_hierarchy` has been called AND the
@ -9468,7 +9717,7 @@ fn resolve_callee_full(
}
}
// 2) Global same-language (FuncSummary path) with Phase-6 hierarchy
// 2) Global same-language (FuncSummary path) with class-hierarchy
// fan-out. Same semantics as step 0.5 but on coarse FuncSummary
// entries, the SSA path missed because no implementer had an SSA
// summary, so we widen the FuncSummary lookup symmetrically.

View file

@ -246,6 +246,8 @@ pub fn extract_ssa_func_summary_full(
receiver_seed: None,
const_values: None,
type_facts: local_type_facts_ref,
xml_parser_config: None,
xpath_config: None,
ssa_summaries,
extra_labels: None,
base_aliases: None,
@ -792,6 +794,8 @@ pub fn extract_ssa_func_summary_full(
receiver_seed: None,
const_values: None,
type_facts: local_type_facts_ref,
xml_parser_config: None,
xpath_config: None,
ssa_summaries,
extra_labels: None,
base_aliases: None,

View file

@ -93,6 +93,8 @@ mod cross_file_tests {
type_facts: crate::ssa::type_facts::TypeFactResult {
facts: std::collections::HashMap::new(),
},
xml_parser_config: crate::ssa::xml_config::XmlParserConfigResult::default(),
xpath_config: crate::ssa::xpath_config::XPathConfigResult::default(),
alias_result: crate::ssa::alias::BaseAliasResult::empty(),
points_to: crate::ssa::heap::PointsToResult::empty(),
module_aliases: std::collections::HashMap::new(),
@ -251,7 +253,7 @@ mod inline_cache_epoch_tests {
ArgTaintSig(SmallVec::new())
}
fn shape(caps_bits: u16) -> CachedInlineShape {
fn shape(caps_bits: u32) -> CachedInlineShape {
CachedInlineShape(Some(ReturnShape {
caps: Cap::from_bits_retain(caps_bits),
internal_origins: SmallVec::new(),
@ -448,7 +450,7 @@ mod binding_key_tests {
// ── seed_lookup ────────────────────────────────────────────────────
fn taint(caps: u16) -> VarTaint {
fn taint(caps: u32) -> VarTaint {
VarTaint {
caps: Cap::from_bits_truncate(caps),
origins: smallvec![],
@ -989,6 +991,8 @@ mod goto_succ_propagation_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -1079,6 +1083,8 @@ mod goto_succ_propagation_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -1516,10 +1522,10 @@ mod receiver_candidates_field_proj_tests {
#[test]
fn field_proj_receiver_walks_to_typed_root_in_go() {
// Go is not Rust, so pre-Phase-4 the candidate walk would have
// returned ONLY the immediate receiver (v2 = FieldProj). With
// We walk through FieldProj.receiver to recover v0 (the
// typed root `c`).
// Go is not Rust, so before the FieldProj walk fix the candidate
// walk would have returned ONLY the immediate receiver
// (v2 = FieldProj). We now walk through FieldProj.receiver to
// recover v0 (the typed root `c`).
let body = body_with_field_proj_chain();
let cands =
super::super::receiver_candidates_for_type_lookup(SsaValue(2), Some(&body), Lang::Go);
@ -1709,7 +1715,7 @@ mod fanout_merge_tests {
];
let m = merge_resolved_summaries_fanout(a, b);
let mut sorted: Vec<(usize, u16)> = m
let mut sorted: Vec<(usize, u32)> = m
.param_to_sink
.iter()
.map(|(i, c)| (*i, c.bits()))
@ -2032,6 +2038,8 @@ mod field_write_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -2114,6 +2122,8 @@ mod field_write_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -2180,6 +2190,8 @@ mod field_write_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -2324,6 +2336,8 @@ mod field_write_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -2420,6 +2434,8 @@ mod container_elem_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -2697,6 +2713,8 @@ mod container_elem_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -2833,6 +2851,8 @@ mod container_elem_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -3387,6 +3407,8 @@ mod field_taint_origin_cap_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -3673,6 +3695,8 @@ mod pointer_lattice_worklist_tests {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,

View file

@ -45,6 +45,8 @@ fn ssa_analyse_rust(src: &[u8]) -> Vec<Finding> {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -1669,10 +1671,10 @@ fn cpp_builder_chain_const_host_silent() {
/// inline member-function bodies inside a
/// `class_specifier` must be extracted as separate functions and
/// intra-file calls must resolve to their bodies. Pre-Phase-4, the
/// `class_specifier` AST kind was unmapped in cpp KINDS, so the CFG
/// walker treated the entire class as a leaf `Seq` node and never
/// descended into inline methods.
/// intra-file calls must resolve to their bodies. Before the cpp KINDS
/// fix the `class_specifier` AST kind was unmapped, so the CFG walker
/// treated the entire class as a leaf `Seq` node and never descended
/// into inline methods.
#[test]
fn cpp_inline_class_method_resolves() {
let src = b"#include <cstdlib>\nclass Inner {\npublic:\n void run(const char* arg) { std::system(arg); }\n};\nint main() {\n char* input = std::getenv(\"X\");\n Inner inner;\n inner.run(input);\n return 0;\n}\n";
@ -3768,6 +3770,8 @@ fn assert_ssa_integration(src: &[u8]) {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -3904,6 +3908,8 @@ fn integ_php_echo_simple_var() {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,
@ -3972,6 +3978,8 @@ fn integ_c_curl_handle_ssrf() {
receiver_seed: None,
const_values: None,
type_facts: None,
xml_parser_config: None,
xpath_config: None,
ssa_summaries: None,
extra_labels: None,
base_aliases: None,

View file

@ -74,6 +74,14 @@ pub enum CapName {
Crypto,
/// Request-bound identifier not yet ownership-checked.
UnauthorizedId,
DataExfil,
LdapInjection,
XpathInjection,
HeaderInjection,
OpenRedirect,
Ssti,
Xxe,
PrototypePollution,
All,
}
@ -94,6 +102,14 @@ impl CapName {
Self::CodeExec => Cap::CODE_EXEC,
Self::Crypto => Cap::CRYPTO,
Self::UnauthorizedId => Cap::UNAUTHORIZED_ID,
Self::DataExfil => Cap::DATA_EXFIL,
Self::LdapInjection => Cap::LDAP_INJECTION,
Self::XpathInjection => Cap::XPATH_INJECTION,
Self::HeaderInjection => Cap::HEADER_INJECTION,
Self::OpenRedirect => Cap::OPEN_REDIRECT,
Self::Ssti => Cap::SSTI,
Self::Xxe => Cap::XXE,
Self::PrototypePollution => Cap::PROTOTYPE_POLLUTION,
Self::All => Cap::all(),
}
}
@ -115,6 +131,14 @@ impl fmt::Display for CapName {
Self::CodeExec => write!(f, "code_exec"),
Self::Crypto => write!(f, "crypto"),
Self::UnauthorizedId => write!(f, "unauthorized_id"),
Self::DataExfil => write!(f, "data_exfil"),
Self::LdapInjection => write!(f, "ldap_injection"),
Self::XpathInjection => write!(f, "xpath_injection"),
Self::HeaderInjection => write!(f, "header_injection"),
Self::OpenRedirect => write!(f, "open_redirect"),
Self::Ssti => write!(f, "ssti"),
Self::Xxe => write!(f, "xxe"),
Self::PrototypePollution => write!(f, "prototype_pollution"),
Self::All => write!(f, "all"),
}
}
@ -137,11 +161,21 @@ impl FromStr for CapName {
"code_exec" => Ok(Self::CodeExec),
"crypto" => Ok(Self::Crypto),
"unauthorized_id" => Ok(Self::UnauthorizedId),
"data_exfil" | "data_exfiltration" => Ok(Self::DataExfil),
"ldap_injection" | "ldapi" => Ok(Self::LdapInjection),
"xpath_injection" | "xpathi" => Ok(Self::XpathInjection),
"header_injection" | "crlf" | "response_splitting" => Ok(Self::HeaderInjection),
"open_redirect" | "redirect" => Ok(Self::OpenRedirect),
"ssti" | "template_injection" => Ok(Self::Ssti),
"xxe" => Ok(Self::Xxe),
"prototype_pollution" | "proto_pollution" => Ok(Self::PrototypePollution),
"all" => Ok(Self::All),
_ => Err(format!(
"invalid cap name: {s:?} (expected env_var, html_escape, shell_escape, \
url_encode, json_parse, file_io, fmt_string, sql_query, deserialize, \
ssrf, code_exec, crypto, unauthorized_id, all)"
ssrf, code_exec, crypto, unauthorized_id, data_exfil, ldap_injection, \
xpath_injection, header_injection, open_redirect, ssti, xxe, \
prototype_pollution, all)"
)),
}
}

View file

@ -1,6 +1,6 @@
{
"required_findings": [
{ "id_prefix": "taint-unsanitised-flow", "min_count": 1 }
{ "id_prefix": "taint-open-redirect", "min_count": 1 }
],
"forbidden_findings": [],
"noise_budget": {

View file

@ -0,0 +1,18 @@
// Safe: query value routed through the project-local `stripCRLF` helper
// before being written to the response header.
package main
import (
"net/http"
"strings"
)
func stripCRLF(raw string) string {
return strings.ReplaceAll(strings.ReplaceAll(raw, "\r", ""), "\n", "")
}
func handler(w http.ResponseWriter, r *http.Request) {
lang := r.URL.Query().Get("lang")
safe := stripCRLF(lang)
w.Header().Set("X-Lang", safe)
}

View file

@ -0,0 +1,12 @@
// Unsafe: net/http `ResponseWriter.Header().Set` receives a value built from
// `r.URL.Query().Get`. HEADER_INJECTION fires on the value argument.
package main
import (
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
lang := r.URL.Query().Get("lang")
w.Header().Set("X-Lang", lang)
}

View file

@ -0,0 +1,16 @@
// Safe: request parameter routed through the project-local `stripCRLF`
// helper before being written to the response header.
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class SafeSetHeader {
public static String stripCRLF(String raw) {
return raw.replace("\r", "").replace("\n", "");
}
public void handle(HttpServletRequest req, HttpServletResponse res) {
String lang = req.getParameter("lang");
String safe = stripCRLF(lang);
res.setHeader("X-Lang", safe);
}
}

View file

@ -0,0 +1,11 @@
// Unsafe: HttpServletResponse.setHeader receives a value built from a
// request parameter. HEADER_INJECTION fires on the value argument.
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class UnsafeSetHeader {
public void handle(HttpServletRequest req, HttpServletResponse res) {
String lang = req.getParameter("lang");
res.setHeader("X-Lang", lang);
}
}

View file

@ -0,0 +1,14 @@
// Safe: req.query.lang routed through the project-local `stripCRLF` helper
// before being written to the response header.
function stripCRLF(raw) {
return raw.replace(/[\r\n]/g, '');
}
function handler(req, res) {
const lang = req.query.lang;
const safe = stripCRLF(lang);
res.setHeader('X-Lang', safe);
res.end();
}
module.exports = handler;

View file

@ -0,0 +1,14 @@
// Safe: req.query.lang routed through the project-local `stripCRLF` helper
// (a registered HEADER_INJECTION sanitizer) before the subscript-set, so
// taint-header-injection stays clean.
function stripCRLF(raw) {
return raw.replace(/[\r\n]/g, '');
}
function handler(req, res) {
const lang = req.query.lang;
res.headers["X-Forwarded-By"] = stripCRLF(lang);
res.end();
}
module.exports = handler;

View file

@ -0,0 +1,9 @@
// Unsafe: Express `res.setHeader` receives a value built from req.query.
// HEADER_INJECTION fires on the value argument.
function handler(req, res) {
const lang = req.query.lang;
res.setHeader('X-Lang', lang);
res.end();
}
module.exports = handler;

View file

@ -0,0 +1,11 @@
// Unsafe: tainted req.query value flows into the bare-subscript header set
// `res.headers["X-Forwarded-By"] = lang`. The LHS-subscript classification
// path matches `res.headers` as a HEADER_INJECTION sink so this form fires
// alongside the explicit `setHeader` / `res.set` method-call shapes.
function handler(req, res) {
const lang = req.query.lang;
res.headers["X-Forwarded-By"] = lang;
res.end();
}
module.exports = handler;

View file

@ -0,0 +1,10 @@
<?php
// Safe: $_GET['lang'] routed through the project-local `strip_crlf` helper
// before concatenation.
function strip_crlf($raw) {
return str_replace(["\r", "\n"], ["", ""], $raw);
}
$lang = $_GET['lang'];
$safe = strip_crlf($lang);
header("X-Lang: " . $safe);

View file

@ -0,0 +1,6 @@
<?php
// Unsafe: $_GET['lang'] concatenated into a `header()` line. The bare
// `header` matcher (exact-match sigil) fires on the call. Tainted input
// without `\r\n` stripping permits response splitting.
$lang = $_GET['lang'];
header("X-Lang: " . $lang);

View file

@ -0,0 +1,15 @@
# Safe: request arg routed through `strip_crlf` before being added to the
# response headers.
from flask import request, make_response
def strip_crlf(raw):
return raw.replace("\r", "").replace("\n", "")
def handler():
lang = request.args.get("lang")
safe = strip_crlf(lang)
resp = make_response("ok")
resp.headers.add("X-Lang", safe)
return resp

View file

@ -0,0 +1,15 @@
# Safe: request arg routed through `strip_crlf` (a registered
# HEADER_INJECTION sanitizer) before the subscript-set, so
# taint-header-injection stays clean.
from flask import request, make_response
def strip_crlf(raw):
return raw.replace("\r", "").replace("\n", "")
def handler():
lang = request.args.get("lang")
response = make_response("ok")
response.headers["X-Forwarded-By"] = strip_crlf(lang)
return response

View file

@ -0,0 +1,10 @@
# Unsafe: Flask response.headers.add receives a value built from request
# args. HEADER_INJECTION fires on the value argument.
from flask import request, make_response
def handler():
lang = request.args.get("lang")
resp = make_response("ok")
resp.headers.add("X-Lang", lang)
return resp

View file

@ -0,0 +1,13 @@
# Unsafe: tainted request value flows into the bare-subscript header set
# `response.headers["X-Forwarded-By"] = lang`. The LHS-subscript
# classification path matches `response.headers` / `resp.headers` as a
# HEADER_INJECTION sink so this form fires alongside the explicit
# `headers.add` / `set_cookie` method-call shapes.
from flask import request, make_response
def handler():
lang = request.args.get("lang")
response = make_response("ok")
response.headers["X-Forwarded-By"] = lang
return response

View file

@ -0,0 +1,7 @@
# Safe: tainted request value routed through `strip_crlf` (a registered
# HEADER_INJECTION sanitizer) before the subscript-set, so taint-header-injection
# stays clean.
def handle(params, response)
lang = params["lang"]
response.headers["X-Forwarded-By"] = strip_crlf(lang)
end

View file

@ -0,0 +1,9 @@
# Unsafe: tainted request value flows into the bare-subscript header set
# `response.headers["X-Forwarded-By"] = lang`. The LHS-subscript
# classification path matches `response.headers` as a HEADER_INJECTION
# sink so this form fires alongside the explicit `set_header` /
# `add_header` method-call shapes.
def handle(params, response)
lang = params["lang"]
response.headers["X-Forwarded-By"] = lang
end

View file

@ -0,0 +1,14 @@
// Safe: env value routed through the project-local `strip_crlf` helper
// before being written to the response header.
use std::env;
fn strip_crlf(raw: &str) -> String {
raw.replace('\r', "").replace('\n', "")
}
fn handler(response: &mut http::Response<()>) {
let lang = env::var("LANG").unwrap_or_default();
let safe = strip_crlf(&lang);
let value = http::HeaderValue::from_str(&safe).unwrap();
response.headers_mut().insert("X-Lang", value);
}

View file

@ -0,0 +1,9 @@
// Unsafe: tainted env value flows into `response.headers_mut().insert`.
// HEADER_INJECTION fires on the value argument.
use std::env;
fn handler(response: &mut http::Response<()>) {
let lang = env::var("LANG").unwrap_or_default();
let value = http::HeaderValue::from_str(&lang).unwrap();
response.headers_mut().insert("X-Lang", value);
}

View file

@ -0,0 +1,12 @@
// Safe: req.query.lang routed through `stripCRLF` before being written to
// the response header.
function stripCRLF(raw: string): string {
return raw.replace(/[\r\n]/g, '');
}
export function handler(req: any, res: any): void {
const lang: string = req.query.lang;
const safe: string = stripCRLF(lang);
res.setHeader('X-Lang', safe);
res.end();
}

View file

@ -0,0 +1,12 @@
// Safe: req.query.lang routed through the project-local `stripCRLF` helper
// (a registered HEADER_INJECTION sanitizer) before the subscript-set, so
// taint-header-injection stays clean.
function stripCRLF(raw: string): string {
return raw.replace(/[\r\n]/g, '');
}
export function handler(req: any, res: any): void {
const lang: string = req.query.lang;
res.headers["X-Forwarded-By"] = stripCRLF(lang);
res.end();
}

View file

@ -0,0 +1,7 @@
// Unsafe: Express `res.setHeader` receives a value built from req.query.
// HEADER_INJECTION fires on the value argument.
export function handler(req: any, res: any): void {
const lang: string = req.query.lang;
res.setHeader('X-Lang', lang);
res.end();
}

View file

@ -0,0 +1,9 @@
// Unsafe: tainted req.query value flows into the bare-subscript header set
// `res.headers["X-Forwarded-By"] = lang`. The LHS-subscript classification
// path matches `res.headers` as a HEADER_INJECTION sink so this form fires
// alongside the explicit `setHeader` / `res.set` method-call shapes.
export function handler(req: any, res: any): void {
const lang: string = req.query.lang;
res.headers["X-Forwarded-By"] = lang;
res.end();
}

View file

@ -1,6 +1,6 @@
{
"required_findings": [
{ "id_prefix": "taint-unsanitised-flow", "min_count": 1 }
{ "id_prefix": "taint-open-redirect", "min_count": 1 }
],
"forbidden_findings": [],
"noise_budget": {

View file

@ -0,0 +1,12 @@
/* Baseline: filter is a string literal, no LDAP_INJECTION finding. */
#include <ldap.h>
int do_lookup(LDAP *ld) {
LDAPMessage *res = NULL;
return ldap_search_ext_s(
ld,
"ou=people,dc=example,dc=com",
LDAP_SCOPE_SUBTREE,
"(objectClass=person)",
NULL, 0, NULL, NULL, NULL, 0, &res);
}

View file

@ -0,0 +1,19 @@
/* Safe: project-local sanitize_ldap_filter (matches the developer-named
* `sanitize_*` Sanitizer rule) clears caps on the user value before it
* reaches ldap_search_ext_s. */
#include <ldap.h>
#include <stdlib.h>
extern char *sanitize_ldap_filter(const char *raw);
int do_lookup(LDAP *ld) {
char *user_filter = getenv("USER_FILTER");
char *safe = sanitize_ldap_filter(user_filter);
LDAPMessage *res = NULL;
return ldap_search_ext_s(
ld,
"ou=people,dc=example,dc=com",
LDAP_SCOPE_SUBTREE,
safe,
NULL, 0, NULL, NULL, NULL, 0, &res);
}

View file

@ -0,0 +1,15 @@
/* Unsafe: tainted env-string passed straight as the LDAP filter argument
* to ldap_search_ext_s. LDAP_INJECTION fires on the filter (arg 3). */
#include <ldap.h>
#include <stdlib.h>
int do_lookup(LDAP *ld) {
char *user_filter = getenv("USER_FILTER");
LDAPMessage *res = NULL;
return ldap_search_ext_s(
ld,
"ou=people,dc=example,dc=com",
LDAP_SCOPE_SUBTREE,
user_filter,
NULL, 0, NULL, NULL, NULL, 0, &res);
}

View file

@ -0,0 +1,12 @@
// Baseline: literal filter, no taint reaches the sink.
#include <ldap.h>
int do_lookup(LDAP* ld) {
LDAPMessage* res = nullptr;
return ldap_search_ext_s(
ld,
"ou=people,dc=example,dc=com",
LDAP_SCOPE_SUBTREE,
"(objectClass=person)",
nullptr, 0, nullptr, nullptr, nullptr, 0, &res);
}

View file

@ -0,0 +1,18 @@
// Safe: developer-named sanitize_* helper clears caps on the user value
// before it reaches ldap_search_ext_s.
#include <cstdlib>
#include <ldap.h>
extern const char* sanitize_ldap_filter(const char* raw);
int do_lookup(LDAP* ld) {
const char* user_filter = std::getenv("USER_FILTER");
const char* safe = sanitize_ldap_filter(user_filter);
LDAPMessage* res = nullptr;
return ldap_search_ext_s(
ld,
"ou=people,dc=example,dc=com",
LDAP_SCOPE_SUBTREE,
safe,
nullptr, 0, nullptr, nullptr, nullptr, 0, &res);
}

View file

@ -0,0 +1,15 @@
// Unsafe: tainted env value passed straight as the LDAP filter argument to
// ldap_search_ext_s. LDAP_INJECTION fires on the filter argument (position 3).
#include <cstdlib>
#include <ldap.h>
int do_lookup(LDAP* ld) {
const char* user_filter = std::getenv("USER_FILTER");
LDAPMessage* res = nullptr;
return ldap_search_ext_s(
ld,
"ou=people,dc=example,dc=com",
LDAP_SCOPE_SUBTREE,
user_filter,
nullptr, 0, nullptr, nullptr, nullptr, 0, &res);
}

View file

@ -0,0 +1,20 @@
// Baseline: filter is a literal string, no taint reaches NewSearchRequest.
package ldap_baseline
import (
"github.com/go-ldap/ldap/v3"
)
func Lookup() {
conn, _ := ldap.DialURL("ldap://example.com")
req := ldap.NewSearchRequest(
"ou=people,dc=example,dc=com",
ldap.ScopeWholeSubtree,
ldap.NeverDerefAliases,
0, 0, false,
"(objectClass=person)",
[]string{"cn"},
nil,
)
conn.Search(req)
}

View file

@ -0,0 +1,27 @@
// Safe: ldap.EscapeFilter applies RFC 4515 escaping before the user value
// is interpolated into the filter. Sanitizer(LDAP_INJECTION) clears the cap.
package ldap_safe
import (
"fmt"
"net/http"
"github.com/go-ldap/ldap/v3"
)
func Lookup(w http.ResponseWriter, r *http.Request) {
conn, _ := ldap.DialURL("ldap://example.com")
user := r.FormValue("user")
safe := ldap.EscapeFilter(user)
filter := fmt.Sprintf("(uid=%s)", safe)
req := ldap.NewSearchRequest(
"ou=people,dc=example,dc=com",
ldap.ScopeWholeSubtree,
ldap.NeverDerefAliases,
0, 0, false,
filter,
[]string{"cn"},
nil,
)
conn.Search(req)
}

View file

@ -0,0 +1,28 @@
// Unsafe: form value concatenated into an LDAP filter passed to
// ldap.NewSearchRequest, then executed via conn.Search. The construction
// call is tagged Cap::LDAP_INJECTION on the filter argument so the finding
// fires here regardless of the eventual conn.Search execution site.
package ldap_unsafe
import (
"fmt"
"net/http"
"github.com/go-ldap/ldap/v3"
)
func Lookup(w http.ResponseWriter, r *http.Request) {
conn, _ := ldap.DialURL("ldap://example.com")
user := r.FormValue("user")
filter := fmt.Sprintf("(uid=%s)", user)
req := ldap.NewSearchRequest(
"ou=people,dc=example,dc=com",
ldap.ScopeWholeSubtree,
ldap.NeverDerefAliases,
0, 0, false,
filter,
[]string{"cn"},
nil,
)
conn.Search(req)
}

View file

@ -0,0 +1,14 @@
// Baseline: the filter is a compile-time constant; no taint reaches the sink
// and no LDAP_INJECTION finding fires. Guards the rule against firing on
// safe-by-construction call sites that simply happen to hit a search API.
import javax.naming.directory.DirContext;
import javax.naming.directory.SearchControls;
public class BaselineConstantLdap {
private DirContext ctx;
public Object lookup() throws Exception {
String filter = "(objectClass=person)";
return ctx.search("ou=people,dc=example,dc=com", filter, new SearchControls());
}
}

View file

@ -0,0 +1,19 @@
// Safe: the user-supplied substring is run through Spring LDAP's
// LdapEncoder.filterEncode (RFC 4515 escape) before being assembled into the
// filter. The Sanitizer(LDAP_INJECTION) clears the cap and the sink does not
// fire.
import javax.naming.directory.DirContext;
import javax.naming.directory.SearchControls;
import javax.servlet.http.HttpServletRequest;
import org.springframework.ldap.support.LdapEncoder;
public class SafeLdapSearch {
private DirContext ctx;
public Object lookup(HttpServletRequest req) throws Exception {
String user = req.getParameter("user");
String safe = LdapEncoder.filterEncode(user);
String filter = "(uid=" + safe + ")";
return ctx.search("ou=people,dc=example,dc=com", filter, new SearchControls());
}
}

View file

@ -0,0 +1,17 @@
// Unsafe: attacker-controlled username concatenated into an LDAP filter passed
// to DirContext.search. The receiver `ctx` carries TypeKind::LdapClient via
// the declared `DirContext` type so type-qualified resolution rewrites the
// callee to `LdapClient.search` and the LDAP_INJECTION sink fires.
import javax.naming.directory.DirContext;
import javax.naming.directory.SearchControls;
import javax.servlet.http.HttpServletRequest;
public class UnsafeLdapSearch {
private DirContext ctx;
public Object lookup(HttpServletRequest req) throws Exception {
String user = req.getParameter("user");
String filter = "(uid=" + user + ")";
return ctx.search("ou=people,dc=example,dc=com", filter, new SearchControls());
}
}

View file

@ -0,0 +1,11 @@
// Baseline: filter is a literal constant; no taint reaches the search call.
const ldap = require('ldapjs');
const client = ldap.createClient({ url: 'ldap://example.com' });
function lookup(_req, res) {
const filter = '(objectClass=person)';
client.search('ou=people,dc=example,dc=com', { filter: filter }, (err) => { res.json({ ok: !err }); });
}
module.exports = lookup;

View file

@ -0,0 +1,16 @@
// Safe: ldap-escape's `filter` helper escapes the user-controlled substring
// before it lands in the filter expression. Mirrors the unsafe sibling's
// bound-variable shape so only the sanitiser introduction differs.
const ldap = require('ldapjs');
const ldapEscape = require('ldap-escape');
const client = ldap.createClient({ url: 'ldap://example.com' });
function lookup(req, res) {
const user = req.query.user;
const safe = ldapEscape(user);
const filter = '(uid=' + safe + ')';
client.search('ou=people,dc=example,dc=com', { filter: filter }, (err) => { res.json({ ok: !err }); });
}
module.exports = lookup;

View file

@ -0,0 +1,16 @@
// Unsafe: ldapjs `client.search` receives a filter assembled from req.query.
// Bound-variable idiom: the closure-captured `client` carries
// `TypeKind::LdapClient` (forwarded from the top-level body to the function
// body by `taint::inject_external_type_facts`), so type-qualified receiver
// resolution rewrites `client.search` → `LdapClient.search`.
const ldap = require('ldapjs');
const client = ldap.createClient({ url: 'ldap://example.com' });
function lookup(req, res) {
const user = req.query.user;
const filter = '(uid=' + user + ')';
client.search('ou=people,dc=example,dc=com', { filter: filter }, (err) => { res.json({ ok: !err }); });
}
module.exports = lookup;

View file

@ -0,0 +1,4 @@
<?php
// Baseline: filter is a literal string, no taint reaches the sink.
$ds = ldap_connect("ldap://example.com");
$result = ldap_search($ds, "ou=people,dc=example,dc=com", "(objectClass=person)");

View file

@ -0,0 +1,9 @@
<?php
// Safe: ldap_escape() with LDAP_ESCAPE_FILTER (or default) sanitises the user
// substring before it lands in the filter. Sanitizer(LDAP_INJECTION) clears
// the cap so the sink does not fire.
$ds = ldap_connect("ldap://example.com");
$user = $_GET['user'];
$safe = ldap_escape($user, "", LDAP_ESCAPE_FILTER);
$filter = "(uid=" . $safe . ")";
$result = ldap_search($ds, "ou=people,dc=example,dc=com", $filter);

View file

@ -0,0 +1,7 @@
<?php
// Unsafe: $_GET['user'] concatenated into an LDAP filter and passed straight
// to ldap_search. LDAP_INJECTION fires on the filter argument.
$ds = ldap_connect("ldap://example.com");
$user = $_GET['user'];
$filter = "(uid=" . $user . ")";
$result = ldap_search($ds, "ou=people,dc=example,dc=com", $filter);

View file

@ -0,0 +1,10 @@
# Baseline: filter is a compile-time constant. No taint reaches `search_s` so
# no LDAP_INJECTION finding fires.
import ldap
def lookup():
conn = ldap.initialize("ldap://example.com")
return conn.search_s(
"ou=people,dc=example,dc=com", ldap.SCOPE_SUBTREE, "(objectClass=person)"
)

View file

@ -0,0 +1,14 @@
# Safe: user-supplied substring run through `escape_filter_chars` (RFC 4515)
# before being concatenated into the filter. The sanitizer clears the
# LDAP_INJECTION cap so the sink does not fire.
import ldap
from ldap.filter import escape_filter_chars
from flask import request
def lookup():
conn = ldap.initialize("ldap://example.com")
user = request.form["user"]
safe = escape_filter_chars(user)
flt = "(uid=" + safe + ")"
return conn.search_s("ou=people,dc=example,dc=com", ldap.SCOPE_SUBTREE, flt)

Some files were not shown because too many files have changed in this diff Show more