Python fp and docs updtes (#58)

* refactor: Update comments for clarity and add expectations.json files for performance metrics * feat: Implement FP guard for JS/TS local-collection receivers to suppress missing ownership checks * feat: Enhance Rust parameter handling to classify local collections and prevent false ownership checks * refactor: Simplify code formatting for better readability in multiple files * refactor: Improve UTF-8 sequence length handling and enhance clarity in loop iteration * feat: Update Java and Python patterns to include new security rules * refactor: Improve comment clarity and consistency across multiple Rust files * refactor: Simplify code formatting for improved readability in integration tests and module files * refactor: Improve comment formatting and enhance clarity in assertions across multiple files
2026-07-21 21:31:03 +02:00 · 2026-04-29 19:53:34 -04:00 · 2026-04-29 19:53:34 -04:00 · a438886217
commit a438886217
parent 4db0805de6
291 changed files with 9485 additions and 3851 deletions
--- a/docs/detectors.md
+++ b/docs/detectors.md
@ -9,6 +9,16 @@ Nyx ships four independent detector families. They run together in `--mode full`
 | [State model](detectors/state.md) | `state-*` | Per-function state lattice | Use-after-close, double-close, leaks, unauthenticated access |
 | [AST patterns](detectors/patterns.md) | `<lang>.<cat>.<name>` | Tree-sitter structural match | Banned APIs, weak crypto, dangerous constructs |

+The taint family is split into cap-specific rule classes when a sink callee carries multiple vulnerability classes:
+
+| Rule id | Cap | Surface |
+|---|---|---|
+| `taint-unsanitised-flow` | every cap except `data_exfil` and `unauthorized_id` | Default taint flow class |
+| `taint-data-exfiltration` | `data_exfil` | Sensitive data flowing into the payload of an outbound network request (body / headers / json on `fetch`, body on `XMLHttpRequest.send`). Distinct from SSRF: the destination is fixed but attacker-influenced bytes leave the process. |
+| `rs.auth.missing_ownership_check.taint` | `unauthorized_id` | Rust auth subsystem fold-in; see [auth.md](auth.md). |
+
+A single call site can fire several of these at once when it carries multiple gates — `fetch(taintedUrl, {body: tainted})` produces both an SSRF finding (URL flow) and a `taint-data-exfiltration` finding (body flow), each with its own cap mask rather than a conflated union.
+
 For Rust auth-specific rules (`rs.auth.*`), see [auth.md](auth.md).

 ## How they combine
--- a/docs/detectors/taint.md
+++ b/docs/detectors/taint.md
@ -134,7 +134,8 @@ Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer onl
 | `fmt_string` | | | `printf(var)` |
 | `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
 | `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
-| `ssrf` | | URL-prefix locks | `requests.get`, `fetch`, `HttpClient.send` |
+| `ssrf` | | URL-prefix locks | `requests.get`, `fetch` URL arg, outbound HTTP destination |
+| `data_exfil` | | | `fetch` body / headers / json, `XMLHttpRequest.send` body |
 | `code_exec` | | | `eval`, `exec`, `Function` |
 | `crypto` | | | weak-algorithm constructors |
 | `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
--- a/docs/rules.md
+++ b/docs/rules.md
@ -112,12 +112,14 @@ The tables below are generated from `src/patterns/<lang>.rs` by [`tools/docgen`]
 | `go.crypto.md5` | Low | A | Medium |
 | `go.crypto.sha1` | Low | A | Medium |

-### Java: 8 patterns
+### Java: 10 patterns

 | Rule ID | Severity | Tier | Confidence |
 |---|---|---|---|
 | `java.cmdi.runtime_exec` | High | A | High |
+| `java.code_exec.text4shell_interpolator` | High | A | High |
 | `java.deser.readobject` | High | A | High |
+| `java.deser.snakeyaml_unsafe_constructor` | High | A | High |
 | `java.reflection.class_forname` | Medium | A | High |
 | `java.reflection.method_invoke` | Medium | A | High |
 | `java.sqli.execute_concat` | Medium | B | Medium |
@ -168,7 +170,7 @@ The tables below are generated from `src/patterns/<lang>.rs` by [`tools/docgen`]
 | `php.crypto.rand` | Low | A | Medium |
 | `php.crypto.sha1` | Low | A | Medium |

-### Python: 13 patterns
+### Python: 14 patterns

 | Rule ID | Severity | Tier | Confidence |
 |---|---|---|---|
@ -182,6 +184,7 @@ The tables below are generated from `src/patterns/<lang>.rs` by [`tools/docgen`]
 | `py.code_exec.compile` | Medium | A | High |
 | `py.deser.shelve_open` | Medium | A | High |
 | `py.sqli.execute_format` | Medium | B | Medium |
+| `py.sqli.text_format` | Medium | B | Medium |
 | `py.xss.jinja_from_string` | Medium | A | High |
 | `py.crypto.md5` | Low | A | Medium |
 | `py.crypto.sha1` | Low | A | Medium |