Critical bug fixes and recall improvements (#68)

2026-07-21 21:31:03 +02:00 · 2026-05-11 12:42:39 -04:00 · 2026-05-11 12:42:39 -04:00 · 55247b7fcd
commit 55247b7fcd
parent 7d0e7320e2
352 changed files with 60069 additions and 900 deletions
--- a/docs/advanced-analysis.md
+++ b/docs/advanced-analysis.md
@ -101,7 +101,7 @@ origin-attribution.
 taint flow to the return value is fully validated by a dominating
 predicate (regex allowlist, type check, validation call) on every
 return path. At call sites, each tainted argument passed to a
-validated position — and the call's own return value — are marked
+validated position, and the call's own return value, are marked
 `validated_must` / `validated_may` in the caller's SSA taint state,
 the same way an inline `if (!regex.test(x)) throw …` would validate
 the surviving branch. Sound because the summary is recorded only when
--- a/docs/auth.md
+++ b/docs/auth.md
@ -1,6 +1,6 @@
 # Auth analysis

-**Rust today.** Other languages have rule scaffolding in [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) (Python, Ruby, Go, Java, JavaScript, TypeScript), but only Rust has benchmark corpus coverage and the precision work to back it. Treat findings on other languages as preview; the rule prefix (`py.auth.*`, `js.auth.*`, `rb.auth.*`, `go.auth.*`, `java.auth.*`) is reserved but the matchers haven't been validated against real codebases yet.
+**Rust is the stable target.** Python and Go have shipped precision work as of 0.7.0 (FastAPI cross-file dependencies, Go DAO-helper filtering, same-file caller-scope IPA) and are usable on real codebases. Ruby, Java, JavaScript, and TypeScript have rule scaffolding in [`src/auth_analysis/config.rs`](https://github.com/elicpeter/nyx/blob/master/src/auth_analysis/config.rs) but no benchmark corpus yet; treat findings there as preview.

 ## What it catches

@ -19,18 +19,44 @@ Handlers registered through attribute macros (`#[get("/path")]`, `#[routes::path

 ## Caller-scope-entity exemption

-`<entity>.id` / `<entity>.pk` is not flagged when `<entity>` is a unit parameter named after a multi-tenant scope primitive: `organization` / `org`, `project`, `team`, `workspace`, `tenant`, `account`, `community`, `group`, `repository` / `repo`, `company`. The argument represents the caller's scope, not a user-controlled target, so internal helpers like `def get_environments(request, organization): Environment.objects.filter(organization_id=organization.id, …)` inherit the caller's authorization. Other field names (`.name`, `.slug`) still flag, and `user` / `member` / `actor` are deliberately excluded — those are handled by the actor-context recogniser.
+`<entity>.id` / `<entity>.pk` is not flagged when `<entity>` is a unit parameter named after a multi-tenant scope primitive: `organization` / `org`, `project`, `team`, `workspace`, `tenant`, `account`, `community`, `group`, `repository` / `repo`, `company`. The argument represents the caller's scope, not a user-controlled target, so internal helpers like `def get_environments(request, organization): Environment.objects.filter(organization_id=organization.id, …)` inherit the caller's authorization. Other field names (`.name`, `.slug`) still flag, and `user` / `member` / `actor` are deliberately excluded; those are handled by the actor-context recogniser.

 ## Project-level web-framework gate (Rust)

 In Rust, the `context_inputs` and param-name arms of the user-input heuristic are gated by a project-level web-framework signal. The signal is three-valued:

- `Some(true)` — the project's `Cargo.toml` names `axum`, `actix-web`, or `rocket`, OR the file directly imports one (`axum::`, `actix_web::`, `rocket::`, `axum_extra::`). Heuristics stay on.
- `Some(false)` — `Cargo.toml` was inspected and named no web framework, AND the file does not directly import one. Heuristics off; only `RouteHandler` classification (concrete route-registration evidence) survives.
- `None` — no detection ran (single-file scan with no project root). Heuristics on; behavior unchanged.
+- `Some(true)`: the project's `Cargo.toml` names `axum`, `actix-web`, or `rocket`, OR the file directly imports one (`axum::`, `actix_web::`, `rocket::`, `axum_extra::`). Heuristics stay on.
+- `Some(false)`: `Cargo.toml` was inspected and named no web framework, AND the file does not directly import one. Heuristics off; only `RouteHandler` classification (concrete route-registration evidence) survives.
+- `None`: no detection ran (single-file scan with no project root). Heuristics on; behavior unchanged.

 This avoids a class of FPs in non-web Rust crates where a debug-session handle named `session` would trip on `session.update(cx, …)`-style desktop-app code. Other languages keep prior behavior; the gate is currently Rust-only.

+## Python: FastAPI cross-file dependencies
+
+FastAPI's `include_router` chain is resolved across files. A child router declared in `routes/task_instances.py` and attached on a parent in `routes/__init__.py` inherits the parent's `dependencies=[...]`.
+
+- Module-level `router = APIRouter(dependencies=[Security(...)])` is pre-walked once per file and merged onto every `@<router>.<verb>(...)` route attached in the same file.
+- `<parent>.include_router(<child_module>.<child_var>)` edges are captured per file in pass 1, persisted into `GlobalSummaries::router_facts_by_module`, and lifted onto the active file's `AuthorizationModel::cross_file_router_deps` at pass 2 entry. Transitive lifts (grandparent to parent to child) iterate to fixpoint.
+- `Security(callable, scopes=[...])` is recognised distinctly from `Depends(callable)` and promotes the synthetic `AuthCheck` to `AuthCheckKind::Other` (route-level scope-checked authorization). Bare `Depends(callable)` is still a Login-only check.
+
+Module identity is the file basename without `.py`. This is sufficient for airflow-style `task_instances.router` naming; a project with two files of the same name in different subtrees will currently collide.
+
+## Go: DAO-helper id-scalar precision pass
+
+For non-route Go units, a parameter whose declared type is a bounded primitive scalar (`int64`, `uint32`, `string`, `bool`, `byte`, `rune`, `float64`, etc.) and whose name is id-shaped (`id`, `*Id`, `*_id`, `*ids`) is dropped from `unit.params` before ownership-check evaluation.
+
+Real Go HTTP handlers always carry a framework-request-typed param (`*http.Request`, `*gin.Context`, `echo.Context`, `*fiber.Ctx`); per-framework route extractors set `include_id_like_typed=true` so id-shaped path params survive on real routes. The filter only fires when the unit was not classified as a route handler, so helpers like `func GetRunByRepoAndID(ctx, repoID, runID int64)` are recognised as DAO callees and the ownership check is expected at the calling route handler, not inside the helper.
+
+## Same-file caller-scope IPA
+
+When a private helper is called only from authorized route handlers in the same file, the caller's auth checks lift onto the helper as synthetic `is_route_level=true` `AuthCheck` entries.
+
+- Iterated to a small fixpoint so transitive chains (route to mid_helper to leaf_helper) are covered.
+- Refuses to authorize helpers with no in-file caller, helpers called from a mix of authorized and unauthorized callers, and helpers called only from un-lifted helpers.
+- Cross-file equivalent is deferred.
+
+This closes the FastAPI / Django / Flask shape where a route authenticates via decorator or dependency, then delegates to a private helper that performs the sink.
+
 ## Sink classification

 The same call name can be safe on a local collection and dangerous on a database. The detector categorises each candidate sink before deciding whether to flag:
--- a/docs/detectors/taint.md
+++ b/docs/detectors/taint.md
@ -160,7 +160,7 @@ Some detector classes need to know not just *that* a value is attacker-influence
 | `Sensitive` | `Cookie`, `Header`, `EnvironmentConfig`, `FileSystem`, `Database`, `CaughtException`, `Unknown` | Operator-bound state that should not leak across boundaries. |
 | `Secret` | (reserved for explicit credential sources) | Highest tier; treated identically to `Sensitive` today. |

-`Cap::DATA_EXFIL` only fires when the contributing source is at least `Sensitive`. Plain user input flowing into an outbound `fetch` body is suppressed at finding-emission time — the canonical false-positive class for API gateways and telemetry forwarders that proxy `req.body`. SSRF and other classes are unaffected; the gate is scoped to `DATA_EXFIL`.
+`Cap::DATA_EXFIL` only fires when the contributing source is at least `Sensitive`. Plain user input flowing into an outbound `fetch` body is suppressed at finding-emission time. That is the canonical false-positive class for API gateways and telemetry forwarders that proxy `req.body`. SSRF and other classes are unaffected; the gate is scoped to `DATA_EXFIL`.

 If a project legitimately classifies a request body as sensitive (e.g. an internal forwarder where `req.body` carries a pre-authenticated user token), override via custom rules in `nyx.conf`:

@ -177,7 +177,7 @@ Or re-classify the source itself with a custom Source rule whose name matches on

 ## DATA_EXFIL suppression layers

-Three knobs ship out of the box so projects can match the cap to their architecture without per-call suppressions.
+Three suppression knobs ship by default so projects can match the cap to their architecture without per-call suppressions.

 ### 1. Forwarding-wrapper sanitizer convention

@ -215,7 +215,7 @@ trusted_destinations = [
 ]
 ```

-Use full origins or origin-pinned paths so a partial-host match across unrelated origins cannot occur. `https://api.` would also match `https://api.evil.example.com/` — the entry must include the path separator (`/`) at the end of the host.
+Use full origins or origin-pinned paths so a partial-host match across unrelated origins cannot occur. `https://api.` would also match `https://api.evil.example.com/`, so the entry must include the path separator (`/`) at the end of the host.

 The match consults the abstract string domain: a literal URL is a static prefix; a template literal `\`https://api.internal/${id}\`` exposes the prefix `https://api.internal/`; a fully dynamic URL has no prefix and the cap fires as usual.

@ -228,7 +228,7 @@ Some projects forward user-bound payloads as a matter of architecture. Turn the
 enabled = false
 ```

-`enabled = false` strips `Cap::DATA_EXFIL` from sink caps before event emission, so no `taint-data-exfiltration` finding reaches the report. The decision is per-project — other projects loaded by the same `nyx serve` instance keep their own settings.
+`enabled = false` strips `Cap::DATA_EXFIL` from sink caps before event emission, so no `taint-data-exfiltration` finding reaches the report. The decision is per-project; other projects loaded by the same `nyx serve` instance keep their own settings.

 ## DATA_EXFIL sinks per language

--- a/docs/recall-validation.md
+++ b/docs/recall-validation.md
@ -0,0 +1,237 @@
+# Recall validation runbook
+
+The recall-validation harness freezes a finding-shape baseline against
+real-world OSS targets so future engine work can prove "actually lifts
+recall on real code", not just "tests pass". This runbook covers
+re-running the validation against a fresh OSS release.
+
+## Targets
+
+| Target            | Clone URL                                  | Recall items exercised |
+|-------------------|--------------------------------------------|------------------------|
+| `cal_com`         | https://github.com/calcom/cal.com          | 1, 5, 6, 7             |
+| `vercel_commerce` | https://github.com/vercel/commerce         | 1, 4, 7                |
+| `shadcn_examples` | https://github.com/shadcn-ui/ui            | 4, 7                   |
+| `blitz_apps`      | https://github.com/blitz-js/blitz          | 1, 3, 6                |
+
+Item numbering is from `.pitboss/RECALL_GAPS.md`.
+
+## Files
+
+| File                                          | Role                                    |
+|-----------------------------------------------|-----------------------------------------|
+| `scripts/validate_recall.sh`                  | runner (capture + diff modes)           |
+| `tests/recall_targets/<target>.json`          | per-target baseline                     |
+| `tests/recall_gaps.rs::validate_real_world_targets` | schema-validity test (`#[ignore]`)|
+| `tests/recall_gaps_baseline.json`             | corpus regression baseline              |
+
+Baselines live next to the harness rather than under `.pitboss/`:
+pitboss implementer agents are forbidden to write under `.pitboss/`,
+so the baseline files were placed beside the test that consumes them.
+
+## Baseline schema
+
+```json
+{
+  "_doc": "...",
+  "target": "cal_com",
+  "clone_url": "https://github.com/calcom/cal.com",
+  "exercises_recall_items": [1, 5, 6, 7],
+  "captured_against": "real-scan @ <sha>",
+  "captured_on": "YYYY-MM-DD",
+  "pinned_commit": "<sha>",
+  "findings": [
+    {
+      "rule_id": "taint-unsanitised-flow",
+      "path_suffix": "packages/...",
+      "line": 130,
+      "severity": "High",
+      "verdict": "TP" | "FP" | "needs_review",
+      "note": "..."
+    }
+  ]
+}
+```
+
+The diff key is `(rule_id, path_suffix, line)`. The `verdict` field
+must be one of `TP`, `FP`, or `needs_review`; unknown verdicts are
+rejected by the schema test.
+
+## Usage
+
+### Diff a fresh scan against the frozen baseline
+
+```bash
+scripts/validate_recall.sh cal_com /path/to/cal.com
+```
+
+Output is a JSON object `{ added, removed, unchanged, *_total }`
+keyed by `rule_id`. Use this to spot intentional recall lift
+(`added`) and regressions (`removed`).
+
+### Refresh the baseline after an intentional recall lift
+
+```bash
+scripts/validate_recall.sh cal_com /path/to/cal.com --capture
+```
+
+This overwrites `tests/recall_targets/cal_com.json` with the current
+scan output. Every finding is re-marked `verdict: "needs_review"`;
+hand-label `TP`/`FP` afterwards as you triage.
+
+### Schema-validity check
+
+```bash
+cargo test --release --test recall_gaps -- --ignored validate_real_world_targets
+```
+
+Loads each per-target JSON, asserts the required keys exist, and
+asserts every finding carries a valid verdict label.
+
+## Refresh procedure
+
+1. Clone or pull the target repo into `~/oss/<target>` (or wherever).
+2. Build nyx: `cargo build --release`.
+3. Run the diff in plain mode to see what changed:
+   `scripts/validate_recall.sh <target> ~/oss/<target>`.
+4. If the lift is intentional, recapture:
+   `scripts/validate_recall.sh <target> ~/oss/<target> --capture`.
+5. Spot-check a handful of new findings. Open the file at
+   `path_suffix:line` and confirm the source-to-sink flow is real.
+   Hand-label them `TP`/`FP`.
+6. Commit the updated `tests/recall_targets/<target>.json`.
+
+## Known captured baselines (2026-05-08)
+
+| Target            | Pinned commit | Findings | TP | FP | needs_review |
+|-------------------|---------------|----------|----|----|--------------|
+| `cal_com`         | `d278d6c9`    | 662      | 0  | 4  | 658          |
+| `vercel_commerce` | unknown       | 0 (placeholder) |    |    |              |
+| `shadcn_examples` | unknown       | 0 (placeholder) |    |    |              |
+| `blitz_apps`      | unknown       | 0 (placeholder) |    |    |              |
+
+The `cal_com` capture used commit `d278d6c9bc535bf3f2c6ba0607654f78dd74d6ee`
+(`refactor: remove dead insights references (#29029)`). The 4 `FP`
+labels are `ts.crypto.math_random` hits inside `apps/web/playwright/`
+test fixtures, which are not a security context.
+
+The other three targets ship as placeholders (empty `findings`).
+Nobody has cloned them locally yet. Run `validate_recall.sh
+<target> <clone> --capture` to populate. The schema test still passes
+because `[]` is a valid `findings` array with zero entries to check.
+
+## Perf baseline
+
+The frozen JS-target perf snapshot lives in
+`tests/recall_targets/perf_after.txt`. Compare against the
+`captured_against` snapshot in `tests/recall_gaps_baseline.json`
+(`corpus_finding_lines.findings_total` = 1121, captured at master
+`ea82ea98`). The acceptance bar: scanner throughput on the existing
+`tests/fixtures/` corpus must regress by no more than 15%. Future
+recall work uses the same corpus and the same record file to measure
+its own perf delta.
+
+## Cross-language runbook
+
+The JS-target baselines above only cover JS/TS. Cross-language
+baselines mirror that work against real-world non-JS targets so
+multi-language engine changes can be measured against actual code,
+not just synthetic fixtures. Per-lang baselines live under
+`tests/recall_targets/xlang/<lang>/<target>.json` and the runner
+accepts a `--lang` flag to select the target set.
+
+### Cross-language targets
+
+| Lang   | Target       | Clone URL                                    | Pinned commit (capture) | Findings | Notes |
+|--------|--------------|----------------------------------------------|-------------------------|----------|-------|
+| php    | phpmyadmin   | https://github.com/phpmyadmin/phpmyadmin     | `ddf4e993`              | 119      | DBA UI; XSS / `php.deser` / `cfg-unguarded-sink` heavy. |
+| php    | joomla       | https://github.com/joomla/joomla-cms         | `7e8527d0`              | 83       | CMS; `php.deser.unserialize` and `php.path.include_variable` clusters. |
+| php    | drupal       | https://github.com/drupal/drupal             | `92aa759e`              | 635      | CMS / DI container; `cfg-unguarded-sink` (198) and `taint-prototype-pollution` (121) dominant. |
+| php    | nextcloud    | https://github.com/nextcloud/server          | `5c0fe4c3`              | 262      | File-sync platform; `cfg-resource-leak` / `state-resource-leak` heavy. |
+| java   | openmrs      | https://github.com/openmrs/openmrs-core      | `f9c76db2`              | 273      | Hibernate-heavy; JPA Criteria fix from `project_realrepo_openmrs.md` already applied. |
+| python | airflow      | https://github.com/apache/airflow            | `3d42610a`              | 892      | Scheduler / DAG runner; `cfg-unguarded-sink` (252) and `taint-unsanitised-flow` (179) lead. |
+| python | flask        | https://github.com/pallets/flask             | placeholder             | 0        | Smaller-surface Python framework; capture deferred. |
+| go     | gin          | https://github.com/gin-gonic/gin             | `d3ffc998`              | 20       | HTTP framework test corpus; `taint-header-injection` and TLS skip-verify in tests. |
+| rust   | axum         | https://github.com/tokio-rs/axum             | placeholder             | 0        | Not cloned in pitboss sandbox at capture time; populate locally. |
+| ruby   | rails        | https://github.com/rails/rails               | placeholder             | 0        | Capture against the `actionpack/` subtree once cloned. |
+
+Captures dated `2026-05-09` (UTC). Counts are deduplicated tuples
+`(rule_id, path_suffix, line)`. Duplicate raw findings collapse on
+the diff key, so the schema-test count and diff-mode `unchanged_total`
+may differ from the `findings | length` total by a handful of
+duplicate sites. The diff key is what matters for regression
+detection.
+
+### Per-lang TP/FP splits
+
+Every captured finding ships with `verdict: "needs_review"` from
+`--capture`. Hand-triage is bounded but pending; none of the cross-
+language captures are sweep-labelled yet. Use the per-lang dominant
+rule_id clusters above as the priority queue:
+
+- **PHP**: `cfg-unguarded-sink` and `taint-prototype-pollution` are
+  the FP-dominant clusters across drupal / nextcloud / phpmyadmin
+  (CMS routing + JS object construction). `php.deser.unserialize` is
+  the highest-value TP cluster on joomla (17) and drupal (83). See
+  `project_realrepo_joomla.md` 2026-05-03 for the magic-method
+  passthrough fix that already filters one shape.
+- **Java**: `taint-unsanitised-flow` (61) and `state-resource-leak`
+  (60) are openmrs's leading clusters. The JPA Criteria-API fix
+  already absorbed the `cfg-unguarded-sink` cluster (216 to 24);
+  remaining Hibernate / Spring resource-management FPs are the next
+  triage target.
+- **Python**: `cfg-unguarded-sink` (252) on airflow is dominated by
+  Airflow's scheduler / DB plumbing; `py.auth.token_override_*`
+  (83) and `py.auth.missing_ownership_check` (61) are the auth-rule
+  noise typical of an admin/operator codebase.
+- **Go**: gin's 20 findings are mostly test-corpus artifacts
+  (`gin_test.go`, `routes_test.go`); 4 of 4 `go.transport.insecure_skip_verify`
+  hits are inside `gin*_test.go` and are legitimate test setup.
+- **Rust / Ruby**: placeholder. Capture once a local clone exists.
+
+### `--lang` runner usage
+
+```bash
+# diff mode (default)
+scripts/validate_recall.sh --lang php drupal /Users/me/oss/drupal
+scripts/validate_recall.sh --lang java openmrs /Users/me/oss/openmrs
+
+# capture / refresh
+scripts/validate_recall.sh --lang go gin /Users/me/oss/gin --capture
+```
+
+Output is the same `{ added, removed, unchanged, *_total }` JSON shape
+as the JS-target diff. The diff key is `(rule_id, path_suffix, line)`.
+
+### Cross-language refresh procedure
+
+1. Clone or update the target into `~/oss/<target>` (or wherever).
+2. Build nyx: `cargo build --release`.
+3. Diff vs the frozen baseline:
+   `scripts/validate_recall.sh --lang <lang> <target> ~/oss/<target>`.
+4. If the lift is intentional, recapture with `--capture`.
+5. Spot-check new findings; hand-label `TP`/`FP`.
+6. Commit the updated `tests/recall_targets/xlang/<lang>/<target>.json`.
+
+### Sandbox-capture caveat
+
+Pitboss implementer agents run sandboxed without network egress, so
+target repos that are not already present under `~/oss/` ship as
+placeholders (`pinned_commit: "unknown"`, `findings: []`). The
+current cross-language baselines cover php / java / python / go
+(every target whose repo was already cloned locally) and ship
+placeholders for `rust/axum`, `ruby/rails`, and `python/flask`. The
+schema test in `validate_real_world_targets` passes against
+placeholders because `[]` is a valid `findings` array.
+
+## What lives where (quick reference)
+
+- Targets list and recall-item mapping in this file.
+- Per-target JS findings under `tests/recall_targets/<target>.json`.
+- Per-target cross-lang findings under `tests/recall_targets/xlang/<lang>/<target>.json`.
+- Diff/capture runner at `scripts/validate_recall.sh` (accepts `--lang`).
+- Schema-validity test at `tests/recall_gaps.rs::validate_real_world_targets`.
+- Corpus regression baseline at `tests/recall_gaps_baseline.json`.
+- Perf records at `tests/recall_targets/perf_after.txt` (JS-target
+  snapshot) and `tests/recall_targets/perf_after_xlang.txt`
+  (cross-language delta).