Performance and precision pass (#64)

This commit is contained in:
Eli Peter 2026-05-04 19:58:04 -04:00 committed by GitHub
parent c7c5e0f3a1
commit fb698d2c27
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
97 changed files with 9932 additions and 517 deletions

View file

@ -22,6 +22,9 @@ Real disclosed CVEs reduced to minimal reproducers, vulnerable + patched pair pe
| CVE-2017-18342 | Python | PyYAML | MIT | Deserialization | detected |
| CVE-2025-69662 | Python | geopandas | BSD-3-Clause | SQL Injection | detected |
| CVE-2026-33626 | Python | LMDeploy | Apache-2.0 | SSRF | detected |
| CVE-2024-23334 | Python | aiohttp | Apache-2.0 | path_traversal | detected |
| CVE-2023-6568 | Python | MLflow | Apache-2.0 | XSS | detected |
| CVE-2024-21513 | Python | LangChain Experimental | MIT | code_exec | detected |
| CVE-2019-14939 | JavaScript | mongo-express | MIT | code_exec | detected |
| CVE-2025-64430 | JavaScript | Parse Server | Apache-2.0 | SSRF | detected |
| CVE-2023-22621 | JavaScript | Strapi | MIT | code_exec (SSTI)| detected |
@ -42,6 +45,7 @@ Real disclosed CVEs reduced to minimal reproducers, vulnerable + patched pair pe
| CVE-2023-38337 | Ruby | rswag | MIT | path_traversal | detected |
| CVE-2017-9841 | PHP | PHPUnit | BSD-3-Clause | code_exec | detected |
| CVE-2018-15133 | PHP | Laravel | MIT | Deserialization | detected |
| CVE-2026-33486 | PHP | Roadiz CMS | MIT | SSRF | detected |
| CVE-2018-20997 | Rust | tar-rs | MIT OR Apache-2.0 | path_traversal | detected |
| CVE-2022-36113 | Rust | cargo | MIT OR Apache-2.0 | path_traversal | detected |
| CVE-2023-42456 | Rust | sudo-rs | Apache-2.0 | path_traversal | detected |
@ -49,10 +53,12 @@ Real disclosed CVEs reduced to minimal reproducers, vulnerable + patched pair pe
| CVE-2024-32884 | Rust | gitoxide | Apache-2.0 OR MIT | CMDI | detected |
| CVE-2025-53549 | Rust | matrix-rust-sdk | Apache-2.0 | SQL Injection | detected |
| CVE-2016-3714 | C | ImageMagick (ImageTragick) | ImageMagick License | CMDI | detected |
| CVE-2017-1000117 | C | git (ssh:// argv injection)| GPL-2.0 | cmdi (argv-inj) | deferred |
| CVE-2019-18634 | C | sudo (pwfeedback) | ISC | memory_safety | detected |
| CVE-2019-13132 | C++ | ZeroMQ libzmq | MPL-2.0 | memory_safety | detected |
| CVE-2022-1941 | C++ | Protocol Buffers | BSD-3-Clause | memory_safety | detected |
| CVE-2026-25544 | TypeScript | Payload (Drizzle adapter) | MIT | sql_injection | deferred |
| CVE-2026-42353 | JavaScript | i18next-http-middleware | MIT | path_traversal | detected |
Deferred entries are real bugs Nyx can't yet detect. The fixture stays committed with `disabled: true` in ground truth so the gap remains visible.
@ -77,6 +83,11 @@ Most recent first. Metrics are rule-level on the corpus size at that point.
| Date | Change | Corpus | P | R | F1 |
|------------|------------------------------------------------------------------------------|--------|-------|-------|-------|
| 2026-05-04 | C cvehunt session-0014: CVE-2017-1000117 (git ssh:// hostname-as-argv injection) added in corpus disabled — three-layer C engine gap: (a) array-element taint propagation through `args[i] = ssh_host;` writes, (b) missing `c.cmdi.exec*` AST patterns in `src/patterns/c.rs`, (c) sanitizer recognition of the upstream `if (ssh_host[0] == '-') die(...)` dash-prefix guard | 565 | 1.000 | 1.000 | 1.000 |
| 2026-05-04 | JS/TS array-method validator-callback narrowing (`try_array_method_validator_callback_narrowing` in `src/taint/ssa_transfer/mod.rs`) — `<arr>.filter(<isSafeXxx>)` / `.find` / `.findLast` strips `Cap::all()` from the call result when the callback resolves to a `BooleanTrueIsValid` validator; CVE-2026-42353 (i18next-http-middleware path traversal) re-enabled in ground truth, deferred queue cleared | 563 | 1.000 | 1.000 | 1.000 |
| 2026-05-04 | JS/TS ternary-RHS source-classification fix in `src/cfg/conditions.rs::lower_ternary_branch` (segment-strip first_member_label on the branch AST) — `let arr = cond ? req.query.lng : "";` now propagates taint through the diamond's join phi instead of lowering both branches to labelless Assign-with-empty-uses; CVE-2026-42353 (i18next-http-middleware path traversal / SSRF) added in corpus disabled — needs Array.prototype.filter(known_validator_callback) precision bridge | 561 | 1.000 | 1.000 | 1.000 |
| 2026-05-04 | PHP class-method body taint analysis (`declaration_list` / `interface_declaration` / `trait_declaration` / `enum_declaration` mapped to `Kind::Block` in `src/labels/php.rs`); PHP `unary_op_expression` recognised as negation in `detect_negation`; camelCase normalisation in `classify_condition` so `isSafeRemoteUrl(x)` classifies as ValidationCall the same as `is_safe_remote_url(x)`; PHP `$`-sigil stripping in `extract_validation_target`; `fopen` added as PHP SSRF sink; CVE-2026-33486 (roadiz/documents `DownloadedFile::fromUrl(file://)` SSRF/LFI) added | 555 | 1.000 | 1.000 | 1.000 |
| 2026-05-04 | Python Tier B `py.xss.make_response_format` AST pattern (Flask `make_response(<f-string>)` / `make_response(<concat>)`); CVE-2023-6568 (mlflow reflected XSS) and CVE-2024-21513 (langchain VectorSQLDatabaseChain `_try_eval` over DB rows) added | 550 | 1.000 | 1.000 | 1.000 |
| 2026-05-03 | Go for-range loop binding now defined from `range_clause` child of `for_statement` (was: tree-sitter wraps the binding/iterable on a child node; only direct `left`/`right` fields were consulted, so taint never reached the loop binding). gin sources extended to `c.QueryArray` / `c.GetQueryArray` / `c.PostFormArray` / `c.GetPostFormArray`. goqu raw SQL literal builders `goqu.L` / `goqu.Lit` recognised as SQL_QUERY sinks. CVE-2026-41422 (daptin aggregate API) detected | 521 | 1.000 | 1.000 | 1.000 |
| 2026-05-02 | TS regex-allowlist `<*regex*>.test(value)` / `<*pattern*>.test(value)` recognised as ValidationCall whose target is the first arg (overrides default receiver-as-target); conservative on receiver names so non-regex `*.test()` callees stay Unknown. CVE-2026-25544 (Payload drizzle SQL injection) lands in corpus disabled — needs validated-flow propagation through SSA derivation / helper-summary returns | 499 | 1.000 | 1.000 | 1.000 |
| 2026-05-02 | JS arrow `assignment_pattern` default-param extraction + JS object-literal kwarg fallback for gated sinks + double-call (`f()(x)`) chained-inner rebinding; lodash `_.template` modeled as gated CODE_EXEC sink suppressed by `{ evaluate: false }`; CVE-2023-22621 (Strapi SSTI) detected | 494 | — | — | — |

View file

@ -1,15 +1,18 @@
package main
// Real-repo precision (2026-05-03): recall guard for the 2026-05-03
// type-aware Go param filter.
// Real-repo precision (2026-05-03): recall guard for the type-aware Go
// param filter (2026-05-03 + 2026-05-03 expansion).
//
// Even after `ctx context.Context` is dropped from `unit.params`, an
// id-shaped param (`id string`) keeps the unit on the hook ─
// `is_external_input_param_name` recognises id-shapes ahead of the
// framework-name allow-list. This fixture asserts that the type-aware
// filter doesn't over-suppress: a helper that takes the canonical
// `(ctx, id)` shape and consumes `id` at a bare-receiver data-layer
// sink must still fire `go.auth.missing_ownership_check`.
// 2026-05-03 update: the engine now drops id-like scalar params from
// `unit.params` for non-route units (gitea `models/...` DAO cluster,
// ~957 FPs). This fixture asserts that the route-aware path keeps
// firing on the real vulnerable shape: a gin route handler whose body
// passes an id-shaped path param straight into a bare-receiver
// data-layer call with no preceding ownership check.
//
// `function_params_route_handler` runs with `include_id_like_typed =
// true`, so even after the DAO-shape filter the id-like scalar param
// survives in `unit.params` for `RouteHandler` units, the rule fires.
import "context"
@ -18,10 +21,16 @@ type Repo struct{}
func (r *Repo) Find(id string) interface{} { return nil }
func (r *Repo) Save(id string, val string) {}
type ginEngine struct{}
func (g *ginEngine) GET(path string, handler interface{}) {}
func (g *ginEngine) POST(path string, handler interface{}) {}
// `ctx context.Context` is dropped by the type-aware Go param filter
// (stdlib non-user-input). `id string` survives ─ id-shape opens the
// gate. `repo.Find(id)` is a bare-identifier read indicator with no
// preceding ownership check. Rule must fire.
// (stdlib non-user-input). `id string` survives because the gin
// extractor promotes this unit to `RouteHandler` and route-aware param
// extraction keeps id-like names. `repo.Find(id)` is a bare-identifier
// read indicator with no preceding ownership check — rule fires.
func GetByID(ctx context.Context, repo *Repo, id string) interface{} {
_ = ctx
return repo.Find(id)
@ -32,3 +41,9 @@ func UpdateByID(ctx context.Context, repo *Repo, id string, val string) {
_ = ctx
repo.Save(id, val)
}
// Gin route binding promotes both handlers to `RouteHandler` kind.
func registerRoutes(r *ginEngine) {
r.GET("/items/:id", GetByID)
r.POST("/items/:id", UpdateByID)
}

View file

@ -10,12 +10,30 @@ package main
// remain canonical data-layer sinks and must continue to fire
// `go.auth.missing_ownership_check` when invoked with a scoped
// identifier (`id` parameter) without a preceding ownership check.
//
// 2026-05-03 update: previously the helper signature alone
// (`func GetByID(ctx, repo, id string)`) was the recall guard. After
// the Go DAO-helper precision pass (id-like scalar params dropped from
// `unit.params` for non-route units) the helper-only shape no longer
// passes `unit_has_user_input_evidence` — which is correct, the gitea
// `models/...` cluster proved that internal DAO helpers should not
// flag. This fixture is now a real route-handler shape: the gin
// extractor recognises `r.GET(..., GetByID)` as a route registration,
// promotes the unit to `RouteHandler`, and `function_params_route_handler`
// keeps the id-like scalar param so the rule still fires on the actual
// vulnerable form (HTTP route binding directly to a DAO call with no
// preceding auth check).
type Repo struct{}
func (r *Repo) Find(id string) interface{} { return nil }
func (r *Repo) Save(id string, val string) {}
type ginEngine struct{}
func (g *ginEngine) GET(path string, handler interface{}) {}
func (g *ginEngine) POST(path string, handler interface{}) {}
// `repo.Find(id)` — bare-identifier receiver, name matches the `Find`
// read indicator. Still classifies as `DbCrossTenantRead` and still
// fires the ownership check because no auth check precedes it.
@ -27,3 +45,13 @@ func GetByID(ctx interface{}, repo *Repo, id string) interface{} {
func UpdateByID(ctx interface{}, repo *Repo, id string, val string) {
repo.Save(id, val)
}
// Route registration: gin extractor recognises `r.GET(...)` /
// `r.POST(...)`, attaches `GetByID` / `UpdateByID` as the route
// handlers, and promotes their units to `AnalysisUnitKind::RouteHandler`.
// The id-like scalar param `id string` survives into `unit.params` via
// `function_params_route_handler` (route-aware, `include_id_like_typed = true`).
func registerRoutes(r *ginEngine) {
r.GET("/items/:id", GetByID)
r.POST("/items/:id", UpdateByID)
}

View file

@ -0,0 +1,87 @@
package main
// Real-repo precision (2026-05-03): distilled from
// /Users/elipeter/oss/gitea/models/actions/{run,run_job,runner,artifact,
// run_attempt,task,variable}.go and ~957 sibling helpers across gitea's
// `models/...` data-access layer. Same shape over-fires on minio's
// `cmd/iam-*-store` and is the canonical Go ORM/DAO helper signature.
//
// Pattern: a model-layer helper takes the canonical Go first-param
// `ctx context.Context` (stdlib cancellation / deadline / value-bag,
// NOT an HTTP request) plus one or more id-like scalar parameters
// (`repoID, runID int64`, `id int64`, …). The helper itself is
// **never** registered as a route handler — gitea's HTTP routes live
// in `routers/`, and the bound route handler runs the auth check
// before calling into `models/`. The DAO helper inherits trust from
// its single caller surface and must not flag
// `go.auth.missing_ownership_check`.
//
// Engine fix (2026-05-03, src/auth_analysis/extract/common.rs::
// collect_param_names Go arm): for non-route units (default
// `include_id_like_typed = false`), drop id-like param names whose
// declared type is a bounded primitive scalar (`int*` / `uint*` /
// `string` / `bool` / `byte` / `rune` / `float*`). Real Go HTTP
// handlers always carry a framework-request-typed param
// (`*http.Request`, `*gin.Context`, `echo.Context`, `*fiber.Ctx`,
// `*context.APIContext`, …) and are recognised by the per-framework
// route extractors which call `function_params_route_handler`
// (`include_id_like_typed = true`) — those bypass the filter so id-shaped
// path params survive on real routes (see
// `auth/vuln_apicontext_findbyid.go` and
// `auth/vuln_repo_findbyid_no_auth.go` for the recall guards).
//
// Conservative scope: only **bounded primitive scalar** types trigger
// the drop. Pointer types (`*Runner`), struct-by-value, slice (`[]T`),
// generic and qualified types are payload shapes whose injection
// surface is unknown — id-like names on those keep their place in
// `unit.params`.
import "context"
type ActionRun struct{ ID int64 }
type ActionRunJob struct{ ID int64 }
type ActionRunner struct{ ID int64 }
type modelDB struct{}
func (m *modelDB) Find(ctx context.Context, id int64) interface{} { return nil }
func (m *modelDB) DeleteByID(ctx context.Context, id int64) error { return nil }
func (m *modelDB) UpdateRunJob(ctx context.Context, j *ActionRunJob) {}
var db = &modelDB{}
// `(ctx context.Context, repoID, runID int64)` — multi-name single-type
// declaration with all bounded scalar params. After the fix:
// `unit.params` is empty; `unit_has_user_input_evidence` returns false;
// `check_ownership_gaps` skips the unit entirely.
func GetRunByRepoAndID(ctx context.Context, repoID, runID int64) (*ActionRun, error) {
_ = db.Find(ctx, runID)
_ = repoID
return &ActionRun{ID: runID}, nil
}
// Single id-like scalar param. Same DAO-helper shape, must not flag
// even though `db.DeleteByID` and `GetRunnerByID` both look like
// canonical mutation/read indicators.
func DeleteRunner(ctx context.Context, id int64) error {
if _, err := GetRunnerByID(ctx, id); err != nil {
return err
}
return db.DeleteByID(ctx, id)
}
func GetRunnerByID(ctx context.Context, id int64) (*ActionRunner, error) {
_ = db.Find(ctx, id)
return &ActionRunner{ID: id}, nil
}
// Mixed-arity helper: `userID int64` (id-like + scalar, dropped) plus
// `cfg *ActionRun` (non-scalar payload, kept). `cfg` is not id-like
// and doesn't match the Go-narrowed framework-name allow-list, so the
// unit still has no evidence and the rule does not flag.
func SetOwnerActionsConfig(ctx context.Context, userID int64, cfg *ActionRun) error {
_ = userID
_ = cfg
_ = ctx
return nil
}

View file

@ -0,0 +1,47 @@
import jakarta.persistence.EntityManager;
import jakarta.persistence.criteria.CriteriaBuilder;
import jakarta.persistence.criteria.CriteriaQuery;
import jakarta.persistence.criteria.Root;
import org.hibernate.Session;
import java.util.List;
// Distilled from openmrs's
// api/src/main/java/org/openmrs/api/db/hibernate/HibernateCohortDAO.java
// (`getCohorts` / `getCohort`) and HibernateAdministrationDAO. The JPA
// CriteriaBuilder pattern builds a structural `CriteriaQuery<Foo>` via
// `cb.createQuery(Foo.class)` plus `Root` / `Predicate` / `cb.equal` /
// `cb.like` etc., then hands the structural query object to
// `session.createQuery(cq)` / `em.createQuery(cq)` for execution. No
// string concatenation occurs JPA emits parameterized SQL by
// construction. Engine must propagate the
// `TypeKind::JpaCriteriaQuery` fact through the `cb.createQuery`
// receiver-text recogniser, then suppress the structural
// `cfg-unguarded-sink` finding at the `session.createQuery(cq)` /
// `em.createQuery(cq)` site via `sink_args_jpa_criteria_query_safe`.
public class SafeJpaCriteriaQuery {
private final Session session;
private final EntityManager em;
public SafeJpaCriteriaQuery(Session session, EntityManager em) {
this.session = session;
this.em = em;
}
public List<Cohort> getCohorts(String nameFragment) {
CriteriaBuilder cb = session.getCriteriaBuilder();
CriteriaQuery<Cohort> cq = cb.createQuery(Cohort.class);
Root<Cohort> root = cq.from(Cohort.class);
cq.where(cb.like(cb.lower(root.get("name")), nameFragment));
return session.createQuery(cq).getResultList();
}
public Cohort getCohortByName(String name) {
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Cohort> cq = cb.createQuery(Cohort.class);
Root<Cohort> root = cq.from(Cohort.class);
cq.where(cb.equal(root.get("name"), name));
return em.createQuery(cq).getSingleResult();
}
public static class Cohort {}
}

View file

@ -0,0 +1,17 @@
// Regression guard for the ternary-RHS source-classification fix in
// `src/cfg/conditions.rs::lower_ternary_branch`. Pre-fix, push_node only
// did suffix/prefix matching on the branch text, so `req.query.lng` did
// not classify as a Source (rule matcher is `req.query`, neither matches
// `req.query.lng`). Both ternary branches lowered to labelless
// Assign-with-empty-uses, the join phi saw no taint, and downstream sinks
// missed the flow. Motivated by GHSA-jfgf-83c5-2c4m / CVE-2026-42353
// (i18next-http-middleware path traversal / SSRF via user-controlled
// language and namespace parameters).
const fs = require('fs');
const express = require('express');
const app = express();
app.get('/locales/resources.json', (req, res) => {
let lng = req.query.lng ? req.query.lng : 'en';
fs.readFileSync(`/locales/${lng}/common.json`);
});

View file

@ -0,0 +1,13 @@
// Companion precision guard to path_traversal_ternary_source.js. When
// both ternary branches are constant strings, the segment-strip
// classifier in `lower_ternary_branch` should not synthesise a Source
// label, so the assigned variable carries no taint and the downstream
// sink does not fire.
const fs = require('fs');
const express = require('express');
const app = express();
app.get('/page', (req, res) => {
const tier = req.query.premium ? 'premium' : 'standard';
fs.readFileSync(`/static/${tier}/index.html`);
});

View file

@ -0,0 +1,25 @@
<?php
// Vulnerable counterpart to safe_serializable_magic_method_unserialize.php.
// The enclosing method is named `unserialize` but the call argument is NOT
// the formal parameter — the developer is passing user input directly to
// PHP's `\unserialize()`. The Serializable magic-method recogniser is
// designed to refuse this shape (the call's argument must be a bare
// reference to the method's single formal parameter). Must still fire
// `php.deser.unserialize`.
class Mishandled {
public function unserialize($input): void {
// BUG: ignores $input, reads from superglobal.
$this->payload = unserialize($_GET['blob']);
}
}
class WrappedThenUnserialize {
// Wrapped argument inside magic method — conservative: still fires.
// Real-world cache / session pass-throughs surface here so the rule
// keeps its signal on `unserialize(trim($input))` /
// `unserialize(base64_decode($input))` shapes.
public function unserialize($input): void {
$this->payload = unserialize(trim($input));
}
}

View file

@ -0,0 +1,30 @@
<?php
// Regression for the PHP `if (!validator($x))` early-return narrowing fix
// (src/cfg/mod.rs detect_negation now recognises tree-sitter-php's
// `unary_op_expression` for `!`) PLUS the camelCase normalisation in
// classify_condition (src/taint/path_state.rs to_snake_lower). Before
// either fix, the camelCase validator name didn't classify as
// ValidationCall, and even if it did, the `!`-prefix wasn't seen as
// negation so the True branch (which is the rejection arm) was treated
// as the validated path, leaving `$url` un-validated past the
// early-return. Pairs with CVE-2026-33486 patched fixture.
class SafeImporter
{
public static function fetchRemote(): void
{
$url = $_REQUEST['url'];
if (!self::isSafeRemoteUrl($url)) {
return;
}
// Use file_get_contents (an SSRF sink that doesn't open a long-lived
// resource) so the regression specifically pins SSRF narrowing
// without conflating with state-resource-leak from fopen.
file_get_contents($url);
}
private static function isSafeRemoteUrl(string $u): bool
{
return true;
}
}

View file

@ -0,0 +1,40 @@
<?php
// `Serializable::unserialize($input)` magic-method body — the legacy
// PHP `Serializable` interface contract (deprecated since PHP 8.1).
// PHP itself invokes `\unserialize($attacker_bytes)` and then dispatches
// to this method during instance restoration; the body's `\unserialize($x)`
// call is part of the deserialization machinery and cannot be removed
// without breaking the interface. The actionable signal lives at the
// class level (the class implements deprecated `Serializable` — fix is
// to migrate to `__serialize` / `__unserialize`), not at this call
// site.
//
// Distilled from
// joomla/administrator/components/com_finder/src/Indexer/Result.php:488
// joomla/libraries/src/Input/Cli.php:112 joomla/libraries/src/Input/Input.php:210.
class IndexerResult implements \Serializable {
private array $data = [];
public function unserialize($serialized): void {
$this->data = unserialize($serialized);
}
}
class CliInput implements \Serializable {
public string $executable = '';
public array $args = [];
public array $options = [];
public function unserialize($input): void {
[$this->executable, $this->args, $this->options] = unserialize($input);
}
}
class CaseFolded implements \Serializable {
private mixed $payload = null;
public function UnSerialize($payload) {
$this->payload = unserialize($payload);
}
}

View file

@ -0,0 +1,17 @@
<?php
// Regression for the PHP class-method body analysis fix
// (declaration_list / interface_declaration / trait_declaration mapped to
// Kind::Block in src/labels/php.rs). Before the fix, taint never crossed
// `class { method { ... } }` because the body of `method` was never
// reached during function extraction, leaving `$_REQUEST → fopen` flows
// inside class methods invisible to taint analysis. Pairs with
// CVE-2026-33486 (roadiz/documents `DownloadedFile::fromUrl`).
class MediaImporter
{
public static function fetchRemote(): void
{
$url = $_REQUEST['url'];
fopen($url, 'r');
}
}

View file

@ -0,0 +1,44 @@
"""Recall guard for the Phase 1 caller-scope IPA fix.
Same shape as `safe_caller_scope_helper_under_authorized_route.py`, but
the router carries no route-level auth dep (`router = APIRouter()`).
The helper's `session.add` is reached from a route handler with no
authorization, so the engine MUST still fire
`missing_ownership_check` (and `token_override_without_validation`)
on the helper's sink.
Triggers `apply_caller_scope_propagation`'s soundness rule: a helper's
caller list must contain at least one caller with route-level non-Login
auth checks. When no caller is authorized, no propagation happens and
the helper's sinks fire as expected.
"""
from typing import Annotated
from uuid import UUID
from fastapi import APIRouter, Body
# Bare router — no Security dep at the boundary.
ti_id_router = APIRouter()
def _create_state_update(
*,
task_instance_id: UUID,
payload: dict,
session,
) -> None:
if payload.get("kind") == "reschedule":
session.add({"id": task_instance_id, "data": payload})
@ti_id_router.patch("/{task_instance_id}/state")
def ti_update_state(
task_instance_id: UUID,
payload: Annotated[dict, Body()],
session,
) -> None:
_create_state_update(
task_instance_id=task_instance_id,
payload=payload,
session=session,
)

View file

@ -1,19 +1,26 @@
"""
Vulnerable counterpart to safe_fastapi_route_dependencies_auth.py: same
shape but with NO `dependencies=[Depends(...)]` keyword arg on the route
decorator. The FastAPI ownership-check rule must still fire the
recognizer must not blanket-suppress every FastAPI route, only those
with an actual dependency-injected auth check.
FastAPI route shape but with NO `dependencies=[Depends(...)]` keyword
arg on the route decorator. The ownership-check rule must still fire
the dependency-injection recogniser must not blanket-suppress every
FastAPI route, only those with an actual dependency-injected auth
check.
Sink uses a qualified Django-style ORM call so the post-fix
classifier still recognises it (`receiver_is_simple_chain` requires a
non-chained receiver dot).
"""
from fastapi import FastAPI
router = FastAPI()
class Connection:
objects = None
@router.delete("/{connection_id}")
def delete_connection(connection_id: str, session):
def delete_connection(connection_id: str):
"""No auth — must still fire missing_ownership_check."""
connection = session.scalar(select(Connection).filter_by(conn_id=connection_id))
if connection is None:
raise HTTPException(404, "not found")
session.delete(connection)
Connection.objects.filter(id=connection_id).delete()
return {"ok": True}

View file

@ -0,0 +1,27 @@
"""SQLAlchemy variant of vuln_fastapi_route_no_dependencies.py: same FastAPI
route shape with NO `dependencies=[Depends(...)]` keyword arg, but the sink
is a real-world airflow-style SQLAlchemy queryset chain
`session.scalar(select(C).filter_by(conn_id=user_input))`.
Pre-fix the chain reduced to bare `["filter_by"]` and was suppressed by
`receiver_is_simple_chain`, blocking recall on this real-repo airflow shape.
The member_chain Python `function`-field traversal + `db_query_builder_roots`
extension restores recall.
Recall guard: ownership-check rule must fire on the chained query the
caller has no auth check.
"""
from fastapi import FastAPI
from sqlalchemy import select
router = FastAPI()
class Connection:
pass
@router.delete("/{connection_id}")
def delete_connection(connection_id: str, session):
"""No auth — must fire missing_ownership_check on the chained query."""
return session.scalar(select(Connection).filter_by(conn_id=connection_id))

View file

@ -0,0 +1,38 @@
"""Recall counterpart to safe_fastapi_route_security_scopes.py.
Precision guard for the Security-without-scopes path: a bare
`Security(callable)` with no `scopes=[...]` kwarg, or with an empty
`scopes=[]`, is NOT promoted from LoginGuard to AuthorizationCheck
the OAuth2 scope semantic only fires when scopes is non-empty. Without
scope enforcement the wrapper is functionally equivalent to
`Depends(callable)` plus a bare login check, so `missing_ownership_check`
must still fire on a downstream id-targeted ORM filter.
Recall guard: ownership-check rule must fire Security with no scopes
is conservative (treated as login-only), so the route is not promoted
to authorized.
"""
from fastapi import FastAPI, Security
def require_auth():
pass
router = FastAPI()
class TaskInstance:
pass
@router.patch(
"/{task_instance_id}/run",
dependencies=[Security(require_auth, scopes=[])],
)
def ti_run(task_instance_id: str, session):
return session.scalar(select(TaskInstance).filter_by(id=task_instance_id))
def select(_):
pass

View file

@ -0,0 +1,46 @@
"""Recall guard for the router-level Security-prop fix. When a router
is declared with NO `dependencies=` kwarg (`router = APIRouter(...)`),
attached routes that don't supply inline deps are genuinely
unauthorized the engine must still flag id-targeted writes as
`missing_ownership_check`. Without the gate the router-level extractor
would over-fire by treating every router as auth-providing.
Distilled from airflow
`task_instances.py:1036-1082` where `router = VersionedAPIRouter()`
(bare, no deps) attaches `@router.get("/states", ...)` the route is
auth-attached only via the cross-file `include_router` chain in
`routes/__init__.py`, which is a separate gap (see deep_engine_fixes.md).
For the per-file case where the router has no router-level deps
declared, the route is correctly an un-guarded ownership-check FN.
"""
from cadwyn import VersionedAPIRouter
# Bare router — no router-level dependencies declared.
router = VersionedAPIRouter()
class TaskInstance:
pass
@router.get("/states/{run_id}/{task_id}")
def get_task_instance_states(run_id: str, task_id: str, session):
rows = session.scalars(
select(TaskInstance)
.where(TaskInstance.run_id == run_id)
.where(TaskInstance.task_id == task_id)
).all()
[
run_id_task_state_map[task.run_id].update(
{task.task_id: task.state}
)
for task in rows
]
def select(_):
pass
run_id_task_state_map = {}

View file

@ -0,0 +1,28 @@
# py-auth-realrepo-XXX (vuln pair): same bare-`set()` / `dict()` /
# `defaultdict()` local collection shape as
# safe_local_set_update_no_orm.py, but the helper *also* runs an
# id-targeted ORM query whose filter argument is a user-supplied id
# (`team_id` in the function signature, no caller-scope-entity
# exemption applies).
#
# Recall guard: the bare-callee constructor recogniser must only
# suppress the InMemoryLocal `.update` / `.add` calls — the
# id-targeted ORM `.filter(id=team_id)` must still fire
# `py.auth.missing_ownership_check`.
class Team:
pass
def get_team_with_history(request, team_id):
seen_ids = set()
audit = dict()
seen_ids.add(team_id)
audit["team"] = team_id
return Team.objects.filter(id=team_id).first()
def archive_team(request, team_id):
pending = set()
pending.add(team_id)
Team.objects.filter(id=team_id).delete()

View file

@ -0,0 +1,13 @@
# py-path-traversal-no-relative-to: regression guard companion to
# safe_relative_to_validator.py. Same source/sink shape but without
# the `filepath.relative_to(base)` validator — taint must propagate.
from pathlib import Path
from flask import request, send_file
def download() -> None:
base = Path("/var/www/static")
rel_url = request.args.get("path")
filepath = base.joinpath(rel_url).resolve()
send_file(str(filepath))

View file

@ -0,0 +1,76 @@
# py-auth-realrepo-011: bare-identifier callees without a receiver dot
# are never DB / ORM operations. Distilled from sentry
# `src/sentry/tasks/statistical_detectors.py` (line 743:
# `org_ids = list({p.organization_id for p in projects})`),
# `src/sentry/utils/query.py:90` (`events = list(method(...))`),
# `src/sentry/api/helpers/group_index/delete.py` (bare `delete_group_list`,
# `create_audit_entry`, `create_audit_entries` helper calls), and
# `src/sentry/seer/autofix/coding_agent.py` (bare `update_coding_agent_state`).
#
# Before the fix, the verb-name fallback in `classify_sink_class`
# matched bare callees `list`, `filter`, `update`, `create`, `add`,
# `delete` against the Python read/mutation indicator vocabulary and
# classified them as `DbCrossTenantRead` / `DbMutation`. Combined with
# the user-input-evidence precondition (`request: Request` triggers it),
# every internal helper firing one of these builtins / locally-defined
# helpers produced a `py.auth.missing_ownership_check` finding.
#
# A real ORM / DB call always carries a receiver
# (`Model.objects.filter(...)`, `repo.find(id)`, `db.query(...)`); a
# bare-identifier callee is a Python builtin or a locally-defined
# helper, neither of which has the cross-tenant read / mutation
# semantics the rule is checking for. The fix gates the verb fallback
# on `receiver_is_simple_chain(callee)` (callee contains a dot AND the
# receiver isn't itself a call expression).
from typing import Any, Iterable
def fetch_continuous_examples(raw_examples):
# `list(...)` is a Python builtin — no DB op happens here.
project_ids = list({pid for pid, _ in raw_examples.keys()})
return project_ids
def detect_function_change_points(projects, start, transactions_per_project=None):
# Bare `list({...})` set-comprehension materialisation; both args
# come from internally-supplied `projects` collection iteration.
org_ids = list({p.organization_id for p in projects})
project_ids = list({p.id for p in projects})
return org_ids, project_ids
def delete_group_list(request, project, group_list, delete_type):
# Bare-name local helper invocation — `create_audit_entries` is a
# function defined in the same module, not a DB write. Used to
# fire `py.auth.missing_ownership_check`.
transaction_id = "tx"
create_audit_entries(request, project, group_list, delete_type, transaction_id)
def create_audit_entries(request, project, group_list, delete_type, transaction_id):
for group in group_list:
# Bare `create_audit_entry` is a helper, not a DB INSERT.
create_audit_entry(
request=request,
target_object=group.id,
event="ISSUE_DELETE",
data={"issue_id": group.id, "project_slug": project.slug},
)
def create_audit_entry(**kwargs):
pass
def update_coding_agent_state(state, action):
# Bare `update_*` helper called inside an outer task — Python lets
# you name local helpers freely, the verb prefix does not imply a
# DB mutation.
pass
def materialise_filter_chain(events: Iterable[Any]):
# `filter(...)` is the Python builtin (`Iterable.filter`), and the
# bare-name local helper pattern below is endemic in real-repo
# Python code.
return list(filter(lambda e: e is not None, events))

View file

@ -0,0 +1,63 @@
"""Distilled from airflow
`airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py:516-628`:
The route handler `ti_update_state` is route-level authorized via the
`ti_id_router = VersionedAPIRouter(dependencies=[Security(require_auth,
scopes=["ti:self"])])` declaration (closed by the session-0010 fix).
The handler then delegates the actual `session.add(TaskReschedule(...))`
sink to a private helper `_create_ti_state_update_query_and_update_state`
that has no inline auth check of its own.
Pre-fix the helper fired `missing_ownership_check` +
`token_override_without_validation` at the helper's body sink because
`check_ownership_gaps` is scoped per AnalysisUnit the caller's
route-level auth check did not propagate to the callee.
The Phase 1 caller-scope IPA fix (`apply_caller_scope_propagation` in
`src/auth_analysis/mod.rs`) walks the call graph DOWN: when every
in-file caller of a helper carries route-level non-Login auth
(Other / Membership / Ownership / AdminGuard), the helper inherits the
caller's checks via synthetic `is_route_level=true` AuthChecks. This
lifts the airflow shape exactly, both findings cleared post-fix.
Precision guard: helper must NOT fire `missing_ownership_check` or
`token_override_without_validation` despite holding the auth-relevant
sinks (`session.add` with caller-passed scoped id).
"""
from typing import Annotated
from uuid import UUID
from fastapi import APIRouter, Body, Security
def require_auth():
pass
# Router-level Security carries the JWT scope check on every attached
# route at runtime. Closes the prior session-0010 gap.
ti_id_router = APIRouter(
dependencies=[Security(require_auth, scopes=["ti:self"])],
)
def _create_state_update(
*,
task_instance_id: UUID,
payload: dict,
session,
) -> None:
"""Helper: caller-scope IPA must propagate route-level auth into here."""
if payload.get("kind") == "reschedule":
session.add({"id": task_instance_id, "data": payload})
@ti_id_router.patch("/{task_instance_id}/state")
def ti_update_state(
task_instance_id: UUID,
payload: Annotated[dict, Body()],
session,
) -> None:
_create_state_update(
task_instance_id=task_instance_id,
payload=payload,
session=session,
)

View file

@ -0,0 +1,44 @@
"""Distilled from airflow
`airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py:101-117`:
FastAPI route declares its auth dependency as
`dependencies=[Security(require_auth, scopes=["token:execution"])]`.
`Security(...)` is FastAPI's OAuth2-scope-checked variant of `Depends(...)`
the JWT must carry one of the listed scopes, so the route is fully
authorized at the boundary.
Pre-fix `is_depends_callee` only matched `Depends`; `Security(...)` was
ignored, leaving the route as if no auth dep were declared. Even after
recognising the marker, `require_auth` is a registered login-guard, and a
`LoginGuard` AuthCheckKind would have been filtered by
`has_prior_subject_auth` the route would still fire
`missing_ownership_check`. The deeper fix promotes a scoped Security
wrapper to `AuthCheckKind::Other` so the route counts as authorized for
ownership / membership checks at any sink the handler reaches.
Precision guard: route must NOT fire `missing_ownership_check` even
though the handler does an id-targeted ORM filter.
"""
from fastapi import FastAPI, Security
def require_auth(scopes):
pass
router = FastAPI()
class TaskInstance:
pass
@router.patch(
"/{task_instance_id}/run",
dependencies=[Security(require_auth, scopes=["token:execution", "token:workload"])],
)
def ti_run(task_instance_id: str, session):
return session.scalar(select(TaskInstance).filter_by(id=task_instance_id))
def select(_):
pass

View file

@ -0,0 +1,61 @@
"""Distilled from airflow
`airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py:89-318`:
FastAPI declares its auth dependency once at the router constructor
`ti_id_router = VersionedAPIRouter(dependencies=[Security(require_auth,
scopes=["ti:self"])])` and every per-task route attaches via
`@ti_id_router.<verb>(...)` with no inline deps. FastAPI propagates
router-level dependencies to every attached route at runtime, so the
JWT-validated scope check guards every `session.add` / row-update sink
the handler body reaches.
Pre-fix the FastAPI dep extractor only walked the per-route decorator's
`dependencies=[...]` kwarg; router-constructor `dependencies=` was
dropped, so every `@ti_id_router.<verb>` route without inline deps fired
`missing_ownership_check` + `token_override_without_validation` despite
being authorized.
The fix walks module-level `<router> = APIRouter(...)` /
`VersionedAPIRouter(...)` / `FastAPI(...)` assignments, captures the
router's `dependencies=[...]` into a per-router map, and merges them
into the per-route middleware list when the decorator's prefix matches.
A scoped Security wrapper synthesises matching TokenExpiry +
TokenRecipient checks (the JWT-validation semantics) so the
token-override rule recognises the route too.
Precision guard: route must NOT fire `missing_ownership_check` /
`token_override_without_validation` even though the handler writes
through an id-targeted state update.
"""
from fastapi import Security
from cadwyn import VersionedAPIRouter
def require_auth(scopes):
pass
# Router-level Security with non-empty scopes. Every route attached to
# this router inherits the dep; no inline declaration needed.
ti_id_router = VersionedAPIRouter(
dependencies=[
Security(require_auth, scopes=["ti:self"]),
],
)
class Log:
pass
class TaskInstance:
pass
@ti_id_router.patch("/{task_instance_id}/state")
def ti_update_state(task_instance_id: str, session):
session.add(
Log(
task_instance_id=task_instance_id,
event="state_update",
)
)

View file

@ -0,0 +1,58 @@
# py-auth-realrepo-XXX: bare-callee Python container constructors
# (`set()` / `dict()` / `defaultdict()`) bind a non-sink local
# collection. Subsequent method calls on the bound var
# (`verified_ids.update(..)`, `cache[k] = v`, `requested_teams.add(..)`)
# are in-memory mutations, not ORM/DB writes, so the route handler
# must NOT fire `py.auth.missing_ownership_check`.
#
# Distilled from sentry `src/sentry/api/helpers/teams.py::get_teams`:
#
# def get_teams(request, organization, teams=None):
# requested_teams = set(request.GET.getlist("team", []) ...)
# verified_ids: set[int] = set()
# ...
# verified_ids.update(myteams) # <-- LOCAL set update
# requested_teams.update(verified_ids)
# teams_query = Team.objects.filter(
# id__in=requested_teams, organization_id=organization.id
# )
#
# Without the bare-callee constructor recogniser, `set()` / `dict()`
# go untracked, the bound vars miss `non_sink_vars`, and the
# `.update(..)` / `.add(..)` calls classify as `DbMutation` —
# triggering the false missing-ownership-check finding. See
# `AuthAnalysisRules::is_non_sink_constructor_callee` and the
# `assignment` arm in `collect_unit_state`.
from collections import Counter, defaultdict
from collections.abc import Iterable
class Organization:
pass
class Team:
pass
def get_teams(request, organization: Organization, teams: Iterable[int] | None = None):
requested_teams = set(request.GET.getlist("team", []) if teams is None else teams)
verified_ids: set[int] = set()
seen_counter = Counter()
cache = defaultdict(list)
metadata = dict()
pending = list()
if "myteams" in requested_teams:
requested_teams.remove("myteams")
myteams = request.access.team_ids_with_membership
verified_ids.update(myteams)
requested_teams.update(verified_ids)
seen_counter.update(myteams)
cache["my"].append(myteams)
metadata["count"] = len(myteams)
pending.append(myteams)
return Team.objects.filter(
id__in=requested_teams, organization_id=organization.id
)

View file

@ -0,0 +1,19 @@
# py-safe-relative-to: pathlib `relative_to(base)` raise-on-escape
# pattern recognised as a receiver-side FILE_IO validator. Captures
# the canonical Python path-containment idiom — the receiver is proven
# contained in `base` if execution reaches the next statement.
# Motivated by CVE-2024-23334 patched fixture.
from pathlib import Path
from flask import request, send_file
def download() -> None:
base = Path("/var/www/static")
rel_url = request.args.get("path")
filepath = base.joinpath(rel_url).resolve()
try:
filepath.relative_to(base)
except ValueError:
return
send_file(str(filepath))

View file

@ -9,9 +9,8 @@
// `src/auth_analysis/extract/common.rs::value_is_self_scoped_session_id_chain`
// which extends `collect_self_actor_id_binding` to recognise
// session-scoped chains beyond the existing `actor_var.id` shape.
async function getCachedApiKeys(_userId: number) {
return [];
}
declare const prisma: any;
declare function getServerSession(): Promise<any>;
export const Page = async () => {
const session = await getServerSession();
@ -21,6 +20,6 @@ export const Page = async () => {
}
const userId = session.user.id;
const apiKeys = await getCachedApiKeys(userId);
const apiKeys = await prisma.apiKey.findMany({ where: { userId } });
return apiKeys;
};

View file

@ -1,13 +1,13 @@
// Vulnerable counterpart to `safe_session_user_id_copy.ts`: the
// `userId` is bound from a route param (`req.params.targetUserId`,
// not from the session), so the rule must still flag the missing
// ownership check on the downstream prisma call.
async function deleteApiKeysFromUserId(_userId: number) {}
export const Handler = async (req: any, _res: any) => {
const session = await getServerSession();
if (!session) return;
const userId = req.params.targetUserId;
await deleteApiKeysFromUserId(userId);
// `targetUserId` is a foreign id parameter (route param, not the
// caller's session-id copy), so the rule must still flag the missing
// ownership check on the downstream qualified prisma call.
declare const prisma: {
apiKey: {
deleteMany(args: { where: { userId: string } }): Promise<void>;
};
};
export async function deleteApiKeysFromUserId(targetUserId: string) {
await prisma.apiKey.deleteMany({ where: { userId: targetUserId } });
}

View file

@ -0,0 +1,84 @@
// Nyx CVE benchmark fixture.
//
// CVE: CVE-2017-1000117
// Project: git (git/git)
// License: GPL-2.0-only (https://github.com/git/git/blob/v2.7.6/COPYING)
// Advisory: https://nvd.nist.gov/vuln/detail/CVE-2017-1000117
// Patched: commit 820d7650cc6705fbb73c8caf9aef47394be5ed72 (in v2.7.6)
// "connect: reject ssh hostname that begins with a dash",
// connect.c:757-758 of the post-fix tree.
//
// Same trims as vulnerable.c (see that header). The patch under test is
// the verbatim 2-line gate added immediately after `get_host_and_port` /
// `get_port` in upstream:
//
// if (ssh_host[0] == '-')
// die("strange hostname '%s' blocked", ssh_host);
//
// In this fixture the upstream `die()` is replaced with `exit(1)` (a
// `noreturn` libc primitive) — the patched-fix simplification. The
// taint-flow consequence is identical: the dash-prefixed `ssh_host` is
// rejected before any argv assembly or exec call, so no user-tainted
// value reaches `execvp`.
//
// Patched-fix simplification:
// - `die("strange hostname '%s' blocked", ssh_host)` rendered as
// `fprintf(stderr, "strange hostname '%s' blocked\n", ssh_host);
// exit(1);`. upstream `die()` is a vararg wrapper that ultimately
// calls `exit(128)`. The flow-killing property (the function never
// returns when the gate fires) is preserved.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
static void get_host_and_port_min(char **host, const char **port) {
char *colon, *end;
if (*host == NULL) return;
end = strchr(*host, '/');
if (end) *end = '\0';
colon = strchr(*host, ':');
if (colon) { *colon = '\0'; *port = colon + 1; }
}
int do_ssh_connect(char *url) {
const char *ssh;
char *ssh_host = url;
const char *port = NULL;
get_host_and_port_min(&ssh_host, &port);
if (!port) port = "22";
if (ssh_host[0] == '-') {
fprintf(stderr, "strange hostname '%s' blocked\n", ssh_host);
exit(1);
}
ssh = getenv("GIT_SSH_COMMAND");
if (!ssh) {
ssh = getenv("GIT_SSH");
if (!ssh) ssh = "ssh";
}
const char *args[8];
int nargs = 0;
args[nargs++] = ssh;
if (port) {
args[nargs++] = "-p";
args[nargs++] = port;
}
args[nargs++] = ssh_host;
args[nargs++] = "git-upload-pack";
args[nargs++] = NULL;
return execvp(args[0], (char *const *)args);
}
int main(void) {
char url_buf[1024];
if (!fgets(url_buf, sizeof url_buf, stdin)) return 1;
size_t len = strlen(url_buf);
if (len && url_buf[len - 1] == '\n') url_buf[len - 1] = '\0';
return do_ssh_connect(url_buf);
}

View file

@ -0,0 +1,96 @@
// Nyx CVE benchmark fixture.
//
// CVE: CVE-2017-1000117
// Project: git (git/git)
// License: GPL-2.0-only (https://github.com/git/git/blob/v2.7.6/COPYING)
// Advisory: https://nvd.nist.gov/vuln/detail/CVE-2017-1000117
// Vulnerable: tag v2.7.5 (parent c8dd1e3bb115), connect.c:733-793 of
// git_connect() — pre-fix tip before commit
// 820d7650cc6705fbb73c8caf9aef47394be5ed72 ("connect: reject
// ssh hostname that begins with a dash") landed.
//
// Pre-2.7.6 git accepted `ssh://-oProxyCommand=...@host/repo` URLs and
// passed the unsanitised `ssh_host` (derived from the URL host part) as
// an argv element to ssh. When `ssh_host` started with a dash, ssh
// interpreted it as an option (`-oProxyCommand=…`), giving the attacker
// a code-execution primitive whenever a user cloned an attacker-supplied
// URL or fetched an attacker-controlled submodule. The fix added a
// hostname-starts-with-dash rejection in connect.c immediately before
// the args were assembled.
//
// Trims:
// - Removed PROTO_LOCAL / git-daemon arms (connect.c:721 else-branch
// and the upstream `else { transport_check_allowed("file"); }` after
// the SSH block) — not on the disclosed flow path.
// - Removed `flags & CONNECT_DIAG_URL` early-exit (lines 744-755) and
// `tortoiseplink` / `putty` shell detection (lines 776-783) — they
// do not influence whether the dash-prefixed `ssh_host` reaches argv.
// - Inlined upstream's `start_command(conn)` (which fork+execvp's
// `conn->args.argv`) directly as `execvp(args[0], (char *const *)args)`
// against `conn->args.argv`. start_command is the heavyweight
// run-command helper; the disclosed sink behavior is identical.
// - Inlined upstream's `argv_array_push(&conn->args, …)` as plain
// pointer assignment into a fixed-size `argv[8]` buffer. argv_array
// is a strvec wrapper; the dispatched argv shape is unchanged.
// - Replaced `parse_connect_url(url, ...)` + `get_host_and_port()` URL
// parser with a minimal `get_host_and_port_min()` that does the
// classic "skip user@, NUL-terminate at /" — the disclosed flow only
// requires that `ssh_host` be a substring of the source URL, which
// this preserves byte-for-byte.
// - Source statement: upstream takes the URL from `argv[1]` of the git
// binary; the fixture uses `fgets(url_buf, ..., stdin)` — a recognised
// C taint source — so the file scans standalone without depending on
// argv-source modeling.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
static void get_host_and_port_min(char **host, const char **port) {
char *colon, *end;
if (*host == NULL) return;
end = strchr(*host, '/');
if (end) *end = '\0';
colon = strchr(*host, ':');
if (colon) { *colon = '\0'; *port = colon + 1; }
}
int do_ssh_connect(char *url) {
// Load-bearing block copied verbatim from connect.c:733-793 of the
// pre-fix git tree (tag v2.7.5 / parent c8dd1e3bb115). The
// dash-prefix check that landed in the fix is intentionally absent.
const char *ssh;
char *ssh_host = url;
const char *port = NULL;
get_host_and_port_min(&ssh_host, &port);
if (!port) port = "22";
ssh = getenv("GIT_SSH_COMMAND");
if (!ssh) {
ssh = getenv("GIT_SSH");
if (!ssh) ssh = "ssh";
}
const char *args[8];
int nargs = 0;
args[nargs++] = ssh;
if (port) {
args[nargs++] = "-p";
args[nargs++] = port;
}
args[nargs++] = ssh_host;
args[nargs++] = "git-upload-pack";
args[nargs++] = NULL;
return execvp(args[0], (char *const *)args);
}
int main(void) {
char url_buf[1024];
if (!fgets(url_buf, sizeof url_buf, stdin)) return 1;
size_t len = strlen(url_buf);
if (len && url_buf[len - 1] == '\n') url_buf[len - 1] = '\0';
return do_ssh_connect(url_buf);
}

View file

@ -0,0 +1,60 @@
// Nyx CVE benchmark fixture (patched).
//
// CVE: CVE-2026-42353
// GHSA: GHSA-jfgf-83c5-2c4m
// Project: i18next-http-middleware (i18next/i18next-http-middleware)
// License: MIT (https://github.com/i18next/i18next-http-middleware/blob/master/licence)
// Patched: 65301c194593d46a84623b64e5fde2f51d3550f6 lib/utils.js:1-22, lib/index.js:243-250
// Release: v3.9.3
//
// Patch adds `utils.isSafeIdentifier` (denylist allowing any legitimate
// i18next language code shape, rejecting `..`, path separators, control
// chars, prototype keys, empty strings, and values longer than 128) and
// inserts `languages = languages.filter(utils.isSafeIdentifier)` and the
// equivalent for `namespaces` before they reach the backend connector.
//
// Trims: same scaffolding trims as the vulnerable counterpart.
//
// Patched-form simplification: same template-literal inline of the
// backend's interpolator + readFileSync as the vulnerable side. The
// `utils.isSafeIdentifier` body is copied verbatim from
// `lib/utils.js:13-22` of the patched commit; the prototype-pollution
// denylist (UNSAFE_KEYS check) and length / control-char / `..` /
// separator rejections are all load-bearing for the precision-side
// claim.
const fs = require('fs');
const express = require('express');
const app = express();
const UNSAFE_KEYS = ['__proto__', 'constructor', 'prototype'];
function isSafeIdentifier (v) {
if (typeof v !== 'string') return false;
if (v.length === 0 || v.length > 128) return false;
if (UNSAFE_KEYS.indexOf(v) > -1) return false;
if (v.indexOf('..') > -1) return false;
if (v.indexOf('/') > -1 || v.indexOf('\\') > -1) return false;
// eslint-disable-next-line no-control-regex
if (/[\x00-\x1F\x7F]/.test(v)) return false;
return true;
}
app.get('/locales/resources.json', (req, res) => {
let languages = req.query.lng
? req.query.lng.split(' ')
: [];
let namespaces = req.query.ns
? req.query.ns.split(' ')
: [];
// Drop user-supplied values containing patterns that could trigger
// path traversal / SSRF / prototype pollution when forwarded to the
// backend connector. See: https://www.i18next.com/how-to/faq#how-should-the-language-codes-be-formatted
languages = languages.filter(isSafeIdentifier);
namespaces = namespaces.filter(isSafeIdentifier);
const lng = languages[0];
const ns = namespaces[0];
const filename = `/locales/${lng}/${ns}.json`;
fs.readFileSync(filename);
});

View file

@ -0,0 +1,56 @@
// Nyx CVE benchmark fixture.
//
// CVE: CVE-2026-42353
// GHSA: GHSA-jfgf-83c5-2c4m
// Project: i18next-http-middleware (i18next/i18next-http-middleware)
// License: MIT (https://github.com/i18next/i18next-http-middleware/blob/master/licence)
// Advisory: https://github.com/i18next/i18next-http-middleware/security/advisories/GHSA-jfgf-83c5-2c4m
// Vulnerable: a1d92a8f03292644d1c6fa83f1b77121d39daf4d lib/index.js:229-234,246-261
//
// Pre-3.9.3 `getResourcesHandler` pulled `lng` and `ns` directly from
// `options.getQuery(req)` (default: `req => req.query`) and forwarded the
// split values into `i18next.services.backendConnector.load(...)` with no
// sanitisation. Paired with `i18next-fs-backend`, the backend's
// `Backend.read` calls `interpolator.interpolate(loadPath, { lng, ns })`
// which substitutes the unsanitised values into a path template and then
// `readFileSync(filename)`, so a request like
// `GET /locales/resources.json?lng=../../etc/passwd&ns=root` reads
// attacker-chosen files off disk. The advisory also flags the SSRF
// variant when paired with `i18next-http-backend`; we model the
// fs-backend path here because it is the more direct sink-flow shape.
//
// Trims: getResourcesHandler's caching headers (lib/index.js:213-227),
// route-params fallback (L237-244), Response/JSON envelope branch
// (L264-268), the full Backend class wrapper (read/save/create/queue/
// debounce — only the inline interpolation + readFileSync are
// load-bearing), `extendOptionsWithDefaults`, the Backend constructor
// path, `loadPath` typeof-function escape hatch, getResourceBundle
// roundtrip, and the express-router/middleware mount glue.
//
// Patched-form simplification: the upstream interpolator is
// `i18next.services.interpolator.interpolate(loadPath, { lng, ns })`;
// here it is inlined as a template literal because the interpolator
// just substitutes `{{lng}}` and `{{ns}}` placeholders into `loadPath`
// (the default loadPath is `/locales/{{lng}}/{{ns}}.json`). The
// substitution is character-for-character equivalent for the load-
// bearing flow path (lng/ns into the string).
const fs = require('fs');
const express = require('express');
const app = express();
app.get('/locales/resources.json', (req, res) => {
let languages = req.query.lng
? req.query.lng.split(' ')
: [];
let namespaces = req.query.ns
? req.query.ns.split(' ')
: [];
// Inline the backend's read() and forEach loop's body verbatim,
// collapsing the call into the array-index access used by the
// recall test (see disabled_reason in ground_truth.json).
const lng = languages[0];
const ns = namespaces[0];
const filename = `/locales/${lng}/${ns}.json`;
fs.readFileSync(filename);
});

View file

@ -0,0 +1,72 @@
<?php
// Nyx CVE benchmark fixture (patched counterpart).
//
// CVE: CVE-2026-33486
// Project: Roadiz CMS (roadiz/core-bundle-dev-app, lib/Documents)
// License: MIT (https://github.com/roadiz/core-bundle-dev-app/blob/main/LICENSE.md)
// Advisory: https://github.com/advisories/GHSA-rc55-58f4-687g
// Patched: 7904f690a51b88b1c72c02149ebdf85fa81f19f2
// lib/Documents/src/DownloadedFile.php:66-86
//
// Patched fixture: the upstream fix prepends a `if (!self::isSafeRemoteUrl($url))
// return null;` early-return guard before any `fopen($url, ...)` call. The
// guard rejects non-`http`/`https` schemes, blocks `localhost`, and
// requires every resolved IP to be a global (non-private/non-reserved)
// address. Once the early-return clears, `$url` is provably restricted
// to a remote URL, so the subsequent `fopen($url, 'r', false, $ctx)` is
// no longer reachable with attacker-controlled `file://` payloads.
//
// Trims: same as the vulnerable counterpart, plus the caller-fold
// (upstream parameter `string $url` is kept as a `$url = $_REQUEST['url']`
// assignment at the top of `fromUrl`).
//
// Patched-fix simplifications (two):
//
// 1. The upstream `isSafeRemoteUrl` body uses `parse_url` +
// `dns_get_record` + `filter_var(FILTER_VALIDATE_IP,
// FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE)`. The
// fixture inlines `return true;` because the precision-relevant
// signal is the early-return-on-validator-false shape, not the
// validator's body.
//
// 2. Upstream's sink line is `fopen($url, 'r', false, $streamContext)`.
// The fixture replaces that with `file_get_contents($url)` (also an
// SSRF sink in `src/labels/php.rs`) to avoid conflating SSRF-flow
// suppression with `state-resource-leak` from fopen's open-but-not-
// fclose'd handle. fopen's SSRF capability is independently
// exercised by the vulnerable counterpart in this directory.
namespace RZ\Roadiz\Documents;
class DownloadedFile
{
public static function fromUrl(?string $originalName = null): ?DownloadedFile
{
$url = $_REQUEST['url']; // caller-fold for upstream `fromUrl(string $url, ...)` parameter
try {
if (!self::isSafeRemoteUrl($url)) {
return null;
}
$baseName = static::sanitizeFilename(pathinfo($url, PATHINFO_BASENAME));
file_get_contents($url);
} catch (\RuntimeException) {
return null;
}
return null;
}
public static function sanitizeFilename(?string $string): string
{
return $string ?? '';
}
private static function isSafeRemoteUrl(string $url): bool
{
return true;
}
}
DownloadedFile::fromUrl();

View file

@ -0,0 +1,61 @@
<?php
// Nyx CVE benchmark fixture.
//
// CVE: CVE-2026-33486
// Project: Roadiz CMS (roadiz/core-bundle-dev-app, lib/Documents)
// License: MIT (https://github.com/roadiz/core-bundle-dev-app/blob/main/LICENSE.md)
// Advisory: https://github.com/advisories/GHSA-rc55-58f4-687g
// Vulnerable: 17ddb5934cdfe9aa617707081ca4765dc988b1d6
// lib/Documents/src/DownloadedFile.php:66-73
//
// SSRF (and LFI via `file://` scheme) in `DownloadedFile::fromUrl()`. The
// `$url` parameter is fed straight to `fopen($url, 'r')` with no scheme
// allowlist or host validation. A backend caller (Podcast/OEmbed media
// importer, Documents `Add a document → Import from URL` UI) drives the
// helper with attacker-controlled URLs, so a `file:///app/.env` payload
// silently reads the host filesystem into the Documents library.
//
// Trims: removed `getOriginalFilename` / `setOriginalFilename` accessors
// (lines 14-25), `__construct` (lines 28-31), and the post-fopen body
// (`tempnam` / `stream_copy_to_stream` / `setOriginalFilename` /
// `guessExtension` / `isReadable` checks, lines 75-103); they sit after
// the SSRF sink and are not on the source-to-sink flow path. The
// `sanitizeFilename` helper that touches `$url` first is preserved and
// trimmed to a minimal stub since `pathinfo(..., PATHINFO_BASENAME)`
// does not validate the URL scheme.
//
// Caller-fold: the upstream caller chain
// `MediaFinders/AbstractPodcastFinder::*::AbstractFinder::process` →
// `DownloadedFile::fromUrl(string $url)` is folded into a single
// `$url = $_REQUEST['url']` assignment at the top of `fromUrl` so the
// fixture stays single-file. The load-bearing `fopen($url, 'r')` line
// remains verbatim from upstream L70.
namespace RZ\Roadiz\Documents;
class DownloadedFile
{
public static function fromUrl(?string $originalName = null): ?DownloadedFile
{
$url = $_REQUEST['url']; // caller-fold for upstream `fromUrl(string $url, ...)` parameter
try {
$baseName = static::sanitizeFilename(pathinfo($url, PATHINFO_BASENAME));
$distantResource = fopen($url, 'r');
if (false === $distantResource) {
return null;
}
} catch (\RuntimeException) {
return null;
}
return null;
}
public static function sanitizeFilename(?string $string): string
{
return $string ?? '';
}
}
DownloadedFile::fromUrl();

View file

@ -0,0 +1,30 @@
# Nyx CVE benchmark fixture.
#
# CVE: CVE-2023-6568
# Project: MLflow (mlflow/mlflow)
# License: Apache-2.0 (https://github.com/mlflow/mlflow/blob/master/LICENSE.txt)
# Advisory: https://nvd.nist.gov/vuln/detail/CVE-2023-6568
# Patched: 28ff3f94994941e038f2172c6484b65dc4db6ca1 mlflow/server/auth/__init__.py:744-770
#
# The fix replaces the f-string interpolation of the attacker-controlled
# `content_type` header with a static error message. No tainted value
# reaches `make_response`, so the reflected-XSS sink is silent.
from flask import request, make_response
def catch_mlflow_exception(fn):
return fn
@catch_mlflow_exception
def create_user():
content_type = request.headers.get("Content-Type")
if content_type == "application/json":
return make_response({"user": "ok"})
else:
message = (
"Invalid content type. Must be one of: "
"application/x-www-form-urlencoded, application/json"
)
return make_response(message, 400)

View file

@ -0,0 +1,45 @@
# Nyx CVE benchmark fixture.
#
# CVE: CVE-2023-6568
# Project: MLflow (mlflow/mlflow)
# License: Apache-2.0 (https://github.com/mlflow/mlflow/blob/master/LICENSE.txt)
# Advisory: https://nvd.nist.gov/vuln/detail/CVE-2023-6568
# Vulnerable: 28ff3f94994941e038f2172c6484b65dc4db6ca1~1 mlflow/server/auth/__init__.py:744-766
#
# Reflected Cross-Site Scripting in MLflow's auth server `create_user`
# handler. When a request arrived with an unrecognised `Content-Type`
# header, the handler reflected the attacker-controlled header value
# into a Flask response via an f-string and `make_response(...)`.
# Because `make_response` returns the response unmodified (no escaping)
# and Werkzeug serves the bytes back to the browser as text/html, the
# header reflection becomes XSS in the browser.
#
# Trims:
# - imports / module-level setup (config, store, blueprints L1-30) —
# scaffolding only.
# - non-`create_user` handlers (`get_user`, `update_user_password`,
# `update_user_admin`, all later in the file) — same `make_response`
# call shape but with non-tainted inputs; not the disclosed sink.
# - `flash` / `alert` paths inside `create_user` (form-urlencoded and
# application/json branches) — those branches do not produce the
# reflected XSS; only the `else` branch does.
#
# Verbatim load-bearing lines: `content_type = request.headers.get(
# "Content-Type")` (source) and `return make_response(f"Invalid content
# type: '{content_type}'", 400)` (sink) are byte-for-byte from
# mlflow/server/auth/__init__.py at the pre-fix SHA.
from flask import request, make_response
def catch_mlflow_exception(fn):
return fn
@catch_mlflow_exception
def create_user():
content_type = request.headers.get("Content-Type")
if content_type == "application/json":
return make_response({"user": "ok"})
else:
return make_response(f"Invalid content type: '{content_type}'", 400)

View file

@ -0,0 +1,26 @@
# Nyx CVE benchmark fixture.
#
# CVE: CVE-2024-21513
# Project: LangChain Experimental (langchain-ai/langchain)
# License: MIT (https://github.com/langchain-ai/langchain/blob/master/LICENSE)
# Advisory: https://nvd.nist.gov/vuln/detail/CVE-2024-21513
# Patched: 7b13292e3544b2f5f2bfb8a27a062ea2b0c34561
# libs/experimental/langchain_experimental/sql/vector_sql.py:79-83
#
# The fix removes the `_try_eval` helper entirely and returns the raw
# `db._execute(...)` result without invoking `eval(...)` at all. No
# `eval` sink remains, so `py.code_exec.eval` is silent.
from typing import Any, Dict, List, Union
class SQLDatabase:
def _execute(self, cmd: str, fetch: str = "all") -> Any:
...
def get_result_from_sqldb(
db: SQLDatabase, cmd: str
) -> Union[str, List[Dict[str, Any]], Dict[str, Any]]:
result = db._execute(cmd, fetch="all") # type: ignore
return result

View file

@ -0,0 +1,56 @@
# Nyx CVE benchmark fixture.
#
# CVE: CVE-2024-21513
# Project: LangChain Experimental (langchain-ai/langchain)
# License: MIT (https://github.com/langchain-ai/langchain/blob/master/LICENSE)
# Advisory: https://nvd.nist.gov/vuln/detail/CVE-2024-21513
# Vulnerable: 7b13292e3544b2f5f2bfb8a27a062ea2b0c34561~1
# libs/experimental/langchain_experimental/sql/vector_sql.py:79-98
#
# `langchain_experimental.sql.vector_sql.VectorSQLDatabaseChain` ran
# every value returned from a SQL query through Python's built-in
# `eval(...)` so that string-shaped numbers / lists were converted into
# Python objects. An attacker who could control the database content
# (for example by writing into a vector store backing the chain) could
# return a value such as `__import__("os").system("rm -rf /")` and the
# chain would `eval` it, achieving arbitrary code execution on the
# server hosting the chain.
#
# Trims:
# - imports / non-load-bearing module decls (L1-30 of upstream).
# - `parse(self, text: str)` output-parser method (L70-77) and the
# `VectorSQLDatabaseChain` class body (L101-200) — neither is on
# the disclosed source→sink path.
# - SQLAlchemy / SQLDatabase type hints simplified to `Any` to avoid
# pulling the upstream type chain into the fixture.
#
# Verbatim load-bearing lines: the `_try_eval` helper definition and
# the two dict / list comprehensions inside `get_result_from_sqldb`
# that call `_try_eval(v)` on each query-result value are
# byte-for-byte from vector_sql.py at the pre-fix SHA.
from typing import Any, Dict, List, Union
class SQLDatabase:
def _execute(self, cmd: str, fetch: str = "all") -> Any:
...
def _try_eval(x: Any) -> Any:
try:
return eval(x)
except Exception:
return x
def get_result_from_sqldb(
db: SQLDatabase, cmd: str
) -> Union[str, List[Dict[str, Any]], Dict[str, Any]]:
result = db._execute(cmd, fetch="all") # type: ignore
if isinstance(result, list):
return [{k: _try_eval(v) for k, v in dict(d._asdict()).items()} for d in result]
else:
return {
k: _try_eval(v) for k, v in dict(result._asdict()).items() # type: ignore
}

View file

@ -0,0 +1,57 @@
# Nyx CVE benchmark fixture.
#
# CVE: CVE-2024-23334
# Project: aiohttp (aio-libs/aiohttp)
# License: Apache-2.0 (https://github.com/aio-libs/aiohttp/blob/master/LICENSE.txt)
# Advisory: https://github.com/aio-libs/aiohttp/security/advisories/GHSA-5h86-8mv2-jq9f
# Patched: 1c335944d6a8b1298baf179b7c0b3069f10c514b aiohttp/web_urldispatcher.py:644-668
#
# The fix splits the previously-unified resolve+containment check so
# that ``relative_to(self._directory)`` is run on *both* arms of the
# ``follow_symlinks`` branch. In the follow-symlinks arm the path is
# normalised pre-resolve so a symlink target that lives outside the
# static directory still raises ``ValueError`` from ``relative_to`` and
# is converted to ``HTTPNotFound``.
#
# Trims: same as vulnerable.py.
#
# Verbatim load-bearing lines: the rebuilt ``follow_symlinks`` branch
# in ``_handle`` (L644-660), the new ``unresolved_path = self._directory
# .joinpath(filename)`` step, and the ``normalized_path.relative_to(
# self._directory)`` guard are byte-for-byte from
# web_urldispatcher.py:644-660 of the fix commit.
import os
from pathlib import Path
from aiohttp import web
from aiohttp.web import FileResponse, HTTPForbidden, HTTPNotFound, Request, StreamResponse
class StaticResource:
def __init__(self, directory: str, follow_symlinks: bool = True) -> None:
self._directory = Path(directory)
self._follow_symlinks = follow_symlinks
self._chunk_size = 256 * 1024
async def _handle(self, request: Request) -> StreamResponse:
rel_url = request.match_info["filename"]
try:
filename = Path(rel_url)
if filename.anchor:
raise HTTPForbidden()
unresolved_path = self._directory.joinpath(filename)
if self._follow_symlinks:
normalized_path = Path(os.path.normpath(unresolved_path))
normalized_path.relative_to(self._directory)
filepath = normalized_path.resolve()
else:
filepath = unresolved_path.resolve()
filepath.relative_to(self._directory)
except (ValueError, FileNotFoundError) as error:
raise HTTPNotFound() from error
except HTTPForbidden:
raise
if filepath.is_file():
return FileResponse(filepath, chunk_size=self._chunk_size)
raise HTTPNotFound

View file

@ -0,0 +1,62 @@
# Nyx CVE benchmark fixture.
#
# CVE: CVE-2024-23334
# Project: aiohttp (aio-libs/aiohttp)
# License: Apache-2.0 (https://github.com/aio-libs/aiohttp/blob/master/LICENSE.txt)
# Advisory: https://github.com/aio-libs/aiohttp/security/advisories/GHSA-5h86-8mv2-jq9f
# Vulnerable: 33ccdfb0a12690af5bb49bda2319ec0907fa7827 aiohttp/web_urldispatcher.py:633-648
#
# aiohttp's StaticResource._handle resolved the requested filename
# under the configured static directory and then verified containment
# only when ``follow_symlinks`` was False. When ``follow_symlinks=True``
# the ``filepath.relative_to(self._directory)`` check was skipped, so a
# symlink (or absolute path slip past the anchor check) under the
# static directory could escape it and serve files from anywhere on
# the filesystem the worker process could read.
#
# Trims:
# - ``append_version`` branch (L575-588) — separate code path that
# does not feed FileResponse on the disclosed flow.
# - ``HTTPNotFound`` / ``Exception`` handling fall-through after the
# try block (L646-654 of upstream) — irrelevant to source→sink.
# - ``_directory_as_html`` directory-listing branch (L658-708) —
# only ``FileResponse`` is the disclosed sink path.
#
# Verbatim load-bearing lines: the ``rel_url = request.match_info[
# "filename"]`` source, the ``filepath = self._directory.joinpath(
# filename).resolve()`` path composition, the missing ``relative_to``
# guard inside the ``if not self._follow_symlinks`` branch, and the
# ``return FileResponse(filepath, chunk_size=self._chunk_size)`` sink
# are byte-for-byte from web_urldispatcher.py:633-648 and L666-668.
from pathlib import Path
from aiohttp import web
from aiohttp.web import FileResponse, HTTPForbidden, HTTPNotFound, Request, StreamResponse
class StaticResource:
def __init__(self, directory: str, follow_symlinks: bool = True) -> None:
self._directory = Path(directory)
self._follow_symlinks = follow_symlinks
self._chunk_size = 256 * 1024
async def _handle(self, request: Request) -> StreamResponse:
rel_url = request.match_info["filename"]
try:
filename = Path(rel_url)
if filename.anchor:
# rel_url is an absolute name like
# /static/\\machine_name\c$ or /static/D:\path
# where the static dir is totally different
raise HTTPForbidden()
filepath = self._directory.joinpath(filename).resolve()
if not self._follow_symlinks:
filepath.relative_to(self._directory)
except (ValueError, FileNotFoundError) as error:
raise HTTPNotFound() from error
except HTTPForbidden:
raise
if filepath.is_file():
return FileResponse(filepath, chunk_size=self._chunk_size)
raise HTTPNotFound

File diff suppressed because it is too large Load diff

View file

@ -1,6 +1,6 @@
{
"benchmark_version": "1.0",
"timestamp": "2026-05-03T17:00:35Z",
"timestamp": "2026-05-04T17:11:50Z",
"scanner_version": "0.6.1",
"scanner_config": {
"analysis_mode": "Full",
@ -9,10 +9,10 @@
"state_analysis_enabled": true,
"worker_threads": 1
},
"ground_truth_hash": "sha256:1d6ed97196d3ff0844320a79ac607983245dd73af5455bcf77f6ac6a212c5e45",
"corpus_size": 533,
"cases_run": 532,
"cases_skipped": 1,
"ground_truth_hash": "sha256:414494ab1b6881a9b78eca38e26561231f78767480399fda73a477e23a9fcbaa",
"corpus_size": 565,
"cases_run": 562,
"cases_skipped": 3,
"outcomes": [
{
"case_id": "c-buf-001",
@ -1656,6 +1656,40 @@
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "cve-js-2026-42353-patched",
"file": "cve_corpus/javascript/CVE-2026-42353/patched.js",
"language": "javascript",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "cve-js-2026-42353-vulnerable",
"file": "cve_corpus/javascript/CVE-2026-42353/vulnerable.js",
"language": "javascript",
"vuln_class": "path_traversal",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": null,
"matched_rule_ids": [
"taint-unsanitised-flow (source 44:9)"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"taint-unsanitised-flow (source 44:9)"
],
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "cve-php-2017-9841-patched",
"file": "cve_corpus/php/CVE-2017-9841/patched.php",
@ -1728,6 +1762,43 @@
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "cve-php-2026-33486-patched",
"file": "cve_corpus/php/CVE-2026-33486/patched.php",
"language": "php",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "cve-php-2026-33486-vulnerable",
"file": "cve_corpus/php/CVE-2026-33486/vulnerable.php",
"language": "php",
"vuln_class": "ssrf",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"taint-unsanitised-flow (source 40:9)"
],
"unexpected_rule_ids": [
"state-resource-leak"
],
"all_finding_ids": [
"state-resource-leak",
"taint-unsanitised-flow (source 40:9)"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2017-18342-patched",
"file": "cve_corpus/python/CVE-2017-18342/patched.py",
@ -1800,6 +1871,113 @@
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2023-6568-patched",
"file": "cve_corpus/python/CVE-2023-6568/patched.py",
"language": "python",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2023-6568-vulnerable",
"file": "cve_corpus/python/CVE-2023-6568/vulnerable.py",
"language": "python",
"vuln_class": "xss",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.xss.make_response_format",
"taint-unsanitised-flow (source 41:20)"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"py.xss.make_response_format",
"taint-unsanitised-flow (source 41:20)"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2024-21513-patched",
"file": "cve_corpus/python/CVE-2024-21513/patched.py",
"language": "python",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2024-21513-vulnerable",
"file": "cve_corpus/python/CVE-2024-21513/vulnerable.py",
"language": "python",
"vuln_class": "code_exec",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.code_exec.eval"
],
"unexpected_rule_ids": [
"cfg-unguarded-sink"
],
"all_finding_ids": [
"cfg-unguarded-sink",
"py.code_exec.eval"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2024-23334-patched",
"file": "cve_corpus/python/CVE-2024-23334/patched.py",
"language": "python",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2024-23334-vulnerable",
"file": "cve_corpus/python/CVE-2024-23334/vulnerable.py",
"language": "python",
"vuln_class": "path_traversal",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"taint-unsanitised-flow (source 45:9)"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"taint-unsanitised-flow (source 45:9)"
],
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "cve-py-2025-69662-patched",
"file": "cve_corpus/python/CVE-2025-69662/patched.py",
@ -3084,6 +3262,21 @@
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "go-safe-realrepo-019",
"file": "go/safe/safe_dao_helper_id_scalar.go",
"language": "go",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "go-sqli-001",
"file": "go/sqli/sqli_concat.go",
@ -3811,6 +4004,21 @@
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "java-safe-realrepo-openmrs-001",
"file": "java/safe/SafeJpaCriteriaQuery.java",
"language": "java",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "java-safe-stmt-execute-validated",
"file": "java/safe/safe_statement_execute_pattern_validated.java",
@ -4260,6 +4468,25 @@
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "js-path_traversal-ternary-source-001",
"file": "javascript/path_traversal/path_traversal_ternary_source.js",
"language": "javascript",
"vuln_class": "path_traversal",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": null,
"matched_rule_ids": [
"taint-unsanitised-flow (source 15:29)"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"taint-unsanitised-flow (source 15:29)"
],
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "js-pathprune-safe-001",
"file": "javascript/path_pruning/safe_early_return.js",
@ -4575,6 +4802,21 @@
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "js-safe-ternary-const-branches",
"file": "javascript/safe/safe_ternary_const_branches.js",
"language": "javascript",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "js-sqli-001",
"file": "javascript/sqli/sqli_concat.js",
@ -4960,6 +5202,29 @@
"security_finding_count": 3,
"non_security_finding_count": 0
},
{
"case_id": "php-deser-003",
"file": "php/deser/deser_unserialize_method_named_unserialize_with_user_input.php",
"language": "php",
"vuln_class": "deser",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": null,
"matched_rule_ids": [
"php.deser.unserialize",
"taint-unsanitised-flow (source 13:38)",
"php.deser.unserialize"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"php.deser.unserialize",
"taint-unsanitised-flow (source 13:38)",
"php.deser.unserialize"
],
"security_finding_count": 3,
"non_security_finding_count": 0
},
{
"case_id": "php-interproc-001",
"file": "php/interprocedural/interproc_taint_propagation.php",
@ -5295,6 +5560,36 @@
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "php-safe-020",
"file": "php/safe/safe_serializable_magic_method_unserialize.php",
"language": "php",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "php-safe-camelcase-validator-001",
"file": "php/safe/safe_camelcase_validator_negated.php",
"language": "php",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "php-safe-filter-001",
"file": "php/safe/safe_filter_input.php",
@ -5392,6 +5687,28 @@
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "php-ssrf-002",
"file": "php/ssrf/ssrf_class_method_fopen.php",
"language": "php",
"vuln_class": "ssrf",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"taint-unsanitised-flow (source 14:9)"
],
"unexpected_rule_ids": [
"cfg-resource-leak"
],
"all_finding_ids": [
"cfg-resource-leak",
"taint-unsanitised-flow (source 14:9)"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "php-ssrf-safe-001",
"file": "php/ssrf/safe_ssrf_hardcoded.php",
@ -5639,6 +5956,184 @@
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-011",
"file": "python/safe/safe_bare_callee_no_receiver.py",
"language": "python",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-012",
"file": "python/safe/safe_local_set_update_no_orm.py",
"language": "python",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-013",
"file": "python/auth/vuln_local_set_with_user_id_query.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.auth.missing_ownership_check",
"py.auth.missing_ownership_check"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"py.auth.missing_ownership_check",
"py.auth.missing_ownership_check"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-014",
"file": "python/auth/vuln_fastapi_route_no_dependencies_sqla.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.auth.missing_ownership_check"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"py.auth.missing_ownership_check"
],
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-015",
"file": "python/safe/safe_fastapi_route_security_scopes.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-016",
"file": "python/auth/vuln_fastapi_route_security_no_scopes.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.auth.missing_ownership_check"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"py.auth.missing_ownership_check"
],
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-017",
"file": "python/safe/safe_fastapi_router_level_security_scopes.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-018",
"file": "python/auth/vuln_fastapi_router_no_dependencies.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.auth.missing_ownership_check"
],
"unexpected_rule_ids": [
"py.auth.token_override_without_validation"
],
"all_finding_ids": [
"py.auth.missing_ownership_check",
"py.auth.token_override_without_validation"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-019",
"file": "python/safe/safe_caller_scope_helper_under_authorized_route.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-auth-realrepo-020",
"file": "python/auth/vuln_caller_scope_helper_under_bare_route.py",
"language": "python",
"vuln_class": "auth",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"py.auth.missing_ownership_check"
],
"unexpected_rule_ids": [
"py.auth.token_override_without_validation"
],
"all_finding_ids": [
"py.auth.missing_ownership_check",
"py.auth.token_override_without_validation"
],
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
"case_id": "py-cmdi-001",
"file": "python/cmdi/cmdi_direct.py",
@ -5962,6 +6457,25 @@
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "py-path_traversal-no-relative-to",
"file": "python/path_traversal/path_traversal_no_relative_to.py",
"language": "python",
"vuln_class": "path_traversal",
"is_vulnerable": true,
"outcome_file_level": "TP",
"outcome_rule_level": "TP",
"outcome_location_level": "TP",
"matched_rule_ids": [
"taint-unsanitised-flow (source 11:15)"
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"taint-unsanitised-flow (source 11:15)"
],
"security_finding_count": 1,
"non_security_finding_count": 0
},
{
"case_id": "py-pathprune-safe-001",
"file": "python/path_pruning/safe_early_return.py",
@ -6187,6 +6701,21 @@
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-safe-relative-to-validator",
"file": "python/safe/safe_relative_to_validator.py",
"language": "python",
"vuln_class": "safe",
"is_vulnerable": false,
"outcome_file_level": "TN",
"outcome_rule_level": "TN",
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"security_finding_count": 0,
"non_security_finding_count": 0
},
{
"case_id": "py-sqli-001",
"file": "python/sqli/sqli_concat.py",
@ -6343,11 +6872,14 @@
"matched_rule_ids": [
"taint-unsanitised-flow (source 4:12)"
],
"unexpected_rule_ids": [],
"unexpected_rule_ids": [
"py.xss.make_response_format"
],
"all_finding_ids": [
"py.xss.make_response_format",
"taint-unsanitised-flow (source 4:12)"
],
"security_finding_count": 1,
"security_finding_count": 2,
"non_security_finding_count": 0
},
{
@ -8494,9 +9026,11 @@
"outcome_location_level": null,
"matched_rule_ids": [],
"unexpected_rule_ids": [],
"all_finding_ids": [],
"all_finding_ids": [
"ts.quality.any_annotation"
],
"security_finding_count": 0,
"non_security_finding_count": 0
"non_security_finding_count": 1
},
{
"case_id": "ts-auth-realrepo-002",
@ -8512,12 +9046,10 @@
],
"unexpected_rule_ids": [],
"all_finding_ids": [
"ts.quality.any_annotation",
"ts.quality.any_annotation",
"js.auth.missing_ownership_check"
],
"security_finding_count": 1,
"non_security_finding_count": 2
"non_security_finding_count": 0
},
{
"case_id": "ts-auth-realrepo-003",
@ -9512,19 +10044,19 @@
}
],
"aggregate_file_level": {
"tp": 261,
"tp": 275,
"fp": 0,
"fn_": 0,
"tn": 271,
"tn": 287,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
},
"aggregate_rule_level": {
"tp": 261,
"tp": 275,
"fp": 0,
"fn_": 0,
"tn": 271,
"tn": 287,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
@ -9552,7 +10084,7 @@
"tp": 30,
"fp": 0,
"fn_": 0,
"tn": 35,
"tn": 36,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
@ -9561,34 +10093,34 @@
"tp": 23,
"fp": 0,
"fn_": 0,
"tn": 22,
"tn": 23,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
},
"javascript": {
"tp": 23,
"tp": 25,
"fp": 0,
"fn_": 0,
"tn": 30,
"tn": 32,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
},
"php": {
"tp": 19,
"tp": 22,
"fp": 0,
"fn_": 0,
"tn": 20,
"tn": 23,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
},
"python": {
"tp": 29,
"tp": 38,
"fp": 0,
"fn_": 0,
"tn": 32,
"tn": 41,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
@ -9623,10 +10155,10 @@
},
"by_vuln_class": {
"auth": {
"tp": 20,
"tp": 25,
"fp": 0,
"fn_": 0,
"tn": 0,
"tn": 3,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
@ -9650,7 +10182,7 @@
"f1": 1.0
},
"code_exec": {
"tp": 4,
"tp": 5,
"fp": 0,
"fn_": 0,
"tn": 0,
@ -9686,7 +10218,7 @@
"f1": 1.0
},
"deser": {
"tp": 8,
"tp": 9,
"fp": 0,
"fn_": 0,
"tn": 0,
@ -9731,7 +10263,7 @@
"f1": 1.0
},
"path_traversal": {
"tp": 28,
"tp": 32,
"fp": 0,
"fn_": 0,
"tn": 0,
@ -9761,7 +10293,7 @@
"tp": 0,
"fp": 0,
"fn_": 0,
"tn": 271,
"tn": 284,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0
@ -9794,7 +10326,7 @@
"f1": 1.0
},
"ssrf": {
"tp": 30,
"tp": 32,
"fp": 0,
"fn_": 0,
"tn": 0,
@ -9803,7 +10335,7 @@
"f1": 1.0
},
"xss": {
"tp": 23,
"tp": 24,
"fp": 0,
"fn_": 0,
"tn": 0,
@ -9814,31 +10346,31 @@
},
"by_confidence": {
">=High": {
"tp": 88,
"fp": 100,
"fn_": 173,
"tn": 171,
"precision": 0.46808510638297873,
"recall": 0.3371647509578544,
"f1": 0.3919821826280624
"tp": 85,
"fp": 114,
"fn_": 190,
"tn": 173,
"precision": 0.4271356783919598,
"recall": 0.3090909090909091,
"f1": 0.3586497890295359
},
">=Low": {
"tp": 90,
"fp": 120,
"fn_": 171,
"tn": 151,
"precision": 0.42857142857142855,
"recall": 0.3448275862068966,
"f1": 0.3821656050955414
"tp": 86,
"fp": 142,
"fn_": 189,
"tn": 145,
"precision": 0.37719298245614036,
"recall": 0.31272727272727274,
"f1": 0.341948310139165
},
">=Medium": {
"tp": 90,
"fp": 116,
"fn_": 171,
"tn": 155,
"precision": 0.4368932038834951,
"recall": 0.3448275862068966,
"f1": 0.38543897216274087
"tp": 86,
"fp": 133,
"fn_": 189,
"tn": 154,
"precision": 0.3926940639269406,
"recall": 0.31272727272727274,
"f1": 0.3481781376518218
}
}
}

View file

@ -0,0 +1,53 @@
//! Cross-file FastAPI `include_router(child)` parent-dep propagation.
//!
//! Distilled from airflow
//! `airflow-core/src/airflow/api_fastapi/execution_api/routes/`:
//! `__init__.py` declares
//! `authenticated_router = VersionedAPIRouter(dependencies=[Security(require_auth)])`
//! and lifts every per-file child router via
//! `authenticated_router.include_router(<child>.router, ...)`. FastAPI's
//! runtime propagates the parent's `dependencies=[...]` onto every route
//! attached to the child router, including bare child routers declared
//! without inline deps.
//!
//! Pre-fix: per-file router-dep extractor only saw inline declarations,
//! so bare child routers (`router = VersionedAPIRouter()`) fired
//! `missing_ownership_check` / `token_override_without_validation`
//! despite being authorized via the cross-file `include_router` chain.
//!
//! Post-fix: pass 1 persists per-file `PerFileRouterFacts` (router-level
//! deps + include_router edges) into
//! `GlobalSummaries.router_facts_by_module`; pass 2 resolves the
//! cross-file lift via `resolve_cross_file_router_deps_for_file` and
//! pre-populates `AuthorizationModel.cross_file_router_deps` before the
//! FlaskExtractor runs. Cross-file `Security(...)` markers are flagged
//! scoped-equivalent (architectural intent of include_router auth
//! scoping), so `inject_middleware_auth` promotes the kind to `Other`
//! and ownership checks see the route as authorized.
//!
//! Recall guard: `public_health.py` is attached to `execution_api_router`
//! which has NO `dependencies=[...]` kwarg. Routes there are genuinely
//! unauthorized — `missing_ownership_check` must still fire. Without
//! this guard, an over-broad cross-file lift (e.g. blanket "every
//! include_router target inherits any parent's auth") would silently
//! suppress real findings.
mod common;
use common::{scan_fixture_dir, validate_expectations};
use nyx_scanner::utils::config::AnalysisMode;
use std::path::{Path, PathBuf};
fn fixture_path(name: &str) -> PathBuf {
Path::new(env!("CARGO_MANIFEST_DIR"))
.join("tests")
.join("fixtures")
.join(name)
}
#[test]
fn fastapi_cross_file_include_router_lifts_parent_security_onto_child_router() {
let dir = fixture_path("auth_analysis_fastapi_cross_file_include_router");
let diags = scan_fixture_dir(&dir, AnalysisMode::Full);
validate_expectations(&diags, &dir);
}

View file

@ -0,0 +1,33 @@
{
"required_findings": [
{ "id_prefix": "py.auth.missing_ownership_check", "min_count": 1 }
],
"forbidden_findings": [
{
"id_prefix": "py.auth.missing_ownership_check",
"file_glob": "**/task_instances.py"
},
{
"id_prefix": "py.auth.missing_ownership_check",
"file_glob": "**/dag_runs.py"
},
{
"id_prefix": "py.auth.token_override_without_validation",
"file_glob": "**/task_instances.py"
},
{
"id_prefix": "py.auth.token_override_without_validation",
"file_glob": "**/dag_runs.py"
}
],
"noise_budget": {
"max_total_findings": 8,
"max_high_findings": 4
},
"performance_expectations": {
"max_ms_no_index": 1500,
"max_ms_index_cold": 2000,
"max_ms_index_warm": 800,
"ci_mode": "lenient"
}
}

View file

@ -0,0 +1,30 @@
# Distilled from airflow `airflow-core/src/airflow/api_fastapi/execution_api/routes/__init__.py`.
# Parent file declares an authorized router carrying scoped Security deps,
# then attaches every per-file child router via `include_router(...)`.
# FastAPI runtime lifts the parent's `dependencies=[...]` onto every route
# attached to the child router — including bare child routers declared
# without inline deps — so routes inside child files inherit the auth
# automatically.
#
# Pre-fix the per-file router-dep extractor only saw inline declarations;
# bare child routers fired `missing_ownership_check` /
# `token_override_without_validation` despite being authorized via the
# `include_router` parent. The cross-file router-fact index resolves the
# parent-child lift at pass 2 entry.
from cadwyn import VersionedAPIRouter
from fastapi import APIRouter, Security
from . import task_instances, dag_runs, public_health
from .security import require_auth
execution_api_router = APIRouter()
execution_api_router.include_router(public_health.router, prefix="/health", tags=["Health"])
# All routes attached to this router are authenticated via Security(require_auth).
authenticated_router = VersionedAPIRouter(dependencies=[Security(require_auth)])
authenticated_router.include_router(
task_instances.router, prefix="/task-instances", tags=["Task Instances"]
)
authenticated_router.include_router(dag_runs.router, prefix="/dag-runs", tags=["Dag Runs"])
execution_api_router.include_router(authenticated_router)

View file

@ -0,0 +1,30 @@
"""Second bare child router — same shape as task_instances.py."""
from typing import Annotated
from fastapi import Body
from cadwyn import VersionedAPIRouter
router = VersionedAPIRouter()
@router.put("/{dag_run_id}/clear")
def clear_dag_run(
dag_run_id: str,
body: Annotated[dict, Body()],
):
"""Bare-child route — auth via parent's include_router lift."""
session = _get_session()
session.add(
DagRunRow(dag_run_id=dag_run_id, cleared=body.get("clear", False))
)
session.commit()
def _get_session():
raise NotImplementedError
class DagRunRow:
def __init__(self, dag_run_id: str, cleared: bool) -> None:
self.dag_run_id = dag_run_id
self.cleared = cleared

View file

@ -0,0 +1,52 @@
"""Public router — NOT attached via authenticated_router, no auth lift.
The parent file declares
`execution_api_router.include_router(public_health.router, prefix="/health")`
where `execution_api_router = APIRouter()` has NO dependencies. Every
route here is genuinely public no inline auth, no cross-file lift.
The vulnerability counterpart in this fixture: the route below writes
a row keyed by an id-like path param, with no auth covering it.
The auth analysis must still fire `missing_ownership_check` here
recall guard for the cross-file resolution. If the cross-file lift
over-applies (e.g. blanket "any router covered by include_router gets
the parent's deps" without checking that the parent itself has deps),
this finding would silently disappear and we would lose the vuln
detection."""
from typing import Annotated
from fastapi import Body
from cadwyn import VersionedAPIRouter
router = VersionedAPIRouter()
@router.put("/{log_id}/payload")
def public_update_log(
log_id: str,
body: Annotated[dict, Body()],
):
"""Public route — no auth covers this id-targeted write.
`log_id` is a path param the route accepted from the URL. The
write is keyed by that id with no ownership check exactly the
shape `py.auth.missing_ownership_check` is designed to flag.
"""
session = _get_session()
session.add(
HealthLogRow(
log_id=log_id,
payload=body.get("payload", ""),
)
)
session.commit()
def _get_session():
raise NotImplementedError
class HealthLogRow:
def __init__(self, log_id: str, payload: str) -> None:
self.log_id = log_id
self.payload = payload

View file

@ -0,0 +1,13 @@
"""Stub for the auth dependency callable referenced by the parent router."""
from typing import Annotated
def require_auth():
"""Validates a bearer JWT, raises HTTPException(401) on failure.
Real airflow uses a more elaborate version that talks to a JWT
validator and the token-recipient table; for this fixture the
declaration-only stub is enough the auth analysis cares about
the route-level wrapper, not the body.
"""
return None

View file

@ -0,0 +1,60 @@
"""Bare child router — auth comes from `__init__.py` via include_router.
Pre-fix: every `@router.<verb>(...)` route in this file fired
`missing_ownership_check` because `router = VersionedAPIRouter()`
declares no inline `dependencies=[...]`. The auth declaration lives
on `__init__.py`'s `authenticated_router = VersionedAPIRouter(
dependencies=[Security(require_auth)])` and is lifted onto this file
via `authenticated_router.include_router(task_instances.router)`.
Post-fix: cross-file router-fact resolution at pass 2 entry detects
the include_router edge targeting this file's `router` var, looks up
`authenticated_router`'s deps in the parent's `local_router_deps`
map, and folds them into this file's per-route auth attribution.
The route below must NOT fire `missing_ownership_check` /
`token_override_without_validation`."""
from typing import Annotated
from fastapi import Body
from cadwyn import VersionedAPIRouter
from .security import require_auth as _require_auth_unused # noqa: F401 (parity with airflow)
router = VersionedAPIRouter()
@router.patch("/{task_instance_id}/state")
def patch_task_instance_state(
task_instance_id: str,
body: Annotated[dict, Body()],
):
"""Bare-child route — relies on parent router's Security(require_auth).
Operations: writes a row keyed by user-supplied `task_instance_id`.
Without cross-file router-dep resolution this is the canonical FP
shape the auth check lives in `__init__.py`, the sink lives here.
"""
new_state = body.get("state", "")
# Simulated session.add — write keyed by an id-like param the route
# accepted from the URL path. A bare in-file scan would mark this
# as missing_ownership_check on the assumption that `task_instance_id`
# is unauthorized user input.
session = _get_session()
session.add(
TaskInstanceRow(
task_instance_id=task_instance_id,
state=new_state,
)
)
session.commit()
def _get_session():
"""Stub — supplies the session object for the write below."""
raise NotImplementedError
class TaskInstanceRow:
def __init__(self, task_instance_id: str, state: str) -> None:
self.task_instance_id = task_instance_id
self.state = state

View file

@ -50,6 +50,16 @@ def trigger_sql_fstring(cursor, user):
def trigger_sqlalchemy_text_fstring(connection, user):
connection.execute(text(f"SELECT * FROM users WHERE name = '{user}'"))
# py.xss.make_response_format
def trigger_make_response_fstring(request, make_response):
content_type = request.headers.get("Content-Type")
return make_response(f"Invalid content type: '{content_type}'", 400)
# py.xss.make_response_format (concat variant)
def trigger_make_response_concat(request, make_response):
name = request.args.get("name")
return make_response("<h1>Hello " + name + "</h1>")
# py.crypto.md5
def trigger_md5(data):
hashlib.md5(data)

View file

@ -301,6 +301,9 @@ fn positive_python() {
// py.sqli.text_format must fire on the SQLAlchemy text() shape.
"py.sqli.execute_format",
"py.sqli.text_format",
// CVE-2023-6568 (mlflow) reflected XSS via make_response f-string;
// also catches the `+`-concat shape in xss_reflected.py.
"py.xss.make_response_format",
],
);
}