nyx/docs/language-maturity.md

# Language Maturity Matrix

Nyx supports ten languages, but support depth is not uniform. This page gives an
honest per-language picture so you can calibrate expectations before depending
on Nyx for a given stack.

The classifications here are grounded in three concrete signals:

1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
   for the language in `src/labels/<lang>.rs`, and how many vulnerability
   classes (Cap bits) those matchers cover.
2. **Benchmark results**: rule-level precision / recall / F1 on the 305-case
   corpus (267 synthetic + 14 real-CVE pairs + 10 auth fixtures) in
   [`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
   last measured 2026-04-23 with scanner version 0.5.0.
3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
   in the benchmark rather than suppressed, documented release-by-release in
   [`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).

All parser integrations use tree-sitter and are stable; parsing is not a
differentiator between tiers. The differentiators are rule depth, cross-file
confidence, and modeled idioms.

---

## Tier Summary

| Tier | Languages | What to expect |
|------|-----------|----------------|
| **Stable** | Python, JavaScript, TypeScript | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA, context-sensitivity, symbolic execution) coverage. Safe to depend on in CI gates. |
| **Beta** | Go, Java, Ruby, PHP | Solid mid-depth rule sets with known narrower class coverage. No gated sinks yet. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation) are incomplete. Usable in CI, but review FP/FN lists before tightening gates. |
| **Preview** | C, C++ | Pattern-only coverage. Pointer aliasing, function pointers, array-element taint, and STL container flows are not modeled. Suitable for finding obvious unsafe API uses; do not use as a sole SAST gate. Pair with clang-tidy / Clang Static Analyzer / Infer. |
| **Experimental** | Rust | Full source coverage relative to the framework ecosystem, but several FPs persist on adversarial safe cases pending engine work (match-arm guards, structural sinks with type facts). Appropriate for spot-checks and contribution but not yet recommended as a sole SAST dependency. |

---

## Per-Language Detail

### Stable tier

#### Python: 100% P / 100% R / 100% F1 *(29-case corpus)*

- **Rule depth**: 5 source families, 7 sanitizer families, 21 sink matchers
  spanning HTML, URL, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Framework context**: Flask, Django, argparse source matchers; `flask_request`
  import-alias support.
- **Advanced analysis**: gated sinks (`Popen`, `subprocess.run/call` with
  activation-arg awareness), most SSA-equivalence and symbolic-execution
  fixtures target Python.
- **Fixtures**: 125 under `tests/fixtures/` plus 30 benchmark cases.
- **Blind spots**: f-string interpolation is not explicitly modeled as a
  distinct taint-producing construct; string-formatting flows are caught by
  the general concatenation path.

#### JavaScript: 93.8% P / 100% R / 96.8% F1 *(27-case corpus)*

- **Rule depth**: 3 source families, 10 sanitizer families, 24 sink matchers
  spanning HTML, URL, JSON, Shell, SQL, Code, SSRF, and File I/O.
- **Advanced analysis**: gated sinks (`setAttribute`, `parseFromString`),
  two-level SSA solve for top-level + per-function scopes (`analyse_ssa_js_two_level`),
  prefix-locked SSRF suppression via StringFact.
- **Framework context**: Express, Koa, Fastify (via in-file import scan when
  `package.json` is absent).
- **Fixtures**: 238 under `tests/fixtures/`; the largest corpus of any
  language.
- **Blind spots**: template literals are lowered through concatenation rather
  than modeled as a first-class taint operator; dynamic property access
  (`obj[user]`) is conservatively treated.

#### TypeScript: 100% P / 100% R / 100% F1 *(35-case corpus, most recent measurement)*

- **Rule depth**: Shares the JS ruleset (3 sources, 10 sanitizers, 24 sinks)
  plus TS-specific grammar handling.
- **Advanced analysis**: TSX and JSX grammars wired as of 2026-04-20;
  discriminated-union narrowing, generic erasure, decorator flow, and
  interface dispatch are all validated against adversarial type-system
  stressors.
- **Framework context**: Fastify detection via `detect_in_file_frameworks`
  (import-driven, no `package.json` required).
- **Fixtures**: 39 test fixtures plus 35 benchmark cases.
- **Blind spots**: 0 known open weak spots as of 2026-04-20. `as any` casts
  and `any`-typed flows are handled conservatively (treated as tainted).

### Beta tier

#### Go: 94.1% P / 100% R / 97.0% F1 *(28-case corpus)*

- **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
  covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
- **Framework context**: Gin, Echo source matchers.
- **Known gaps**: no gated sinks, no deserialization class, allowlist
  early-return patterns in path-pruning benchmark cases still produce FPs
  (`go-pathprune-safe-001`). `fmt.Sprintf` is deliberately not a sink.

#### Java: 92.9% P / 100% R / 96.3% F1 *(23-case corpus)*

- **Rule depth**: 3 source families, 8 sanitizer families, 10 sink matchers
  covering HTML, URL, Shell, SQL, Code, SSRF, and Deserialization.
- **Framework context**: Spring, JPA, Hibernate ORM rules; JNDI injection
  sinks.
- **Known gaps**: no gated sinks. Variable-receiver method calls
  (`client.send(...)` vs `HttpClient.send(...)`) rely on type-qualified
  resolution from receiver-type inference; flows where the receiver type
  cannot be inferred are missed (`java-ssrf-002` historically persisted as
  FN; closed via type facts but fragile on unusual builder chains).

#### Ruby: 100% P / 92.3% R / 96.0% F1 *(24-case corpus)*

- **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
  covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Framework context**: Rails helpers (`sanitize_sql`, `permit`, `require`).
- **Known gaps**: string interpolation inside shell and SQL strings is
  recognized structurally but not modeled as a distinct operator.
  `begin/rescue/ensure` exception-edge wiring is documented as deferred
  (structurally incompatible with `build_try()`). One FN persists on an
  interprocedural taint propagation case due to rule-ID mismatch, not a
  missed flow (`rb-interproc-001`).

#### PHP: 86.7% P / 100% R / 92.9% F1 *(24-case corpus)*

- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
  superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
  Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
  methods only). Interprocedural sanitizer-wrapping case
  (`php-interproc-safe-001`) persists as FP. `echo` language-construct
  detection is wired but its inner-argument propagation is narrower than
  function-call sinks.

### Preview tier

C and C++ are labeled **Preview** (not Experimental) to convey a specific
shape of limitation: the parser and existing rules produce useful findings
on obvious unsafe-API uses, but the engine **structurally cannot model**
several pervasive C/C++ constructs. Running Nyx on a C/C++ codebase and
seeing a clean report should not be read as a clean audit. Pair Nyx with
clang-tidy, the Clang Static Analyzer, or Infer for production use.

**Not modeled** (common to both C and C++):

- Pointer aliasing. Taint through `*p`, `p->field`, arbitrary pointer
  arithmetic, and aliased writes are not tracked.
- Function pointers and callback dispatch. Indirect calls through
  `void (*fn)(char *)` resolve to no callee.
- Array-element taint. Writes to `buf[i]` do not propagate taint to `buf`
  in the general case; structural taint chains involving `fgets` → array →
  `system` have rule-ID matching issues (`c-cmdi-004`).
- STL container operations (C++ only). `std::vector`, `std::map`,
  `std::string` methods are not taint-aware; `c_str()` breaks taint chains
  (`cpp-cmdi-003`).
- Lambdas and nested classes (C++ only). Not modeled.
- Complex socket setup (C++ only). E.g. `connect()` chains are not detected
  (`cpp-ssrf-002`).

#### C: 85.7% P / 100% R / 92.3% F1 *(20-case corpus)*

- **Rule depth**: 3 source families, **2** sanitizer families (prefix-based
  only), 5 sink matchers spanning Shell, File, SSRF, and Format-String.
- **Known gaps**: no framework rules, no gated sinks. Path-validation via
  `strstr()` is not recognized as a guard (`c-safe-006`). Forward-declared
  sanitizers are not tracked (`c-safe-008`).

#### C++: 80.0% P / 100% R / 88.9% F1 *(20-case corpus)*

- **Rule depth**: Clones the C ruleset (3 sources, 2 sanitizers, 5 sinks) and
  adds `std::cin` / `std::getline` sources.
- **Known gaps**: same sanitizer-recognition gaps as C. See the "Not
  modeled" list above for structural gaps (STL containers, `c_str()`,
  `connect()`, lambdas, nested classes).

### Experimental tier

#### Rust: 76.0% P / 100% R / 86.4% F1 *(31-case adversarial corpus)*

- **Rule depth**: 6 source families, **2** sanitizer families (prefix and
  type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
  Deserialization, and File I/O. Extensive framework source coverage
  (Axum, Actix, Rocket); the most of any language on the source side.
- **Recent additions (2026-04-20)**: new SQL class (`rusqlite`, `sqlx`,
  `diesel`, `postgres`), new Deserialization class (`serde_yaml`,
  `bincode`, `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
  (`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Known gaps**:
  - `rs-safe-003`: structural `cfg-unguarded-sink` fires when a tainted
    variable is *declared* in scope but not used in the sink; intentional
    for high-risk sinks.
  - `rs-safe-009`: match-arm guards don't surface as `StmtKind::If`, so
    `classify_condition` never sees the character-class validation.
  - `safe_direct_sanitizer.rs`: still FP because the SSA lowering for
    an OR-chain rejection (`if a || b || c { return X }`) joins both
    return paths into a single block, losing the early-return
    semantics.  Distinct from the merged-return-block defect closed in
    2026-04-24; tracked separately.
- **Closed by the 2026-04-23 PathFact domain**
  (`src/abstract_interp/path_domain.rs`): `rs-safe-007` (`.replace("..",
  "")` sanitiser), `rs-safe-008` (negative-validation return pattern),
  `rs-safe-010` (static-map lookup; still handled by the dedicated
  static-map analysis, but PathFact does not interfere), new `rs-safe-012`
  (`.contains("..")` + `.starts_with('/')` intraprocedural rejection),
  new `rs-safe-015` (`Path::new(p).is_absolute()` typed rejection), plus a
  new `rs-path-006` negative-guard to prevent over-suppression.
- **Closed by the 2026-04-24 per-return-path PathFact landing**
  (`PathFactReturnEntry` on `SsaFuncSummary` + structural
  variant-wrapper transparency + non-data-return skipping +
  path-fact-proven leaf detection in
  `trace_tainted_leaf_values`):
  `rs-safe-014` (Option-returning user sanitiser),
  new `rs-safe-016` (cross-function `.contains("..")` rejection),
  `CVE-2018-20997` patched (tar-rs zip-slip),
  `CVE-2022-36113` patched (cargo `.cargo-ok` symlink),
  `CVE-2024-24576` patched (BatBadBut argv injection).
- **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
  `process::Command` async variants (not distinguished from sync),
  `hyper` / `surf` / `ureq` SSRF clients (reqwest family only), and Rocket /
  Actix positive cases (rules exist but no benchmark fixtures yet).

---

## How the tiers were assigned

A language lands in **Stable** when all three hold:

- Rule set covers ≥ 8 vulnerability classes with both source and sink
  matchers, and at least one class has argument-role-aware gating.
- Benchmark F1 ≥ 95% on a corpus of ≥ 25 cases.
- Advanced analysis (SSA lowering, context-sensitivity, symbolic-execution)
  is exercised by fixtures for the language.

A language lands in **Beta** when benchmark F1 ≥ 90% but at least one of the
Stable criteria fails; usually narrower cap coverage or absence of gated
sinks.

A language lands in **Preview** when the engine structurally cannot model
constructs that are pervasive in typical codebases for that language
(pointer aliasing, function pointers, array-element taint, STL containers
for C/C++). Pattern-only coverage is useful but not sufficient as a sole
SAST gate.

A language lands in **Experimental** when rule depth is clearly narrower
(≤ 5 sinks and ≤ 2 sanitizers), or benchmark F1 < 90%, or documented weak
spots require engine changes rather than rule additions to close, but the
engine does not have the pervasive structural blind spots of the Preview
tier.

---

## What this means for you

- **CI gates**: safe to set strict `--fail-on HIGH` gates on Stable-tier
  languages. On Beta-tier, expect occasional FP triage; the weak-spot lists
  above tell you exactly what to skim for. On Preview- and Experimental-tier,
  treat Nyx findings as a starting point for manual review rather than
  authoritative; Preview-tier languages in particular have structural
  blind spots that a clean report will not disclose.
- **Rule contributions**: the shortest path to raising a language's tier is
  contributing sink matchers and gated-sink registrations. Label files live
  at `src/labels/<lang>.rs`; benchmark cases live at
  `tests/benchmark/corpus/<lang>/`.
- **Scope planning**: if your primary stack is C, C++, or Rust, Nyx will
  surface real findings, but you should budget for review time and consider
  combining Nyx with a language-specific tool (e.g. `cargo-audit`,
  `clang-tidy`) until those tiers mature.

The benchmark thresholds in `tests/benchmark_test.rs` are deliberately set
~5 pp below current baselines so any drop in a language's F1 fails CI. Tier
promotions require sustained benchmark performance, not just rule additions.