Prerelease cleanup (#46)

* feat: Add const_bound_vars tracking to prevent false positives in ownership checks * feat: Introduce field interner and typed bounded vars for enhanced type tracking * feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking * feat: Centralize method name extraction with bare_method_name helper * feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch * feat: Enhance C++ taint tracking with additional container operations and inline method resolution * feat: Introduce field-sensitive points-to analysis for enhanced resource tracking * feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis * test: Add comprehensive tests for JavaScript control flow constructs and lattice operations * docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details * test: Add comprehensive tests for lattice algebra laws and SSA edge cases * feat: Add destructured session user handling and safe user ID access patterns * feat: Implement row-population reverse-walk for enhanced authorization checks * feat: Enhance authorization checks with local alias chain for self-actor types * feat: Introduce ActiveRecord query safety checks and enhance snippet extraction * feat: Implement chained method call inner-gate rebinding for SSRF prevention * feat: Add observability and error modules, enhance debug functionality, and implement theme context * feat: Remove Auth Analysis page and update navigation to redirect to Explorer * feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor * feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor * feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity * fix(ssa): ungate debug_assert_bfs_ordering for release-tests build The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while the unit test at the bottom of the file was gated only `#[cfg(test)]`. Since `cfg(test)` is set in release builds with `--tests` but `cfg(debug_assertions)` is not, `cargo build --release --tests` failed with E0425. Removing the gate fixes the build; the body is `debug_assert!` only, so the helper is free in release. Also drop the gate at the call site to avoid a `dead_code` warning when the lib is built without `--tests`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(closure-capture): flip JS/TS fixtures to required-finding The JS and TS closure-capture fixtures pinned the old broken behaviour via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now correctly traces taint through the closure boundary (env source captured by an arrow function, sunk via `child_process.exec` inside the body), so the formerly-forbidden finding is a true positive. Match the Python sibling's shape — `required_findings` with `id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the companion READMEs and the phase8_fragility_tests doc-comments from "known gap" to "regression guard". Verified: - cargo test --release --test phase8_fragility_tests → 8/8 pass - cargo test --release --lib bfs_assertion → pass - corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis * feat: Introduce health module and enhance health score computation with calibration tests * feat: Add expectations configuration and cleanup .gitignore for log files * feat: Implement theme selection and enhance settings panel for triage sync * feat: Suppress false positives for strcpy calls with literal sources in AST * feat: Update analyse_function_ssa to return body CFG for accurate analysis * feat: Add bug report and feature request templates for improved issue tracking * feat: removed dev scripts * feat: update README.md for clarity and consistency in fixture descriptions * feat: removed dev docs * feat: clean up error handling and UI elements for improved user experience * feat: adjust button sizes in HeaderBar for better UI consistency * feat: enhance taint analysis with additional context for sanitizer and taint findings * cargo fmt * prettier * refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts * feat: add script to frame PNG screenshots with brand gradient * feat: add fuzzing support with new targets and CI workflows * refactor: streamline match expressions and improve formatting in CLI and output handling * feat: enhance configuration display with detailed output options * feat: stage demo configuration for improved CLI screenshot output * feat: expose merge_configs function for user-configurable settings * refactor: simplify code structure and improve readability in config handling * refactor: improve descriptions for vulnerability patterns in various languages * feat: update MIT License section with additional usage details and copyright information * feat: update screenshots * refactor: update build process and paths for frontend assets * feat: add cross-file taint fuzzing target and supporting dictionary * refactor: clean up formatting and comments in fuzz configuration and example files * refactor: remove outdated comments and clean up CI configuration files * chore: update changelog dates and improve formatting in documentation * refactor: update Cargo.toml and CI configuration for improved packaging and build process * refactor: enhance quote-stripping logic to prevent panics and add regression tests --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-21 20:18:06 +02:00 · 2026-04-29 00:58:38 -04:00 · 2026-04-29 00:58:38 -04:00 · 82f18184b1
commit 82f18184b1
parent 79c29b394d
348 changed files with 48731 additions and 2925 deletions
--- a/docs/advanced-analysis.md
+++ b/docs/advanced-analysis.md
@ -1,11 +1,15 @@
 # Advanced Analysis

-Nyx ships four optional analysis passes that layer on top of the core SSA
-taint engine. Each pass is independently switchable via config
-(`[analysis.engine]` in `nyx.conf` / `nyx.local`), a matching CLI flag pair,
-or; as a legacy last-resort override for library users with no CLI entry
-point; a `NYX_*` environment variable. All four are **on by default**: turning
-them off trades precision for speed.
+Nyx layers several analysis passes on top of the core SSA taint engine.
+Most are switchable via config (`[analysis.engine]` in `nyx.conf` /
+`nyx.local`), a matching CLI flag pair, or, as a last-resort override for
+library users with no CLI entry point, a `NYX_*` environment variable. The
+five precision-tuning passes (abstract interpretation, context sensitivity,
+symbolic execution, constraint solving, field-sensitive points-to) are
+**on by default** because the benchmark numbers in
+[language-maturity.md](language-maturity.md) are measured with them on.
+The demand-driven backwards walk and hierarchy fan-out sit alongside but
+are not user-toggleable in the same way.

 See [`Configuration`](configuration.md#analysisengine) for the full config
 surface and CLI flag table. This page explains what each pass does, why it
@ -81,6 +85,77 @@ origin-attribution.

 ---

+## Field-sensitive points-to
+
+**What it does.** Runs a Steensgaard-style alias analysis that interns field
+accesses as their own abstract locations. `c.mu` becomes `Field(c, mu)`,
+distinct from `c` itself; a write to `obj.cache` and a read from
+`obj.cache` in different methods both land on the same abstract location;
+subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic
+`__index_get__` / `__index_set__` calls so the engine can model them
+through the same container store/load primitives used for STL containers,
+Python lists, JS arrays, and similar.
+
+**Why it helps.** It splits a class of false positives that the
+whole-variable taint model produced. Before this pass, `obj.field =
+tainted; sink(obj.other_field)` would taint `obj` as a whole and fire on
+the safe field; the receiver-type / sub-field distinction is also what
+lets the resource-lifecycle pass attribute a `c.mu.Lock()` to the lock
+field rather than to its container. Cross-method field flow (writer in
+one method, reader in another) shows up only when fields have stable
+identity independent of the parent value.
+
+**How to turn it off.**
+
+| Surface | Value |
+|---|---|
+| Env var | `NYX_POINTER_ANALYSIS=0` |
+
+The pass is **on by default** as of 2026-04-26. The env-var override is
+kept for one release so you can compare against the pre-pointer baseline,
+then will be removed.
+
+**Limitations.** This is not a general escape analysis. Function pointers
+and arbitrary indirect calls still resolve to no callee, and deep alias
+chains through `*p` / `p->field` in C/C++ are not tracked beyond the
+direct field case. The points-to set per value is capped at
+`--max-pointsto` (default 32); when truncation happens, an engine note
+records the precision loss.
+
+**Source**: [`src/pointer/`](https://github.com/elicpeter/nyx/tree/master/src/pointer/).
+
+---
+
+## Hierarchy fan-out for virtual dispatch
+
+**What it does.** Builds a per-language type-hierarchy index in pass 1
+(extends, implements, impl-for, includes; the exact construct depends on
+the language) and uses it in pass 2 to widen method-call resolution. When
+a call's receiver is statically typed as a super-class, trait, or
+interface, the resolver returns every concrete implementer it has seen
+in the codebase rather than just the first match.
+
+**Why it helps.** Without it, a call like `repository.findById(id)` where
+`repository` is typed as the interface gets resolved against whatever the
+single-result resolver finds first; if the matching implementer is in
+another file the call effectively goes opaque. With the hierarchy, the
+taint engine sees the union of every implementer's transform and the
+flow shows up regardless of which file holds the concrete class.
+
+**Limitations.** Fan-out is capped at 8 implementers per call site; over
+that, the tail is silently dropped (a debug log records the cap hit) and
+the call is treated as a non-deterministic union of the kept
+implementers. Languages that use structural / implicit interface
+satisfaction (Go) are deliberately skipped because per-file extraction
+is intractable; those calls fall back to the single-result resolver. The
+extractor covers Java, Rust, TS/JS/TSX, Python, Ruby, PHP, and C++.
+
+**Source**: [`src/cfg/hierarchy.rs`](https://github.com/elicpeter/nyx/blob/master/src/cfg/hierarchy.rs)
+and [`src/summary/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/summary/mod.rs)
+(`TypeHierarchyIndex`, `resolve_callee_widened`).
+
+---
+
 ## Symbolic execution

 **What it does.** Builds a symbolic expression tree per tainted SSA value,
--- a/docs/detectors/taint.md
+++ b/docs/detectors/taint.md
@ -25,8 +25,7 @@ One rule ID, parameterized by the source location. Suppressions can target eithe
 ## What it can't detect

 - **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
+- **Deep pointer aliasing.** `let y = &x; sink(*y)` works through one level, but arbitrary chains of pointer arithmetic and aliased writes (`*p`, `p->field` in C/C++) are not tracked end-to-end. Function pointers and indirect calls resolve to no callee.
 - **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
 - **Globals and statics across functions.** Not tracked across function boundaries.

@ -35,7 +34,7 @@ One rule ID, parameterized by the source location. Suppressions can target eithe
 | Scenario | Why | Mitigation |
 |---|---|---|
 | Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
-| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
+| Container holds mixed-typed items the engine cannot tell apart | A `vector<int>` of port numbers and a `vector<string>` of user input share the same store/load model | Sanitize the values on the way in (numeric parse / explicit validator) so the values themselves carry no cap, not just the container |
 | Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
 | Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |

--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@ -14,6 +14,10 @@ A scan runs in two passes over the file tree, with an optional SQLite index that

 Two extra layers tune precision around calls. **Context-sensitive inlining** (k=1) re-runs intra-file callees with the actual argument taint at the call site, so a helper called once with tainted input and once with sanitized input produces the right result for each call. **SCC fixed-point**: when a group of mutually-recursive functions forms a strongly-connected component in the call graph, the engine iterates summaries to a joint fixed-point (capped at 64 iterations). SCCs that span files are also handled.

+When a method call has a receiver typed as a super-class, trait, or interface, **hierarchy fan-out** widens the resolved callee set to every concrete implementer the engine has seen. A class diagram extracted in pass 1 (Java extends/implements, Rust impl-for, TS/JS extends, Python bases, Ruby includes, PHP extends/implements, C++ inheritance) feeds an index that the call resolver consults during pass 2. The fan-out is capped at 8 implementers per call site; over-fanning is a precision tax, not a soundness issue.
+
+A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for one release if you need to compare baselines.
+
 ## Optional analyses on top

 These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
@ -22,6 +26,8 @@ These run on top of the forward taint pass. They're independently switchable via
 |---|---|---|
 | Abstract interpretation | Carries interval and string prefix/suffix bounds alongside taint. Suppresses findings on proven-bounded integers and locked-prefix URLs | on |
 | Context sensitivity | k=1 inlining for intra-file callees | on |
+| Field-sensitive points-to | Distinguishes `obj.field` from `obj` itself, so a tainted write to one field does not poison reads from another. Also gives the resource-lifecycle pass per-field locks | on |
+| Hierarchy fan-out | When a method call's receiver is typed as a super-class, trait, or interface, widens callee resolution to every concrete implementer the engine has seen | on |
 | Constraint solving | Drops paths whose accumulated branch predicates are unsatisfiable. Optional Z3 backend with `--features smt` | on |
 | Symbolic execution | Builds an expression tree per tainted value. Produces a witness string at the sink. Detects sanitization patterns the taint engine alone would miss | on |
 | Backwards analysis | After the forward pass, walks backwards from each sink to confirm or invalidate the flow. Annotates findings as `backwards-confirmed`, `backwards-infeasible`, or `backwards-budget-exhausted` | off |
--- a/docs/language-maturity.md
+++ b/docs/language-maturity.md
@ -9,28 +9,34 @@ The classifications here are grounded in three concrete signals:
 1. **Rule depth**: how many distinct source / sanitizer / sink matchers exist
   for the language in `src/labels/<lang>.rs`, and how many vulnerability
   classes (Cap bits) those matchers cover.
-2. **Benchmark results**: rule-level precision / recall / F1 on the 305-case
-   corpus (267 synthetic + 14 real-CVE pairs + 10 auth fixtures) in
+2. **Benchmark results**: rule-level precision / recall / F1 on the 433-case
+   corpus in
   [`tests/benchmark/RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md),
-   last measured 2026-04-23 with scanner version 0.5.0.
+   last measured 2026-04-29 with scanner version 0.5.0.
 3. **Known weak spots**: FPs and FNs the maintainers have deliberately left
-   in the benchmark rather than suppressed, documented release-by-release in
+   in the benchmark rather than suppressed, plus structural engine
+   limitations the corpus does not stress, documented release-by-release in
   [`RESULTS.md`](https://github.com/elicpeter/nyx/blob/master/tests/benchmark/RESULTS.md).

-All parser integrations use tree-sitter and are stable; parsing is not a
-differentiator between tiers. The differentiators are rule depth, cross-file
-confidence, and modeled idioms.
+As of 2026-04-29 the synthetic corpus has effectively saturated: nine of ten
+languages report rule-level F1 = 100.0% and Go reports 94.1% (two FPs and
+one FN on a real-CVE SSRF case, `cve-go-2023-3188-vulnerable`). Aggregate
+rule-level P=0.991, R=0.995, F1=0.993. That means F1 alone no longer
+differentiates tiers, so the differentiators are **rule depth**,
+**gated-sink coverage**, and **structural idioms the corpus does not fully
+stress** (deep pointer aliasing in C/C++, framework-specific context). All
+parser integrations use tree-sitter and are stable; parsing is not a
+differentiator.

 ---

 ## Tier Summary

-| Tier | Languages | What to expect |
-|------|-----------|----------------|
-| **Stable** | Python, JavaScript, TypeScript | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA, context-sensitivity, symbolic execution) coverage. Safe to depend on in CI gates. |
-| **Beta** | Go, Java, Ruby, PHP | Solid mid-depth rule sets with known narrower class coverage. No gated sinks yet. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation) are incomplete. Usable in CI, but review FP/FN lists before tightening gates. |
-| **Preview** | C, C++ | Pattern-only coverage. Pointer aliasing, function pointers, array-element taint, and STL container flows are not modeled. Suitable for finding obvious unsafe API uses; do not use as a sole SAST gate. Pair with clang-tidy / Clang Static Analyzer / Infer. |
-| **Experimental** | Rust | Full source coverage relative to the framework ecosystem, but several FPs persist on adversarial safe cases pending engine work (match-arm guards, structural sinks with type facts). Appropriate for spot-checks and contribution but not yet recommended as a sole SAST dependency. |
+| Tier | Languages | F1 | What to expect |
+|------|-----------|----|----------------|
+| **Stable** | Python, JavaScript, TypeScript | 100% | Deep rule sets, gated sinks (argument-role-aware), framework detection, extensive fixtures, and the bulk of advanced-analysis (SSA two-level solve, context-sensitivity, symbolic execution, abstract interpretation) coverage. Safe to depend on in CI gates. |
+| **Beta** | Go, Java, PHP, Ruby, Rust | 94.1% to 100% | Solid mid-depth rule sets with narrower cap coverage and **no gated sinks**. Cross-file flows work; some idioms (variable-typed method receivers, framework context, string interpolation, match-arm guards) are partially modeled. Usable in CI; review FP/FN lists before tightening gates. |
+| **Preview** | C, C++ | 100% on synthetic corpus | Recent work taught the engine to follow taint through `std::vector` / `std::string` / map containers (including `c_str()`), through fluent builder chains like `Socket::builder().host(h).connect()`, and through inline class member functions. Function pointers and deeper pointer aliasing through `*p` / `p->field` are still not tracked. Rule-level scores against a corpus of obvious unsafe-API uses look perfect, but that is not the same as a clean audit on a real codebase. Pair with clang-tidy, Clang Static Analyzer, or Infer. |

 ---

@ -38,7 +44,7 @@ confidence, and modeled idioms.

 ### Stable tier

-#### Python: 100% P / 100% R / 100% F1 *(29-case corpus)*
+#### Python: 100% P / 100% R / 100% F1 *(46-case corpus)*

 - **Rule depth**: 5 source families, 7 sanitizer families, 21 sink matchers
  spanning HTML, URL, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
@ -47,52 +53,59 @@ confidence, and modeled idioms.
 - **Advanced analysis**: gated sinks (`Popen`, `subprocess.run/call` with
  activation-arg awareness), most SSA-equivalence and symbolic-execution
  fixtures target Python.
- **Fixtures**: 125 under `tests/fixtures/` plus 30 benchmark cases.
+- **Fixtures**: 125 under `tests/fixtures/` plus 42 benchmark cases.
 - **Blind spots**: f-string interpolation is not explicitly modeled as a
  distinct taint-producing construct; string-formatting flows are caught by
  the general concatenation path.

-#### JavaScript: 93.8% P / 100% R / 96.8% F1 *(27-case corpus)*
+#### JavaScript: 100% P / 100% R / 100% F1 *(42-case corpus)*

 - **Rule depth**: 3 source families, 10 sanitizer families, 24 sink matchers
  spanning HTML, URL, JSON, Shell, SQL, Code, SSRF, and File I/O.
 - **Advanced analysis**: gated sinks (`setAttribute`, `parseFromString`),
-  two-level SSA solve for top-level + per-function scopes (`analyse_ssa_js_two_level`),
-  prefix-locked SSRF suppression via StringFact.
+  two-level SSA solve for top-level + per-function scopes
+  (`analyse_ssa_js_two_level`), prefix-locked SSRF suppression via
+  StringFact, abstract-interpretation interval tracking.
 - **Framework context**: Express, Koa, Fastify (via in-file import scan when
  `package.json` is absent).
- **Fixtures**: 238 under `tests/fixtures/`; the largest corpus of any
+- **Fixtures**: 238 under `tests/fixtures/`; the largest fixture set of any
  language.
 - **Blind spots**: template literals are lowered through concatenation rather
  than modeled as a first-class taint operator; dynamic property access
  (`obj[user]`) is conservatively treated.

-#### TypeScript: 100% P / 100% R / 100% F1 *(35-case corpus, most recent measurement)*
+#### TypeScript: 100% P / 100% R / 100% F1 *(47-case corpus)*

 - **Rule depth**: Shares the JS ruleset (3 sources, 10 sanitizers, 24 sinks)
  plus TS-specific grammar handling.
- **Advanced analysis**: TSX and JSX grammars wired as of 2026-04-20;
+- **Advanced analysis**: TSX and JSX grammars wired;
  discriminated-union narrowing, generic erasure, decorator flow, and
  interface dispatch are all validated against adversarial type-system
  stressors.
 - **Framework context**: Fastify detection via `detect_in_file_frameworks`
  (import-driven, no `package.json` required).
- **Fixtures**: 39 test fixtures plus 35 benchmark cases.
- **Blind spots**: 0 known open weak spots as of 2026-04-20. `as any` casts
-  and `any`-typed flows are handled conservatively (treated as tainted).
+- **Fixtures**: 39 test fixtures plus 42 benchmark cases.
+- **Blind spots**: `as any` casts and `any`-typed flows are handled
+  conservatively (treated as tainted).

 ### Beta tier

-#### Go: 94.1% P / 100% R / 97.0% F1 *(28-case corpus)*
+#### Go: 92.3% P / 96.0% R / 94.1% F1 *(53-case corpus, 2 FPs, 1 FN)*

 - **Rule depth**: 4 source families, 4 sanitizer families, 9 sink matchers
  covering HTML, URL, Shell, SQL, SSRF, Crypto, and File I/O.
 - **Framework context**: Gin, Echo source matchers.
- **Known gaps**: no gated sinks, no deserialization class, allowlist
-  early-return patterns in path-pruning benchmark cases still produce FPs
-  (`go-pathprune-safe-001`). `fmt.Sprintf` is deliberately not a sink.
+- **Open weak spots**: `cve-go-2023-3188-vulnerable` (owncast SSRF) goes
+  undetected, and two safe Go fixtures (`go-safe-007`, `go-safe-009`) draw
+  spurious SQLi and CMDi findings respectively. These are the only
+  imperfect language scores in the current corpus.
+- **Known gaps**: no gated sinks, no deserialization class. `fmt.Sprintf`
+  is deliberately not a sink. Cap coverage is narrower than the Stable
+  tier and argument-role-aware sink modeling is not yet implemented for Go,
+  so production CI gates may surface additional FPs the corpus does not
+  exercise.

-#### Java: 92.9% P / 100% R / 96.3% F1 *(23-case corpus)*
+#### Java: 100% P / 100% R / 100% F1 *(35-case corpus)*

 - **Rule depth**: 3 source families, 8 sanitizer families, 10 sink matchers
  covering HTML, URL, Shell, SQL, Code, SSRF, and Deserialization.
@ -101,10 +114,19 @@ confidence, and modeled idioms.
 - **Known gaps**: no gated sinks. Variable-receiver method calls
  (`client.send(...)` vs `HttpClient.send(...)`) rely on type-qualified
  resolution from receiver-type inference; flows where the receiver type
-  cannot be inferred are missed (`java-ssrf-002` historically persisted as
-  FN; closed via type facts but fragile on unusual builder chains).
+  cannot be inferred are conservatively over-tainted on unusual builder
+  chains.

-#### Ruby: 100% P / 92.3% R / 96.0% F1 *(24-case corpus)*
+#### PHP: 100% P / 100% R / 100% F1 *(37-case corpus)*
+
+- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
+  superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
+  Shell, SQL, Code, SSRF, File I/O, and Deserialization.
+- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
+  methods only). `echo` language-construct detection is wired but its
+  inner-argument propagation is narrower than function-call sinks.
+
+#### Ruby: 100% P / 100% R / 100% F1 *(39-case corpus)*

 - **Rule depth**: 3 source families, 7 sanitizer families, 15 sink matchers
  covering HTML, Shell, SQL, Code, SSRF, File I/O, and Deserialization.
@ -112,154 +134,168 @@ confidence, and modeled idioms.
 - **Known gaps**: string interpolation inside shell and SQL strings is
  recognized structurally but not modeled as a distinct operator.
  `begin/rescue/ensure` exception-edge wiring is documented as deferred
-  (structurally incompatible with `build_try()`). One FN persists on an
-  interprocedural taint propagation case due to rule-ID mismatch, not a
-  missed flow (`rb-interproc-001`).
+  (structurally incompatible with `build_try()`). The previous open
+  `rb-interproc-001` FN closed in the 2026-04-28 baseline after the
+  Ruby `Kernel#open` CMDI sink and exact-match sigil work landed.

-#### PHP: 86.7% P / 100% R / 92.9% F1 *(24-case corpus)*
+#### Rust: 100% P / 100% R / 100% F1 *(70-case adversarial corpus)*

- **Rule depth**: 3 source families (`$_GET`, `$_POST`, `$_REQUEST`
-  superglobals), 7 sanitizer families, 10 sink matchers covering HTML, URL,
-  Shell, SQL, Code, SSRF, File I/O, and Deserialization.
- **Known gaps**: no gated sinks. Limited framework context (Laravel raw
-  methods only). Interprocedural sanitizer-wrapping case
-  (`php-interproc-safe-001`) persists as FP. `echo` language-construct
-  detection is wired but its inner-argument propagation is narrower than
-  function-call sinks.
-
-### Preview tier
-
-C and C++ are labeled **Preview** (not Experimental) to convey a specific
-shape of limitation: the parser and existing rules produce useful findings
-on obvious unsafe-API uses, but the engine **structurally cannot model**
-several pervasive C/C++ constructs. Running Nyx on a C/C++ codebase and
-seeing a clean report should not be read as a clean audit. Pair Nyx with
-clang-tidy, the Clang Static Analyzer, or Infer for production use.
-
-**Not modeled** (common to both C and C++):
-
- Pointer aliasing. Taint through `*p`, `p->field`, arbitrary pointer
-  arithmetic, and aliased writes are not tracked.
- Function pointers and callback dispatch. Indirect calls through
-  `void (*fn)(char *)` resolve to no callee.
- Array-element taint. Writes to `buf[i]` do not propagate taint to `buf`
-  in the general case; structural taint chains involving `fgets` → array →
-  `system` have rule-ID matching issues (`c-cmdi-004`).
- STL container operations (C++ only). `std::vector`, `std::map`,
-  `std::string` methods are not taint-aware; `c_str()` breaks taint chains
-  (`cpp-cmdi-003`).
- Lambdas and nested classes (C++ only). Not modeled.
- Complex socket setup (C++ only). E.g. `connect()` chains are not detected
-  (`cpp-ssrf-002`).
-
-#### C: 85.7% P / 100% R / 92.3% F1 *(20-case corpus)*
-
- **Rule depth**: 3 source families, **2** sanitizer families (prefix-based
-  only), 5 sink matchers spanning Shell, File, SSRF, and Format-String.
- **Known gaps**: no framework rules, no gated sinks. Path-validation via
-  `strstr()` is not recognized as a guard (`c-safe-006`). Forward-declared
-  sanitizers are not tracked (`c-safe-008`).
-
-#### C++: 80.0% P / 100% R / 88.9% F1 *(20-case corpus)*
-
- **Rule depth**: Clones the C ruleset (3 sources, 2 sanitizers, 5 sinks) and
-  adds `std::cin` / `std::getline` sources.
- **Known gaps**: same sanitizer-recognition gaps as C. See the "Not
-  modeled" list above for structural gaps (STL containers, `c_str()`,
-  `connect()`, lambdas, nested classes).
-
-### Experimental tier
-
-#### Rust: 76.0% P / 100% R / 86.4% F1 *(31-case adversarial corpus)*
+Rust holds the largest per-language adversarial corpus and was promoted
+from Experimental to Beta in the 2026-04-25 measurement after the PathFact
+landings closed every previously-open `rs-safe-*` regression.

 - **Rule depth**: 6 source families, **2** sanitizer families (prefix and
  type-coercion), 11 sink matchers covering HTML, Shell, SQL, SSRF,
  Deserialization, and File I/O. Extensive framework source coverage
-  (Axum, Actix, Rocket); the most of any language on the source side.
- **Recent additions (2026-04-20)**: new SQL class (`rusqlite`, `sqlx`,
-  `diesel`, `postgres`), new Deserialization class (`serde_yaml`,
-  `bincode`, `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
+  (Axum, Actix, Rocket); the most of any language on the source side. The
+  narrow sanitizer count is the primary reason Rust is not in the Stable
+  tier. Engine-side path/typed sanitizer recognition (PathFact) compensates,
+  but the ruleset itself is shallow.
+- **Recent additions**: SQL class (`rusqlite`, `sqlx`, `diesel`,
+  `postgres`), Deserialization class (`serde_yaml`, `bincode`,
+  `rmp_serde`, `ciborium`, `ron`, `toml`), expanded file I/O
  (`fs::remove_file/dir/rename/copy`), `reqwest` SSRF builder chain.
- **Known gaps**:
-  - `rs-safe-003`: structural `cfg-unguarded-sink` fires when a tainted
-    variable is *declared* in scope but not used in the sink; intentional
-    for high-risk sinks.
-  - `rs-safe-009`: match-arm guards don't surface as `StmtKind::If`, so
-    `classify_condition` never sees the character-class validation.
-  - `safe_direct_sanitizer.rs`: still FP because the SSA lowering for
-    an OR-chain rejection (`if a || b || c { return X }`) joins both
-    return paths into a single block, losing the early-return
-    semantics.  Distinct from the merged-return-block defect closed in
-    2026-04-24; tracked separately.
- **Closed by the 2026-04-23 PathFact domain**
-  (`src/abstract_interp/path_domain.rs`): `rs-safe-007` (`.replace("..",
-  "")` sanitiser), `rs-safe-008` (negative-validation return pattern),
-  `rs-safe-010` (static-map lookup; still handled by the dedicated
-  static-map analysis, but PathFact does not interfere), new `rs-safe-012`
-  (`.contains("..")` + `.starts_with('/')` intraprocedural rejection),
-  new `rs-safe-015` (`Path::new(p).is_absolute()` typed rejection), plus a
-  new `rs-path-006` negative-guard to prevent over-suppression.
- **Closed by the 2026-04-24 per-return-path PathFact landing**
-  (`PathFactReturnEntry` on `SsaFuncSummary` + structural
-  variant-wrapper transparency + non-data-return skipping +
-  path-fact-proven leaf detection in
-  `trace_tainted_leaf_values`):
-  `rs-safe-014` (Option-returning user sanitiser),
-  new `rs-safe-016` (cross-function `.contains("..")` rejection),
-  `CVE-2018-20997` patched (tar-rs zip-slip),
-  `CVE-2022-36113` patched (cargo `.cargo-ok` symlink),
-  `CVE-2024-24576` patched (BatBadBut argv injection).
+- **Closed by recent PathFact landings**
+  (`src/abstract_interp/path_domain.rs` + per-return-path PathFact entries
+  on `SsaFuncSummary`): `rs-safe-007` (`.replace("..","")` sanitiser),
+  `rs-safe-008` (negative-validation return), `rs-safe-009` (match-arm
+  guards via condition lifting), `rs-safe-010` (static-map lookup),
+  `rs-safe-012` (`.contains("..")` + `.starts_with('/')` rejection),
+  `rs-safe-014` (Option-returning user sanitiser), `rs-safe-015`
+  (`Path::new(p).is_absolute()` typed rejection), `rs-safe-016`
+  (cross-function `.contains("..")` rejection), and CVE patches
+  `CVE-2018-20997`, `CVE-2022-36113`, `CVE-2024-24576`.
 - **Not yet covered**: unsafe FFI / `std::mem::transmute` (no rules), Tokio
  `process::Command` async variants (not distinguished from sync),
-  `hyper` / `surf` / `ureq` SSRF clients (reqwest family only), and Rocket /
-  Actix positive cases (rules exist but no benchmark fixtures yet).
+  `hyper` / `surf` / `ureq` SSRF clients (reqwest family only).
+
+### Preview tier
+
+C and C++ remain **Preview** despite reporting 100% rule-level F1 on the
+synthetic corpus. A run of additions in late April taught the engine to
+follow taint through several constructs that used to be hard cutoffs (STL
+containers, builder chains, inline member functions, the wider `std::sto*`
+family), so the gap between "passes the synthetic corpus" and "would catch
+the same flow on a real codebase" is narrower than it used to be. It is not
+zero. The biggest remaining gaps are deep pointer aliasing and function
+pointers, both of which are pervasive in real C/C++ code. Treat a clean
+report as a starting point, not an audit. Pair Nyx with clang-tidy, the
+Clang Static Analyzer, or Infer for production use.
+
+**What now works** (added in late April):
+
+- STL container flow. `vec.push_back(tainted)` followed by
+  `vec.front().c_str()` carries taint into a downstream `system()` sink.
+  `std::map::insert_or_assign`, `find`, `count`, `at`, and `data` all
+  participate in the container store/load model.
+- Inline class member functions. `class C { void run(...) { ... } };`
+  bodies are now extracted as their own functions, so an intra-file call
+  like `inner.run(input)` resolves to the body summary. Same fix covers
+  `struct_specifier`, `union_specifier`, `enum_specifier`,
+  `template_declaration`, and `extern "C"` blocks.
+- Lambda passthrough. `auto echo = [](const char* s) { return s; };` carries
+  argument taint into the result via the engine's default call-argument
+  propagation.
+- Builder chains. `Socket::builder().host(user).port(8080).connect()`
+  resolves the chained returns and fires on `.connect()` when `user` is
+  tainted; the safe variant with a hardcoded host stays quiet.
+- Wider numeric sanitizer family. The full `std::sto*` set (including
+  `stoll`, `stoull`, `stold`) and the C-stdlib forms (`atoi`, `atof`,
+  `strtol`, etc.) clear all caps when they're called.
+- More header / source extensions. `.cc`, `.cxx`, `.hpp`, `.hxx`, `.hh`,
+  and `.h++` are recognized as C++ on top of `.cpp` and `.c++`. `.h` is
+  intentionally still routed to C since it's ambiguous without a build
+  system.
+
+**Still not modeled** (common to both C and C++):
+
+- Deep pointer aliasing. Taint through `*p`, `p->field`, and arbitrary
+  pointer arithmetic is not tracked through arbitrary aliased writes.
+  Field-sensitive points-to (see [Advanced analysis](advanced-analysis.md))
+  handles the "lock on a sub-field" case but is not a general escape
+  analysis.
+- Function pointers and callback dispatch. An indirect call through
+  `void (*fn)(char *)` resolves to no callee, so cross-pointer flows are
+  invisible.
+- Array-element taint by index. Writes to `buf[i]` do not always propagate
+  taint to `buf` as a whole; the recent subscript-handling work helps the
+  general case but doesn't make `buf` an alias for every element.
+- Nested classes beyond one level (C++ only).
+
+#### C: 100% P / 100% R / 100% F1 *(30-case corpus)*
+
+- **Rule depth**: 3 source families, **2** sanitizer families (the
+  `sanitize_*` prefix and numeric-parse functions), 5 sink matchers spanning
+  Shell, File, SSRF, and Format-String.
+- **Known gaps**: no framework rules, no gated sinks. The structural
+  limitations listed above are the dominant concern; rule additions alone
+  will not lift this language out of the Preview tier.
+
+#### C++: 100% P / 100% R / 100% F1 *(33-case corpus, plus 6 new fixtures for STL / builder / inline-method flows)*
+
+- **Rule depth**: Builds on the C ruleset with `std::cin` / `std::getline`
+  sources and a wider numeric-sanitizer set covering the full `std::sto*`
+  family (3 sources, 3 sanitizer families, 5 sinks).
+- **Known gaps**: still no framework rules and no gated sinks. The
+  structural blind spots are now narrower than they were a release ago
+  (see "What now works" above), but function pointers and the harder
+  pointer-aliasing patterns still produce false negatives.

 ---

 ## How the tiers were assigned

+Because rule-level F1 has saturated for nine of ten languages, the tier
+boundaries are drawn primarily on **rule depth** and **engine coverage of
+real-world idioms** rather than on benchmark scores alone.
+
 A language lands in **Stable** when all three hold:

 - Rule set covers ≥ 8 vulnerability classes with both source and sink
-  matchers, and at least one class has argument-role-aware gating.
+  matchers, and at least one class has argument-role-aware **gated-sink**
+  modeling (e.g. `setAttribute("href", url)` only flags href-like attrs).
 - Benchmark F1 ≥ 95% on a corpus of ≥ 25 cases.
- Advanced analysis (SSA lowering, context-sensitivity, symbolic-execution)
-  is exercised by fixtures for the language.
+- Advanced analysis (SSA lowering, context-sensitivity, symbolic execution,
+  abstract interpretation) is exercised by fixtures for the language.

-A language lands in **Beta** when benchmark F1 ≥ 90% but at least one of the
-Stable criteria fails; usually narrower cap coverage or absence of gated
-sinks.
+A language lands in **Beta** when benchmark F1 is in the mid-90s or higher
+on a meaningful corpus but at least one Stable criterion fails. Typical
+gaps: absence of gated sinks, or sanitizer rule depth narrow enough that
+the engine compensates structurally rather than via the ruleset.

-A language lands in **Preview** when the engine structurally cannot model
-constructs that are pervasive in typical codebases for that language
-(pointer aliasing, function pointers, array-element taint, STL containers
-for C/C++). Pattern-only coverage is useful but not sufficient as a sole
-SAST gate.
+A language lands in **Preview** when the engine has documented structural
+blind spots for constructs that are pervasive in typical codebases for that
+language. For C and C++ that means deep pointer aliasing, function
+pointers, and array-element taint; STL container flow and builder chains
+have moved out of the blind-spot list. Synthetic-corpus F1 is not a
+reliable signal for Preview-tier languages: a clean report can coexist
+with structural gaps.

-A language lands in **Experimental** when rule depth is clearly narrower
-(≤ 5 sinks and ≤ 2 sanitizers), or benchmark F1 < 90%, or documented weak
-spots require engine changes rather than rule additions to close, but the
-engine does not have the pervasive structural blind spots of the Preview
-tier.
+(The previous **Experimental** tier was retired in the 2026-04-25
+measurement when Rust's adversarial corpus reached 100% F1; no language
+currently sits in that tier.)

 ---

 ## What this means for you

 - **CI gates**: safe to set strict `--fail-on HIGH` gates on Stable-tier
-  languages. On Beta-tier, expect occasional FP triage; the weak-spot lists
-  above tell you exactly what to skim for. On Preview- and Experimental-tier,
-  treat Nyx findings as a starting point for manual review rather than
-  authoritative; Preview-tier languages in particular have structural
-  blind spots that a clean report will not disclose.
+  languages. On Beta-tier, expect occasional FP triage on production code
+  (the synthetic corpus does not cover every framework idiom); the
+  weak-spot lists above tell you what to skim for. On Preview-tier, treat
+  Nyx findings as a starting point for manual review rather than
+  authoritative. STL container flow and builder chains are tracked now,
+  but deep pointer aliasing and function pointers are not, so a clean
+  report does not tell you what the engine could not see.
 - **Rule contributions**: the shortest path to raising a language's tier is
  contributing sink matchers and gated-sink registrations. Label files live
  at `src/labels/<lang>.rs`; benchmark cases live at
  `tests/benchmark/corpus/<lang>/`.
- **Scope planning**: if your primary stack is C, C++, or Rust, Nyx will
-  surface real findings, but you should budget for review time and consider
-  combining Nyx with a language-specific tool (e.g. `cargo-audit`,
-  `clang-tidy`) until those tiers mature.
+- **Scope planning**: if your primary stack is C or C++, Nyx will surface
+  real findings on obvious unsafe-API uses, but budget for review time and
+  combine Nyx with `clang-tidy` or the Clang Static Analyzer. Rust is now
+  Beta-tier and suitable as a CI gate; pair with `cargo-audit` for
+  dependency CVEs.

 The benchmark thresholds in `tests/benchmark_test.rs` are deliberately set
 ~5 pp below current baselines so any drop in a language's F1 fails CI. Tier
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@ -10,7 +10,7 @@ First run builds a SQLite index under `.nyx/`; later runs skip files whose conte

 ## What a finding looks like

-<p align="center"><img src="../assets/screenshots/docs/cli-scan-quickstart.png" alt="nyx scan output: two HIGH taint flows (Python os.system, JavaScript document.write) framed by the brand purple gradient" width="900"/></p>
+<p align="center"><img src="../assets/screenshots/cli-scan.png" alt="nyx scan output: HIGH taint flows from req.params.user, req.query.url, and req.query.path into exec/fetch/fs.readFileSync, framed by the brand purple gradient" width="900"/></p>

 The same scan in console form:

@ -23,7 +23,7 @@ The same scan in console form:
      Sink:   os.system

  6:5  ✖ [HIGH] py.cmdi.os_system  (Score: 64, Confidence: High)
-      Os.system() — shell command execution
+      os.system() runs a shell command

 /tmp/demo/xss_document_write.js
  5:5  ✖ [HIGH] taint-unsanitised-flow (source 3:18)  (Score: 81, Confidence: High)
@ -33,7 +33,7 @@ The same scan in console form:
      Sink:   document.write

  5:5  ⚠ [MEDIUM] js.xss.document_write  (Score: 34, Confidence: High)
-      Document.write() — XSS sink
+      document.write() is an XSS sink

 warning 'demo' generated 10 issues.
 Finished in 0.054s.
--- a/docs/serve.md
+++ b/docs/serve.md
@ -46,6 +46,44 @@ If you forward the port over SSH or expose it through a reverse proxy, the host-

 The numeric `:id` for finding URLs is the position index in the current scan, not a stable fingerprint. Bookmarks across scans aren't reliable; rely on file path + line.

+### Overview and Health Score
+
+The overview is the landing page after a scan. Severity counts, top affected files, OWASP coverage, and a 0 to 100 Health Score with a letter grade.
+
+#### How the Health Score is calculated
+
+Two things drive the score. The density of risk in the codebase, and hard guardrails that decide what the grade can mean.
+
+Each finding contributes weight = `severity_base × confidence_factor × verdict_factor × context_factor`:
+
+- Severity base: HIGH 10, MEDIUM 3, LOW (security) 0.5
+- Confidence: High 1.0, Medium 0.6, Low 0.3
+- Symex verdict: Confirmed 1.2, NotAttempted 1.0, Inconclusive 0.7, Infeasible 0.1
+- Context: cross-file taint flow 1.15, intra-file flow 1.0, AST-only or no flow 0.75, test path 0.3
+
+Quality lints (rule IDs containing `.quality.`) skip the per-finding weight and instead apply a saturating drag, capped at 15 points (so 1000 unwrap lints don't grade worse than 300 do). Total weight gets divided by `sqrt(files / 100)`, clamped between 1 and roughly 22, so a 100-file repo and a 50000-file repo see different denominators but a monorepo can't dilute its way out of a real HIGH.
+
+The result feeds a log curve into a 0 to 100 base, minus the quality drag. Then HIGH guardrails apply, keyed on the *credibility-adjusted* HIGH count rather than the raw count:
+
+| effective HIGH | ceiling |
+|---|---|
+| 0 | 100 |
+| 1 | 85 |
+| 2 | 78 |
+| 3 to 5 | 68 |
+| 6 to 10 | 58 |
+| 11+ | 45 |
+
+A repo with zero effective HIGHs never grades below C 70. That floor is the structural promise that the score isn't an automated F-machine for projects that have lots of LOW noise but no critical issues.
+
+Modifiers in the ±5 range nudge the result for trend (only after the second scan), triage coverage (only when total findings ≥ 20), reintroduced findings, and stale HIGHs more than 30 days old.
+
+#### What the score doesn't measure
+
+It's a Nyx-finding-pressure metric, not a security audit. Score 100 means Nyx didn't find anything under its current rules and language coverage; it doesn't certify the absence of vulnerabilities. The score doesn't see runtime config, IAM, secret stores, dependency CVEs, or anything outside the source tree being scanned. A repo of mostly Kotlin (where Nyx coverage is thin) will score artificially well because most of the code never gets evaluated.
+
+The current ceilings are calibrated for v0.5 scanner false-positive rates. As symex coverage and rule precision improve, the ceilings tighten. Calibration data and the rationale behind each tunable lives in [health-score-audit.md](health-score-audit.md).
+
 ### Findings and Finding detail

 The findings list is filterable by severity, confidence, category, language, rule ID, and triage state.