mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-06 19:35:13 +02:00
* feat: Add const_bound_vars tracking to prevent false positives in ownership checks
* feat: Introduce field interner and typed bounded vars for enhanced type tracking
* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking
* feat: Centralize method name extraction with bare_method_name helper
* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch
* feat: Enhance C++ taint tracking with additional container operations and inline method resolution
* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking
* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis
* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations
* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details
* test: Add comprehensive tests for lattice algebra laws and SSA edge cases
* feat: Add destructured session user handling and safe user ID access patterns
* feat: Implement row-population reverse-walk for enhanced authorization checks
* feat: Enhance authorization checks with local alias chain for self-actor types
* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction
* feat: Implement chained method call inner-gate rebinding for SSRF prevention
* feat: Add observability and error modules, enhance debug functionality, and implement theme context
* feat: Remove Auth Analysis page and update navigation to redirect to Explorer
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor
* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity
* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build
The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(closure-capture): flip JS/TS fixtures to required-finding
The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.
Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".
Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis
* feat: Introduce health module and enhance health score computation with calibration tests
* feat: Add expectations configuration and cleanup .gitignore for log files
* feat: Implement theme selection and enhance settings panel for triage sync
* feat: Suppress false positives for strcpy calls with literal sources in AST
* feat: Update analyse_function_ssa to return body CFG for accurate analysis
* feat: Add bug report and feature request templates for improved issue tracking
* feat: removed dev scripts
* feat: update README.md for clarity and consistency in fixture descriptions
* feat: removed dev docs
* feat: clean up error handling and UI elements for improved user experience
* feat: adjust button sizes in HeaderBar for better UI consistency
* feat: enhance taint analysis with additional context for sanitizer and taint findings
* cargo fmt
* prettier
* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts
* feat: add script to frame PNG screenshots with brand gradient
* feat: add fuzzing support with new targets and CI workflows
* refactor: streamline match expressions and improve formatting in CLI and output handling
* feat: enhance configuration display with detailed output options
* feat: stage demo configuration for improved CLI screenshot output
* feat: expose merge_configs function for user-configurable settings
* refactor: simplify code structure and improve readability in config handling
* refactor: improve descriptions for vulnerability patterns in various languages
* feat: update MIT License section with additional usage details and copyright information
* feat: update screenshots
* refactor: update build process and paths for frontend assets
* feat: add cross-file taint fuzzing target and supporting dictionary
* refactor: clean up formatting and comments in fuzz configuration and example files
* refactor: remove outdated comments and clean up CI configuration files
* chore: update changelog dates and improve formatting in documentation
* refactor: update Cargo.toml and CI configuration for improved packaging and build process
* refactor: enhance quote-stripping logic to prevent panics and add regression tests
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
296 lines
13 KiB
Markdown
296 lines
13 KiB
Markdown
# Advanced Analysis
|
||
|
||
Nyx layers several analysis passes on top of the core SSA taint engine.
|
||
Most are switchable via config (`[analysis.engine]` in `nyx.conf` /
|
||
`nyx.local`), a matching CLI flag pair, or, as a last-resort override for
|
||
library users with no CLI entry point, a `NYX_*` environment variable. The
|
||
five precision-tuning passes (abstract interpretation, context sensitivity,
|
||
symbolic execution, constraint solving, field-sensitive points-to) are
|
||
**on by default** because the benchmark numbers in
|
||
[language-maturity.md](language-maturity.md) are measured with them on.
|
||
The demand-driven backwards walk and hierarchy fan-out sit alongside but
|
||
are not user-toggleable in the same way.
|
||
|
||
See [`Configuration`](configuration.md#analysisengine) for the full config
|
||
surface and CLI flag table. This page explains what each pass does, why it
|
||
helps, how to disable it, and what it does not cover.
|
||
|
||
---
|
||
|
||
## Abstract interpretation
|
||
|
||
**What it does.** Propagates interval and string abstract domains through the
|
||
SSA worklist alongside taint. Integer values carry `[lo, hi]` bounds;
|
||
string values carry a prefix and suffix (plus a bit domain for known-zero /
|
||
known-one bits). Values are joined at merge points and widened at loop
|
||
heads so the worklist always terminates.
|
||
|
||
**Why it helps.** Lets Nyx suppress some findings that are obviously safe
|
||
given the abstract value; a proven-bounded integer does not flow into a
|
||
SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a
|
||
trusted host stays quiet. This turns a large class of FPs on numeric and
|
||
locked-prefix paths into true negatives.
|
||
|
||
**How to turn it off.**
|
||
|
||
| Surface | Value |
|
||
|---|---|
|
||
| Config | `abstract_interpretation = false` under `[analysis.engine]` |
|
||
| CLI flag | `--no-abstract-interp` |
|
||
| Env var (legacy) | `NYX_ABSTRACT_INTERP=0` |
|
||
|
||
**Limitations.** The interval domain is 64-bit signed; very wide or
|
||
overflow-producing arithmetic degrades to `⊤` (unbounded). String prefix /
|
||
suffix tracking is concat-only; it does not model reordering, reversal, or
|
||
character-level regex constraints. Loop widening deliberately drops
|
||
changing bounds rather than chasing fixpoints.
|
||
|
||
**Source**: [`src/abstract_interp/`](https://github.com/elicpeter/nyx/tree/master/src/abstract_interp/).
|
||
|
||
---
|
||
|
||
## Context-sensitive analysis
|
||
|
||
**What it does.** Adds k=1 call-site-sensitive taint propagation for
|
||
intra-file callees. When a function is invoked, Nyx reanalyzes the callee
|
||
body with the actual per-argument taint signature of the call site,
|
||
producing call-site-specific return taint. Results are cached by
|
||
`(function_name, ArgTaintSig)` so repeated calls with the same signature
|
||
are free.
|
||
|
||
**Why it helps.** A helper called once with a tainted argument and once
|
||
with a sanitized argument produces two different findings; without k=1
|
||
sensitivity, the conservative union of both call sites would be applied
|
||
to the sanitized call, producing a spurious finding there.
|
||
|
||
**How to turn it off.**
|
||
|
||
| Surface | Value |
|
||
|---|---|
|
||
| Config | `context_sensitive = false` under `[analysis.engine]` |
|
||
| CLI flag | `--no-context-sensitive` |
|
||
| Env var (legacy) | `NYX_CONTEXT_SENSITIVE=0` |
|
||
|
||
**Limitations.** Intra-file only. Cross-file callees are resolved via
|
||
summaries (see `src/summary/`) rather than re-inlined. Depth is capped at
|
||
k=1 to prevent cache blow-up and re-entrancy; higher k would require a
|
||
different cache key design. Callee bodies larger than the internal
|
||
`MAX_INLINE_BLOCKS` threshold fall back to the summary path. Cache keys
|
||
hash per-argument `Cap` bits but not source-origin identity, so two
|
||
callers with identical caps but different origins share cached
|
||
origin-attribution.
|
||
|
||
**Source**: [`src/taint/ssa_transfer.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/ssa_transfer.rs)
|
||
(`ArgTaintSig`, `InlineCache`, `inline_analyse_callee`).
|
||
|
||
---
|
||
|
||
## Field-sensitive points-to
|
||
|
||
**What it does.** Runs a Steensgaard-style alias analysis that interns field
|
||
accesses as their own abstract locations. `c.mu` becomes `Field(c, mu)`,
|
||
distinct from `c` itself; a write to `obj.cache` and a read from
|
||
`obj.cache` in different methods both land on the same abstract location;
|
||
subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic
|
||
`__index_get__` / `__index_set__` calls so the engine can model them
|
||
through the same container store/load primitives used for STL containers,
|
||
Python lists, JS arrays, and similar.
|
||
|
||
**Why it helps.** It splits a class of false positives that the
|
||
whole-variable taint model produced. Before this pass, `obj.field =
|
||
tainted; sink(obj.other_field)` would taint `obj` as a whole and fire on
|
||
the safe field; the receiver-type / sub-field distinction is also what
|
||
lets the resource-lifecycle pass attribute a `c.mu.Lock()` to the lock
|
||
field rather than to its container. Cross-method field flow (writer in
|
||
one method, reader in another) shows up only when fields have stable
|
||
identity independent of the parent value.
|
||
|
||
**How to turn it off.**
|
||
|
||
| Surface | Value |
|
||
|---|---|
|
||
| Env var | `NYX_POINTER_ANALYSIS=0` |
|
||
|
||
The pass is **on by default** as of 2026-04-26. The env-var override is
|
||
kept for one release so you can compare against the pre-pointer baseline,
|
||
then will be removed.
|
||
|
||
**Limitations.** This is not a general escape analysis. Function pointers
|
||
and arbitrary indirect calls still resolve to no callee, and deep alias
|
||
chains through `*p` / `p->field` in C/C++ are not tracked beyond the
|
||
direct field case. The points-to set per value is capped at
|
||
`--max-pointsto` (default 32); when truncation happens, an engine note
|
||
records the precision loss.
|
||
|
||
**Source**: [`src/pointer/`](https://github.com/elicpeter/nyx/tree/master/src/pointer/).
|
||
|
||
---
|
||
|
||
## Hierarchy fan-out for virtual dispatch
|
||
|
||
**What it does.** Builds a per-language type-hierarchy index in pass 1
|
||
(extends, implements, impl-for, includes; the exact construct depends on
|
||
the language) and uses it in pass 2 to widen method-call resolution. When
|
||
a call's receiver is statically typed as a super-class, trait, or
|
||
interface, the resolver returns every concrete implementer it has seen
|
||
in the codebase rather than just the first match.
|
||
|
||
**Why it helps.** Without it, a call like `repository.findById(id)` where
|
||
`repository` is typed as the interface gets resolved against whatever the
|
||
single-result resolver finds first; if the matching implementer is in
|
||
another file the call effectively goes opaque. With the hierarchy, the
|
||
taint engine sees the union of every implementer's transform and the
|
||
flow shows up regardless of which file holds the concrete class.
|
||
|
||
**Limitations.** Fan-out is capped at 8 implementers per call site; over
|
||
that, the tail is silently dropped (a debug log records the cap hit) and
|
||
the call is treated as a non-deterministic union of the kept
|
||
implementers. Languages that use structural / implicit interface
|
||
satisfaction (Go) are deliberately skipped because per-file extraction
|
||
is intractable; those calls fall back to the single-result resolver. The
|
||
extractor covers Java, Rust, TS/JS/TSX, Python, Ruby, PHP, and C++.
|
||
|
||
**Source**: [`src/cfg/hierarchy.rs`](https://github.com/elicpeter/nyx/blob/master/src/cfg/hierarchy.rs)
|
||
and [`src/summary/mod.rs`](https://github.com/elicpeter/nyx/blob/master/src/summary/mod.rs)
|
||
(`TypeHierarchyIndex`, `resolve_callee_widened`).
|
||
|
||
---
|
||
|
||
## Symbolic execution
|
||
|
||
**What it does.** Builds a symbolic expression tree per tainted SSA value,
|
||
generates a witness string for each taint finding (the concrete-looking
|
||
shape of the dangerous value at the sink), and detects sanitization
|
||
patterns that the taint engine alone would miss. Supports string
|
||
operations (`trim`, `replace`, `toLower`, `substring`, `strlen`, …),
|
||
arithmetic, concatenation, phi nodes, and opaque calls.
|
||
|
||
**Why it helps.** Raises finding quality. A taint finding with a rendered
|
||
witness like `"SELECT * FROM t WHERE id=" + userInput` is substantially
|
||
easier to triage than one without. Also powers some confidence-gating for
|
||
downstream display.
|
||
|
||
**How to turn it off.**
|
||
|
||
| Surface | Value |
|
||
|---|---|
|
||
| Config | `symex.enabled = false` under `[analysis.engine]` |
|
||
| CLI flag | `--no-symex` |
|
||
| Env var (legacy) | `NYX_SYMEX=0` |
|
||
|
||
Two nested switches refine the scope without disabling symex entirely:
|
||
|
||
| Setting | CLI | Env | Default | Effect |
|
||
|---|---|---|---|---|
|
||
| `symex.cross_file` | `--no-cross-file-symex` | `NYX_CROSS_FILE_SYMEX=0` | on | Consult cross-file SSA bodies so symex can reason about callees defined in other files |
|
||
| `symex.interprocedural` | `--no-symex-interproc` | `NYX_SYMEX_INTERPROC=0` | on | Intra-file interprocedural symex (k ≥ 2 via frame stack) |
|
||
|
||
**Limitations.** Expression trees are bounded at `MAX_EXPR_DEPTH=32`;
|
||
deeper expressions degrade to `Unknown` rather than growing unboundedly.
|
||
Sanitizer detection is informational: string-replace sanitizer patterns
|
||
are reported as witness metadata, not used to clear taint.
|
||
|
||
**Source**: [`src/symex/`](https://github.com/elicpeter/nyx/tree/master/src/symex/).
|
||
|
||
---
|
||
|
||
## Demand-driven analysis
|
||
|
||
**What it does.** After the forward pass-2 taint analysis finishes, runs a
|
||
*backwards* walk from each sink's tainted SSA operands. The walk follows
|
||
reverse SSA-edge transfer (phi fan-out, `Assign` operand-fanout, `Call`
|
||
body-expansion or arg-fanout) until it reaches a taint source, proves
|
||
the flow infeasible via an accumulated path predicate, or exhausts its
|
||
budget. Each forward finding is then annotated with the aggregate verdict:
|
||
|
||
- `backwards-confirmed`; a matching source was reached. Finding picks
|
||
up a small confidence boost and the note appears in
|
||
`evidence.symbolic.cutoff_notes`.
|
||
- `backwards-infeasible`; every walk proved the flow unreachable.
|
||
Finding is capped to Low confidence and a user-readable limiter is
|
||
attached.
|
||
- `backwards-budget-exhausted`; the walk hit `BACKWARDS_VALUE_BUDGET`
|
||
without a verdict. Recorded as a limiter so operators can see when
|
||
the pass could not keep up.
|
||
- Inconclusive outcomes are a no-op: the forward finding is untouched.
|
||
|
||
Because the backwards walk can consult `GlobalSummaries.bodies_by_key`
|
||
(populated by the cross-file callee body persistence layer) it closes
|
||
across file boundaries; when a callee body is not loadable the walk
|
||
falls back to fanning out over the call's arguments so local reach-back
|
||
is still possible.
|
||
|
||
**Why it helps.** Inverts the analysis direction so budget follows
|
||
questions the scanner actually cares about; "does any source reach
|
||
*this* sink?"; instead of proving every potential source-to-sink
|
||
path. Corroborated findings are a stronger signal than forward-only
|
||
ones, and proven-infeasible flows provide a principled way to lower
|
||
confidence on forward false positives without silently dropping them.
|
||
|
||
**How to turn it on.** Defaults off so the benchmark floor is preserved
|
||
while the pass stabilises.
|
||
|
||
| Surface | Value |
|
||
|---|---|
|
||
| Config | `backwards_analysis = true` under `[analysis.engine]` |
|
||
| CLI flag | `--backwards-analysis` / `--no-backwards-analysis` |
|
||
| Env var (legacy) | `NYX_BACKWARDS=1` |
|
||
|
||
**Limitations (first cut).** Reverse call-graph expansion past a
|
||
`ReachedParam` is deferred; the walk terminates at function parameters
|
||
rather than crossing back into callers. Path-constraint pruning is
|
||
conservative: only the accumulated `PredicateSummary` bits are consulted,
|
||
not the full symbolic predicate stack. Depth-bounded at k=2 for
|
||
cross-function body expansion. See `DEFAULT_BACKWARDS_DEPTH`,
|
||
`BACKWARDS_VALUE_BUDGET`, and `MAX_BACKWARDS_CALLEE_BLOCKS` in
|
||
`src/taint/backwards.rs` for the exact bounds.
|
||
|
||
**Source**: [`src/taint/backwards.rs`](https://github.com/elicpeter/nyx/blob/master/src/taint/backwards.rs).
|
||
|
||
---
|
||
|
||
## Constraint solving
|
||
|
||
**What it does.** Collects path constraints at each branch in SSA and
|
||
propagates them alongside taint. Prunes paths whose accumulated constraint
|
||
set is unsatisfiable; a taint flow guarded by `if x < 0 && x > 10` is
|
||
dropped rather than surfaced. Optionally delegates the satisfiability
|
||
check to Z3 when Nyx is built with the `smt` Cargo feature.
|
||
|
||
**Why it helps.** Removes a class of FPs rooted in clearly-infeasible
|
||
control-flow combinations. Without path constraints, a taint flow that
|
||
only occurs when mutually-exclusive branches are simultaneously taken can
|
||
still produce a finding.
|
||
|
||
**How to turn it off.**
|
||
|
||
| Surface | Value |
|
||
|---|---|
|
||
| Config | `constraint_solving = false` under `[analysis.engine]` |
|
||
| CLI flag | `--no-constraint-solving` |
|
||
| Env var (legacy) | `NYX_CONSTRAINT=0` |
|
||
|
||
The SMT backend is a separate switch:
|
||
|
||
| Setting | CLI | Env | Default | Effect |
|
||
|---|---|---|---|---|
|
||
| `symex.smt` | `--no-smt` | `NYX_SMT=0` | on when built with `smt` feature | Delegate satisfiability checks to Z3; ignored if Nyx was built without `smt` |
|
||
|
||
**Limitations.** The default path-constraint domain is syntactic;
|
||
trivially-inconsistent pairs are caught without an SMT solver, but richer
|
||
algebraic unsatisfiability requires the `smt` feature (Z3). Without `smt`,
|
||
Nyx ships a lightweight satisfiability check that catches literal
|
||
contradictions but not deeper reasoning.
|
||
|
||
**Source**: [`src/constraint/`](https://github.com/elicpeter/nyx/tree/master/src/constraint/).
|
||
|
||
---
|
||
|
||
## Combining the switches
|
||
|
||
The defaults (all on) are the configuration Nyx is benchmarked against.
|
||
Turning any switch off trades precision for speed and may move findings
|
||
relative to the published baseline; CI regression gates assume defaults.
|
||
If you need a minimal-overhead scan (for very large repositories or a
|
||
pre-commit fast path), the AST-only scan mode (`--mode ast`) skips CFG,
|
||
taint, and all four advanced passes entirely and is the right tool.
|