nyx/docs/advanced-analysis.md
Eli Peter 82f18184b1
Prerelease cleanup (#46)
* feat: Add const_bound_vars tracking to prevent false positives in ownership checks

* feat: Introduce field interner and typed bounded vars for enhanced type tracking

* feat: Add typed_call_receivers and typed_bounded_dto_fields for enhanced type tracking

* feat: Centralize method name extraction with bare_method_name helper

* feat: Implement Phase-6 hierarchy fan-out for runtime virtual dispatch

* feat: Enhance C++ taint tracking with additional container operations and inline method resolution

* feat: Introduce field-sensitive points-to analysis for enhanced resource tracking

* feat: Implement Pointer-Phase 6 subscript handling for enhanced container analysis

* test: Add comprehensive tests for JavaScript control flow constructs and lattice operations

* docs: Update advanced analysis documentation with field-sensitive points-to and hierarchy fan-out details

* test: Add comprehensive tests for lattice algebra laws and SSA edge cases

* feat: Add destructured session user handling and safe user ID access patterns

* feat: Implement row-population reverse-walk for enhanced authorization checks

* feat: Enhance authorization checks with local alias chain for self-actor types

* feat: Introduce ActiveRecord query safety checks and enhance snippet extraction

* feat: Implement chained method call inner-gate rebinding for SSRF prevention

* feat: Add observability and error modules, enhance debug functionality, and implement theme context

* feat: Remove Auth Analysis page and update navigation to redirect to Explorer

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Optimize SSA lowering by sharing results between taint engine and artifact extractor

* feat: Reset path-safe-suppressed spans before lowering to maintain analysis integrity

* fix(ssa): ungate debug_assert_bfs_ordering for release-tests build

The helper at src/ssa/lower.rs was gated `#[cfg(debug_assertions)]` while
the unit test at the bottom of the file was gated only `#[cfg(test)]`.
Since `cfg(test)` is set in release builds with `--tests` but
`cfg(debug_assertions)` is not, `cargo build --release --tests` failed
with E0425. Removing the gate fixes the build; the body is `debug_assert!`
only, so the helper is free in release. Also drop the gate at the call
site to avoid a `dead_code` warning when the lib is built without
`--tests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(closure-capture): flip JS/TS fixtures to required-finding

The JS and TS closure-capture fixtures pinned the old broken behaviour
via `forbidden_findings: [{ "id_prefix": "taint-" }]`. The engine now
correctly traces taint through the closure boundary (env source captured
by an arrow function, sunk via `child_process.exec` inside the body), so
the formerly-forbidden finding is a true positive.

Match the Python sibling's shape — `required_findings` with
`id_prefix` + `min_count` plus a small `noise_budget` — and rewrite the
companion READMEs and the phase8_fragility_tests doc-comments from
"known gap" to "regression guard".

Verified:
- cargo test --release --test phase8_fragility_tests → 8/8 pass
- cargo test --release --lib bfs_assertion → pass
- corpus benchmark F1 = 0.9976 (TP=205, FP=1, FN=0) — unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: Add OWASP mapping and baseline mutation hooks for enhanced security analysis

* feat: Introduce health module and enhance health score computation with calibration tests

* feat: Add expectations configuration and cleanup .gitignore for log files

* feat: Implement theme selection and enhance settings panel for triage sync

* feat: Suppress false positives for strcpy calls with literal sources in AST

* feat: Update analyse_function_ssa to return body CFG for accurate analysis

* feat: Add bug report and feature request templates for improved issue tracking

* feat: removed dev scripts

* feat: update README.md for clarity and consistency in fixture descriptions

* feat: removed dev docs

* feat: clean up error handling and UI elements for improved user experience

* feat: adjust button sizes in HeaderBar for better UI consistency

* feat: enhance taint analysis with additional context for sanitizer and taint findings

* cargo fmt

* prettier

* refactor: simplify conditional checks and improve code readability in AST and screenshot capture scripts

* feat: add script to frame PNG screenshots with brand gradient

* feat: add fuzzing support with new targets and CI workflows

* refactor: streamline match expressions and improve formatting in CLI and output handling

* feat: enhance configuration display with detailed output options

* feat: stage demo configuration for improved CLI screenshot output

* feat: expose merge_configs function for user-configurable settings

* refactor: simplify code structure and improve readability in config handling

* refactor: improve descriptions for vulnerability patterns in various languages

* feat: update MIT License section with additional usage details and copyright information

* feat: update screenshots

* refactor: update build process and paths for frontend assets

* feat: add cross-file taint fuzzing target and supporting dictionary

* refactor: clean up formatting and comments in fuzz configuration and example files

* refactor: remove outdated comments and clean up CI configuration files

* chore: update changelog dates and improve formatting in documentation

* refactor: update Cargo.toml and CI configuration for improved packaging and build process

* refactor: enhance quote-stripping logic to prevent panics and add regression tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:58:38 -04:00

13 KiB
Raw Permalink Blame History

Advanced Analysis

Nyx layers several analysis passes on top of the core SSA taint engine. Most are switchable via config ([analysis.engine] in nyx.conf / nyx.local), a matching CLI flag pair, or, as a last-resort override for library users with no CLI entry point, a NYX_* environment variable. The five precision-tuning passes (abstract interpretation, context sensitivity, symbolic execution, constraint solving, field-sensitive points-to) are on by default because the benchmark numbers in language-maturity.md are measured with them on. The demand-driven backwards walk and hierarchy fan-out sit alongside but are not user-toggleable in the same way.

See Configuration for the full config surface and CLI flag table. This page explains what each pass does, why it helps, how to disable it, and what it does not cover.


Abstract interpretation

What it does. Propagates interval and string abstract domains through the SSA worklist alongside taint. Integer values carry [lo, hi] bounds; string values carry a prefix and suffix (plus a bit domain for known-zero / known-one bits). Values are joined at merge points and widened at loop heads so the worklist always terminates.

Why it helps. Lets Nyx suppress some findings that are obviously safe given the abstract value; a proven-bounded integer does not flow into a SQL sink as an injection risk; an SSRF sink whose URL prefix is locked to a trusted host stays quiet. This turns a large class of FPs on numeric and locked-prefix paths into true negatives.

How to turn it off.

Surface Value
Config abstract_interpretation = false under [analysis.engine]
CLI flag --no-abstract-interp
Env var (legacy) NYX_ABSTRACT_INTERP=0

Limitations. The interval domain is 64-bit signed; very wide or overflow-producing arithmetic degrades to (unbounded). String prefix / suffix tracking is concat-only; it does not model reordering, reversal, or character-level regex constraints. Loop widening deliberately drops changing bounds rather than chasing fixpoints.

Source: src/abstract_interp/.


Context-sensitive analysis

What it does. Adds k=1 call-site-sensitive taint propagation for intra-file callees. When a function is invoked, Nyx reanalyzes the callee body with the actual per-argument taint signature of the call site, producing call-site-specific return taint. Results are cached by (function_name, ArgTaintSig) so repeated calls with the same signature are free.

Why it helps. A helper called once with a tainted argument and once with a sanitized argument produces two different findings; without k=1 sensitivity, the conservative union of both call sites would be applied to the sanitized call, producing a spurious finding there.

How to turn it off.

Surface Value
Config context_sensitive = false under [analysis.engine]
CLI flag --no-context-sensitive
Env var (legacy) NYX_CONTEXT_SENSITIVE=0

Limitations. Intra-file only. Cross-file callees are resolved via summaries (see src/summary/) rather than re-inlined. Depth is capped at k=1 to prevent cache blow-up and re-entrancy; higher k would require a different cache key design. Callee bodies larger than the internal MAX_INLINE_BLOCKS threshold fall back to the summary path. Cache keys hash per-argument Cap bits but not source-origin identity, so two callers with identical caps but different origins share cached origin-attribution.

Source: src/taint/ssa_transfer.rs (ArgTaintSig, InlineCache, inline_analyse_callee).


Field-sensitive points-to

What it does. Runs a Steensgaard-style alias analysis that interns field accesses as their own abstract locations. c.mu becomes Field(c, mu), distinct from c itself; a write to obj.cache and a read from obj.cache in different methods both land on the same abstract location; subscript reads and writes (arr[i], map[k] = v) lower to synthetic __index_get__ / __index_set__ calls so the engine can model them through the same container store/load primitives used for STL containers, Python lists, JS arrays, and similar.

Why it helps. It splits a class of false positives that the whole-variable taint model produced. Before this pass, obj.field = tainted; sink(obj.other_field) would taint obj as a whole and fire on the safe field; the receiver-type / sub-field distinction is also what lets the resource-lifecycle pass attribute a c.mu.Lock() to the lock field rather than to its container. Cross-method field flow (writer in one method, reader in another) shows up only when fields have stable identity independent of the parent value.

How to turn it off.

Surface Value
Env var NYX_POINTER_ANALYSIS=0

The pass is on by default as of 2026-04-26. The env-var override is kept for one release so you can compare against the pre-pointer baseline, then will be removed.

Limitations. This is not a general escape analysis. Function pointers and arbitrary indirect calls still resolve to no callee, and deep alias chains through *p / p->field in C/C++ are not tracked beyond the direct field case. The points-to set per value is capped at --max-pointsto (default 32); when truncation happens, an engine note records the precision loss.

Source: src/pointer/.


Hierarchy fan-out for virtual dispatch

What it does. Builds a per-language type-hierarchy index in pass 1 (extends, implements, impl-for, includes; the exact construct depends on the language) and uses it in pass 2 to widen method-call resolution. When a call's receiver is statically typed as a super-class, trait, or interface, the resolver returns every concrete implementer it has seen in the codebase rather than just the first match.

Why it helps. Without it, a call like repository.findById(id) where repository is typed as the interface gets resolved against whatever the single-result resolver finds first; if the matching implementer is in another file the call effectively goes opaque. With the hierarchy, the taint engine sees the union of every implementer's transform and the flow shows up regardless of which file holds the concrete class.

Limitations. Fan-out is capped at 8 implementers per call site; over that, the tail is silently dropped (a debug log records the cap hit) and the call is treated as a non-deterministic union of the kept implementers. Languages that use structural / implicit interface satisfaction (Go) are deliberately skipped because per-file extraction is intractable; those calls fall back to the single-result resolver. The extractor covers Java, Rust, TS/JS/TSX, Python, Ruby, PHP, and C++.

Source: src/cfg/hierarchy.rs and src/summary/mod.rs (TypeHierarchyIndex, resolve_callee_widened).


Symbolic execution

What it does. Builds a symbolic expression tree per tainted SSA value, generates a witness string for each taint finding (the concrete-looking shape of the dangerous value at the sink), and detects sanitization patterns that the taint engine alone would miss. Supports string operations (trim, replace, toLower, substring, strlen, …), arithmetic, concatenation, phi nodes, and opaque calls.

Why it helps. Raises finding quality. A taint finding with a rendered witness like "SELECT * FROM t WHERE id=" + userInput is substantially easier to triage than one without. Also powers some confidence-gating for downstream display.

How to turn it off.

Surface Value
Config symex.enabled = false under [analysis.engine]
CLI flag --no-symex
Env var (legacy) NYX_SYMEX=0

Two nested switches refine the scope without disabling symex entirely:

Setting CLI Env Default Effect
symex.cross_file --no-cross-file-symex NYX_CROSS_FILE_SYMEX=0 on Consult cross-file SSA bodies so symex can reason about callees defined in other files
symex.interprocedural --no-symex-interproc NYX_SYMEX_INTERPROC=0 on Intra-file interprocedural symex (k ≥ 2 via frame stack)

Limitations. Expression trees are bounded at MAX_EXPR_DEPTH=32; deeper expressions degrade to Unknown rather than growing unboundedly. Sanitizer detection is informational: string-replace sanitizer patterns are reported as witness metadata, not used to clear taint.

Source: src/symex/.


Demand-driven analysis

What it does. After the forward pass-2 taint analysis finishes, runs a backwards walk from each sink's tainted SSA operands. The walk follows reverse SSA-edge transfer (phi fan-out, Assign operand-fanout, Call body-expansion or arg-fanout) until it reaches a taint source, proves the flow infeasible via an accumulated path predicate, or exhausts its budget. Each forward finding is then annotated with the aggregate verdict:

  • backwards-confirmed; a matching source was reached. Finding picks up a small confidence boost and the note appears in evidence.symbolic.cutoff_notes.
  • backwards-infeasible; every walk proved the flow unreachable. Finding is capped to Low confidence and a user-readable limiter is attached.
  • backwards-budget-exhausted; the walk hit BACKWARDS_VALUE_BUDGET without a verdict. Recorded as a limiter so operators can see when the pass could not keep up.
  • Inconclusive outcomes are a no-op: the forward finding is untouched.

Because the backwards walk can consult GlobalSummaries.bodies_by_key (populated by the cross-file callee body persistence layer) it closes across file boundaries; when a callee body is not loadable the walk falls back to fanning out over the call's arguments so local reach-back is still possible.

Why it helps. Inverts the analysis direction so budget follows questions the scanner actually cares about; "does any source reach this sink?"; instead of proving every potential source-to-sink path. Corroborated findings are a stronger signal than forward-only ones, and proven-infeasible flows provide a principled way to lower confidence on forward false positives without silently dropping them.

How to turn it on. Defaults off so the benchmark floor is preserved while the pass stabilises.

Surface Value
Config backwards_analysis = true under [analysis.engine]
CLI flag --backwards-analysis / --no-backwards-analysis
Env var (legacy) NYX_BACKWARDS=1

Limitations (first cut). Reverse call-graph expansion past a ReachedParam is deferred; the walk terminates at function parameters rather than crossing back into callers. Path-constraint pruning is conservative: only the accumulated PredicateSummary bits are consulted, not the full symbolic predicate stack. Depth-bounded at k=2 for cross-function body expansion. See DEFAULT_BACKWARDS_DEPTH, BACKWARDS_VALUE_BUDGET, and MAX_BACKWARDS_CALLEE_BLOCKS in src/taint/backwards.rs for the exact bounds.

Source: src/taint/backwards.rs.


Constraint solving

What it does. Collects path constraints at each branch in SSA and propagates them alongside taint. Prunes paths whose accumulated constraint set is unsatisfiable; a taint flow guarded by if x < 0 && x > 10 is dropped rather than surfaced. Optionally delegates the satisfiability check to Z3 when Nyx is built with the smt Cargo feature.

Why it helps. Removes a class of FPs rooted in clearly-infeasible control-flow combinations. Without path constraints, a taint flow that only occurs when mutually-exclusive branches are simultaneously taken can still produce a finding.

How to turn it off.

Surface Value
Config constraint_solving = false under [analysis.engine]
CLI flag --no-constraint-solving
Env var (legacy) NYX_CONSTRAINT=0

The SMT backend is a separate switch:

Setting CLI Env Default Effect
symex.smt --no-smt NYX_SMT=0 on when built with smt feature Delegate satisfiability checks to Z3; ignored if Nyx was built without smt

Limitations. The default path-constraint domain is syntactic; trivially-inconsistent pairs are caught without an SMT solver, but richer algebraic unsatisfiability requires the smt feature (Z3). Without smt, Nyx ships a lightweight satisfiability check that catches literal contradictions but not deeper reasoning.

Source: src/constraint/.


Combining the switches

The defaults (all on) are the configuration Nyx is benchmarked against. Turning any switch off trades precision for speed and may move findings relative to the published baseline; CI regression gates assume defaults. If you need a minimal-overhead scan (for very large repositories or a pre-commit fast path), the AST-only scan mode (--mode ast) skips CFG, taint, and all four advanced passes entirely and is the right tool.