This commit is contained in:
Eli Peter 2026-06-05 10:16:30 -05:00 committed by GitHub
parent 55247b7fcd
commit 991c84a1eb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
1464 changed files with 225448 additions and 1985 deletions

View file

@ -6,6 +6,23 @@ If you're going to act on a finding, it helps to know how the scanner got there.
A scan runs in two passes over the file tree, with an optional SQLite index that lets the second scan skip files whose content hash hasn't changed.
```mermaid
flowchart TD
Walk["Walk file tree"] --> Pass1["Pass 1 per file<br/>tree-sitter parse, CFG, SSA"]
Pass1 --> Summaries["Per-function summaries<br/>sources, sinks, sanitizers, returns, points-to"]
Pass1 --> Hierarchy["Type hierarchy index<br/>extends, implements, impl-for, includes"]
Summaries --> Global["GlobalSummaries map<br/>plus optional SQLite cache"]
Hierarchy --> Global
Global --> Pass2["Pass 2 per file<br/>cross-file context"]
Pass2 --> Taint["Forward SSA taint worklist<br/>finite lattice, guaranteed convergence"]
Pass2 --> Calls["Call precision<br/>k=1 inline, summaries, SCC fixed-point"]
Taint --> Findings["Findings with evidence<br/>source, path, sink, engine notes"]
Calls --> Findings
Findings --> Rank["Rank and dedupe<br/>severity, confidence, score"]
Rank --> Verify["Dynamic verification<br/>sandboxed harnesses, verdicts"]
Verify --> Emit["Emit<br/>console, JSON, SARIF, UI"]
```
**Pass 1, per file.** Tree-sitter parses the file. Nyx builds an intra-procedural control-flow graph, lowers it to SSA, and extracts a summary per function describing what that function does at the boundary: which arguments flow to sinks, which sources it reads from, which sinks it calls, what taint it strips, what it returns. Summaries are persisted to SQLite ([`src/summary/`](https://github.com/elicpeter/nyx/tree/master/src/summary/), [`src/database.rs`](https://github.com/elicpeter/nyx/blob/master/src/database.rs)).
**Summary merge.** All per-file summaries get unioned into a global map keyed by qualified function name.
@ -18,6 +35,8 @@ When a method call has a receiver typed as a super-class, trait, or interface, *
A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for baseline comparison.
**Dynamic verification.** After ranking and dedupe, default builds verify Medium and High confidence findings unless `--no-verify` or `scanner.verify = false` is set. The verifier derives a small harness from the finding, runs it in a sandbox against curated payloads, and stores the result on `evidence.dynamic_verdict`. `Confirmed` means a vulnerable payload fired and its benign control stayed clean. `NotConfirmed` means the harness ran but did not fire, not that the finding is closed.
## Optional analyses on top
These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
@ -47,6 +66,6 @@ Findings whose engine notes indicate a bound was hit can be filtered with `--req
## What you get out
Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.
Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, dynamic verdict when one was attempted, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.
For the JSON shape and SARIF mapping, see [output.md](output.md).