Dynamic (#77)

2026-07-03 20:41:00 +02:00 · 2026-06-05 10:16:30 -05:00 · 2026-06-05 10:16:30 -05:00 · 991c84a1eb
commit 991c84a1eb
parent 55247b7fcd
1464 changed files with 225448 additions and 1985 deletions
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@ -6,6 +6,23 @@ If you're going to act on a finding, it helps to know how the scanner got there.

 A scan runs in two passes over the file tree, with an optional SQLite index that lets the second scan skip files whose content hash hasn't changed.

+```mermaid
+flowchart TD
+    Walk["Walk file tree"] --> Pass1["Pass 1 per file<br/>tree-sitter parse, CFG, SSA"]
+    Pass1 --> Summaries["Per-function summaries<br/>sources, sinks, sanitizers, returns, points-to"]
+    Pass1 --> Hierarchy["Type hierarchy index<br/>extends, implements, impl-for, includes"]
+    Summaries --> Global["GlobalSummaries map<br/>plus optional SQLite cache"]
+    Hierarchy --> Global
+    Global --> Pass2["Pass 2 per file<br/>cross-file context"]
+    Pass2 --> Taint["Forward SSA taint worklist<br/>finite lattice, guaranteed convergence"]
+    Pass2 --> Calls["Call precision<br/>k=1 inline, summaries, SCC fixed-point"]
+    Taint --> Findings["Findings with evidence<br/>source, path, sink, engine notes"]
+    Calls --> Findings
+    Findings --> Rank["Rank and dedupe<br/>severity, confidence, score"]
+    Rank --> Verify["Dynamic verification<br/>sandboxed harnesses, verdicts"]
+    Verify --> Emit["Emit<br/>console, JSON, SARIF, UI"]
+```
+
 **Pass 1, per file.** Tree-sitter parses the file. Nyx builds an intra-procedural control-flow graph, lowers it to SSA, and extracts a summary per function describing what that function does at the boundary: which arguments flow to sinks, which sources it reads from, which sinks it calls, what taint it strips, what it returns. Summaries are persisted to SQLite ([`src/summary/`](https://github.com/elicpeter/nyx/tree/master/src/summary/), [`src/database.rs`](https://github.com/elicpeter/nyx/blob/master/src/database.rs)).

 **Summary merge.** All per-file summaries get unioned into a global map keyed by qualified function name.
@ -18,6 +35,8 @@ When a method call has a receiver typed as a super-class, trait, or interface, *

 A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for baseline comparison.

+**Dynamic verification.** After ranking and dedupe, default builds verify Medium and High confidence findings unless `--no-verify` or `scanner.verify = false` is set. The verifier derives a small harness from the finding, runs it in a sandbox against curated payloads, and stores the result on `evidence.dynamic_verdict`. `Confirmed` means a vulnerable payload fired and its benign control stayed clean. `NotConfirmed` means the harness ran but did not fire, not that the finding is closed.
+
 ## Optional analyses on top

 These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
@ -47,6 +66,6 @@ Findings whose engine notes indicate a bound was hit can be filtered with `--req

 ## What you get out

-Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.
+Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, dynamic verdict when one was attempted, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.

 For the JSON shape and SARIF mapping, see [output.md](output.md).