fix failing ci + update docs

2026-06-12 19:55:14 +02:00 · 2026-06-05 09:56:04 -05:00 · 2026-06-05 09:56:04 -05:00 · 061e1f981c
commit 061e1f981c
parent db35cdff2c
7 changed files with 201 additions and 76 deletions
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@ -18,7 +18,9 @@ flowchart TD
    Pass2 --> Calls["Call precision<br/>k=1 inline, summaries, SCC fixed-point"]
    Taint --> Findings["Findings with evidence<br/>source, path, sink, engine notes"]
    Calls --> Findings
-    Findings --> Emit["Rank, dedupe, emit<br/>console, JSON, SARIF, UI"]
+    Findings --> Rank["Rank and dedupe<br/>severity, confidence, score"]
+    Rank --> Verify["Dynamic verification<br/>sandboxed harnesses, verdicts"]
+    Verify --> Emit["Emit<br/>console, JSON, SARIF, UI"]
 ```

 **Pass 1, per file.** Tree-sitter parses the file. Nyx builds an intra-procedural control-flow graph, lowers it to SSA, and extracts a summary per function describing what that function does at the boundary: which arguments flow to sinks, which sources it reads from, which sinks it calls, what taint it strips, what it returns. Summaries are persisted to SQLite ([`src/summary/`](https://github.com/elicpeter/nyx/tree/master/src/summary/), [`src/database.rs`](https://github.com/elicpeter/nyx/blob/master/src/database.rs)).
@ -33,6 +35,8 @@ When a method call has a receiver typed as a super-class, trait, or interface, *

 A separate **field-sensitive points-to** pass tracks abstract locations down to the field level, so `c.mu.Lock()` is a lock on `Field(c, mu)` rather than on `c` as a whole. That distinction is what lets the resource-lifecycle and taint passes tell `obj.field = tainted; sink(obj.other_field)` apart from the conservative whole-variable approximation. Subscript reads and writes (`arr[i]`, `map[k] = v`) lower to synthetic `__index_get__` / `__index_set__` calls so the same container model handles them. Set `NYX_POINTER_ANALYSIS=0` to fall back to the pre-pointer-pass behaviour for baseline comparison.

+**Dynamic verification.** After ranking and dedupe, default builds verify Medium and High confidence findings unless `--no-verify` or `scanner.verify = false` is set. The verifier derives a small harness from the finding, runs it in a sandbox against curated payloads, and stores the result on `evidence.dynamic_verdict`. `Confirmed` means a vulnerable payload fired and its benign control stayed clean. `NotConfirmed` means the harness ran but did not fire, not that the finding is closed.
+
 ## Optional analyses on top

 These run on top of the forward taint pass. They're independently switchable via `[analysis.engine]` config or matching CLI flags. See [advanced-analysis.md](advanced-analysis.md) for the full description and tradeoffs.
@ -62,6 +66,6 @@ Findings whose engine notes indicate a bound was hit can be filtered with `--req

 ## What you get out

-Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.
+Each finding carries the source location, the sink location, the path in between (when symex produced one), the rule ID, severity, attack-surface score, confidence level, dynamic verdict when one was attempted, and a list of engine notes describing any precision loss along the way. Console output is human-readable; JSON and SARIF carry the full evidence object for tooling.

 For the JSON shape and SARIF mapping, see [output.md](output.md).
--- a/docs/output.md
+++ b/docs/output.md
@ -69,48 +69,71 @@ Use --include-quality, --max-low, or --all to adjust.

 ## JSON

-Machine-readable JSON array. Each finding is an object:
+Machine-readable JSON object. The main keys are:
+
+| Key | Type | Description |
+|-----|------|-------------|
+| `findings` | array | Finding objects |
+| `chains` | array | Composed exploit chains, when emitted |
+| `dynamic_verification` | object | Count of attached dynamic verdicts |
+| `verdict_diff` | object | Baseline comparison, only when `--baseline` is used |

 ```json
-[
-  {
-    "path": "src/handler.rs",
-    "line": 12,
-    "col": 5,
-    "severity": "High",
-    "id": "taint-unsanitised-flow (source 5:11)",
-    "path_validated": false,
-    "labels": [
-      ["Source", "env::var(\"CMD\") at 5:11"],
-      ["Sink", "Command::new(\"sh\").arg(\"-c\")"]
-    ],
-    "confidence": "High",
-    "evidence": {
-      "source": {
-        "path": "src/handler.rs",
-        "line": 5,
-        "col": 11,
-        "kind": "source",
-        "snippet": "env::var(\"CMD\")"
+{
+  "findings": [
+    {
+      "path": "src/handler.rs",
+      "line": 12,
+      "col": 5,
+      "severity": "High",
+      "id": "taint-unsanitised-flow (source 5:11)",
+      "path_validated": false,
+      "labels": [
+        ["Source", "env::var(\"CMD\") at 5:11"],
+        ["Sink", "Command::new(\"sh\").arg(\"-c\")"]
+      ],
+      "confidence": "High",
+      "evidence": {
+        "source": {
+          "path": "src/handler.rs",
+          "line": 5,
+          "col": 11,
+          "kind": "source",
+          "snippet": "env::var(\"CMD\")"
+        },
+        "sink": {
+          "path": "src/handler.rs",
+          "line": 12,
+          "col": 5,
+          "kind": "sink",
+          "snippet": "Command::new(\"sh\")"
+        },
+        "notes": ["source_kind:EnvironmentConfig"],
+        "dynamic_verdict": {
+          "finding_id": "a3b12f0c91e04420",
+          "status": "Confirmed",
+          "triggered_payload": "cmdi-echo-marker"
+        }
      },
-      "sink": {
-        "path": "src/handler.rs",
-        "line": 12,
-        "col": 5,
-        "kind": "sink",
-        "snippet": "Command::new(\"sh\")"
-      },
-      "notes": ["source_kind:EnvironmentConfig"]
-    },
-    "rank_score": 76.0,
-    "rank_reason": [
-      ["severity_base", "60"],
-      ["analysis_kind", "10"],
-      ["source_kind", "5"],
-      ["evidence_count", "1"]
-    ]
+      "rank_score": 76.0,
+      "rank_reason": [
+        ["severity_base", "60"],
+        ["analysis_kind", "10"],
+        ["source_kind", "5"],
+        ["evidence_count", "1"]
+      ]
+    }
+  ],
+  "chains": [],
+  "dynamic_verification": {
+    "total": 1,
+    "confirmed": 1,
+    "partially_confirmed": 0,
+    "not_confirmed": 0,
+    "inconclusive": 0,
+    "unsupported": 0
  }
-]
+}
 ```

 ### Field descriptions
@ -132,6 +155,7 @@ Machine-readable JSON array. Each finding is an object:
 | `rank_score` | float | no | Attack-surface score (omitted when ranking disabled) |
 | `rank_reason` | array | no | Score breakdown (omitted when ranking disabled) |
 | `rollup` | object | no | Rollup data when findings are grouped (see below) |
+| `chain_member_of` | int | no | Stable hash of the emitted chain this finding belongs to |

 Fields marked "no" are omitted when empty/null/false to keep output compact.

@ -155,9 +179,40 @@ The `evidence` field provides structured provenance data:
 | `sanitizers` | array | Sanitizer spans |
 | `state` | object | State-machine evidence (machine, subject, from_state, to_state) |
 | `notes` | array | Free-form notes (e.g. `"source_kind:UserInput"`, `"path_validated"`) |
+| `dynamic_verdict` | object | Dynamic verification result, when verification ran or was skipped for a typed reason |

 All fields are omitted when empty/null.

+### Dynamic verdict object
+
+`evidence.dynamic_verdict` uses this shape:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `finding_id` | string | Stable 16-character hex finding id |
+| `status` | string | `Confirmed`, `PartiallyConfirmed`, `NotConfirmed`, `Inconclusive`, or `Unsupported` |
+| `triggered_payload` | string | Payload label for `Confirmed` verdicts |
+| `reason` | object/string | Typed reason for `Unsupported` |
+| `inconclusive_reason` | object/string | Typed reason for `Inconclusive` |
+| `detail` | string | Extra build, sandbox, or policy detail |
+| `attempts` | array | Per-payload attempt summaries |
+| `toolchain_match` | string | `exact` or `drift` |
+| `differential` | object | Vulnerable versus benign control result, when both ran |
+| `hardening_outcome` | object | Process-backend hardening result, when recorded |
+
+The top-level `dynamic_verification` object counts verdict statuses across the emitted findings:
+
+```json
+{
+  "total": 4,
+  "confirmed": 2,
+  "partially_confirmed": 0,
+  "not_confirmed": 1,
+  "inconclusive": 0,
+  "unsupported": 1
+}
+```
+
 ### Rollup object

 When a finding is a rollup (grouped from multiple occurrences), the `rollup` field is present:
@ -195,7 +250,8 @@ The SARIF output includes:
 - **Tool metadata**: Nyx name and version
 - **Rules**: Rule ID, description, severity mapping
 - **Results**: One result per finding with location, message, and properties
- **Properties**: Each result includes `category` and optionally `confidence` and `rollup.count`
+- **Properties**: Each result includes `category` and optionally `confidence`, `rollup.count`, and `nyx_dynamic_verdict`
+- **Fingerprints**: Dynamic verdict status is added as `partialFingerprints.dynamic_verdict_status` when present
 - **Related locations**: Rollup findings include example locations in `relatedLocations`
 - **Artifacts**: File paths referenced by findings

--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@ -6,7 +6,7 @@ After `cargo install nyx-scanner` (or dropping a release binary on your PATH), p
 nyx scan ./my-project
 ```

-First run builds a SQLite index under `.nyx/`; later runs skip files whose content hash hasn't changed.
+First run builds a SQLite index under `.nyx/`; later runs skip files whose content hash hasn't changed. Default builds also verify Medium and High confidence findings in a sandbox. Use `--no-verify` when you want a static-only local loop.

 ## What a finding looks like

@ -21,6 +21,7 @@ The same scan in console form:

      Source: request.args.get (5:11)
      Sink:   os.system
+      [DYN: confirmed via cmdi-echo-marker-python]

  6:5  ✖ [HIGH] py.cmdi.os_system  (Score: 64, Confidence: High)
      os.system() runs a shell command
@ -31,12 +32,15 @@ The same scan in console form:

      Source: req.query.content (3:18)
      Sink:   document.write
+      [DYN: confirmed via xss-script-marker]

  5:5  ⚠ [MEDIUM] js.xss.document_write  (Score: 34, Confidence: High)
      document.write() is an XSS sink

+Dynamic verification: 4 verdicts (2 confirmed, 0 partially confirmed, 1 not confirmed, 0 inconclusive, 1 unsupported)
+
 warning 'demo' generated 10 issues.
-Finished in 0.054s.
+Finished in 1.842s.
 ```

 Each finding is one line of header plus evidence. Fields that matter:
@ -48,6 +52,7 @@ Each finding is one line of header plus evidence. Fields that matter:
 | Score | Attack-surface ranking (severity + analysis kind + source kind + evidence). Higher is more exploitable |
 | Confidence | `High`, `Medium`, `Low`. Drops for AST-only matches, capped widened flows, and lowered-to-Low backwards-infeasible findings |
 | Source / Sink | Where tainted data entered and where the dangerous call happened |
+| `[DYN: ...]` | Dynamic verifier result, when Nyx built and ran a harness for the finding |

 Two rules firing on the same line (the taint finding plus the AST pattern) is normal. The pattern matches the structural presence of `document.write`; the taint rule adds the evidence that `req.query.content` actually reached it. Both carry distinct rule IDs so suppressions can target one without the other.

@ -85,14 +90,17 @@ nyx scan . --require-converged

 `--require-converged` keeps `under-report` findings (the emitted flow is still real) but drops over-reports and widenings. Intended for strict gates where a noisy finding is worse than nothing.

-## Skip dataflow for a fast first pass
+## Skip work for a fast first pass

 ```bash
 nyx scan . --mode ast
+nyx scan . --no-verify
 ```

 AST-only mode runs tree-sitter patterns without building a CFG or running taint. It's fast and still catches banned-API uses, weak crypto, and obvious XSS sinks, but it can't tell `eval("1+1")` apart from `eval(userInput)`. Use it as a pre-commit filter, not as a CI gate replacement.

+`--no-verify` keeps the static engine on but skips sandboxed execution. Use it when you are iterating locally and only need the analyzer result.
+
 ## Next

 - [CLI reference](cli.md) for every flag and subcommand.