nyx/src/server/debug.rs

1377 lines
44 KiB
Rust
Raw Normal View History

Release/0.5.0 (#35) * feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures * feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests * feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements * feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles * feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing * feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling * feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures * feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration * feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests * feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic * feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection * feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements * feat: Enable auth-state analysis by default and update relevant tests in benchmark config * test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test * docs: update CHANGELOG.md * feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers * feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers * feat: Implement per-index array slot tracking in symbolic heap with overflow collapse * feat: Add implicit definition handling for uninitialized declarations in SSA value allocation * feat: Refactor function parameters and constants for improved clarity and maintainability * refactor: Reorder module imports and improve formatting for consistency * refactor: Fix formatting erorrs * refactor: Fix clippy warnings * refactor: Fix fmt warnings (again) * chore: Update dependencies and improve feature configuration * Add comprehensive tests for undertested modules (#36) (COPILOT) * Add comprehensive tests for undertested modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 * Add comprehensive tests for ext, project, walk, and errors modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: Update dependencies and improve feature configuration * fix: formatting errors in new tests * chore: Update license list in about.toml * chore: made functions input inline * chore: updated cfg graph to take up the full page * chore: add Prettier configuration and update code formatting * Add frontend test suite with Vitest (111 tests) (#37) * Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7 * ci: add frontend test step to CI workflow Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: simplify array initialization in test files for consistency * ran typecheck * feat: add AnalysisWorkspace component and integrate it into CfgViewerPage * feat: update routing in AppLayout and improve empty state message in ExplorerPage * feat: enhance scan progress tracking with additional metrics and stages * feat: update license information and add license check script * feat: implement cross-file symbolic execution with callee body persistence * feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering * feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions * feat: enhance resource tracking with proxy method summaries and improve finding extraction * feat: add terminal function exit detection for accurate resource leak analysis * feat: add warnings for loops and functions without bodies to improve error recovery * feat: update lambda expression handling to ensure proper function classification and control flow * feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling * feat: add inline return taint analysis and regression tests for improved security checks * feat: add engine version management and migration handling for database schema updates * feat: enhance first_call_ident to skip nested function bodies and add regression tests * feat: enhance callee name resolution with two-segment normalization and disambiguation * feat: add cross-file context flags and debug assertions for taint analysis * feat: refactor taint analysis structure to unify context handling and improve clarity * feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests * docs: updated CHANGELOG.md * fmt: formatting fixes * fix: fixed frontend formatting and lint warnings * fix: optimized ci * fix: optimized ci * Add comprehensive multi-file test coverage to Nyx (#38) * Initial checklist for multi-file test suite expansion Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * Add 12 new multi-file test fixtures with TP/TN/near-miss coverage Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * deleted root repo * rebuilt to test for regressions --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * feat: enhance import alias resolution and taint tracking * feat: implement security hardening with CSRF protection and path validation * feat: add support for import alias bindings in Python, PHP, and Rust * feat: enhance CFG analysis modes and improve code readability * feat: add detection for parameterized SQL queries to enhance security * feat: add safe internal redirect handling and enhance session destroy validation * feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads * feat: enhance taint detection by adding support for inline source member expressions in call arguments * feat: implement pre-emission of Source nodes for inline source member expressions in call arguments * feat: add support for Throw statement in control flow and error handling * feat: add debug and echo endpoints with potential information leakage * feat: implement internal redirect suppression and enhance taint detection * feat: implement module alias tracking for dynamic dispatch in JS/TS * feat: add authorization analysis module with Express support * feat: add authorization analysis module with Express support * feat: add tests for admin guard requirements and clean checks in authorization analysis * feat: integrate Koa and Fastify frameworks into authorization analysis * feat: add Flask and Django support to authorization analysis module * feat: add support for Rails and Sinatra frameworks in authorization analysis * feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis * feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis * feat: add support for Rails and Sinatra in authorization analysis * chore: add .DS_Store to .gitignore * refactor: simplify conditional checks and improve readability in multiple files * refactor: update usage of Option methods for improved clarity and consistency * refactor: improve code readability by simplifying conditional checks and formatting * refactor: improve code formatting and readability by simplifying conditional checks * refactor: simplify conditional checks and improve readability in multiple files * refactor: simplify conditional checks in axum.rs for improved readability * feat: add CodeQL analysis configuration for enhanced security scanning * test: add comprehensive tests for `src/output.rs` SARIF builder (#39) * chore: start test coverage improvement work Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * test: add comprehensive tests for src/output.rs SARIF builder Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * refactor: improve code formatting and readability in output.rs --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * refactor: improve code formatting and readability in output.rs * Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * refactor: enhance triage file path handling with improved error management and validation * refactor: updated func summaries for richer detail * refactor: update SSA summary extraction to use canonical FuncKey for distinct entries * refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution * refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls * refactor: implement new Flask routes for safe and unsafe shell command execution * refactor: separate receiver handling in SSA operations and enhance taint propagation * refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments * refactor: implement auth decorator extraction and classification for multiple languages * refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation * refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic * refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior * refactor: standardize default struct initialization across multiple files * feat: add scripts for formatting checks and auto-fixes with test summaries * refactor: simplify character splitting and enhance namespace qualifier handling * refactor: improve documentation clarity and enhance code readability in resolver logic * refactor: replace default struct initialization with explicit field assignments for clarity * feat: enhance anonymous function naming by deriving context-based bindings * refactor: streamline match expressions for improved readability and performance * refactor: streamline match expressions for improved readability and performance * refactor: replace loop with while let for improved clarity and performance * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: implement shell metacharacter validation and bounded-length checks in Rust analysis * feat: add static map analysis for command injection suppression and type safety * refactor: simplify match statements and reduce line breaks for improved readability * feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the primary sink source-location through function summaries. Swap SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a backward-compatible cap_sites() helper and serde defaults so pre-phase-1 on-disk rows continue to deserialise cleanly. Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the locator in for the persisted pass-1 path, while pass-2 intra-file transient summaries fall back to cap-only sites (behavior unchanged). Merge: GlobalSummaries::insert now unions sink sites with (file_rel, line, col, cap) dedup via shared union_param_sink_sites helper. Database: JSON-serialised summary columns carry the new shape automatically; no schema change needed. Phase 2 will consume SinkSite in build_taint_diag() to overwrite the caller-site Finding.line with the callee's sink line when resolved via summary. Phase 1 keeps behavior unchanged: scanning tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the same (wrong) line 10 finding. Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink sites, legacy-JSON default handling for both summary types, and merge dedup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding Plumb Phase 1's SinkSite through the event pipeline into Findings, no output change yet. SsaTaintEvent gains `primary_sink_site: Option<SinkSite>`; when the main or callback sink-emission path has non-empty `param_to_sink_sites`, filter to sites whose `(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per distinct site — the multi-primary collapse keeps each downstream Finding single-primary. Resolution: ResolvedSummary and SinkInfo gain mirror `param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink` (SSA + callback paths) and `FuncSummary.param_to_sink` (global paths). Label, local-summary, and interop resolution paths leave the field empty — they only ever had cap-level info to begin with. Finding: new `primary_location: Option<SinkLocation>` with `file_rel/line/col`. `ssa_events_to_findings` maps `event.primary_sink_site` → `Finding.primary_location`, filtering cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never leaks to formatters. Dedup key extended with the primary location so multi-site events aren't collapsed back together. Invariants (debug_assert!): * every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps != ∅` — enforced by the pick_primary_sink_sites* filters; * every populated Finding.primary_location has `line != 0` AND non-empty `file_rel` — the cap-only → None translation upstream guarantees this. Deliberately independent of `uses_summary`: that flag tracks whether the *taint chain* used a summary, whereas primary attribution requires only that the *sink* itself was summary-resolved. A local source reaching a cross-file sink produces `uses_summary=false` alongside a populated primary_location — documented on Finding.primary_location, covered by `cross_file_sink_finding_carries_primary_location`. build_taint_diag, SARIF/JSON/explanation formatters, and the benchmark scorer remain untouched: finding.line still comes from `cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10 and the benchmark's rs-cmdi-003 row still shows FN in the LOC column. Tests: `cross_file_sink_finding_carries_primary_location` (proves plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and `cross_file_sink_cap_only_site_leaves_primary_location_none` (regression guard against cap-only sites surfacing). All 1566 lib tests + integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(output): phase 3/5 consume primary sink location in diag + SARIF When a finding's primary_location (populated in phase 2 from a callee summary's SinkSite) names the dangerous instruction inside a callee body, attribute the diagnostic line to that location instead of the caller's call site. The call site is demoted to a Call step in flow_steps, and a synthetic Sink step at the primary location is appended so analysts still see the full trace. Changes: - Add scan_root parameter to build_taint_diag so file_rel can be resolved back to an absolute path via a shared resolve_file_rel helper. Empty file_rel (single-file scans where namespace == "") resolves to the file under analysis. - Extend SinkLocation with snippet, carried from the upstream SinkSite so the formatter needs no second file read. - Relax the ssa_events_to_findings debug_assert to allow empty file_rel, which is valid when scan root equals the file itself. - SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[]; locations[0] already reflects the primary sink position via the updated diag line/col. Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs now reports line 5 (Command::new) as the primary sink, with the call site at line 10 visible in flow_steps. Two expect.json fixtures updated (must_match line_range widened): - javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is the real sink inside run()). - rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new inside the closure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bench): phase 4/5 validate primary sink attribution across corpus Extend the benchmark scorer and ground truth to lock in phase 3's primary-location behavior, and add fixtures that exercise the new capability end-to-end. Scorer (tests/benchmark_test.rs): - Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on Case. When present, score_location_level additionally requires at least one flow_step in the finding's evidence trace to fall within ±2 of the call-site range. When absent, the check is skipped — fully forward-compatible with existing fixtures. - Retain ±2 tolerance on expected_sink_lines (compared against the now-primary Diag.line post-phase-3). Ground truth edits: - rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the transform::wrap call site (a cross-file propagator, not a sink); line 9 is Command::new, the real sink. The ±2 tolerance happened to mask this stale attribution but it was semantically wrong — phase 4 is the right time to correct it. Also adds expected_call_site_lines [8,8] so the new field is exercised on an existing cross-file case. - rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call). This fixture's sink (Command::new inside run_cmd at line 5) was the motivating case for phases 1-3; adding the call-site assertion guards against regression to caller-line attribution. New fixtures: - rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both takes two tainted params and invokes two Command sinks on consecutive lines. Locks in that primary line lands inside the helper (lines 5-6), not at the caller (line 12). Notes document that SinkSite is currently one-per-callee so both findings today collapse onto the first sink; expected_sink_lines=[5,6] and expected_call_site_lines=[12,12] stay valid either way. - python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross- 004): sink os.system lives in helper.py (cross-file), caller in app.py reads env source and calls run_cmd. Verifies phase 3's cross-file primary attribution: Diag.path = helper.py, Diag.line = 5, with app.py:7 recorded in flow_steps as a Call step. Acceptance: - `cargo test --test benchmark_test -- --ignored --nocapture` passes. - rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are TP/TP/TP. - Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994 F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on 264 pre-phase-4, delta is the +2 new cases both resolving TP). - Full `cargo test` green (1566 lib tests + all integration tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(taint): phase 5/5 lock Finding.primary_location contract via regression test Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three emission stages (pick_primary_sink_sites → emit_ssa_taint_events → ssa_events_to_findings) against a minimal caller SSA body. Asserts the resulting Finding.primary_location is exactly that triple. The existing integration tests in src/taint/tests.rs cover the coarse FuncSummary path end-to-end through analyse_file. This test locks in the lower-level SSA-side plumbing so a future refactor that silently drops the site between pick → emit → findings fails here rather than only at the benchmark layer. Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003 remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4). Closes the primary sink-location attribution feature (phases 1-5/5): * Phase 1 — SinkSite data model on summaries. * Phase 2 — SinkSite threaded into SsaTaintEvent and Finding. * Phase 3 — diag + SARIF consume primary_location. * Phase 4 — benchmark validates primary_call_site_lines across corpus. * Phase 5 — regression test locks the event→finding contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: clean up formatting and improve readability in multiple files * refactor: simplify type definition for deduplication key in findings * test(harness): add must_not_match expectation for FP regression guards Extends ExpectedFinding with must_not_match field that asserts a diagnostic must NOT fire — presence is a hard failure. Non-consuming scan so it coexists with must_match entries on the same rule_id. Adds forbidden_violations accumulator and updates summary line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules * feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking * feat: update switch statement handling to improve control flow analysis * feat: implement promisify alias handling for JS/TS to enhance taint tracking * feat: enhance taint tracking by refining expectation handling and adding mode filtering * feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters * feat: update taint tracking rules to enforce full mode matching and improve flow analysis * feat: enhance Ruby subshell handling to improve taint tracking and flow analysis * feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding * feat: refine framework detection and update expectation handling for Echo and Sinatra * feat: implement max_count for taint tracking expectations and deduplicate findings * feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files * feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity * feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files * feat: add structural invariant checks for SSA bodies * feat: ensure deterministic phi emission order using BTreeSet * feat: enhance handling of terminators to ensure authoritative flow through successor edges * feat: enhance Goto terminator handling to ensure all successors are marked executable * feat: refactor code for improved readability and organization * feat: simplify predicate checks and enhance readability in SSA handling * feat: implement per-file parse timeout and enhance file size handling * feat: migrate analysis engine toggles from environment variables to configuration file * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: update dependencies and enhance documentation on language maturity * feat: enhance security headers and improve request body limits * feat: implement sink capability bits for deduplication and enhance evidence tagging * feat: implement dynamic activation handling for gated sinks and enhance validation logic * feat: enhance configuration documentation and clarify inline analysis cache behavior * feat: implement panic recovery during analysis to continue scans past errors * feat: add expectations configuration for taint analysis and performance metrics * feat: enhance error handling and logging during file reading and mutex locking * feat: add cross-file body loading tests and plumbing for CF-1 phase * feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures * feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality * feat: enhance classification span handling in CFG and AST for improved source attribution * feat: add new Express routes for handling user input and telemetry data * feat: implement ternary expression handling in CFG with diamond structure for JS/TS * feat: implement Phase CF-3 abstract-domain transfer channels in summaries * feat: add support for string-prefix transfer in cross-file calls and update tests * docs: reduce RESULTS.md doc size * feat: implement Phase CF-4 per-return-path summary decomposition with tests * feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization * feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests * feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests * refactor: update comments and documentation for clarity and consistency * style: format code for consistency and readability * refactor: simplify verdict handling and improve edge checking logic * refactor: optimize path and identifier collection by avoiding unnecessary cloning * chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults * refactor: update documentation and improve clarity in configuration files * refactor: update documentation and improve clarity in configuration files * feat: add JS/TS pass-2 convergence tests and expectations configuration * feat: add Phase 5 regression tests for inline cache origin attribution and update related logic * feat: implement Phase 7 deduplication and alternative path linking for taint findings * feat: implement structural DFS index for anonymous functions and update naming conventions * feat: add Phase 8 regression tests for container-element taint in JS and Python * feat: add engine-depth profiles and explain-engine option for CLI * feat: update expectations and add new README fixtures for multi-file scan regression * feat: implement Phase 11 callback-alias and factory patterns with regression tests * feat: implement Terminator::Switch for multi-way dispatch and add regression tests * feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants * refactor: extract cfg and ssa_transfer to submodules * refactor: cargo fmt * refactor: remove unnecessary blank line in cfg_tests.rs * refactor: remove unnecessary planning file * chore: update Rust version to 1.88 and bump dependencies in Cargo files * feat: enhance triage UI with new layout and controls, update README for clarity * feat: enhance triage UI with new layout and controls, update README for clarity * chore: remove outdated section from README for version 0.5.0 * docs: improve clarity and consistency in README content * chore: add "GPL-3.0-or-later" to license options in about.toml * chore: update license handling in about.toml and check-licenses.mjs * style: format code for improved readability in TriagePage component * style: format code for improved readability in TriagePage component * chore: enhance license handling and improve body_id scoping in seed lookup * feat: introduce owner and parent body IDs for enhanced seed scoping * feat: implement direction-aware engine provenance with new CLI flag for strict CI gating * feat: add Undef SSA operation for improved control-flow handling * style: improve code formatting for consistency and readability in multiple files * feat: add 16-function chain SCC across multiple files for enhanced analysis * style: simplify code formatting for improved readability in multiple files * fix: update CapHitReason default implementation and improve README clarity * docs: enhance README with detailed explanations of taint analysis and limitations * docs: refine README for clarity and consistency in taint analysis section * style: improve code formatting for better readability in NewScanModal and scans * fix: update cargo-about command to use --offline for deterministic license generation * fix: update cargo-about command to use --offline for deterministic license generation * ci: add step to prime cargo registry cache for deterministic license generation * feat: add support for non-sink collections in authorization analysis * feat: enhance authorization checks with row-level ownership equality and binding tracking * feat: implement self-scoped user handling and enhance ownership checks * refactor: simplify assertions and formatting in authorization analysis tests * fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure * docs: update AI disclosure section for clarity and conciseness * feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure * feat: enhance authorization analysis with SSA-derived variable type classification * feat: implement auth_finding_to_diag function for enhanced security diagnostics * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add direction-aware engine provenance with LossDirection classification and new CLI flag * feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks * feat: enhance error message handling in cli_validation_tests for better Windows compatibility * feat: optimize release profile settings in Cargo.toml and update CodeQL configuration * feat: enhance release build process with SBOM generation and SLSA provenance * feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries * feat: introduce PathFact handling for path safety checks and rejection logic * feat: introduce PathFact handling for path safety checks and rejection logic * feat: update benchmark data and enhance path sanitization logic with new safety checks * feat: document AI assistance in frontend UI development and human review process * feat: add return path facts for enhanced path safety checks and update documentation * chore: update release date for version 0.5.0 in CHANGELOG.md * chore: clean up ci.yml by removing outdated comments and clarifying steps * feat: implement cross-language path sanitizers and validators for enhanced security * feat: enhance SSA value usage tracking by including block terminators and improve path safety checks * feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases * refactor: simplify conditional formatting and improve code readability in executor and lower modules * feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: add transform classifiers for Java, Go, and Ruby with corresponding tests * refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 17:59:11 -04:00
//! Debug view-model types and on-demand analysis pipeline.
//!
//! Provides serializable "view" structs that mirror internal engine types
//! (CFG, SSA, taint state, etc.) without requiring the engine types themselves
//! to derive `Serialize`. Also provides helper functions that re-run the
//! analysis pipeline on a single file/function for debug inspection.
use crate::ast::build_cfg_for_file;
use crate::callgraph::{CallGraph, CallGraphAnalysis};
use crate::cfg::{Cfg, EdgeKind, FileCfg, FuncSummaries, StmtKind};
use crate::constraint::{CompOp, ConditionExpr, ConstValue, Operand};
use crate::labels::{Cap, DataLabel};
use crate::ssa::ir::*;
use crate::ssa::{self, OptimizeResult};
use crate::state::symbol::SymbolInterner;
use crate::summary::GlobalSummaries;
use crate::summary::ssa_summary::{SsaFuncSummary, TaintTransform};
use crate::symbol::{FuncKey, Lang};
use crate::symex::state::SymbolicState;
use crate::taint::domain::VarTaint;
use crate::taint::ssa_transfer::{SsaTaintEvent, SsaTaintState, SsaTaintTransfer};
use crate::utils::config::Config;
use axum::http::StatusCode;
use petgraph::graph::NodeIndex;
use petgraph::visit::{EdgeRef, IntoNodeReferences};
use serde::Serialize;
use std::collections::VecDeque;
use std::path::Path;
// ─────────────────────────────────────────────────────────────────────────────
// Line-number helper
// ─────────────────────────────────────────────────────────────────────────────
/// Convert a byte offset to a 1-based line number.
fn byte_offset_to_line(bytes: &[u8], offset: usize) -> usize {
let offset = offset.min(bytes.len());
bytes[..offset].iter().filter(|&&b| b == b'\n').count() + 1
}
// ─────────────────────────────────────────────────────────────────────────────
// Cap → human-readable names
// ─────────────────────────────────────────────────────────────────────────────
fn cap_names(c: Cap) -> Vec<String> {
let mut names = Vec::new();
if c.contains(Cap::ENV_VAR) {
names.push("ENV_VAR".into());
}
if c.contains(Cap::HTML_ESCAPE) {
names.push("HTML_ESCAPE".into());
}
if c.contains(Cap::SHELL_ESCAPE) {
names.push("SHELL_ESCAPE".into());
}
if c.contains(Cap::URL_ENCODE) {
names.push("URL_ENCODE".into());
}
if c.contains(Cap::JSON_PARSE) {
names.push("JSON_PARSE".into());
}
if c.contains(Cap::FILE_IO) {
names.push("FILE_IO".into());
}
if c.contains(Cap::FMT_STRING) {
names.push("FMT_STRING".into());
}
if c.contains(Cap::SQL_QUERY) {
names.push("SQL_QUERY".into());
}
if c.contains(Cap::DESERIALIZE) {
names.push("DESERIALIZE".into());
}
if c.contains(Cap::SSRF) {
names.push("SSRF".into());
}
if c.contains(Cap::CODE_EXEC) {
names.push("CODE_EXEC".into());
}
if c.contains(Cap::CRYPTO) {
names.push("CRYPTO".into());
}
names
}
fn label_str(l: &DataLabel) -> String {
match l {
DataLabel::Source(c) => format!("Source({})", cap_names(*c).join("|")),
DataLabel::Sanitizer(c) => format!("Sanitizer({})", cap_names(*c).join("|")),
DataLabel::Sink(c) => format!("Sink({})", cap_names(*c).join("|")),
}
}
// ═════════════════════════════════════════════════════════════════════════════
// View-model types
// ═════════════════════════════════════════════════════════════════════════════
// ── Function list ────────────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct FunctionInfo {
pub name: String,
pub namespace: String,
pub param_count: usize,
pub line: usize,
pub source_caps: Vec<String>,
pub sanitizer_caps: Vec<String>,
pub sink_caps: Vec<String>,
}
// ── CFG ──────────────────────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct CfgNodeView {
pub id: usize,
pub kind: String,
pub span: (usize, usize),
pub line: usize,
pub defines: Option<String>,
pub uses: Vec<String>,
pub callee: Option<String>,
pub labels: Vec<String>,
pub condition_text: Option<String>,
pub enclosing_func: Option<String>,
}
#[derive(Debug, Serialize)]
pub struct CfgEdgeView {
pub source: usize,
pub target: usize,
pub kind: String,
}
#[derive(Debug, Serialize)]
pub struct CfgGraphView {
pub nodes: Vec<CfgNodeView>,
pub edges: Vec<CfgEdgeView>,
pub entry: usize,
}
impl CfgGraphView {
pub fn from_cfg(cfg: &Cfg, entry: NodeIndex, bytes: &[u8]) -> Self {
let nodes = cfg
.node_references()
.map(|(idx, info)| CfgNodeView {
id: idx.index(),
kind: stmt_kind_str(info.kind),
span: info.ast.span,
line: byte_offset_to_line(bytes, info.ast.span.0),
defines: info.taint.defines.clone(),
uses: info.taint.uses.clone(),
callee: info.call.callee.clone(),
labels: info.taint.labels.iter().map(label_str).collect(),
condition_text: info.condition_text.clone(),
enclosing_func: info.ast.enclosing_func.clone(),
})
.collect();
let edges = cfg
.edge_references()
.map(|e| CfgEdgeView {
source: e.source().index(),
target: e.target().index(),
kind: edge_kind_str(*e.weight()),
})
.collect();
CfgGraphView {
nodes,
edges,
entry: entry.index(),
}
}
/// Build a CFG view for a single function by looking up its dedicated
/// `BodyCfg` in the `FileCfg`. This replaces the old BFS-filter approach
/// that walked the supergraph filtered by `enclosing_func`.
pub fn from_cfg_function(file_cfg: &FileCfg, func_name: &str, bytes: &[u8]) -> Option<Self> {
// Find the BodyCfg whose meta.name matches the requested function.
let body = file_cfg
.bodies
.iter()
.find(|b| b.meta.name.as_deref() == Some(func_name))?;
Some(Self::from_cfg(&body.graph, body.entry, bytes))
}
}
fn stmt_kind_str(k: StmtKind) -> String {
match k {
StmtKind::Entry => "Entry",
StmtKind::Exit => "Exit",
StmtKind::Seq => "Seq",
StmtKind::If => "If",
StmtKind::Loop => "Loop",
StmtKind::Break => "Break",
StmtKind::Continue => "Continue",
StmtKind::Return => "Return",
StmtKind::Throw => "Throw",
StmtKind::Call => "Call",
}
.into()
}
fn edge_kind_str(k: EdgeKind) -> String {
match k {
EdgeKind::Seq => "Seq",
EdgeKind::True => "True",
EdgeKind::False => "False",
EdgeKind::Back => "Back",
EdgeKind::Exception => "Exception",
}
.into()
}
// ── SSA ──────────────────────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct SsaInstView {
pub value: u32,
pub op: String,
pub operands: Vec<String>,
pub var_name: Option<String>,
pub span: (usize, usize),
pub line: usize,
}
#[derive(Debug, Serialize)]
pub struct SsaBlockView {
pub id: u32,
pub phis: Vec<SsaInstView>,
pub body: Vec<SsaInstView>,
pub terminator: String,
pub preds: Vec<u32>,
pub succs: Vec<u32>,
}
#[derive(Debug, Serialize)]
pub struct SsaBodyView {
pub blocks: Vec<SsaBlockView>,
pub entry: u32,
pub num_values: usize,
}
impl SsaBodyView {
pub fn from_ssa(ssa: &SsaBody, bytes: &[u8]) -> Self {
let blocks = ssa
.blocks
.iter()
.map(|block| {
let phis = block.phis.iter().map(|i| inst_view(i, bytes)).collect();
let body = block.body.iter().map(|i| inst_view(i, bytes)).collect();
let terminator = terminator_str(&block.terminator);
SsaBlockView {
id: block.id.0,
phis,
body,
terminator,
preds: block.preds.iter().map(|b| b.0).collect(),
succs: block.succs.iter().map(|b| b.0).collect(),
}
})
.collect();
SsaBodyView {
blocks,
entry: ssa.entry.0,
num_values: ssa.num_values(),
}
}
}
fn inst_view(inst: &SsaInst, bytes: &[u8]) -> SsaInstView {
let (op, operands) = op_view(&inst.op);
SsaInstView {
value: inst.value.0,
op,
operands,
var_name: inst.var_name.clone(),
span: inst.span,
line: byte_offset_to_line(bytes, inst.span.0),
}
}
fn op_view(op: &SsaOp) -> (String, Vec<String>) {
match op {
SsaOp::Phi(operands) => {
let ops: Vec<String> = operands
.iter()
.map(|(bid, val)| format!("B{}:v{}", bid.0, val.0))
.collect();
("Phi".into(), ops)
}
SsaOp::Assign(uses) => {
let ops: Vec<String> = uses.iter().map(|v| format!("v{}", v.0)).collect();
("Assign".into(), ops)
}
SsaOp::Call {
callee,
args,
receiver,
} => {
let mut ops = Vec::new();
if let Some(rv) = receiver {
ops.push(format!("recv=v{}", rv.0));
}
ops.push(format!("callee={}", callee));
for (i, arg) in args.iter().enumerate() {
let vs: Vec<String> = arg.iter().map(|v| format!("v{}", v.0)).collect();
ops.push(format!("arg{}=[{}]", i, vs.join(",")));
}
("Call".into(), ops)
}
SsaOp::Source => ("Source".into(), vec![]),
SsaOp::Const(text) => {
let ops = text.iter().cloned().collect();
("Const".into(), ops)
}
SsaOp::Param { index } => ("Param".into(), vec![format!("{}", index)]),
SsaOp::SelfParam => ("SelfParam".into(), vec![]),
SsaOp::CatchParam => ("CatchParam".into(), vec![]),
SsaOp::Nop => ("Nop".into(), vec![]),
SsaOp::Undef => ("Undef".into(), vec![]),
}
}
fn terminator_str(t: &Terminator) -> String {
match t {
Terminator::Goto(bid) => format!("goto B{}", bid.0),
Terminator::Branch {
true_blk,
false_blk,
condition,
..
} => {
let cond_str = condition
.as_ref()
.map(|c| format!("{:?}", c))
.unwrap_or_else(|| "?".into());
format!("branch {} -> B{}, B{}", cond_str, true_blk.0, false_blk.0)
}
Terminator::Switch {
scrutinee,
targets,
default,
..
} => {
let ts: Vec<String> = targets.iter().map(|t| format!("B{}", t.0)).collect();
format!(
"switch v{} -> [{}] default B{}",
scrutinee.0,
ts.join(", "),
default.0,
)
}
Terminator::Return(v) => match v {
Some(val) => format!("return v{}", val.0),
None => "return".into(),
},
Terminator::Unreachable => "unreachable".into(),
}
}
// ── Taint ────────────────────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct TaintValueView {
pub ssa_value: u32,
pub var_name: Option<String>,
pub caps: Vec<String>,
pub uses_summary: bool,
}
#[derive(Debug, Serialize)]
pub struct TaintBlockStateView {
pub block_id: u32,
pub values: Vec<TaintValueView>,
pub validated_must: u64,
pub validated_may: u64,
}
#[derive(Debug, Serialize)]
pub struct TaintEventView {
pub sink_node: usize,
pub sink_caps: Vec<String>,
pub tainted_values: Vec<TaintValueView>,
pub all_validated: bool,
pub uses_summary: bool,
}
#[derive(Debug, Serialize)]
pub struct TaintAnalysisView {
pub block_states: Vec<TaintBlockStateView>,
pub events: Vec<TaintEventView>,
/// Whether cross-file global summaries were available from DB.
pub cross_file_context: bool,
/// Whether SSA-level summaries were loaded (subset of cross-file context).
pub ssa_summaries_available: bool,
}
impl TaintAnalysisView {
pub fn from_results(
events: &[SsaTaintEvent],
block_states: &[Option<SsaTaintState>],
ssa: &SsaBody,
cross_file_context: bool,
ssa_summaries_available: bool,
) -> Self {
let block_states_view: Vec<TaintBlockStateView> = block_states
.iter()
.enumerate()
.filter_map(|(i, state_opt)| {
let state = state_opt.as_ref()?;
let values: Vec<TaintValueView> = state
.values
.iter()
.map(|(sv, taint)| taint_value_view(*sv, taint, ssa))
.collect();
Some(TaintBlockStateView {
block_id: i as u32,
values,
validated_must: state.validated_must.bits(),
validated_may: state.validated_may.bits(),
})
})
.collect();
let events_view: Vec<TaintEventView> = events
.iter()
.map(|e| {
let tainted_values: Vec<TaintValueView> = e
.tainted_values
.iter()
.map(|(sv, caps, _origins)| TaintValueView {
ssa_value: sv.0,
var_name: ssa
.value_defs
.get(sv.0 as usize)
.and_then(|d| d.var_name.clone()),
caps: cap_names(*caps),
uses_summary: false,
})
.collect();
TaintEventView {
sink_node: e.sink_node.index(),
sink_caps: cap_names(e.sink_caps),
tainted_values,
all_validated: e.all_validated,
uses_summary: e.uses_summary,
}
})
.collect();
TaintAnalysisView {
block_states: block_states_view,
events: events_view,
cross_file_context,
ssa_summaries_available,
}
}
}
fn taint_value_view(sv: SsaValue, taint: &VarTaint, ssa: &SsaBody) -> TaintValueView {
TaintValueView {
ssa_value: sv.0,
var_name: ssa
.value_defs
.get(sv.0 as usize)
.and_then(|d| d.var_name.clone()),
caps: cap_names(taint.caps),
uses_summary: taint.uses_summary,
}
}
// ── Abstract Interpretation ──────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct AbstractValueView {
pub ssa_value: u32,
pub var_name: Option<String>,
pub interval_lo: Option<i64>,
pub interval_hi: Option<i64>,
pub string_prefix: Option<String>,
pub string_suffix: Option<String>,
pub known_zero: u64,
pub known_one: u64,
}
#[derive(Debug, Serialize)]
pub struct AbstractBlockView {
pub block_id: u32,
pub values: Vec<AbstractValueView>,
}
#[derive(Debug, Serialize)]
pub struct TypeFactView {
pub ssa_value: u32,
pub var_name: Option<String>,
pub type_kind: String,
pub nullable: bool,
}
#[derive(Debug, Serialize)]
pub struct ConstValueViewEntry {
pub ssa_value: u32,
pub var_name: Option<String>,
pub value: String,
}
#[derive(Debug, Serialize)]
pub struct AbstractInterpView {
pub blocks: Vec<AbstractBlockView>,
pub type_facts: Vec<TypeFactView>,
pub const_values: Vec<ConstValueViewEntry>,
}
impl AbstractInterpView {
pub fn from_taint_states(
block_states: &[Option<SsaTaintState>],
ssa: &SsaBody,
opt: &OptimizeResult,
) -> Self {
let blocks: Vec<AbstractBlockView> = block_states
.iter()
.enumerate()
.filter_map(|(i, state_opt)| {
let state = state_opt.as_ref()?;
let abs_state = state.abstract_state.as_ref()?;
let values: Vec<AbstractValueView> = (0..ssa.num_values() as u32)
.filter_map(|v| {
let av = abs_state.get(SsaValue(v));
if av.is_top() {
return None;
}
Some(AbstractValueView {
ssa_value: v,
var_name: ssa
.value_defs
.get(v as usize)
.and_then(|d| d.var_name.clone()),
interval_lo: av.interval.lo,
interval_hi: av.interval.hi,
string_prefix: av.string.prefix.clone(),
string_suffix: av.string.suffix.clone(),
known_zero: av.bits.known_zero,
known_one: av.bits.known_one,
})
})
.collect();
if values.is_empty() {
return None;
}
Some(AbstractBlockView {
block_id: i as u32,
values,
})
})
.collect();
// Type facts from optimization pass
let mut type_facts: Vec<TypeFactView> = opt
.type_facts
.facts
.iter()
.filter(|(_, tf)| !matches!(tf.kind, crate::ssa::type_facts::TypeKind::Unknown))
.map(|(sv, tf)| TypeFactView {
ssa_value: sv.0,
var_name: ssa
.value_defs
.get(sv.0 as usize)
.and_then(|d| d.var_name.clone()),
type_kind: format!("{:?}", tf.kind),
nullable: tf.nullable,
})
.collect();
type_facts.sort_by_key(|v| v.ssa_value);
// Const values from constant propagation
let mut const_values: Vec<ConstValueViewEntry> = opt
.const_values
.iter()
.filter(|(_, cl)| {
!matches!(
cl,
crate::ssa::const_prop::ConstLattice::Top
| crate::ssa::const_prop::ConstLattice::Varying
)
})
.map(|(sv, cl)| {
let value = match cl {
crate::ssa::const_prop::ConstLattice::Str(s) => format!("\"{}\"", s),
crate::ssa::const_prop::ConstLattice::Int(n) => format!("{}", n),
crate::ssa::const_prop::ConstLattice::Bool(b) => format!("{}", b),
crate::ssa::const_prop::ConstLattice::Null => "null".into(),
_ => unreachable!(),
};
ConstValueViewEntry {
ssa_value: sv.0,
var_name: ssa
.value_defs
.get(sv.0 as usize)
.and_then(|d| d.var_name.clone()),
value,
}
})
.collect();
const_values.sort_by_key(|v| v.ssa_value);
AbstractInterpView {
blocks,
type_facts,
const_values,
}
}
}
// ── Symbolic Execution ───────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct SymexValueView {
pub ssa_value: u32,
pub var_name: Option<String>,
pub expression: String,
}
#[derive(Debug, Serialize)]
pub struct PathConstraintView {
pub block: u32,
pub condition: String,
pub polarity: bool,
}
#[derive(Debug, Serialize)]
pub struct SymexView {
pub values: Vec<SymexValueView>,
pub path_constraints: Vec<PathConstraintView>,
pub tainted_roots: Vec<u32>,
}
impl SymexView {
pub fn from_symbolic_state(state: &SymbolicState, ssa: &SsaBody) -> Self {
let mut values: Vec<SymexValueView> = state
.iter_values()
.map(|(&v, sym)| SymexValueView {
ssa_value: v.0,
var_name: ssa
.value_defs
.get(v.0 as usize)
.and_then(|d| d.var_name.clone()),
expression: format!("{}", sym),
})
.collect();
values.sort_by_key(|v| v.ssa_value);
let path_constraints = state
.path_constraints()
.iter()
.map(|pc| PathConstraintView {
block: pc.block.0,
condition: format_condition_expr(&pc.condition),
polarity: pc.polarity,
})
.collect();
let mut tainted_roots: Vec<u32> = state.tainted_values().iter().map(|v| v.0).collect();
tainted_roots.sort();
SymexView {
values,
path_constraints,
tainted_roots,
}
}
}
// ── Call Graph ───────────────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct CallGraphNodeView {
pub id: usize,
pub name: String,
pub file: String,
pub lang: String,
pub namespace: String,
pub arity: Option<usize>,
}
#[derive(Debug, Serialize)]
pub struct CallGraphEdgeView {
pub source: usize,
pub target: usize,
pub call_site: String,
}
#[derive(Debug, Serialize)]
pub struct CallGraphView {
pub nodes: Vec<CallGraphNodeView>,
pub edges: Vec<CallGraphEdgeView>,
pub sccs: Vec<Vec<usize>>,
pub unresolved_count: usize,
pub ambiguous_count: usize,
}
impl CallGraphView {
pub fn from_call_graph(cg: &CallGraph, analysis: &CallGraphAnalysis) -> Self {
let nodes: Vec<CallGraphNodeView> = cg
.graph
.node_references()
.map(|(idx, fk)| CallGraphNodeView {
id: idx.index(),
name: fk.name.clone(),
file: fk.namespace.clone(),
lang: format!("{:?}", fk.lang),
namespace: fk.namespace.clone(),
arity: fk.arity,
})
.collect();
let edges: Vec<CallGraphEdgeView> = cg
.graph
.edge_references()
.map(|e| CallGraphEdgeView {
source: e.source().index(),
target: e.target().index(),
call_site: e.weight().call_site.clone(),
})
.collect();
let sccs: Vec<Vec<usize>> = analysis
.sccs
.iter()
.filter(|scc| scc.len() > 1) // Only show non-trivial SCCs
.map(|scc| scc.iter().map(|n| n.index()).collect())
.collect();
CallGraphView {
nodes,
edges,
sccs,
unresolved_count: cg.unresolved_not_found.len(),
ambiguous_count: cg.unresolved_ambiguous.len(),
}
}
}
// ── Summaries ────────────────────────────────────────────────────────────────
#[derive(Debug, Serialize)]
pub struct FuncSummaryView {
pub name: String,
pub file_path: String,
pub lang: String,
pub namespace: String,
pub arity: Option<usize>,
pub param_count: usize,
pub source_caps: Vec<String>,
pub sanitizer_caps: Vec<String>,
pub sink_caps: Vec<String>,
pub propagates_taint: bool,
pub propagating_params: Vec<usize>,
pub tainted_sink_params: Vec<usize>,
pub callees: Vec<CalleeSiteView>,
pub ssa_summary: Option<SsaSummaryView>,
}
#[derive(Debug, Serialize)]
pub struct CalleeSiteView {
pub name: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub arity: Option<usize>,
#[serde(skip_serializing_if = "Option::is_none")]
pub receiver: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub qualifier: Option<String>,
#[serde(skip_serializing_if = "is_zero_u32")]
pub ordinal: u32,
}
fn is_zero_u32(n: &u32) -> bool {
*n == 0
}
#[derive(Debug, Serialize)]
pub struct SsaSummaryView {
pub param_to_return: Vec<ParamReturnView>,
pub param_to_sink: Vec<ParamSinkView>,
pub source_caps: Vec<String>,
}
#[derive(Debug, Serialize)]
pub struct ParamReturnView {
pub param_index: usize,
pub transform: String,
}
#[derive(Debug, Serialize)]
pub struct ParamSinkView {
pub param_index: usize,
pub sink_caps: Vec<String>,
}
impl FuncSummaryView {
pub fn from_global(
key: &FuncKey,
summary: &crate::summary::FuncSummary,
ssa_summary: Option<&SsaFuncSummary>,
) -> Self {
let ssa_view = ssa_summary.map(|ss| SsaSummaryView {
param_to_return: ss
.param_to_return
.iter()
.map(|(idx, transform)| ParamReturnView {
param_index: *idx,
transform: transform_str(transform),
})
.collect(),
param_to_sink: ss
.param_to_sink_caps()
.into_iter()
.map(|(idx, caps)| ParamSinkView {
param_index: idx,
sink_caps: cap_names(caps),
})
.collect(),
source_caps: cap_names(ss.source_caps),
});
FuncSummaryView {
name: key.name.clone(),
file_path: summary.file_path.clone(),
lang: format!("{:?}", key.lang),
namespace: key.namespace.clone(),
arity: key.arity,
param_count: summary.param_count,
source_caps: cap_names(Cap::from_bits_truncate(summary.source_caps)),
sanitizer_caps: cap_names(Cap::from_bits_truncate(summary.sanitizer_caps)),
sink_caps: cap_names(Cap::from_bits_truncate(summary.sink_caps)),
propagates_taint: summary.propagates_taint,
propagating_params: summary.propagating_params.clone(),
tainted_sink_params: summary.tainted_sink_params.clone(),
callees: summary
.callees
.iter()
.map(|c| CalleeSiteView {
name: c.name.clone(),
arity: c.arity,
receiver: c.receiver.clone(),
qualifier: c.qualifier.clone(),
ordinal: c.ordinal,
})
.collect(),
ssa_summary: ssa_view,
}
}
}
fn transform_str(t: &TaintTransform) -> String {
match t {
TaintTransform::Identity => "Identity".into(),
TaintTransform::StripBits(caps) => format!("StripBits({})", cap_names(*caps).join("|")),
TaintTransform::AddBits(caps) => format!("AddBits({})", cap_names(*caps).join("|")),
}
}
// ═════════════════════════════════════════════════════════════════════════════
// On-demand analysis pipeline
// ═════════════════════════════════════════════════════════════════════════════
/// Result of parsing + CFG construction for a single file.
pub struct FileAnalysis {
pub file_cfg: crate::cfg::FileCfg,
pub lang: Lang,
pub bytes: Vec<u8>,
}
impl FileAnalysis {
/// Top-level body's graph (backward-compatible accessor).
pub fn cfg(&self) -> &Cfg {
&self.file_cfg.toplevel().graph
}
pub fn entry(&self) -> NodeIndex {
self.file_cfg.toplevel().entry
}
pub fn summaries(&self) -> &FuncSummaries {
&self.file_cfg.summaries
}
}
/// Parse a file and build its CFG. Returns an error status code on failure.
pub fn analyse_file(file_path: &Path, config: &Config) -> Result<FileAnalysis, StatusCode> {
let result =
build_cfg_for_file(file_path, config).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
match result {
Some((file_cfg, lang)) => {
let bytes = std::fs::read(file_path).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
Ok(FileAnalysis {
file_cfg,
lang,
bytes,
})
}
None => Err(StatusCode::BAD_REQUEST),
}
}
/// Extract function info list from local summaries.
pub fn function_list(analysis: &FileAnalysis) -> Vec<FunctionInfo> {
analysis
.summaries()
.iter()
.map(|(key, summary)| FunctionInfo {
name: key.name.clone(),
namespace: key.namespace.clone(),
param_count: summary.param_count,
line: byte_offset_to_line(&analysis.bytes, analysis.cfg()[summary.entry].ast.span.0),
source_caps: cap_names(summary.source_caps),
sanitizer_caps: cap_names(summary.sanitizer_caps),
sink_caps: cap_names(summary.sink_caps),
})
.collect()
}
/// Lower a single function to SSA and optimize it.
pub fn analyse_function_ssa(
analysis: &FileAnalysis,
func_name: &str,
) -> Result<(SsaBody, OptimizeResult), StatusCode> {
// Find the function body by name from the per-body CFGs.
let body = analysis
.file_cfg
.bodies
.iter()
.find(|b| b.meta.name.as_deref() == Some(func_name))
.ok_or(StatusCode::NOT_FOUND)?;
let ssa_result = crate::ssa::lower::lower_to_ssa_with_params(
&body.graph,
body.entry,
Some(func_name),
false,
&body.meta.params,
);
let mut ssa = ssa_result.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
let opt = ssa::optimize_ssa(&mut ssa, &body.graph, Some(analysis.lang));
Ok((ssa, opt))
}
/// Run taint analysis on a function's SSA body.
pub fn analyse_function_taint(
ssa: &SsaBody,
cfg: &Cfg,
lang: Lang,
summaries: &FuncSummaries,
global_summaries: Option<&GlobalSummaries>,
opt: &OptimizeResult,
) -> (
Vec<SsaTaintEvent>,
Vec<Option<SsaTaintState>>,
Vec<Option<SsaTaintState>>,
) {
let interner = SymbolInterner::default();
let empty_interop = vec![];
let transfer = SsaTaintTransfer {
lang,
namespace: "",
interner: &interner,
local_summaries: summaries,
global_summaries,
interop_edges: &empty_interop,
owner_body_id: crate::cfg::BodyId(0),
parent_body_id: None,
global_seed: None,
param_seed: None,
receiver_seed: None,
const_values: Some(&opt.const_values),
type_facts: Some(&opt.type_facts),
ssa_summaries: None,
extra_labels: None,
callee_bodies: None,
inline_cache: None,
base_aliases: Some(&opt.alias_result),
context_depth: 0,
callback_bindings: None,
points_to: Some(&opt.points_to),
dynamic_pts: None,
import_bindings: None,
promisify_aliases: None,
module_aliases: if opt.module_aliases.is_empty() {
None
} else {
Some(&opt.module_aliases)
},
static_map: None,
auto_seed_handler_params: matches!(lang, Lang::JavaScript | Lang::TypeScript),
cross_file_bodies: global_summaries.and_then(|gs| gs.bodies_by_key()),
};
crate::taint::ssa_transfer::run_ssa_taint_full_with_exits(ssa, cfg, &transfer)
}
/// Run symbolic execution on a function's SSA body and return the final state.
pub fn analyse_function_symex(
ssa: &SsaBody,
cfg: &Cfg,
lang: Lang,
opt: &OptimizeResult,
global_summaries: Option<&GlobalSummaries>,
) -> SymbolicState {
let mut state = SymbolicState::new();
state.seed_from_const_values(&opt.const_values);
let summary_ctx = global_summaries.map(|gs| crate::symex::transfer::SymexSummaryCtx {
global_summaries: gs,
lang,
namespace: "",
type_facts: Some(&opt.type_facts),
});
let heap_ctx = crate::symex::transfer::SymexHeapCtx {
points_to: &opt.points_to,
ssa,
lang,
const_values: &opt.const_values,
};
// BFS over blocks from entry to cover all reachable blocks.
let mut visited = std::collections::HashSet::new();
let mut queue = VecDeque::new();
queue.push_back(ssa.entry);
visited.insert(ssa.entry);
while let Some(bid) = queue.pop_front() {
let block = ssa.block(bid);
crate::symex::transfer::transfer_block(
&mut state,
block,
cfg,
ssa,
summary_ctx.as_ref(),
Some(&heap_ctx),
None, // no interproc context
Some(lang),
);
for &succ in &block.succs {
if visited.insert(succ) {
queue.push_back(succ);
}
}
}
state
}
/// Extract `GlobalSummaries` from a single file on-demand (no DB required).
pub fn analyse_file_summaries(
file_path: &Path,
config: &Config,
) -> Result<GlobalSummaries, StatusCode> {
let bytes = std::fs::read(file_path).map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
let (func_summaries, ssa_rows, _ssa_bodies, auth_rows) =
crate::ast::extract_all_summaries_from_bytes(&bytes, file_path, config, None)
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
let mut global = crate::summary::merge_summaries(func_summaries, None);
for (key, ssa_summary) in ssa_rows {
global.insert_ssa(key, ssa_summary);
}
for (key, auth_summary) in auth_rows {
global.insert_auth(key, auth_summary);
}
Ok(global)
}
/// Format a `ConditionExpr` as a human-readable string.
fn format_condition_expr(cond: &ConditionExpr) -> String {
match cond {
ConditionExpr::Comparison { lhs, op, rhs } => {
let op_str = match op {
CompOp::Eq => "==",
CompOp::Neq => "!=",
CompOp::Lt => "<",
CompOp::Gt => ">",
CompOp::Le => "<=",
CompOp::Ge => ">=",
};
format!("{} {} {}", format_operand(lhs), op_str, format_operand(rhs))
}
ConditionExpr::NullCheck { var, is_null } => {
if *is_null {
format!("v{} == null", var.0)
} else {
format!("v{} != null", var.0)
}
}
ConditionExpr::TypeCheck {
var,
type_name,
positive,
} => {
if *positive {
format!("typeof v{} === \"{}\"", var.0, type_name)
} else {
format!("typeof v{} !== \"{}\"", var.0, type_name)
}
}
ConditionExpr::BoolTest { var } => format!("v{}", var.0),
ConditionExpr::Unknown => "?".to_string(),
}
}
fn format_operand(op: &Operand) -> String {
match op {
Operand::Value(v) => format!("v{}", v.0),
Operand::Const(c) => match c {
ConstValue::Int(n) => format!("{}", n),
ConstValue::Str(s) => format!("\"{}\"", s),
ConstValue::Bool(b) => format!("{}", b),
ConstValue::Null => "null".to_string(),
},
Operand::Unknown => "?".to_string(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::utils::config::Config;
#[test]
fn taint_debug_uses_exit_states_for_single_block_flows() {
let dir = tempfile::tempdir().unwrap();
let path = dir.path().join("app.js");
std::fs::write(
&path,
r#"
function demo() {
const cmd = process.env.CRON_JOB_CMD;
eval(cmd);
}
"#,
)
.unwrap();
let config = Config::default();
let analysis = analyse_file(&path, &config).expect("file should analyse");
let (ssa, opt) =
analyse_function_ssa(&analysis, "demo").expect("function should lower to SSA");
let body = analysis
.file_cfg
.bodies
.iter()
.find(|b| b.meta.name.as_deref() == Some("demo"))
.expect("should find demo function body");
let (events, _entry_states, exit_states) = analyse_function_taint(
&ssa,
&body.graph,
analysis.lang,
analysis.summaries(),
None,
&opt,
);
assert!(
!events.is_empty(),
"expected the test fixture to produce at least one taint event"
);
assert!(
exit_states
.iter()
.flatten()
.any(|state| !state.values.is_empty()),
"exit-state debug view should show tainted SSA values even for single-block functions"
);
let view = TaintAnalysisView::from_results(&events, &exit_states, &ssa, false, false);
assert!(
view.block_states
.iter()
.any(|state| !state.values.is_empty()),
"serialized debug taint view should expose the populated exit states"
);
}
#[test]
fn taint_view_without_global_summaries_marks_no_cross_file_context() {
let dir = tempfile::tempdir().unwrap();
let path = dir.path().join("local.js");
std::fs::write(
&path,
r#"
function sink() {
const x = process.env.SECRET;
eval(x);
}
"#,
)
.unwrap();
let config = Config::default();
let analysis = analyse_file(&path, &config).expect("file should analyse");
let (ssa, opt) =
analyse_function_ssa(&analysis, "sink").expect("function should lower to SSA");
let body = analysis
.file_cfg
.bodies
.iter()
.find(|b| b.meta.name.as_deref() == Some("sink"))
.expect("should find sink function body");
let (events, _entry_states, exit_states) = analyse_function_taint(
&ssa,
&body.graph,
analysis.lang,
analysis.summaries(),
None, // no global summaries
&opt,
);
let view = TaintAnalysisView::from_results(&events, &exit_states, &ssa, false, false);
assert!(!view.cross_file_context);
assert!(!view.ssa_summaries_available);
// The local analysis should still find the taint event
assert!(
!view.events.is_empty(),
"local taint should still find events"
);
}
#[test]
fn taint_view_with_global_summaries_marks_cross_file_context() {
let dir = tempfile::tempdir().unwrap();
let path = dir.path().join("consumer.js");
std::fs::write(
&path,
r#"
function consume() {
const x = process.env.SECRET;
eval(x);
}
"#,
)
.unwrap();
let config = Config::default();
let analysis = analyse_file(&path, &config).expect("file should analyse");
let (ssa, opt) =
analyse_function_ssa(&analysis, "consume").expect("function should lower to SSA");
let body = analysis
.file_cfg
.bodies
.iter()
.find(|b| b.meta.name.as_deref() == Some("consume"))
.expect("should find consume function body");
// Create non-empty global summaries to simulate having run a scan
let mut global = crate::summary::GlobalSummaries::default();
let key = crate::symbol::FuncKey {
lang: crate::symbol::Lang::JavaScript,
namespace: "src/helper.js".into(),
name: "getInput".into(),
arity: Some(0),
..Default::default()
};
global.insert_ssa(
key,
crate::summary::ssa_summary::SsaFuncSummary {
param_to_return: vec![],
param_to_sink: vec![],
source_caps: crate::labels::Cap::all(),
param_to_sink_param: vec![],
param_container_to_return: vec![],
param_to_container_store: vec![],
return_type: None,
return_abstract: None,
source_to_callback: vec![],
receiver_to_return: None,
receiver_to_sink: Cap::empty(),
abstract_transfer: vec![],
param_return_paths: vec![],
points_to: Default::default(),
return_path_facts: smallvec::SmallVec::new(),
},
);
let cross_file = !global.is_empty();
let ssa_avail = !global.snapshot_ssa().is_empty();
let (events, _entry_states, exit_states) = analyse_function_taint(
&ssa,
&body.graph,
analysis.lang,
analysis.summaries(),
Some(&global),
&opt,
);
let view =
TaintAnalysisView::from_results(&events, &exit_states, &ssa, cross_file, ssa_avail);
assert!(view.cross_file_context);
assert!(view.ssa_summaries_available);
}
#[test]
fn cfg_function_view_does_not_bleed_into_sibling_functions() {
let dir = tempfile::tempdir().unwrap();
let path = dir.path().join("admin.js");
std::fs::write(
&path,
r#"
const db = require("../db");
async function writeAuditLog({ actorId, action, targetType, targetId, metadata }) {
await db.query(
`
INSERT INTO audit_logs (actor_id, action, target_type, target_id, metadata)
VALUES ($1, $2, $3, $4, $5)
`,
[actorId, action, targetType, targetId, metadata]
);
}
async function recentAuditLogs() {
const result = await db.query(
`
SELECT a.*, u.full_name AS actor_name
FROM audit_logs a
LEFT JOIN users u ON u.id = a.actor_id
ORDER BY a.created_at DESC
LIMIT 20
`
);
return result.rows;
}
"#,
)
.unwrap();
let config = Config::default();
let analysis = analyse_file(&path, &config).expect("file should analyse");
let view =
CfgGraphView::from_cfg_function(&analysis.file_cfg, "writeAuditLog", &analysis.bytes)
.expect("function view should exist");
assert!(
!view.nodes.is_empty(),
"expected writeAuditLog to produce CFG nodes"
);
assert!(
view.nodes
.iter()
.all(|node| node.enclosing_func.as_deref() == Some("writeAuditLog")),
"function-scoped CFG view should only contain writeAuditLog nodes"
);
assert!(
view.nodes.iter().any(|node| node.line == 4),
"expected function entry/header for writeAuditLog"
);
assert!(
view.nodes.iter().any(|node| node.line == 5),
"expected db.query call inside writeAuditLog"
);
assert!(
view.nodes.iter().all(|node| node.line < 13),
"sibling function nodes should not appear in writeAuditLog view"
);
}
}