mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-21 20:18:06 +02:00
Release/0.5.0 (#35)
* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures * feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests * feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements * feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles * feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing * feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling * feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures * feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration * feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests * feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic * feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection * feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements * feat: Enable auth-state analysis by default and update relevant tests in benchmark config * test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test * docs: update CHANGELOG.md * feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers * feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers * feat: Implement per-index array slot tracking in symbolic heap with overflow collapse * feat: Add implicit definition handling for uninitialized declarations in SSA value allocation * feat: Refactor function parameters and constants for improved clarity and maintainability * refactor: Reorder module imports and improve formatting for consistency * refactor: Fix formatting erorrs * refactor: Fix clippy warnings * refactor: Fix fmt warnings (again) * chore: Update dependencies and improve feature configuration * Add comprehensive tests for undertested modules (#36) (COPILOT) * Add comprehensive tests for undertested modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 * Add comprehensive tests for ext, project, walk, and errors modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: Update dependencies and improve feature configuration * fix: formatting errors in new tests * chore: Update license list in about.toml * chore: made functions input inline * chore: updated cfg graph to take up the full page * chore: add Prettier configuration and update code formatting * Add frontend test suite with Vitest (111 tests) (#37) * Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7 * ci: add frontend test step to CI workflow Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: simplify array initialization in test files for consistency * ran typecheck * feat: add AnalysisWorkspace component and integrate it into CfgViewerPage * feat: update routing in AppLayout and improve empty state message in ExplorerPage * feat: enhance scan progress tracking with additional metrics and stages * feat: update license information and add license check script * feat: implement cross-file symbolic execution with callee body persistence * feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering * feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions * feat: enhance resource tracking with proxy method summaries and improve finding extraction * feat: add terminal function exit detection for accurate resource leak analysis * feat: add warnings for loops and functions without bodies to improve error recovery * feat: update lambda expression handling to ensure proper function classification and control flow * feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling * feat: add inline return taint analysis and regression tests for improved security checks * feat: add engine version management and migration handling for database schema updates * feat: enhance first_call_ident to skip nested function bodies and add regression tests * feat: enhance callee name resolution with two-segment normalization and disambiguation * feat: add cross-file context flags and debug assertions for taint analysis * feat: refactor taint analysis structure to unify context handling and improve clarity * feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests * docs: updated CHANGELOG.md * fmt: formatting fixes * fix: fixed frontend formatting and lint warnings * fix: optimized ci * fix: optimized ci * Add comprehensive multi-file test coverage to Nyx (#38) * Initial checklist for multi-file test suite expansion Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * Add 12 new multi-file test fixtures with TP/TN/near-miss coverage Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * deleted root repo * rebuilt to test for regressions --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * feat: enhance import alias resolution and taint tracking * feat: implement security hardening with CSRF protection and path validation * feat: add support for import alias bindings in Python, PHP, and Rust * feat: enhance CFG analysis modes and improve code readability * feat: add detection for parameterized SQL queries to enhance security * feat: add safe internal redirect handling and enhance session destroy validation * feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads * feat: enhance taint detection by adding support for inline source member expressions in call arguments * feat: implement pre-emission of Source nodes for inline source member expressions in call arguments * feat: add support for Throw statement in control flow and error handling * feat: add debug and echo endpoints with potential information leakage * feat: implement internal redirect suppression and enhance taint detection * feat: implement module alias tracking for dynamic dispatch in JS/TS * feat: add authorization analysis module with Express support * feat: add authorization analysis module with Express support * feat: add tests for admin guard requirements and clean checks in authorization analysis * feat: integrate Koa and Fastify frameworks into authorization analysis * feat: add Flask and Django support to authorization analysis module * feat: add support for Rails and Sinatra frameworks in authorization analysis * feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis * feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis * feat: add support for Rails and Sinatra in authorization analysis * chore: add .DS_Store to .gitignore * refactor: simplify conditional checks and improve readability in multiple files * refactor: update usage of Option methods for improved clarity and consistency * refactor: improve code readability by simplifying conditional checks and formatting * refactor: improve code formatting and readability by simplifying conditional checks * refactor: simplify conditional checks and improve readability in multiple files * refactor: simplify conditional checks in axum.rs for improved readability * feat: add CodeQL analysis configuration for enhanced security scanning * test: add comprehensive tests for `src/output.rs` SARIF builder (#39) * chore: start test coverage improvement work Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * test: add comprehensive tests for src/output.rs SARIF builder Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * refactor: improve code formatting and readability in output.rs --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * refactor: improve code formatting and readability in output.rs * Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * refactor: enhance triage file path handling with improved error management and validation * refactor: updated func summaries for richer detail * refactor: update SSA summary extraction to use canonical FuncKey for distinct entries * refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution * refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls * refactor: implement new Flask routes for safe and unsafe shell command execution * refactor: separate receiver handling in SSA operations and enhance taint propagation * refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments * refactor: implement auth decorator extraction and classification for multiple languages * refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation * refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic * refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior * refactor: standardize default struct initialization across multiple files * feat: add scripts for formatting checks and auto-fixes with test summaries * refactor: simplify character splitting and enhance namespace qualifier handling * refactor: improve documentation clarity and enhance code readability in resolver logic * refactor: replace default struct initialization with explicit field assignments for clarity * feat: enhance anonymous function naming by deriving context-based bindings * refactor: streamline match expressions for improved readability and performance * refactor: streamline match expressions for improved readability and performance * refactor: replace loop with while let for improved clarity and performance * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: implement shell metacharacter validation and bounded-length checks in Rust analysis * feat: add static map analysis for command injection suppression and type safety * refactor: simplify match statements and reduce line breaks for improved readability * feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the primary sink source-location through function summaries. Swap SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a backward-compatible cap_sites() helper and serde defaults so pre-phase-1 on-disk rows continue to deserialise cleanly. Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the locator in for the persisted pass-1 path, while pass-2 intra-file transient summaries fall back to cap-only sites (behavior unchanged). Merge: GlobalSummaries::insert now unions sink sites with (file_rel, line, col, cap) dedup via shared union_param_sink_sites helper. Database: JSON-serialised summary columns carry the new shape automatically; no schema change needed. Phase 2 will consume SinkSite in build_taint_diag() to overwrite the caller-site Finding.line with the callee's sink line when resolved via summary. Phase 1 keeps behavior unchanged: scanning tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the same (wrong) line 10 finding. Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink sites, legacy-JSON default handling for both summary types, and merge dedup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding Plumb Phase 1's SinkSite through the event pipeline into Findings, no output change yet. SsaTaintEvent gains `primary_sink_site: Option<SinkSite>`; when the main or callback sink-emission path has non-empty `param_to_sink_sites`, filter to sites whose `(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per distinct site — the multi-primary collapse keeps each downstream Finding single-primary. Resolution: ResolvedSummary and SinkInfo gain mirror `param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink` (SSA + callback paths) and `FuncSummary.param_to_sink` (global paths). Label, local-summary, and interop resolution paths leave the field empty — they only ever had cap-level info to begin with. Finding: new `primary_location: Option<SinkLocation>` with `file_rel/line/col`. `ssa_events_to_findings` maps `event.primary_sink_site` → `Finding.primary_location`, filtering cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never leaks to formatters. Dedup key extended with the primary location so multi-site events aren't collapsed back together. Invariants (debug_assert!): * every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps != ∅` — enforced by the pick_primary_sink_sites* filters; * every populated Finding.primary_location has `line != 0` AND non-empty `file_rel` — the cap-only → None translation upstream guarantees this. Deliberately independent of `uses_summary`: that flag tracks whether the *taint chain* used a summary, whereas primary attribution requires only that the *sink* itself was summary-resolved. A local source reaching a cross-file sink produces `uses_summary=false` alongside a populated primary_location — documented on Finding.primary_location, covered by `cross_file_sink_finding_carries_primary_location`. build_taint_diag, SARIF/JSON/explanation formatters, and the benchmark scorer remain untouched: finding.line still comes from `cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10 and the benchmark's rs-cmdi-003 row still shows FN in the LOC column. Tests: `cross_file_sink_finding_carries_primary_location` (proves plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and `cross_file_sink_cap_only_site_leaves_primary_location_none` (regression guard against cap-only sites surfacing). All 1566 lib tests + integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(output): phase 3/5 consume primary sink location in diag + SARIF When a finding's primary_location (populated in phase 2 from a callee summary's SinkSite) names the dangerous instruction inside a callee body, attribute the diagnostic line to that location instead of the caller's call site. The call site is demoted to a Call step in flow_steps, and a synthetic Sink step at the primary location is appended so analysts still see the full trace. Changes: - Add scan_root parameter to build_taint_diag so file_rel can be resolved back to an absolute path via a shared resolve_file_rel helper. Empty file_rel (single-file scans where namespace == "") resolves to the file under analysis. - Extend SinkLocation with snippet, carried from the upstream SinkSite so the formatter needs no second file read. - Relax the ssa_events_to_findings debug_assert to allow empty file_rel, which is valid when scan root equals the file itself. - SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[]; locations[0] already reflects the primary sink position via the updated diag line/col. Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs now reports line 5 (Command::new) as the primary sink, with the call site at line 10 visible in flow_steps. Two expect.json fixtures updated (must_match line_range widened): - javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is the real sink inside run()). - rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new inside the closure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bench): phase 4/5 validate primary sink attribution across corpus Extend the benchmark scorer and ground truth to lock in phase 3's primary-location behavior, and add fixtures that exercise the new capability end-to-end. Scorer (tests/benchmark_test.rs): - Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on Case. When present, score_location_level additionally requires at least one flow_step in the finding's evidence trace to fall within ±2 of the call-site range. When absent, the check is skipped — fully forward-compatible with existing fixtures. - Retain ±2 tolerance on expected_sink_lines (compared against the now-primary Diag.line post-phase-3). Ground truth edits: - rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the transform::wrap call site (a cross-file propagator, not a sink); line 9 is Command::new, the real sink. The ±2 tolerance happened to mask this stale attribution but it was semantically wrong — phase 4 is the right time to correct it. Also adds expected_call_site_lines [8,8] so the new field is exercised on an existing cross-file case. - rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call). This fixture's sink (Command::new inside run_cmd at line 5) was the motivating case for phases 1-3; adding the call-site assertion guards against regression to caller-line attribution. New fixtures: - rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both takes two tainted params and invokes two Command sinks on consecutive lines. Locks in that primary line lands inside the helper (lines 5-6), not at the caller (line 12). Notes document that SinkSite is currently one-per-callee so both findings today collapse onto the first sink; expected_sink_lines=[5,6] and expected_call_site_lines=[12,12] stay valid either way. - python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross- 004): sink os.system lives in helper.py (cross-file), caller in app.py reads env source and calls run_cmd. Verifies phase 3's cross-file primary attribution: Diag.path = helper.py, Diag.line = 5, with app.py:7 recorded in flow_steps as a Call step. Acceptance: - `cargo test --test benchmark_test -- --ignored --nocapture` passes. - rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are TP/TP/TP. - Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994 F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on 264 pre-phase-4, delta is the +2 new cases both resolving TP). - Full `cargo test` green (1566 lib tests + all integration tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(taint): phase 5/5 lock Finding.primary_location contract via regression test Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three emission stages (pick_primary_sink_sites → emit_ssa_taint_events → ssa_events_to_findings) against a minimal caller SSA body. Asserts the resulting Finding.primary_location is exactly that triple. The existing integration tests in src/taint/tests.rs cover the coarse FuncSummary path end-to-end through analyse_file. This test locks in the lower-level SSA-side plumbing so a future refactor that silently drops the site between pick → emit → findings fails here rather than only at the benchmark layer. Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003 remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4). Closes the primary sink-location attribution feature (phases 1-5/5): * Phase 1 — SinkSite data model on summaries. * Phase 2 — SinkSite threaded into SsaTaintEvent and Finding. * Phase 3 — diag + SARIF consume primary_location. * Phase 4 — benchmark validates primary_call_site_lines across corpus. * Phase 5 — regression test locks the event→finding contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: clean up formatting and improve readability in multiple files * refactor: simplify type definition for deduplication key in findings * test(harness): add must_not_match expectation for FP regression guards Extends ExpectedFinding with must_not_match field that asserts a diagnostic must NOT fire — presence is a hard failure. Non-consuming scan so it coexists with must_match entries on the same rule_id. Adds forbidden_violations accumulator and updates summary line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules * feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking * feat: update switch statement handling to improve control flow analysis * feat: implement promisify alias handling for JS/TS to enhance taint tracking * feat: enhance taint tracking by refining expectation handling and adding mode filtering * feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters * feat: update taint tracking rules to enforce full mode matching and improve flow analysis * feat: enhance Ruby subshell handling to improve taint tracking and flow analysis * feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding * feat: refine framework detection and update expectation handling for Echo and Sinatra * feat: implement max_count for taint tracking expectations and deduplicate findings * feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files * feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity * feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files * feat: add structural invariant checks for SSA bodies * feat: ensure deterministic phi emission order using BTreeSet * feat: enhance handling of terminators to ensure authoritative flow through successor edges * feat: enhance Goto terminator handling to ensure all successors are marked executable * feat: refactor code for improved readability and organization * feat: simplify predicate checks and enhance readability in SSA handling * feat: implement per-file parse timeout and enhance file size handling * feat: migrate analysis engine toggles from environment variables to configuration file * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: update dependencies and enhance documentation on language maturity * feat: enhance security headers and improve request body limits * feat: implement sink capability bits for deduplication and enhance evidence tagging * feat: implement dynamic activation handling for gated sinks and enhance validation logic * feat: enhance configuration documentation and clarify inline analysis cache behavior * feat: implement panic recovery during analysis to continue scans past errors * feat: add expectations configuration for taint analysis and performance metrics * feat: enhance error handling and logging during file reading and mutex locking * feat: add cross-file body loading tests and plumbing for CF-1 phase * feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures * feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality * feat: enhance classification span handling in CFG and AST for improved source attribution * feat: add new Express routes for handling user input and telemetry data * feat: implement ternary expression handling in CFG with diamond structure for JS/TS * feat: implement Phase CF-3 abstract-domain transfer channels in summaries * feat: add support for string-prefix transfer in cross-file calls and update tests * docs: reduce RESULTS.md doc size * feat: implement Phase CF-4 per-return-path summary decomposition with tests * feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization * feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests * feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests * refactor: update comments and documentation for clarity and consistency * style: format code for consistency and readability * refactor: simplify verdict handling and improve edge checking logic * refactor: optimize path and identifier collection by avoiding unnecessary cloning * chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults * refactor: update documentation and improve clarity in configuration files * refactor: update documentation and improve clarity in configuration files * feat: add JS/TS pass-2 convergence tests and expectations configuration * feat: add Phase 5 regression tests for inline cache origin attribution and update related logic * feat: implement Phase 7 deduplication and alternative path linking for taint findings * feat: implement structural DFS index for anonymous functions and update naming conventions * feat: add Phase 8 regression tests for container-element taint in JS and Python * feat: add engine-depth profiles and explain-engine option for CLI * feat: update expectations and add new README fixtures for multi-file scan regression * feat: implement Phase 11 callback-alias and factory patterns with regression tests * feat: implement Terminator::Switch for multi-way dispatch and add regression tests * feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants * refactor: extract cfg and ssa_transfer to submodules * refactor: cargo fmt * refactor: remove unnecessary blank line in cfg_tests.rs * refactor: remove unnecessary planning file * chore: update Rust version to 1.88 and bump dependencies in Cargo files * feat: enhance triage UI with new layout and controls, update README for clarity * feat: enhance triage UI with new layout and controls, update README for clarity * chore: remove outdated section from README for version 0.5.0 * docs: improve clarity and consistency in README content * chore: add "GPL-3.0-or-later" to license options in about.toml * chore: update license handling in about.toml and check-licenses.mjs * style: format code for improved readability in TriagePage component * style: format code for improved readability in TriagePage component * chore: enhance license handling and improve body_id scoping in seed lookup * feat: introduce owner and parent body IDs for enhanced seed scoping * feat: implement direction-aware engine provenance with new CLI flag for strict CI gating * feat: add Undef SSA operation for improved control-flow handling * style: improve code formatting for consistency and readability in multiple files * feat: add 16-function chain SCC across multiple files for enhanced analysis * style: simplify code formatting for improved readability in multiple files * fix: update CapHitReason default implementation and improve README clarity * docs: enhance README with detailed explanations of taint analysis and limitations * docs: refine README for clarity and consistency in taint analysis section * style: improve code formatting for better readability in NewScanModal and scans * fix: update cargo-about command to use --offline for deterministic license generation * fix: update cargo-about command to use --offline for deterministic license generation * ci: add step to prime cargo registry cache for deterministic license generation * feat: add support for non-sink collections in authorization analysis * feat: enhance authorization checks with row-level ownership equality and binding tracking * feat: implement self-scoped user handling and enhance ownership checks * refactor: simplify assertions and formatting in authorization analysis tests * fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure * docs: update AI disclosure section for clarity and conciseness * feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure * feat: enhance authorization analysis with SSA-derived variable type classification * feat: implement auth_finding_to_diag function for enhanced security diagnostics * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add direction-aware engine provenance with LossDirection classification and new CLI flag * feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks * feat: enhance error message handling in cli_validation_tests for better Windows compatibility * feat: optimize release profile settings in Cargo.toml and update CodeQL configuration * feat: enhance release build process with SBOM generation and SLSA provenance * feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries * feat: introduce PathFact handling for path safety checks and rejection logic * feat: introduce PathFact handling for path safety checks and rejection logic * feat: update benchmark data and enhance path sanitization logic with new safety checks * feat: document AI assistance in frontend UI development and human review process * feat: add return path facts for enhanced path safety checks and update documentation * chore: update release date for version 0.5.0 in CHANGELOG.md * chore: clean up ci.yml by removing outdated comments and clarifying steps * feat: implement cross-language path sanitizers and validators for enhanced security * feat: enhance SSA value usage tracking by including block terminators and improve path safety checks * feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases * refactor: simplify conditional formatting and improve code readability in executor and lower modules * feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: add transform classifiers for Java, Go, and Ruby with corresponding tests * refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
c4ce08b452
commit
41128177d2
2144 changed files with 201812 additions and 8927 deletions
321
src/ssa/alias.rs
Normal file
321
src/ssa/alias.rs
Normal file
|
|
@ -0,0 +1,321 @@
|
|||
use std::collections::HashMap;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
use smallvec::SmallVec;
|
||||
|
||||
use super::ir::*;
|
||||
|
||||
/// Maximum members per alias group to bound analysis cost.
|
||||
const MAX_ALIAS_GROUP_SIZE: usize = 16;
|
||||
|
||||
/// Result of base-variable alias analysis.
|
||||
///
|
||||
/// Maps variable base names that are known to reference the same object.
|
||||
/// Two names in the same group are must-aliases: a copy `b = a` (with no
|
||||
/// semantic labels) means `b` and `a` reference the same value, so field
|
||||
/// paths like `b.data` and `a.data` are interchangeable.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct BaseAliasResult {
|
||||
/// base_name → canonical name. All aliases map to the same canonical.
|
||||
canonical: HashMap<String, String>,
|
||||
/// canonical_name → all member base names (including the canonical itself).
|
||||
members: HashMap<String, SmallVec<[String; 4]>>,
|
||||
}
|
||||
|
||||
impl BaseAliasResult {
|
||||
/// An empty result (no aliases detected).
|
||||
pub fn empty() -> Self {
|
||||
Self {
|
||||
canonical: HashMap::new(),
|
||||
members: HashMap::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// True when no aliases were found.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.members.is_empty()
|
||||
}
|
||||
|
||||
/// Get all must-alias base names for `base` (including itself).
|
||||
/// Returns `None` if the name has no known aliases.
|
||||
pub fn aliases_of(&self, base: &str) -> Option<&[String]> {
|
||||
let canon = self.canonical.get(base)?;
|
||||
self.members.get(canon).map(|v| v.as_slice())
|
||||
}
|
||||
|
||||
/// Check if two base names are must-aliases.
|
||||
pub fn are_aliases(&self, a: &str, b: &str) -> bool {
|
||||
if a == b {
|
||||
return true;
|
||||
}
|
||||
match (self.canonical.get(a), self.canonical.get(b)) {
|
||||
(Some(ca), Some(cb)) => ca == cb,
|
||||
_ => false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Compute base-variable alias groups from the copy propagation replacement map.
|
||||
///
|
||||
/// For each entry `(dst_val, src_val)` where copy prop replaced `dst` with
|
||||
/// `src`, looks up the original variable names. If both are plain identifiers
|
||||
/// (no dots — i.e. not field paths), they are registered as base aliases.
|
||||
/// Transitive closure is computed so `b = a; c = b` yields group `{a, b, c}`.
|
||||
pub fn compute_base_aliases(
|
||||
copy_map: &HashMap<SsaValue, SsaValue>,
|
||||
body: &SsaBody,
|
||||
) -> BaseAliasResult {
|
||||
if copy_map.is_empty() {
|
||||
return BaseAliasResult::empty();
|
||||
}
|
||||
|
||||
// Union-Find for transitive closure (string-keyed, small N).
|
||||
let mut parent: HashMap<String, String> = HashMap::new();
|
||||
|
||||
fn find(parent: &mut HashMap<String, String>, x: &str) -> String {
|
||||
if !parent.contains_key(x) {
|
||||
return x.to_string();
|
||||
}
|
||||
let mut root = x.to_string();
|
||||
// Chase to root (with iteration cap for safety).
|
||||
for _ in 0..100 {
|
||||
match parent.get(&root) {
|
||||
Some(p) if p != &root => root = p.clone(),
|
||||
_ => break,
|
||||
}
|
||||
}
|
||||
// Path compression.
|
||||
let mut cur = x.to_string();
|
||||
for _ in 0..100 {
|
||||
match parent.get(&cur) {
|
||||
Some(p) if p != &root => {
|
||||
let next = p.clone();
|
||||
parent.insert(cur, root.clone());
|
||||
cur = next;
|
||||
}
|
||||
_ => break,
|
||||
}
|
||||
}
|
||||
root
|
||||
}
|
||||
|
||||
fn union(parent: &mut HashMap<String, String>, a: &str, b: &str) {
|
||||
let ra = find(parent, a);
|
||||
let rb = find(parent, b);
|
||||
if ra != rb {
|
||||
// Arbitrary root choice — alphabetically smaller becomes root
|
||||
// for determinism.
|
||||
if ra < rb {
|
||||
parent.insert(rb, ra);
|
||||
} else {
|
||||
parent.insert(ra, rb);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Collect alias pairs from the copy map.
|
||||
for (&dst, &src) in copy_map {
|
||||
let dst_idx = dst.0 as usize;
|
||||
let src_idx = src.0 as usize;
|
||||
if dst_idx >= body.value_defs.len() || src_idx >= body.value_defs.len() {
|
||||
continue;
|
||||
}
|
||||
|
||||
let dst_name = match &body.value_defs[dst_idx].var_name {
|
||||
Some(n) => n.as_str(),
|
||||
None => continue,
|
||||
};
|
||||
let src_name = match &body.value_defs[src_idx].var_name {
|
||||
Some(n) => n.as_str(),
|
||||
None => continue,
|
||||
};
|
||||
|
||||
// Only alias plain idents — dotted paths (field accesses) are tracked
|
||||
// independently in SSA and handled by field-aware suppression.
|
||||
if dst_name.contains('.') || src_name.contains('.') {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Skip self-aliases.
|
||||
if dst_name == src_name {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Ensure both exist in the parent map.
|
||||
parent
|
||||
.entry(dst_name.to_string())
|
||||
.or_insert_with(|| dst_name.to_string());
|
||||
parent
|
||||
.entry(src_name.to_string())
|
||||
.or_insert_with(|| src_name.to_string());
|
||||
|
||||
union(&mut parent, dst_name, src_name);
|
||||
}
|
||||
|
||||
if parent.is_empty() {
|
||||
return BaseAliasResult::empty();
|
||||
}
|
||||
|
||||
// Build groups from union-find.
|
||||
let mut groups: HashMap<String, SmallVec<[String; 4]>> = HashMap::new();
|
||||
let all_names: Vec<String> = parent.keys().cloned().collect();
|
||||
for name in &all_names {
|
||||
let root = find(&mut parent, name);
|
||||
groups.entry(root).or_default().push(name.clone());
|
||||
}
|
||||
|
||||
// Remove singleton groups (no aliases) and enforce size limit.
|
||||
groups.retain(|_, members| members.len() > 1);
|
||||
for members in groups.values_mut() {
|
||||
members.sort();
|
||||
members.truncate(MAX_ALIAS_GROUP_SIZE);
|
||||
}
|
||||
|
||||
// Build canonical map.
|
||||
let mut canonical: HashMap<String, String> = HashMap::new();
|
||||
for (root, members) in &groups {
|
||||
for member in members {
|
||||
canonical.insert(member.clone(), root.clone());
|
||||
}
|
||||
}
|
||||
|
||||
BaseAliasResult {
|
||||
canonical,
|
||||
members: groups,
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Helper: create a ValueDef with the given var_name.
|
||||
fn vdef(name: &str) -> ValueDef {
|
||||
ValueDef {
|
||||
var_name: Some(name.to_string()),
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
}
|
||||
}
|
||||
|
||||
fn vdef_none() -> ValueDef {
|
||||
ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
}
|
||||
}
|
||||
|
||||
fn make_body(defs: Vec<ValueDef>) -> SsaBody {
|
||||
SsaBody {
|
||||
blocks: vec![],
|
||||
entry: BlockId(0),
|
||||
value_defs: defs,
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_simple_alias_detection() {
|
||||
// v0 = "a", v1 = "b"; copy_map: v1 → v0 ⇒ {a, b}
|
||||
let body = make_body(vec![vdef("a"), vdef("b")]);
|
||||
let mut copy_map = HashMap::new();
|
||||
copy_map.insert(SsaValue(1), SsaValue(0));
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
assert!(!result.is_empty());
|
||||
assert!(result.are_aliases("a", "b"));
|
||||
assert!(result.are_aliases("b", "a"));
|
||||
|
||||
let aliases = result.aliases_of("a").unwrap();
|
||||
assert_eq!(aliases.len(), 2);
|
||||
assert!(aliases.contains(&"a".to_string()));
|
||||
assert!(aliases.contains(&"b".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_transitive_aliases() {
|
||||
// v0="a", v1="b", v2="c"; copy_map: v1→v0, v2→v1 ⇒ {a, b, c}
|
||||
let body = make_body(vec![vdef("a"), vdef("b"), vdef("c")]);
|
||||
let mut copy_map = HashMap::new();
|
||||
copy_map.insert(SsaValue(1), SsaValue(0));
|
||||
copy_map.insert(SsaValue(2), SsaValue(1));
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
assert!(result.are_aliases("a", "b"));
|
||||
assert!(result.are_aliases("b", "c"));
|
||||
assert!(result.are_aliases("a", "c"));
|
||||
|
||||
let aliases = result.aliases_of("c").unwrap();
|
||||
assert_eq!(aliases.len(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_no_alias_for_none_names() {
|
||||
// v0=None, v1="b"; copy_map: v1→v0 ⇒ no aliases
|
||||
let body = make_body(vec![vdef_none(), vdef("b")]);
|
||||
let mut copy_map = HashMap::new();
|
||||
copy_map.insert(SsaValue(1), SsaValue(0));
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
assert!(result.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_dotted_paths_ignored() {
|
||||
// v0="a.x", v1="b.x"; copy_map: v1→v0 ⇒ no aliases (dotted)
|
||||
let body = make_body(vec![vdef("a.x"), vdef("b.x")]);
|
||||
let mut copy_map = HashMap::new();
|
||||
copy_map.insert(SsaValue(1), SsaValue(0));
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
assert!(result.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_alias_group_size_limit() {
|
||||
// Create 20 variables all aliased to v0
|
||||
let mut defs = vec![vdef("v0")];
|
||||
let mut copy_map = HashMap::new();
|
||||
for i in 1..20u32 {
|
||||
defs.push(vdef(&format!("v{}", i)));
|
||||
copy_map.insert(SsaValue(i), SsaValue(0));
|
||||
}
|
||||
let body = make_body(defs);
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
// All should be aliases, but group is capped at MAX_ALIAS_GROUP_SIZE
|
||||
let aliases = result.aliases_of("v0").unwrap();
|
||||
assert_eq!(aliases.len(), MAX_ALIAS_GROUP_SIZE);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_empty_copy_map() {
|
||||
let body = make_body(vec![vdef("a"), vdef("b")]);
|
||||
let copy_map = HashMap::new();
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
assert!(result.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_self_alias_ignored() {
|
||||
// v0="a"; copy_map: v0→v0 ⇒ no aliases (self)
|
||||
let body = make_body(vec![vdef("a")]);
|
||||
let mut copy_map = HashMap::new();
|
||||
copy_map.insert(SsaValue(0), SsaValue(0));
|
||||
|
||||
let result = compute_base_aliases(©_map, &body);
|
||||
assert!(result.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_are_aliases_same_name() {
|
||||
let result = BaseAliasResult::empty();
|
||||
// Same name is always an alias of itself
|
||||
assert!(result.are_aliases("x", "x"));
|
||||
}
|
||||
}
|
||||
754
src/ssa/const_prop.rs
Normal file
754
src/ssa/const_prop.rs
Normal file
|
|
@ -0,0 +1,754 @@
|
|||
use std::collections::{HashMap, HashSet, VecDeque};
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use super::ir::*;
|
||||
|
||||
/// Lattice value for constant propagation.
|
||||
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub enum ConstLattice {
|
||||
/// Not yet analyzed (optimistic top).
|
||||
Top,
|
||||
/// Known string constant.
|
||||
Str(String),
|
||||
/// Known integer constant.
|
||||
Int(i64),
|
||||
/// Known boolean constant.
|
||||
Bool(bool),
|
||||
/// Null / nil / None.
|
||||
Null,
|
||||
/// Multiple possible values — not constant.
|
||||
Varying,
|
||||
}
|
||||
|
||||
impl ConstLattice {
|
||||
/// Meet operation: combine two lattice values.
|
||||
fn meet(&self, other: &Self) -> Self {
|
||||
match (self, other) {
|
||||
(ConstLattice::Top, x) | (x, ConstLattice::Top) => x.clone(),
|
||||
(ConstLattice::Varying, _) | (_, ConstLattice::Varying) => ConstLattice::Varying,
|
||||
(a, b) if a == b => a.clone(),
|
||||
_ => ConstLattice::Varying,
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse a raw constant text into a typed lattice value.
|
||||
pub(crate) fn parse(text: &str) -> Self {
|
||||
let trimmed = text.trim();
|
||||
|
||||
// Boolean
|
||||
if trimmed == "true" || trimmed == "True" || trimmed == "TRUE" {
|
||||
return ConstLattice::Bool(true);
|
||||
}
|
||||
if trimmed == "false" || trimmed == "False" || trimmed == "FALSE" {
|
||||
return ConstLattice::Bool(false);
|
||||
}
|
||||
|
||||
// Null variants
|
||||
if trimmed == "null"
|
||||
|| trimmed == "nil"
|
||||
|| trimmed == "None"
|
||||
|| trimmed == "NULL"
|
||||
|| trimmed == "nullptr"
|
||||
{
|
||||
return ConstLattice::Null;
|
||||
}
|
||||
|
||||
// Integer (including negative)
|
||||
if let Ok(i) = trimmed.parse::<i64>() {
|
||||
return ConstLattice::Int(i);
|
||||
}
|
||||
|
||||
// String: strip surrounding quotes
|
||||
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|
||||
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
|
||||
{
|
||||
let inner = &trimmed[1..trimmed.len() - 1];
|
||||
return ConstLattice::Str(inner.to_string());
|
||||
}
|
||||
|
||||
// Bare string (no quotes) — treat as string constant
|
||||
ConstLattice::Str(trimmed.to_string())
|
||||
}
|
||||
|
||||
/// Returns the boolean value if this is a known Bool.
|
||||
pub fn as_bool(&self) -> Option<bool> {
|
||||
match self {
|
||||
ConstLattice::Bool(b) => Some(*b),
|
||||
// Truthiness: null is false, 0 is false, empty string is false
|
||||
ConstLattice::Null => Some(false),
|
||||
ConstLattice::Int(0) => Some(false),
|
||||
ConstLattice::Str(s) if s.is_empty() => Some(false),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Result of constant propagation analysis.
|
||||
pub struct ConstPropResult {
|
||||
/// Per-SSA-value constant lattice.
|
||||
pub values: HashMap<SsaValue, ConstLattice>,
|
||||
/// Blocks that are statically unreachable.
|
||||
pub unreachable_blocks: HashSet<BlockId>,
|
||||
}
|
||||
|
||||
/// Run Sparse Conditional Constant Propagation on an SSA body.
|
||||
pub fn const_propagate(body: &SsaBody) -> ConstPropResult {
|
||||
let num_blocks = body.blocks.len();
|
||||
|
||||
// Per-value lattice: starts at Top
|
||||
let mut values: HashMap<SsaValue, ConstLattice> = HashMap::new();
|
||||
|
||||
// Executable flags per CFG edge (from_block, to_block)
|
||||
let mut executable_edges: HashSet<(BlockId, BlockId)> = HashSet::new();
|
||||
// Executable blocks
|
||||
let mut executable_blocks: HashSet<BlockId> = HashSet::new();
|
||||
|
||||
// Two worklists
|
||||
let mut cfg_worklist: VecDeque<BlockId> = VecDeque::new();
|
||||
let mut ssa_worklist: VecDeque<SsaValue> = VecDeque::new();
|
||||
|
||||
// Mark entry executable
|
||||
executable_blocks.insert(body.entry);
|
||||
cfg_worklist.push_back(body.entry);
|
||||
|
||||
// Build use-map: SsaValue → list of (BlockId, instruction index in block)
|
||||
// so we can propagate SSA value changes efficiently.
|
||||
let mut use_sites: HashMap<SsaValue, Vec<BlockId>> = HashMap::new();
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
for used_val in inst_uses(inst) {
|
||||
use_sites.entry(used_val).or_default().push(block.id);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Initialize all values to Top
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
values.insert(inst.value, ConstLattice::Top);
|
||||
}
|
||||
}
|
||||
|
||||
// Process until both worklists are empty
|
||||
loop {
|
||||
let mut changed = false;
|
||||
|
||||
// Process CFG worklist
|
||||
while let Some(block_id) = cfg_worklist.pop_front() {
|
||||
let block = body.block(block_id);
|
||||
|
||||
// Evaluate phis
|
||||
for phi in &block.phis {
|
||||
if let SsaOp::Phi(operands) = &phi.op {
|
||||
let old = values.get(&phi.value).cloned().unwrap_or(ConstLattice::Top);
|
||||
let new_val = eval_phi(operands, &values, &executable_edges, block_id);
|
||||
if new_val != old {
|
||||
values.insert(phi.value, new_val);
|
||||
ssa_worklist.push_back(phi.value);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Evaluate body instructions
|
||||
for inst in &block.body {
|
||||
let old = values
|
||||
.get(&inst.value)
|
||||
.cloned()
|
||||
.unwrap_or(ConstLattice::Top);
|
||||
let new_val = eval_inst(inst, &values);
|
||||
if new_val != old {
|
||||
values.insert(inst.value, new_val);
|
||||
ssa_worklist.push_back(inst.value);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
|
||||
// Process terminator: determine which successors are executable
|
||||
process_terminator(
|
||||
block,
|
||||
body,
|
||||
&values,
|
||||
&mut executable_edges,
|
||||
&mut executable_blocks,
|
||||
&mut cfg_worklist,
|
||||
);
|
||||
}
|
||||
|
||||
// Process SSA worklist
|
||||
while let Some(val) = ssa_worklist.pop_front() {
|
||||
if let Some(blocks) = use_sites.get(&val) {
|
||||
for &block_id in blocks {
|
||||
if !executable_blocks.contains(&block_id) {
|
||||
continue;
|
||||
}
|
||||
let block = body.block(block_id);
|
||||
|
||||
// Re-evaluate phis using this value
|
||||
for phi in &block.phis {
|
||||
if let SsaOp::Phi(operands) = &phi.op
|
||||
&& operands.iter().any(|(_, v)| *v == val)
|
||||
{
|
||||
let old = values.get(&phi.value).cloned().unwrap_or(ConstLattice::Top);
|
||||
let new_val = eval_phi(operands, &values, &executable_edges, block_id);
|
||||
if new_val != old {
|
||||
values.insert(phi.value, new_val);
|
||||
ssa_worklist.push_back(phi.value);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Re-evaluate body instructions using this value
|
||||
for inst in &block.body {
|
||||
if inst_uses(inst).contains(&val) {
|
||||
let old = values
|
||||
.get(&inst.value)
|
||||
.cloned()
|
||||
.unwrap_or(ConstLattice::Top);
|
||||
let new_val = eval_inst(inst, &values);
|
||||
if new_val != old {
|
||||
values.insert(inst.value, new_val);
|
||||
ssa_worklist.push_back(inst.value);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Re-evaluate terminator if condition changed
|
||||
process_terminator(
|
||||
block,
|
||||
body,
|
||||
&values,
|
||||
&mut executable_edges,
|
||||
&mut executable_blocks,
|
||||
&mut cfg_worklist,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if !changed {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Compute unreachable blocks
|
||||
let unreachable_blocks: HashSet<BlockId> = (0..num_blocks)
|
||||
.map(|i| BlockId(i as u32))
|
||||
.filter(|bid| !executable_blocks.contains(bid))
|
||||
.collect();
|
||||
|
||||
ConstPropResult {
|
||||
values,
|
||||
unreachable_blocks,
|
||||
}
|
||||
}
|
||||
|
||||
/// Evaluate a phi: meet of operands from executable predecessors.
|
||||
fn eval_phi(
|
||||
operands: &[(BlockId, SsaValue)],
|
||||
values: &HashMap<SsaValue, ConstLattice>,
|
||||
executable_edges: &HashSet<(BlockId, BlockId)>,
|
||||
this_block: BlockId,
|
||||
) -> ConstLattice {
|
||||
let mut result = ConstLattice::Top;
|
||||
for (pred_block, val) in operands {
|
||||
if !executable_edges.contains(&(*pred_block, this_block)) {
|
||||
continue; // skip non-executable predecessors
|
||||
}
|
||||
let operand_val = values.get(val).cloned().unwrap_or(ConstLattice::Top);
|
||||
result = result.meet(&operand_val);
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
/// Evaluate a single instruction.
|
||||
fn eval_inst(inst: &SsaInst, values: &HashMap<SsaValue, ConstLattice>) -> ConstLattice {
|
||||
match &inst.op {
|
||||
SsaOp::Const(Some(text)) => ConstLattice::parse(text),
|
||||
SsaOp::Const(None) => ConstLattice::Varying, // unknown constant
|
||||
SsaOp::Assign(uses) if uses.len() == 1 => {
|
||||
// Copy: propagate the source's value
|
||||
values.get(&uses[0]).cloned().unwrap_or(ConstLattice::Top)
|
||||
}
|
||||
SsaOp::Assign(_) => ConstLattice::Varying, // expression with multiple uses
|
||||
SsaOp::Call { .. }
|
||||
| SsaOp::Source
|
||||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam => ConstLattice::Varying,
|
||||
SsaOp::Phi(_) => ConstLattice::Varying, // phis in body shouldn't happen
|
||||
SsaOp::Nop => ConstLattice::Varying,
|
||||
// Undef contributes no knowledge: `Top` is the lattice identity
|
||||
// for meet, so a phi operand of Undef leaves the joined value
|
||||
// to the other incoming operands.
|
||||
SsaOp::Undef => ConstLattice::Top,
|
||||
}
|
||||
}
|
||||
|
||||
/// Collect SSA values used by an instruction (for use-map building).
|
||||
fn inst_uses(inst: &SsaInst) -> Vec<SsaValue> {
|
||||
match &inst.op {
|
||||
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
|
||||
SsaOp::Assign(uses) => uses.to_vec(),
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
let mut vals = Vec::new();
|
||||
if let Some(rv) = receiver {
|
||||
vals.push(*rv);
|
||||
}
|
||||
for arg in args {
|
||||
vals.extend(arg.iter());
|
||||
}
|
||||
vals
|
||||
}
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam
|
||||
| SsaOp::Nop
|
||||
| SsaOp::Undef => Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Process a block's terminator to determine successor executability.
|
||||
fn process_terminator(
|
||||
block: &SsaBlock,
|
||||
body: &SsaBody,
|
||||
values: &HashMap<SsaValue, ConstLattice>,
|
||||
executable_edges: &mut HashSet<(BlockId, BlockId)>,
|
||||
executable_blocks: &mut HashSet<BlockId>,
|
||||
cfg_worklist: &mut VecDeque<BlockId>,
|
||||
) {
|
||||
match &block.terminator {
|
||||
Terminator::Goto(_) => {
|
||||
// `block.succs` is authoritative. For collapsed ≥3-way fanouts
|
||||
// (see src/ssa/lower.rs `three_successor_collapse`) the terminator
|
||||
// only records the first successor; marking just that one would
|
||||
// leave the others unreachable for SCCP. Iterate succs so every
|
||||
// CFG successor is marked executable.
|
||||
for &target in &block.succs {
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
target,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
}
|
||||
}
|
||||
Terminator::Branch {
|
||||
cond,
|
||||
true_blk,
|
||||
false_blk,
|
||||
condition: _,
|
||||
} => {
|
||||
// Try to resolve the condition to a known boolean
|
||||
let cond_val = body
|
||||
.cfg_node_map
|
||||
.get(cond)
|
||||
.and_then(|v| values.get(v))
|
||||
.and_then(|c| c.as_bool());
|
||||
|
||||
match cond_val {
|
||||
Some(true) => {
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
*true_blk,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
}
|
||||
Some(false) => {
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
*false_blk,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
}
|
||||
None => {
|
||||
// Unknown: both successors executable
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
*true_blk,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
*false_blk,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
Terminator::Switch {
|
||||
scrutinee,
|
||||
targets,
|
||||
default,
|
||||
case_values,
|
||||
} => {
|
||||
// Try to resolve scrutinee to a concrete integer literal; if
|
||||
// we can match it against one of the case literals (not
|
||||
// currently available on the SSA IR), mark just that target.
|
||||
// Until per-case literals are threaded through, fall back to
|
||||
// the sound "any successor executable" behavior, which mirrors
|
||||
// the pre-Switch cascade.
|
||||
let _ = (scrutinee, targets, default, case_values);
|
||||
for &target in &block.succs {
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
target,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
}
|
||||
}
|
||||
Terminator::Return(_) | Terminator::Unreachable => {
|
||||
// `block.succs` is authoritative; the terminator is advisory.
|
||||
// Finally/cleanup continuation edges live on `succs` even when
|
||||
// the structured terminator is `Return`/`Unreachable`. Mark them
|
||||
// executable so SCCP reaches downstream (e.g. finally) blocks.
|
||||
for &target in &block.succs {
|
||||
mark_edge_executable(
|
||||
block.id,
|
||||
target,
|
||||
executable_edges,
|
||||
executable_blocks,
|
||||
cfg_worklist,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn mark_edge_executable(
|
||||
from: BlockId,
|
||||
to: BlockId,
|
||||
executable_edges: &mut HashSet<(BlockId, BlockId)>,
|
||||
executable_blocks: &mut HashSet<BlockId>,
|
||||
cfg_worklist: &mut VecDeque<BlockId>,
|
||||
) {
|
||||
if executable_edges.insert((from, to)) {
|
||||
if executable_blocks.insert(to) {
|
||||
cfg_worklist.push_back(to);
|
||||
} else {
|
||||
// Block already executable but new edge — re-evaluate phis
|
||||
cfg_worklist.push_back(to);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply constant propagation results: prune branches where condition is known constant.
|
||||
///
|
||||
/// Returns the number of branches pruned.
|
||||
pub fn apply_const_prop(body: &mut SsaBody, result: &ConstPropResult) -> usize {
|
||||
// Collect pruning decisions first to avoid borrow conflicts.
|
||||
// Each entry: (block_index, taken_block, untaken_block)
|
||||
let mut prune_ops: Vec<(usize, BlockId, BlockId)> = Vec::new();
|
||||
|
||||
for (block_idx, block) in body.blocks.iter().enumerate() {
|
||||
if let Terminator::Branch {
|
||||
cond,
|
||||
true_blk,
|
||||
false_blk,
|
||||
condition: _,
|
||||
} = &block.terminator
|
||||
{
|
||||
let cond_val = body
|
||||
.cfg_node_map
|
||||
.get(cond)
|
||||
.and_then(|v| result.values.get(v))
|
||||
.and_then(|c| c.as_bool());
|
||||
|
||||
match cond_val {
|
||||
Some(true) => {
|
||||
prune_ops.push((block_idx, *true_blk, *false_blk));
|
||||
}
|
||||
Some(false) => {
|
||||
prune_ops.push((block_idx, *false_blk, *true_blk));
|
||||
}
|
||||
None => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let pruned = prune_ops.len();
|
||||
|
||||
// Apply pruning
|
||||
for (block_idx, taken, untaken) in prune_ops {
|
||||
let pred_id = body.blocks[block_idx].id;
|
||||
body.blocks[block_idx].terminator = Terminator::Goto(taken);
|
||||
|
||||
// Remove pred from untaken's preds
|
||||
let untaken_idx = untaken.0 as usize;
|
||||
if untaken_idx < body.blocks.len() {
|
||||
body.blocks[untaken_idx].preds.retain(|p| *p != pred_id);
|
||||
// Remove phi operands referencing this pred
|
||||
for phi in &mut body.blocks[untaken_idx].phis {
|
||||
if let SsaOp::Phi(operands) = &mut phi.op {
|
||||
operands.retain(|(bid, _)| *bid != pred_id);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Remove untaken from pred's succs
|
||||
body.blocks[block_idx].succs.retain(|s| *s != untaken);
|
||||
}
|
||||
|
||||
// Mark unreachable blocks
|
||||
for &bid in &result.unreachable_blocks {
|
||||
body.block_mut(bid).terminator = Terminator::Unreachable;
|
||||
}
|
||||
|
||||
pruned
|
||||
}
|
||||
|
||||
/// Collect module aliases from `require()` calls in the SSA body.
|
||||
///
|
||||
/// Detects patterns like `const http = require("http")` and propagates
|
||||
/// aliases through phi nodes (e.g., `const lib = cond ? https : http`).
|
||||
/// Returns a map from SSA value → set of possible module names.
|
||||
///
|
||||
/// Only tracks known HTTP-related modules to avoid false positives.
|
||||
pub fn collect_module_aliases(
|
||||
body: &SsaBody,
|
||||
const_values: &HashMap<SsaValue, ConstLattice>,
|
||||
) -> HashMap<SsaValue, smallvec::SmallVec<[String; 2]>> {
|
||||
use smallvec::SmallVec;
|
||||
|
||||
// Known modules whose methods are security-relevant for alias tracking.
|
||||
const KNOWN_MODULES: &[&str] = &["http", "https", "child_process", "fs", "net", "dgram"];
|
||||
|
||||
let mut aliases: HashMap<SsaValue, SmallVec<[String; 2]>> = HashMap::new();
|
||||
|
||||
// Pass 1: detect `require("module")` calls.
|
||||
for block in &body.blocks {
|
||||
for inst in &block.body {
|
||||
if let SsaOp::Call { callee, args, .. } = &inst.op
|
||||
&& (callee == "require" || callee.ends_with(".require"))
|
||||
{
|
||||
// Check if the first argument is a known module string constant.
|
||||
if let Some(first_arg) = args.first()
|
||||
&& let Some(&first_val) = first_arg.first()
|
||||
&& let Some(ConstLattice::Str(module_name)) = const_values.get(&first_val)
|
||||
&& KNOWN_MODULES.contains(&module_name.as_str())
|
||||
{
|
||||
aliases
|
||||
.entry(inst.value)
|
||||
.or_default()
|
||||
.push(module_name.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if aliases.is_empty() {
|
||||
return aliases;
|
||||
}
|
||||
|
||||
// Pass 2: propagate through copies (single-use Assign) and phi nodes.
|
||||
let mut changed = true;
|
||||
let mut iterations = 0;
|
||||
while changed && iterations < 10 {
|
||||
changed = false;
|
||||
iterations += 1;
|
||||
for block in &body.blocks {
|
||||
// Phi nodes
|
||||
for phi in &block.phis {
|
||||
if let SsaOp::Phi(operands) = &phi.op {
|
||||
let mut merged: SmallVec<[String; 2]> = SmallVec::new();
|
||||
for (_, operand_val) in operands {
|
||||
if let Some(operand_aliases) = aliases.get(operand_val) {
|
||||
for a in operand_aliases {
|
||||
if !merged.contains(a) {
|
||||
merged.push(a.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if !merged.is_empty() {
|
||||
let entry = aliases.entry(phi.value).or_default();
|
||||
for a in &merged {
|
||||
if !entry.contains(a) {
|
||||
entry.push(a.clone());
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// Copy propagation through single-use Assign
|
||||
for inst in &block.body {
|
||||
if let SsaOp::Assign(uses) = &inst.op
|
||||
&& uses.len() == 1
|
||||
&& let Some(src_aliases) = aliases.get(&uses[0]).cloned()
|
||||
{
|
||||
let entry = aliases.entry(inst.value).or_default();
|
||||
for a in &src_aliases {
|
||||
if !entry.contains(a) {
|
||||
entry.push(a.clone());
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
aliases
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
fn make_body(blocks: Vec<SsaBlock>, value_defs: Vec<ValueDef>) -> SsaBody {
|
||||
let cfg_node_map = value_defs
|
||||
.iter()
|
||||
.enumerate()
|
||||
.map(|(i, vd)| (vd.cfg_node, SsaValue(i as u32)))
|
||||
.collect();
|
||||
SsaBody {
|
||||
blocks,
|
||||
entry: BlockId(0),
|
||||
value_defs,
|
||||
cfg_node_map,
|
||||
exception_edges: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn const_literal_parsed() {
|
||||
assert_eq!(ConstLattice::parse("42"), ConstLattice::Int(42));
|
||||
assert_eq!(ConstLattice::parse("-1"), ConstLattice::Int(-1));
|
||||
assert_eq!(ConstLattice::parse("true"), ConstLattice::Bool(true));
|
||||
assert_eq!(ConstLattice::parse("false"), ConstLattice::Bool(false));
|
||||
assert_eq!(ConstLattice::parse("null"), ConstLattice::Null);
|
||||
assert_eq!(ConstLattice::parse("nil"), ConstLattice::Null);
|
||||
assert_eq!(
|
||||
ConstLattice::parse("\"hello\""),
|
||||
ConstLattice::Str("hello".into())
|
||||
);
|
||||
assert_eq!(
|
||||
ConstLattice::parse("'world'"),
|
||||
ConstLattice::Str("world".into())
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn meet_lattice() {
|
||||
let a = ConstLattice::Int(42);
|
||||
let b = ConstLattice::Int(42);
|
||||
assert_eq!(a.meet(&b), ConstLattice::Int(42));
|
||||
|
||||
let c = ConstLattice::Int(99);
|
||||
assert_eq!(a.meet(&c), ConstLattice::Varying);
|
||||
|
||||
assert_eq!(ConstLattice::Top.meet(&a), ConstLattice::Int(42));
|
||||
assert_eq!(a.meet(&ConstLattice::Top), ConstLattice::Int(42));
|
||||
|
||||
assert_eq!(ConstLattice::Varying.meet(&a), ConstLattice::Varying);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn single_block_const() {
|
||||
// v0 = const("42")
|
||||
let n0 = NodeIndex::new(0);
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
};
|
||||
let body = make_body(
|
||||
vec![block],
|
||||
vec![ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
);
|
||||
|
||||
let result = const_propagate(&body);
|
||||
assert_eq!(
|
||||
result.values.get(&SsaValue(0)),
|
||||
Some(&ConstLattice::Int(42))
|
||||
);
|
||||
assert!(result.unreachable_blocks.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn copy_propagation_through_assign() {
|
||||
// v0 = const("true"), v1 = assign(v0)
|
||||
let n0 = NodeIndex::new(0);
|
||||
let n1 = NodeIndex::new(1);
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("true".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 4),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (5, 9),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
};
|
||||
let body = make_body(
|
||||
vec![block],
|
||||
vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
);
|
||||
|
||||
let result = const_propagate(&body);
|
||||
assert_eq!(
|
||||
result.values.get(&SsaValue(0)),
|
||||
Some(&ConstLattice::Bool(true))
|
||||
);
|
||||
assert_eq!(
|
||||
result.values.get(&SsaValue(1)),
|
||||
Some(&ConstLattice::Bool(true))
|
||||
);
|
||||
}
|
||||
}
|
||||
228
src/ssa/copy_prop.rs
Normal file
228
src/ssa/copy_prop.rs
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
#![allow(clippy::collapsible_if)]
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
use super::ir::*;
|
||||
use crate::cfg::Cfg;
|
||||
|
||||
/// Run copy propagation on an SSA body.
|
||||
///
|
||||
/// Identifies `Assign([single_use])` instructions where the CFG node has no
|
||||
/// labels (i.e., no semantic significance like sanitizer/source), then rewrites
|
||||
/// all uses of the destination value to use the source value directly.
|
||||
///
|
||||
/// Returns `(copies_eliminated, resolved_replacement_map)`. The replacement map
|
||||
/// maps each eliminated destination SsaValue to its transitive root source
|
||||
/// SsaValue, used downstream by alias analysis to recover base-variable
|
||||
/// aliasing relationships.
|
||||
pub fn copy_propagate(body: &mut SsaBody, cfg: &Cfg) -> (usize, HashMap<SsaValue, SsaValue>) {
|
||||
// 1. Identify copies: Assign with single operand and no labels on CFG node
|
||||
let mut replace_map: HashMap<SsaValue, SsaValue> = HashMap::new();
|
||||
|
||||
for block in &body.blocks {
|
||||
for inst in &block.body {
|
||||
if let SsaOp::Assign(uses) = &inst.op {
|
||||
if uses.len() == 1 {
|
||||
let src = uses[0];
|
||||
let info = &cfg[inst.cfg_node];
|
||||
// Skip if the node has labels — sanitizers, sources, sinks
|
||||
// have semantic meaning that must be preserved.
|
||||
if !info.taint.labels.is_empty() {
|
||||
continue;
|
||||
}
|
||||
// Skip numeric-length reads (`arr.length`, `map.size`, etc.):
|
||||
// the destination is Int-typed (a derived property of the
|
||||
// source) while the source is typically String/Object/
|
||||
// Unknown. Copy-propagating through this Assign would
|
||||
// erase the Int type fact and defeat HTML_ESCAPE / SQL /
|
||||
// FILE_IO / SHELL sink suppression.
|
||||
if info.is_numeric_length_access {
|
||||
continue;
|
||||
}
|
||||
// Skip Assigns whose CFG node carries a `string_prefix`
|
||||
// (template literals or `"lit" + var` RHS recognised by
|
||||
// `extract_template_prefix`). The abstract-interpretation
|
||||
// `transfer_abstract` consumes that prefix to seed a
|
||||
// StringFact on the Assign's SSA value, which downstream
|
||||
// SSRF suppression reads. Propagating past this Assign
|
||||
// erases the prefix-bearing SSA value: the Call's args get
|
||||
// rewritten to the bare upstream variable (no prefix), and
|
||||
// `is_call_abstract_safe` falls through to a tainted-flow
|
||||
// emission even on safe fixed-host URLs.
|
||||
if info.string_prefix.is_some() {
|
||||
continue;
|
||||
}
|
||||
replace_map.insert(inst.value, src);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if replace_map.is_empty() {
|
||||
return (0, HashMap::new());
|
||||
}
|
||||
|
||||
// 2. Build transitive replacement map: chase chains (SSA is acyclic)
|
||||
let mut resolved: HashMap<SsaValue, SsaValue> = HashMap::new();
|
||||
for &dst in replace_map.keys() {
|
||||
let root = resolve_root(dst, &replace_map);
|
||||
resolved.insert(dst, root);
|
||||
}
|
||||
|
||||
// 3. Rewrite all uses
|
||||
let mut count = 0;
|
||||
for block in &mut body.blocks {
|
||||
// Rewrite phi operands
|
||||
for phi in &mut block.phis {
|
||||
if let SsaOp::Phi(operands) = &mut phi.op {
|
||||
for (_bid, val) in operands.iter_mut() {
|
||||
if let Some(&root) = resolved.get(val) {
|
||||
*val = root;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Rewrite body instructions
|
||||
for inst in &mut block.body {
|
||||
match &mut inst.op {
|
||||
SsaOp::Assign(uses) => {
|
||||
for val in uses.iter_mut() {
|
||||
if let Some(&root) = resolved.get(val) {
|
||||
*val = root;
|
||||
}
|
||||
}
|
||||
}
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
if let Some(rv) = receiver {
|
||||
if let Some(&root) = resolved.get(rv) {
|
||||
*rv = root;
|
||||
}
|
||||
}
|
||||
for arg in args.iter_mut() {
|
||||
for val in arg.iter_mut() {
|
||||
if let Some(&root) = resolved.get(val) {
|
||||
*val = root;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Convert copy instructions to Nop (DCE will clean up)
|
||||
for block in &mut body.blocks {
|
||||
for inst in &mut block.body {
|
||||
if resolved.contains_key(&inst.value) {
|
||||
inst.op = SsaOp::Nop;
|
||||
count += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
(count, resolved)
|
||||
}
|
||||
|
||||
/// Chase the replacement chain to find the root value.
|
||||
fn resolve_root(val: SsaValue, map: &HashMap<SsaValue, SsaValue>) -> SsaValue {
|
||||
let mut current = val;
|
||||
// Safety: SSA is acyclic, but cap iterations to be safe
|
||||
for _ in 0..1000 {
|
||||
match map.get(¤t) {
|
||||
Some(&next) if next != current => current = next,
|
||||
_ => break,
|
||||
}
|
||||
}
|
||||
current
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::cfg::{NodeInfo, StmtKind};
|
||||
use petgraph::Graph;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
fn make_cfg_node(kind: StmtKind) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn simple_copy_eliminated() {
|
||||
// v0 = const("42"), v1 = assign(v0), v2 = assign(v1)
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (3, 5),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(2),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(1), 1)),
|
||||
cfg_node: n2,
|
||||
var_name: Some("z".into()),
|
||||
span: (6, 8),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("z".into()),
|
||||
cfg_node: n2,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
|
||||
.into_iter()
|
||||
.collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
|
||||
assert_eq!(eliminated, 2);
|
||||
// Both v1 and v2 should map to v0 (the root)
|
||||
assert_eq!(copy_map.get(&SsaValue(1)), Some(&SsaValue(0)));
|
||||
assert_eq!(copy_map.get(&SsaValue(2)), Some(&SsaValue(0)));
|
||||
|
||||
// v1 and v2 should be Nop now
|
||||
assert!(matches!(body.blocks[0].body[1].op, SsaOp::Nop));
|
||||
assert!(matches!(body.blocks[0].body[2].op, SsaOp::Nop));
|
||||
}
|
||||
}
|
||||
449
src/ssa/dce.rs
Normal file
449
src/ssa/dce.rs
Normal file
|
|
@ -0,0 +1,449 @@
|
|||
use std::collections::HashMap;
|
||||
|
||||
use super::ir::*;
|
||||
use crate::cfg::Cfg;
|
||||
use crate::labels::DataLabel;
|
||||
|
||||
/// Eliminate dead definitions from an SSA body.
|
||||
///
|
||||
/// A definition is dead if its SsaValue has zero uses across the entire body,
|
||||
/// except for instructions that must be preserved:
|
||||
/// - `Source` (taint origin, must survive for correctness)
|
||||
/// - `Call` (may have side effects)
|
||||
/// - `CatchParam` (exception binding)
|
||||
/// - Instructions whose CFG node has Sink labels (sink detection relies on them)
|
||||
///
|
||||
/// Returns the number of instructions removed.
|
||||
pub fn eliminate_dead_defs(body: &mut SsaBody, cfg: &Cfg) -> usize {
|
||||
let mut total_removed = 0;
|
||||
|
||||
// Iterate until no more removals (removing a def may make its operands dead)
|
||||
loop {
|
||||
let use_counts = build_use_counts(body);
|
||||
let mut removed_this_pass = 0;
|
||||
|
||||
for block in &mut body.blocks {
|
||||
// Remove dead body instructions
|
||||
let before = block.body.len();
|
||||
block.body.retain(|inst| !is_dead(inst, &use_counts, cfg));
|
||||
removed_this_pass += before - block.body.len();
|
||||
|
||||
// Remove dead phi instructions
|
||||
let before_phis = block.phis.len();
|
||||
block.phis.retain(|inst| !is_dead(inst, &use_counts, cfg));
|
||||
removed_this_pass += before_phis - block.phis.len();
|
||||
}
|
||||
|
||||
total_removed += removed_this_pass;
|
||||
if removed_this_pass == 0 {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
total_removed
|
||||
}
|
||||
|
||||
/// Build a map of SsaValue → number of uses across all instructions and
|
||||
/// block terminators.
|
||||
///
|
||||
/// Terminator uses must be counted: `Terminator::Return(rv)` references the
|
||||
/// returned value and `Terminator::Branch { condition, .. }` references the
|
||||
/// condition variable. Without counting these, a value used solely by a
|
||||
/// terminator (the canonical case for short helpers like
|
||||
/// `def f(s): return s`) is judged dead, and DCE strips every instruction
|
||||
/// in the body — leaving empty blocks whose terminators reference
|
||||
/// nonexistent SsaValues, breaking downstream analyses (per-return-path
|
||||
/// PathFact narrowing, inline-summary extraction, etc.).
|
||||
fn build_use_counts(body: &SsaBody) -> HashMap<SsaValue, usize> {
|
||||
let mut counts: HashMap<SsaValue, usize> = HashMap::new();
|
||||
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
for v in inst_used_values(inst) {
|
||||
*counts.entry(v).or_insert(0) += 1;
|
||||
}
|
||||
}
|
||||
for v in terminator_used_values(&block.terminator) {
|
||||
*counts.entry(v).or_insert(0) += 1;
|
||||
}
|
||||
}
|
||||
|
||||
counts
|
||||
}
|
||||
|
||||
/// Get all SSA values used by a block terminator.
|
||||
fn terminator_used_values(term: &Terminator) -> Vec<SsaValue> {
|
||||
use crate::constraint::lower::{ConditionExpr, Operand};
|
||||
match term {
|
||||
Terminator::Return(Some(rv)) => vec![*rv],
|
||||
Terminator::Return(None) => Vec::new(),
|
||||
Terminator::Branch { condition, .. } => match condition.as_deref() {
|
||||
Some(ConditionExpr::BoolTest { var }) => vec![*var],
|
||||
Some(ConditionExpr::NullCheck { var, .. }) => vec![*var],
|
||||
Some(ConditionExpr::TypeCheck { var, .. }) => vec![*var],
|
||||
Some(ConditionExpr::Comparison { lhs, rhs, .. }) => {
|
||||
let mut out = Vec::new();
|
||||
if let Operand::Value(v) = lhs {
|
||||
out.push(*v);
|
||||
}
|
||||
if let Operand::Value(v) = rhs {
|
||||
out.push(*v);
|
||||
}
|
||||
out
|
||||
}
|
||||
Some(ConditionExpr::Unknown) | None => Vec::new(),
|
||||
},
|
||||
Terminator::Switch { scrutinee, .. } => vec![*scrutinee],
|
||||
Terminator::Goto(_) | Terminator::Unreachable => Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if an instruction is dead and safe to remove.
|
||||
fn is_dead(inst: &SsaInst, use_counts: &HashMap<SsaValue, usize>, cfg: &Cfg) -> bool {
|
||||
let uses = use_counts.get(&inst.value).copied().unwrap_or(0);
|
||||
if uses > 0 {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Never remove side-effectful or semantically required instructions
|
||||
match &inst.op {
|
||||
SsaOp::Source => return false,
|
||||
SsaOp::Call { .. } => return false,
|
||||
SsaOp::CatchParam => return false,
|
||||
_ => {}
|
||||
}
|
||||
|
||||
// Never remove instructions whose CFG node has Sink, Source, or Sanitizer labels
|
||||
if cfg.node_weight(inst.cfg_node).is_some_and(|info| {
|
||||
info.taint.labels.iter().any(|l| {
|
||||
matches!(
|
||||
l,
|
||||
DataLabel::Sink(_) | DataLabel::Source(_) | DataLabel::Sanitizer(_)
|
||||
)
|
||||
})
|
||||
}) {
|
||||
return false;
|
||||
}
|
||||
|
||||
true
|
||||
}
|
||||
|
||||
/// Get all SSA values used by an instruction.
|
||||
fn inst_used_values(inst: &SsaInst) -> Vec<SsaValue> {
|
||||
match &inst.op {
|
||||
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
|
||||
SsaOp::Assign(uses) => uses.to_vec(),
|
||||
SsaOp::Call { args, receiver, .. } => {
|
||||
let mut vals = Vec::new();
|
||||
if let Some(rv) = receiver {
|
||||
vals.push(*rv);
|
||||
}
|
||||
for arg in args {
|
||||
vals.extend(arg.iter());
|
||||
}
|
||||
vals
|
||||
}
|
||||
SsaOp::Source
|
||||
| SsaOp::Const(_)
|
||||
| SsaOp::Param { .. }
|
||||
| SsaOp::SelfParam
|
||||
| SsaOp::CatchParam
|
||||
| SsaOp::Nop
|
||||
| SsaOp::Undef => Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::cfg::{NodeInfo, StmtKind};
|
||||
use petgraph::Graph;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
fn make_cfg_node(kind: StmtKind) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dead_const_removed() {
|
||||
// v0 = const("42") — unused, should be removed
|
||||
// v1 = source() — must survive even if unused
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Source,
|
||||
cfg_node: n1,
|
||||
var_name: Some("tainted".into()),
|
||||
span: (3, 10),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("tainted".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(removed, 1);
|
||||
assert_eq!(body.blocks[0].body.len(), 1);
|
||||
// Source survives
|
||||
assert!(matches!(body.blocks[0].body[0].op, SsaOp::Source));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dead_sanitizer_label_preserved() {
|
||||
// v0 has a Sanitizer label on its CFG node — must survive even if unused
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(NodeInfo {
|
||||
taint: crate::cfg::TaintMeta {
|
||||
labels: smallvec::smallvec![DataLabel::Sanitizer(Cap::HTML_ESCAPE)],
|
||||
..Default::default()
|
||||
},
|
||||
..make_cfg_node(StmtKind::Seq)
|
||||
});
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Assign(SmallVec::new()),
|
||||
cfg_node: n0,
|
||||
var_name: Some("sanitized".into()),
|
||||
span: (0, 5),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("sanitized".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(
|
||||
removed, 0,
|
||||
"Sanitizer-labeled instruction must not be removed"
|
||||
);
|
||||
assert_eq!(body.blocks[0].body.len(), 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dead_source_label_preserved() {
|
||||
// v0 has a Source label on its CFG node — must survive even if unused
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(NodeInfo {
|
||||
taint: crate::cfg::TaintMeta {
|
||||
labels: smallvec::smallvec![DataLabel::Source(Cap::all())],
|
||||
..Default::default()
|
||||
},
|
||||
..make_cfg_node(StmtKind::Seq)
|
||||
});
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Assign(SmallVec::new()),
|
||||
cfg_node: n0,
|
||||
var_name: Some("src".into()),
|
||||
span: (0, 3),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("src".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(removed, 0, "Source-labeled instruction must not be removed");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dead_sink_label_still_preserved() {
|
||||
// Regression: Sink-labeled dead instructions must still be kept
|
||||
use crate::labels::{Cap, DataLabel};
|
||||
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(NodeInfo {
|
||||
taint: crate::cfg::TaintMeta {
|
||||
labels: smallvec::smallvec![DataLabel::Sink(Cap::SQL_QUERY)],
|
||||
..Default::default()
|
||||
},
|
||||
..make_cfg_node(StmtKind::Seq)
|
||||
});
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Assign(SmallVec::new()),
|
||||
cfg_node: n0,
|
||||
var_name: Some("q".into()),
|
||||
span: (0, 2),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("q".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(removed, 0, "Sink-labeled instruction must not be removed");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn dead_unlabeled_assign_still_removed() {
|
||||
// Negative test: unlabeled dead assignments must still be eliminated
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Assign(SmallVec::new()),
|
||||
cfg_node: n0,
|
||||
var_name: Some("dead".into()),
|
||||
span: (0, 4),
|
||||
}],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("dead".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
assert_eq!(removed, 1, "unlabeled dead assignment must be removed");
|
||||
assert!(body.blocks[0].body.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn used_def_preserved() {
|
||||
// v0 = const("42"), v1 = assign(v0) — v0 is used, both survive
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
|
||||
|
||||
let mut body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(Some("42".into())),
|
||||
cfg_node: n0,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 2),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(1),
|
||||
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
|
||||
cfg_node: n1,
|
||||
var_name: Some("y".into()),
|
||||
span: (3, 5),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: SmallVec::new(),
|
||||
succs: SmallVec::new(),
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![
|
||||
ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: n0,
|
||||
block: BlockId(0),
|
||||
},
|
||||
ValueDef {
|
||||
var_name: Some("y".into()),
|
||||
cfg_node: n1,
|
||||
block: BlockId(0),
|
||||
},
|
||||
],
|
||||
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
|
||||
let removed = eliminate_dead_defs(&mut body, &cfg);
|
||||
// v1 is dead (unused), but v0 is used by v1 so on first pass only v1 removed,
|
||||
// then v0 becomes dead on second pass
|
||||
assert_eq!(removed, 2);
|
||||
assert_eq!(body.blocks[0].body.len(), 0);
|
||||
}
|
||||
}
|
||||
147
src/ssa/display.rs
Normal file
147
src/ssa/display.rs
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
use std::fmt;
|
||||
|
||||
use super::ir::*;
|
||||
|
||||
impl fmt::Display for SsaBody {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
for block in &self.blocks {
|
||||
let entry_marker = if block.id == self.entry {
|
||||
" (entry)"
|
||||
} else {
|
||||
""
|
||||
};
|
||||
writeln!(f, "Block B{}{entry_marker}:", block.id.0)?;
|
||||
|
||||
// Predecessors
|
||||
if !block.preds.is_empty() {
|
||||
let preds: Vec<String> = block.preds.iter().map(|p| format!("B{}", p.0)).collect();
|
||||
writeln!(f, " ; preds: {}", preds.join(", "))?;
|
||||
}
|
||||
|
||||
// Phi instructions
|
||||
for inst in &block.phis {
|
||||
write!(f, " v{} = ", inst.value.0)?;
|
||||
if let SsaOp::Phi(ref operands) = inst.op {
|
||||
let ops: Vec<String> = operands
|
||||
.iter()
|
||||
.map(|(bid, val)| format!("B{}:v{}", bid.0, val.0))
|
||||
.collect();
|
||||
write!(f, "phi({})", ops.join(", "))?;
|
||||
}
|
||||
if let Some(ref name) = inst.var_name {
|
||||
write!(f, " # {name}")?;
|
||||
}
|
||||
writeln!(f)?;
|
||||
}
|
||||
|
||||
// Body instructions
|
||||
for inst in &block.body {
|
||||
write!(f, " v{} = ", inst.value.0)?;
|
||||
match &inst.op {
|
||||
SsaOp::Phi(_) => write!(f, "phi(???)")?, // shouldn't appear in body
|
||||
SsaOp::Assign(uses) => {
|
||||
let uses_str: Vec<String> =
|
||||
uses.iter().map(|v| format!("v{}", v.0)).collect();
|
||||
write!(f, "assign({})", uses_str.join(", "))?;
|
||||
}
|
||||
SsaOp::Call {
|
||||
callee,
|
||||
args,
|
||||
receiver,
|
||||
} => {
|
||||
if let Some(rv) = receiver {
|
||||
write!(f, "v{}.{callee}(", rv.0)?;
|
||||
} else {
|
||||
write!(f, "{callee}(")?;
|
||||
}
|
||||
let arg_strs: Vec<String> = args
|
||||
.iter()
|
||||
.map(|arg| {
|
||||
let vs: Vec<String> =
|
||||
arg.iter().map(|v| format!("v{}", v.0)).collect();
|
||||
vs.join("+")
|
||||
})
|
||||
.collect();
|
||||
write!(f, "{})", arg_strs.join(", "))?;
|
||||
}
|
||||
SsaOp::Source => write!(f, "source()")?,
|
||||
SsaOp::Const(val) => {
|
||||
if let Some(v) = val {
|
||||
write!(f, "const({v})")?;
|
||||
} else {
|
||||
write!(f, "const")?;
|
||||
}
|
||||
}
|
||||
SsaOp::Param { index } => write!(f, "param({index})")?,
|
||||
SsaOp::SelfParam => write!(f, "self_param()")?,
|
||||
SsaOp::CatchParam => write!(f, "catch_param()")?,
|
||||
SsaOp::Nop => write!(f, "nop")?,
|
||||
SsaOp::Undef => write!(f, "undef")?,
|
||||
}
|
||||
if let Some(ref name) = inst.var_name {
|
||||
write!(f, " # {name}")?;
|
||||
}
|
||||
// Span info
|
||||
if inst.span != (0, 0) {
|
||||
write!(f, " @ {}..{}", inst.span.0, inst.span.1)?;
|
||||
}
|
||||
writeln!(f)?;
|
||||
}
|
||||
|
||||
// Terminator
|
||||
match &block.terminator {
|
||||
Terminator::Goto(target) => writeln!(f, " goto → B{}", target.0)?,
|
||||
Terminator::Branch {
|
||||
true_blk,
|
||||
false_blk,
|
||||
..
|
||||
} => writeln!(
|
||||
f,
|
||||
" branch → B{} (true), B{} (false)",
|
||||
true_blk.0, false_blk.0
|
||||
)?,
|
||||
Terminator::Switch {
|
||||
scrutinee,
|
||||
targets,
|
||||
default,
|
||||
case_values,
|
||||
} => {
|
||||
write!(f, " switch v{} → [", scrutinee.0)?;
|
||||
for (i, t) in targets.iter().enumerate() {
|
||||
if i > 0 {
|
||||
write!(f, ", ")?;
|
||||
}
|
||||
match case_values.get(i).and_then(|cv| cv.as_ref()) {
|
||||
Some(lit) => write!(f, "{:?}=B{}", lit, t.0)?,
|
||||
None => write!(f, "B{}", t.0)?,
|
||||
}
|
||||
}
|
||||
writeln!(f, "] default B{}", default.0)?;
|
||||
}
|
||||
Terminator::Return(ret_val) => {
|
||||
if let Some(v) = ret_val {
|
||||
writeln!(f, " return v{}", v.0)?
|
||||
} else {
|
||||
writeln!(f, " return")?
|
||||
}
|
||||
}
|
||||
Terminator::Unreachable => writeln!(f, " unreachable")?,
|
||||
}
|
||||
|
||||
writeln!(f)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
impl fmt::Display for SsaValue {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
write!(f, "v{}", self.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl fmt::Display for BlockId {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
write!(f, "B{}", self.0)
|
||||
}
|
||||
}
|
||||
1350
src/ssa/heap.rs
Normal file
1350
src/ssa/heap.rs
Normal file
File diff suppressed because it is too large
Load diff
915
src/ssa/invariants.rs
Normal file
915
src/ssa/invariants.rs
Normal file
|
|
@ -0,0 +1,915 @@
|
|||
//! Structural invariant checks for SSA bodies.
|
||||
//!
|
||||
//! In addition to the `Vec<String>` aggregation used by
|
||||
//! [`check_structural_invariants`], targeted checks that SSA *lowering* may
|
||||
//! want to query directly (e.g. to decide whether to panic in debug builds
|
||||
//! or warn + attach an engine note in release builds) return a
|
||||
//! [`Result<(), InvariantError>`] for a more ergonomic API.
|
||||
//!
|
||||
//!
|
||||
//! These checks prove that [`SsaBody`] instances are well-formed: single-
|
||||
//! assignment holds, pred/succ edges are mutually consistent, phi operands
|
||||
//! reference actual predecessors, terminators agree with the successor
|
||||
//! list, and every `SsaValue` is backed by a matching `ValueDef`.
|
||||
//!
|
||||
//! The module is intentionally separate from the lowering code so the same
|
||||
//! invariants can be exercised from tests that do not have access to the
|
||||
//! private scaffolding inside [`crate::ssa::lower`]. Each function returns
|
||||
//! a `Vec<String>` of violation messages rather than panicking; tests can
|
||||
//! aggregate violations across an entire corpus before failing.
|
||||
//!
|
||||
//! Invariants are split into two groups:
|
||||
//!
|
||||
//! **Group A — SSA integrity (must hold unconditionally):**
|
||||
//!
|
||||
//! 1. `BlockId` indexing — `blocks[i].id == BlockId(i)`
|
||||
//! 2. Entry block has no predecessors
|
||||
//! 3. Pred/succ symmetry — `B.succs.contains(S)` ⇔ `S.preds.contains(B)`
|
||||
//! 4. Phi placement — every phi appears in `block.phis` (never in body)
|
||||
//! 5. Phi operand arity — ≤ `block.preds.len()`
|
||||
//! 6. Phi operand sources — every `(pred_bid, _)` operand has
|
||||
//! `block.preds.contains(pred_bid)`
|
||||
//! 7. Unique SSA definitions — every `SsaValue` is defined at most once
|
||||
//! across all phi + body instructions
|
||||
//! 8. `value_defs` coverage — every defined `SsaValue.0` is a valid index
|
||||
//! into `value_defs`, and `value_defs[v.0].block` matches the block
|
||||
//! containing the defining instruction
|
||||
//! 9. `cfg_node_map` consistency — every `(node, SsaValue)` pair points
|
||||
//! to an instruction whose `cfg_node == node`
|
||||
//!
|
||||
//! **Group B — terminator and reachability (loose, reflecting lowering):**
|
||||
//!
|
||||
//! 10. Terminator/succs agreement *subset* form:
|
||||
//! * `Goto(t)` → `succs.contains(t)` — extras tolerated
|
||||
//! (3-successor collapse fallback)
|
||||
//! * `Branch{t, f, …}` → `succs` contains both `t` and `f`
|
||||
//! * `Return`/`Unreachable` → no constraint on `succs` (CFG may carry
|
||||
//! finally/cleanup continuation edges that downstream analysis
|
||||
//! propagates through)
|
||||
//! 11. Reachability from entry — tolerated exceptions:
|
||||
//! * blocks that appear as the `catch` side of an exception edge
|
||||
//!
|
||||
//! Group B is deliberately permissive: the SSA body's `succs` field is the
|
||||
//! authoritative successor set for analysis (taint, abstract interp,
|
||||
//! symbolic execution all enumerate `block.succs`), while the terminator
|
||||
//! is a structured summary that may simplify or drop CFG-level info.
|
||||
//! Regression value comes from catching *new* deviations from these
|
||||
//! already-understood patterns, not from enforcing a textbook SSA shape
|
||||
//! the lowering never promised.
|
||||
|
||||
use super::ir::*;
|
||||
|
||||
/// Errors returned by targeted invariant checks.
|
||||
///
|
||||
/// Wraps a list of human-readable violation messages — one per offending
|
||||
/// block — so callers can include every failure in a single panic /
|
||||
/// warning.
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct InvariantError {
|
||||
pub messages: Vec<String>,
|
||||
}
|
||||
|
||||
impl InvariantError {
|
||||
/// Join every message onto its own line.
|
||||
pub fn joined(&self) -> String {
|
||||
self.messages.join("\n")
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Display for InvariantError {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "{}", self.joined())
|
||||
}
|
||||
}
|
||||
|
||||
impl std::error::Error for InvariantError {}
|
||||
|
||||
/// Aggregate invariant violations found in a single body. An empty
|
||||
/// vector means the body is structurally well-formed.
|
||||
pub fn check_structural_invariants(body: &SsaBody) -> Vec<String> {
|
||||
let mut errors = Vec::new();
|
||||
|
||||
check_block_ids(body, &mut errors);
|
||||
check_entry_has_no_preds(body, &mut errors);
|
||||
check_pred_succ_symmetry(body, &mut errors);
|
||||
check_terminator_succ_agreement(body, &mut errors);
|
||||
check_phi_placement_and_arity(body, &mut errors);
|
||||
check_phi_operand_sources(body, &mut errors);
|
||||
check_unique_definitions(body, &mut errors);
|
||||
check_value_def_coverage(body, &mut errors);
|
||||
check_cfg_node_map(body, &mut errors);
|
||||
check_reachability(body, &mut errors);
|
||||
if let Err(e) = check_catch_block_reachability(body) {
|
||||
errors.extend(e.messages);
|
||||
}
|
||||
|
||||
errors
|
||||
}
|
||||
|
||||
/// Every block carrying an [`SsaOp::CatchParam`] — an exception-handler
|
||||
/// entry — must be reachable from either the function entry (via normal
|
||||
/// flow) or from at least one entry in [`SsaBody::exception_edges`].
|
||||
///
|
||||
/// When this fails, the CFG builder has produced an orphan catch block
|
||||
/// that should have been wired up as an exception successor but was not —
|
||||
/// a real construction bug that otherwise manifests as silent false
|
||||
/// negatives in resource-cleanup / exception-flow findings.
|
||||
pub fn check_catch_block_reachability(body: &SsaBody) -> Result<(), InvariantError> {
|
||||
let n = body.blocks.len();
|
||||
if n == 0 {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// 1. Identify catch blocks: any block containing a CatchParam op in
|
||||
// either its phi or body lists.
|
||||
let catch_blocks: Vec<BlockId> = body
|
||||
.blocks
|
||||
.iter()
|
||||
.filter(|b| {
|
||||
b.phis
|
||||
.iter()
|
||||
.chain(b.body.iter())
|
||||
.any(|inst| matches!(inst.op, SsaOp::CatchParam))
|
||||
})
|
||||
.map(|b| b.id)
|
||||
.collect();
|
||||
|
||||
if catch_blocks.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// 2. BFS from entry via normal succs.
|
||||
let mut reachable = vec![false; n];
|
||||
let entry_idx = body.entry.0 as usize;
|
||||
if entry_idx < n {
|
||||
reachable[entry_idx] = true;
|
||||
let mut stack: Vec<BlockId> = vec![body.entry];
|
||||
while let Some(b) = stack.pop() {
|
||||
for &s in &body.blocks[b.0 as usize].succs {
|
||||
let sidx = s.0 as usize;
|
||||
if sidx < n && !reachable[sidx] {
|
||||
reachable[sidx] = true;
|
||||
stack.push(s);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Collect exception-edge targets.
|
||||
let exception_targets: std::collections::HashSet<BlockId> = body
|
||||
.exception_edges
|
||||
.iter()
|
||||
.map(|(_, catch)| *catch)
|
||||
.collect();
|
||||
|
||||
// 4. Each catch block must be normal-reachable OR an exception target.
|
||||
let mut messages = Vec::new();
|
||||
for bid in catch_blocks {
|
||||
let idx = bid.0 as usize;
|
||||
let normal = idx < n && reachable[idx];
|
||||
let via_exception = exception_targets.contains(&bid);
|
||||
if !normal && !via_exception {
|
||||
messages.push(format!(
|
||||
"catch-block orphan: block {:?} carries CatchParam but is neither \
|
||||
reachable from entry {:?} nor a target of any exception edge",
|
||||
bid, body.entry
|
||||
));
|
||||
}
|
||||
}
|
||||
|
||||
if messages.is_empty() {
|
||||
Ok(())
|
||||
} else {
|
||||
Err(InvariantError { messages })
|
||||
}
|
||||
}
|
||||
|
||||
// ── Individual invariant checks ─────────────────────────────────────────
|
||||
|
||||
fn check_block_ids(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
for (i, block) in body.blocks.iter().enumerate() {
|
||||
if block.id.0 as usize != i {
|
||||
errors.push(format!(
|
||||
"block at index {i} has mismatched id {:?}",
|
||||
block.id
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_entry_has_no_preds(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
let entry_idx = body.entry.0 as usize;
|
||||
if entry_idx >= body.blocks.len() {
|
||||
errors.push(format!("entry {:?} is out of bounds", body.entry));
|
||||
return;
|
||||
}
|
||||
let entry = &body.blocks[entry_idx];
|
||||
if !entry.preds.is_empty() {
|
||||
errors.push(format!(
|
||||
"entry block {:?} has {} predecessor(s): {:?}",
|
||||
body.entry,
|
||||
entry.preds.len(),
|
||||
entry.preds
|
||||
));
|
||||
}
|
||||
}
|
||||
|
||||
fn check_pred_succ_symmetry(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
for block in &body.blocks {
|
||||
for &succ in &block.succs {
|
||||
let sidx = succ.0 as usize;
|
||||
if sidx >= body.blocks.len() {
|
||||
errors.push(format!(
|
||||
"block {:?} has out-of-bounds succ {:?}",
|
||||
block.id, succ
|
||||
));
|
||||
continue;
|
||||
}
|
||||
if !body.blocks[sidx].preds.contains(&block.id) {
|
||||
errors.push(format!(
|
||||
"block {:?} lists succ {:?} but {:?} does not list {:?} as pred",
|
||||
block.id, succ, succ, block.id
|
||||
));
|
||||
}
|
||||
}
|
||||
for &pred in &block.preds {
|
||||
let pidx = pred.0 as usize;
|
||||
if pidx >= body.blocks.len() {
|
||||
errors.push(format!(
|
||||
"block {:?} has out-of-bounds pred {:?}",
|
||||
block.id, pred
|
||||
));
|
||||
continue;
|
||||
}
|
||||
if !body.blocks[pidx].succs.contains(&block.id) {
|
||||
errors.push(format!(
|
||||
"block {:?} lists pred {:?} but {:?} does not list {:?} as succ",
|
||||
block.id, pred, pred, block.id
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_terminator_succ_agreement(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
// Group B — loose agreement. See module docs for rationale.
|
||||
for block in &body.blocks {
|
||||
match &block.terminator {
|
||||
Terminator::Goto(target) => {
|
||||
if !block.succs.iter().any(|s| s == target) {
|
||||
errors.push(format!(
|
||||
"block {:?} Goto({:?}) target not in succs {:?}",
|
||||
block.id, target, block.succs
|
||||
));
|
||||
}
|
||||
}
|
||||
Terminator::Branch {
|
||||
true_blk,
|
||||
false_blk,
|
||||
..
|
||||
} => {
|
||||
if !block.succs.iter().any(|s| s == true_blk) {
|
||||
errors.push(format!(
|
||||
"block {:?} Branch true target {:?} not in succs {:?}",
|
||||
block.id, true_blk, block.succs
|
||||
));
|
||||
}
|
||||
if !block.succs.iter().any(|s| s == false_blk) {
|
||||
errors.push(format!(
|
||||
"block {:?} Branch false target {:?} not in succs {:?}",
|
||||
block.id, false_blk, block.succs
|
||||
));
|
||||
}
|
||||
}
|
||||
Terminator::Switch {
|
||||
targets, default, ..
|
||||
} => {
|
||||
// Every Switch target and the default arm must be in succs.
|
||||
for t in targets {
|
||||
if !block.succs.iter().any(|s| s == t) {
|
||||
errors.push(format!(
|
||||
"block {:?} Switch target {:?} not in succs {:?}",
|
||||
block.id, t, block.succs
|
||||
));
|
||||
}
|
||||
}
|
||||
if !block.succs.iter().any(|s| s == default) {
|
||||
errors.push(format!(
|
||||
"block {:?} Switch default {:?} not in succs {:?}",
|
||||
block.id, default, block.succs
|
||||
));
|
||||
}
|
||||
}
|
||||
Terminator::Return(_) | Terminator::Unreachable => {
|
||||
// Loose by design — cleanup/finally continuation edges in
|
||||
// `succs` are expected. Downstream consumers (taint
|
||||
// `compute_succ_states`, SCCP `process_terminator`) treat
|
||||
// `succs` as authoritative and propagate across these edges,
|
||||
// so the terminator shape must not forbid them.
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_phi_placement_and_arity(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
for block in &body.blocks {
|
||||
// Phis must not appear in body.
|
||||
for inst in &block.body {
|
||||
if matches!(inst.op, SsaOp::Phi(_)) {
|
||||
errors.push(format!(
|
||||
"block {:?} has a Phi op in body (should be in phis): value {:?}",
|
||||
block.id, inst.value
|
||||
));
|
||||
}
|
||||
}
|
||||
// Every entry in `phis` must be a Phi op.
|
||||
for inst in &block.phis {
|
||||
if !matches!(inst.op, SsaOp::Phi(_)) {
|
||||
errors.push(format!(
|
||||
"block {:?} has non-Phi op in phis slot: value {:?}",
|
||||
block.id, inst.value
|
||||
));
|
||||
}
|
||||
if let SsaOp::Phi(ref ops) = inst.op
|
||||
&& ops.len() > block.preds.len()
|
||||
{
|
||||
errors.push(format!(
|
||||
"block {:?} phi for {:?} has {} operand(s) > {} pred(s)",
|
||||
block.id,
|
||||
inst.value,
|
||||
ops.len(),
|
||||
block.preds.len()
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_phi_operand_sources(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
for block in &body.blocks {
|
||||
for inst in &block.phis {
|
||||
if let SsaOp::Phi(ref ops) = inst.op {
|
||||
for &(pred_bid, operand_value) in ops.iter() {
|
||||
if !block.preds.contains(&pred_bid) {
|
||||
errors.push(format!(
|
||||
"block {:?} phi for {:?} references non-pred {:?} (preds: {:?})",
|
||||
block.id, inst.value, pred_bid, block.preds
|
||||
));
|
||||
}
|
||||
// Operand value must be a valid SSA index.
|
||||
if (operand_value.0 as usize) >= body.value_defs.len() {
|
||||
errors.push(format!(
|
||||
"block {:?} phi for {:?} has operand {:?} out of value_defs range",
|
||||
block.id, inst.value, operand_value
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_unique_definitions(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
let mut seen: std::collections::HashMap<SsaValue, BlockId> =
|
||||
std::collections::HashMap::with_capacity(body.value_defs.len());
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
if let Some(prev) = seen.insert(inst.value, block.id) {
|
||||
errors.push(format!(
|
||||
"SSA {:?} defined in both {:?} and {:?} — single-assignment violated",
|
||||
inst.value, prev, block.id
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_value_def_coverage(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
let idx = inst.value.0 as usize;
|
||||
if idx >= body.value_defs.len() {
|
||||
errors.push(format!(
|
||||
"instruction defining {:?} in block {:?} has no entry in value_defs (len {})",
|
||||
inst.value,
|
||||
block.id,
|
||||
body.value_defs.len()
|
||||
));
|
||||
continue;
|
||||
}
|
||||
let def = &body.value_defs[idx];
|
||||
if def.block != block.id {
|
||||
errors.push(format!(
|
||||
"value_defs[{}] records block {:?} but instruction lives in block {:?}",
|
||||
idx, def.block, block.id
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_cfg_node_map(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
for (&cfg_node, &sv) in body.cfg_node_map.iter() {
|
||||
let idx = sv.0 as usize;
|
||||
if idx >= body.value_defs.len() {
|
||||
errors.push(format!(
|
||||
"cfg_node_map points {:?} → {:?} which is out of value_defs range",
|
||||
cfg_node, sv
|
||||
));
|
||||
continue;
|
||||
}
|
||||
let def = &body.value_defs[idx];
|
||||
if def.cfg_node != cfg_node {
|
||||
errors.push(format!(
|
||||
"cfg_node_map inconsistency: map says {:?} → {:?}, but value_defs[{}].cfg_node = {:?}",
|
||||
cfg_node, sv, idx, def.cfg_node
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn check_reachability(body: &SsaBody, errors: &mut Vec<String>) {
|
||||
let n = body.blocks.len();
|
||||
if n == 0 {
|
||||
errors.push("body has zero blocks".into());
|
||||
return;
|
||||
}
|
||||
let entry_idx = body.entry.0 as usize;
|
||||
if entry_idx >= n {
|
||||
// already reported by check_entry_has_no_preds
|
||||
return;
|
||||
}
|
||||
|
||||
// Multi-root BFS: start from the entry *and* from every catch target
|
||||
// recorded in `exception_edges`. Exception-handler blocks are reached
|
||||
// via stripped exception edges, so from the SSA body's perspective they
|
||||
// look like roots — as does anything transitively reachable from them
|
||||
// (e.g. a `finally` block chained after a `catch`).
|
||||
let mut visited = vec![false; n];
|
||||
let mut stack: Vec<BlockId> = Vec::new();
|
||||
let seed = |bid: BlockId, visited: &mut [bool], stack: &mut Vec<BlockId>| {
|
||||
let idx = bid.0 as usize;
|
||||
if idx < visited.len() && !visited[idx] {
|
||||
visited[idx] = true;
|
||||
stack.push(bid);
|
||||
}
|
||||
};
|
||||
seed(body.entry, &mut visited, &mut stack);
|
||||
for (_src, catch_target) in &body.exception_edges {
|
||||
seed(*catch_target, &mut visited, &mut stack);
|
||||
}
|
||||
while let Some(bid) = stack.pop() {
|
||||
let block = &body.blocks[bid.0 as usize];
|
||||
for &s in &block.succs {
|
||||
let sidx = s.0 as usize;
|
||||
if sidx < n && !visited[sidx] {
|
||||
visited[sidx] = true;
|
||||
stack.push(s);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (i, v) in visited.iter().enumerate() {
|
||||
if !*v {
|
||||
let block = &body.blocks[i];
|
||||
errors.push(format!(
|
||||
"block {:?} is unreachable from entry {:?} or any exception-handler root",
|
||||
block.id, body.entry
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Optimization idempotence ─────────────────────────────────────────────
|
||||
|
||||
/// Compute a structural fingerprint of an [`SsaBody`] that is stable across
|
||||
/// equivalent lowerings / optimisations. Two bodies producing the same
|
||||
/// fingerprint have the same block structure, terminator shape, per-block
|
||||
/// phi/body instruction counts and op-kind sequences. SsaValue numbers are
|
||||
/// not part of the fingerprint, so renumbering between runs does not cause
|
||||
/// spurious diffs — only shape changes do.
|
||||
///
|
||||
/// Phis are emitted in their natural (insertion) order. Lowering now drives
|
||||
/// phi placement through a `BTreeSet`, so that order is deterministic
|
||||
/// (alphabetical by `var_name`) and any divergence between runs is a real
|
||||
/// regression rather than hasher noise.
|
||||
pub fn body_fingerprint(body: &SsaBody) -> String {
|
||||
use std::fmt::Write;
|
||||
let mut out = String::new();
|
||||
let _ = writeln!(out, "entry={:?}", body.entry);
|
||||
let _ = writeln!(out, "blocks={}", body.blocks.len());
|
||||
for block in &body.blocks {
|
||||
let _ = writeln!(
|
||||
out,
|
||||
" b{:?} preds={} succs={} phis={} body={} term={}",
|
||||
block.id,
|
||||
block.preds.len(),
|
||||
block.succs.len(),
|
||||
block.phis.len(),
|
||||
block.body.len(),
|
||||
terminator_kind(&block.terminator),
|
||||
);
|
||||
for inst in &block.phis {
|
||||
if let SsaOp::Phi(ref ops) = inst.op {
|
||||
let _ = writeln!(
|
||||
out,
|
||||
" phi var={} operands={}",
|
||||
inst.var_name.as_deref().unwrap_or(""),
|
||||
ops.len(),
|
||||
);
|
||||
}
|
||||
}
|
||||
for inst in &block.body {
|
||||
let _ = writeln!(out, " {}", op_kind(&inst.op));
|
||||
}
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
fn terminator_kind(t: &Terminator) -> &'static str {
|
||||
match t {
|
||||
Terminator::Goto(_) => "Goto",
|
||||
Terminator::Branch { .. } => "Branch",
|
||||
Terminator::Switch { .. } => "Switch",
|
||||
Terminator::Return(_) => "Return",
|
||||
Terminator::Unreachable => "Unreachable",
|
||||
}
|
||||
}
|
||||
|
||||
fn op_kind(op: &SsaOp) -> &'static str {
|
||||
match op {
|
||||
SsaOp::Phi(_) => "Phi",
|
||||
SsaOp::Assign(_) => "Assign",
|
||||
SsaOp::Call { .. } => "Call",
|
||||
SsaOp::Source => "Source",
|
||||
SsaOp::Const(_) => "Const",
|
||||
SsaOp::Param { .. } => "Param",
|
||||
SsaOp::SelfParam => "SelfParam",
|
||||
SsaOp::CatchParam => "CatchParam",
|
||||
SsaOp::Nop => "Nop",
|
||||
SsaOp::Undef => "Undef",
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::cfg::{Cfg, EdgeKind, NodeInfo, StmtKind, TaintMeta};
|
||||
use crate::ssa::lower_to_ssa;
|
||||
use petgraph::Graph;
|
||||
use petgraph::graph::NodeIndex;
|
||||
|
||||
fn make_node(kind: StmtKind) -> NodeInfo {
|
||||
NodeInfo {
|
||||
kind,
|
||||
..Default::default()
|
||||
}
|
||||
}
|
||||
|
||||
fn def(var: &str) -> NodeInfo {
|
||||
NodeInfo {
|
||||
taint: TaintMeta {
|
||||
defines: Some(var.into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
}
|
||||
}
|
||||
|
||||
fn use_var(var: &str) -> NodeInfo {
|
||||
NodeInfo {
|
||||
taint: TaintMeta {
|
||||
uses: vec![var.into()],
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
}
|
||||
}
|
||||
|
||||
fn assert_well_formed(body: &SsaBody) {
|
||||
let errs = check_structural_invariants(body);
|
||||
assert!(
|
||||
errs.is_empty(),
|
||||
"structural invariants failed:\n{}",
|
||||
errs.join("\n")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn linear_cfg_is_well_formed() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let n1 = cfg.add_node(def("x"));
|
||||
let n2 = cfg.add_node(use_var("x"));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
cfg.add_edge(entry, n1, EdgeKind::Seq);
|
||||
cfg.add_edge(n1, n2, EdgeKind::Seq);
|
||||
cfg.add_edge(n2, exit, EdgeKind::Seq);
|
||||
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
assert_well_formed(&body);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn diamond_cfg_is_well_formed() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let if_n = cfg.add_node(make_node(StmtKind::If));
|
||||
let t = cfg.add_node(def("x"));
|
||||
let f = cfg.add_node(def("x"));
|
||||
let join = cfg.add_node(use_var("x"));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
cfg.add_edge(entry, if_n, EdgeKind::Seq);
|
||||
cfg.add_edge(if_n, t, EdgeKind::True);
|
||||
cfg.add_edge(if_n, f, EdgeKind::False);
|
||||
cfg.add_edge(t, join, EdgeKind::Seq);
|
||||
cfg.add_edge(f, join, EdgeKind::Seq);
|
||||
cfg.add_edge(join, exit, EdgeKind::Seq);
|
||||
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
assert_well_formed(&body);
|
||||
|
||||
// Additionally: the join block must carry a phi whose operands come
|
||||
// from exactly its two predecessors.
|
||||
let phi_block = body
|
||||
.blocks
|
||||
.iter()
|
||||
.find(|b| !b.phis.is_empty())
|
||||
.expect("diamond should produce a phi");
|
||||
for phi in &phi_block.phis {
|
||||
if let SsaOp::Phi(ref ops) = phi.op {
|
||||
for (pred, _) in ops {
|
||||
assert!(
|
||||
phi_block.preds.iter().any(|p| p == pred),
|
||||
"phi operand {pred:?} is not a pred of {:?}",
|
||||
phi_block.id
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn loop_cfg_is_well_formed() {
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let init = cfg.add_node(def("x"));
|
||||
let header = cfg.add_node(make_node(StmtKind::Loop));
|
||||
let body_n = cfg.add_node(def("x"));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
cfg.add_edge(entry, init, EdgeKind::Seq);
|
||||
cfg.add_edge(init, header, EdgeKind::Seq);
|
||||
cfg.add_edge(header, body_n, EdgeKind::True);
|
||||
cfg.add_edge(body_n, header, EdgeKind::Back);
|
||||
cfg.add_edge(header, exit, EdgeKind::False);
|
||||
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
assert_well_formed(&body);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fingerprint_is_stable_on_double_lowering() {
|
||||
// Lowering twice on the same CFG must produce the same fingerprint.
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let if_n = cfg.add_node(make_node(StmtKind::If));
|
||||
let t = cfg.add_node(def("x"));
|
||||
let f = cfg.add_node(def("x"));
|
||||
let join = cfg.add_node(use_var("x"));
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
cfg.add_edge(entry, if_n, EdgeKind::Seq);
|
||||
cfg.add_edge(if_n, t, EdgeKind::True);
|
||||
cfg.add_edge(if_n, f, EdgeKind::False);
|
||||
cfg.add_edge(t, join, EdgeKind::Seq);
|
||||
cfg.add_edge(f, join, EdgeKind::Seq);
|
||||
cfg.add_edge(join, exit, EdgeKind::Seq);
|
||||
let a = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
let b = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
assert_eq!(body_fingerprint(&a), body_fingerprint(&b));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn phis_are_emitted_in_alphabetical_order() {
|
||||
// Diamond CFG with multiple variables defined on both sides:
|
||||
// Entry → If → [True: a=, b=, c=] [False: a=, b=, c=] → Join → Exit
|
||||
// Join should carry phis for a, b, and c, emitted alphabetically
|
||||
// as a consequence of the BTreeSet-backed phi_placements.
|
||||
fn defs(vars: &[&str]) -> NodeInfo {
|
||||
// Chain multiple Seq nodes; tests/fixtures route each `def(var)`
|
||||
// through its own node, so build a little sub-block here.
|
||||
// For a single NodeInfo we can only record one define; callers
|
||||
// emit one node per variable.
|
||||
NodeInfo {
|
||||
taint: TaintMeta {
|
||||
defines: Some(vars[0].into()),
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
}
|
||||
}
|
||||
let mut cfg: Cfg = Graph::new();
|
||||
let entry = cfg.add_node(make_node(StmtKind::Entry));
|
||||
let if_n = cfg.add_node(make_node(StmtKind::If));
|
||||
|
||||
// True branch defines c, then a, then b (intentionally non-alphabetical
|
||||
// to prove the fingerprint order is driven by lowering, not source).
|
||||
let t_c = cfg.add_node(defs(&["c"]));
|
||||
let t_a = cfg.add_node(defs(&["a"]));
|
||||
let t_b = cfg.add_node(defs(&["b"]));
|
||||
|
||||
// False branch: same vars, different order to make sure neither side
|
||||
// accidentally sets the ordering downstream.
|
||||
let f_b = cfg.add_node(defs(&["b"]));
|
||||
let f_c = cfg.add_node(defs(&["c"]));
|
||||
let f_a = cfg.add_node(defs(&["a"]));
|
||||
|
||||
let join = cfg.add_node(NodeInfo {
|
||||
taint: TaintMeta {
|
||||
uses: vec!["a".into(), "b".into(), "c".into()],
|
||||
..Default::default()
|
||||
},
|
||||
..make_node(StmtKind::Seq)
|
||||
});
|
||||
let exit = cfg.add_node(make_node(StmtKind::Exit));
|
||||
|
||||
cfg.add_edge(entry, if_n, EdgeKind::Seq);
|
||||
cfg.add_edge(if_n, t_c, EdgeKind::True);
|
||||
cfg.add_edge(t_c, t_a, EdgeKind::Seq);
|
||||
cfg.add_edge(t_a, t_b, EdgeKind::Seq);
|
||||
cfg.add_edge(t_b, join, EdgeKind::Seq);
|
||||
cfg.add_edge(if_n, f_b, EdgeKind::False);
|
||||
cfg.add_edge(f_b, f_c, EdgeKind::Seq);
|
||||
cfg.add_edge(f_c, f_a, EdgeKind::Seq);
|
||||
cfg.add_edge(f_a, join, EdgeKind::Seq);
|
||||
cfg.add_edge(join, exit, EdgeKind::Seq);
|
||||
|
||||
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
|
||||
let join_block = body
|
||||
.blocks
|
||||
.iter()
|
||||
.find(|b| b.phis.len() >= 3)
|
||||
.expect("join block should carry phis for a, b, c");
|
||||
let names: Vec<&str> = join_block
|
||||
.phis
|
||||
.iter()
|
||||
.filter_map(|inst| inst.var_name.as_deref())
|
||||
.collect();
|
||||
assert_eq!(
|
||||
names,
|
||||
vec!["a", "b", "c"],
|
||||
"phis within a block must be emitted in alphabetical var_name order"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn broken_pred_succ_symmetry_is_detected() {
|
||||
// Hand-craft a body with inconsistent pred/succ lists.
|
||||
use smallvec::smallvec;
|
||||
let body = SsaBody {
|
||||
blocks: vec![
|
||||
SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![BlockId(1)],
|
||||
},
|
||||
SsaBlock {
|
||||
id: BlockId(1),
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Unreachable,
|
||||
preds: smallvec![], // Missing pred back to 0.
|
||||
succs: smallvec![],
|
||||
},
|
||||
],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
errs.iter().any(|e| e.contains("does not list")),
|
||||
"expected a symmetry violation, got: {:?}",
|
||||
errs
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn duplicate_ssa_def_is_detected() {
|
||||
use smallvec::smallvec;
|
||||
let dummy_cfg = NodeIndex::new(0);
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Const(None),
|
||||
cfg_node: dummy_cfg,
|
||||
var_name: None,
|
||||
span: (0, 0),
|
||||
},
|
||||
SsaInst {
|
||||
value: SsaValue(0), // duplicate
|
||||
op: SsaOp::Const(None),
|
||||
cfg_node: dummy_cfg,
|
||||
var_name: None,
|
||||
span: (0, 0),
|
||||
},
|
||||
],
|
||||
terminator: Terminator::Unreachable,
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: dummy_cfg,
|
||||
block: BlockId(0),
|
||||
}],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
errs.iter()
|
||||
.any(|e| e.contains("single-assignment violated")),
|
||||
"expected a duplicate-def violation, got: {:?}",
|
||||
errs
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn phi_operand_from_non_pred_is_detected() {
|
||||
use smallvec::smallvec;
|
||||
let dummy_cfg = NodeIndex::new(0);
|
||||
let body = SsaBody {
|
||||
blocks: vec![
|
||||
SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![BlockId(1)],
|
||||
},
|
||||
SsaBlock {
|
||||
id: BlockId(1),
|
||||
// Phi claims an operand from block 2 which isn't in preds.
|
||||
phis: vec![SsaInst {
|
||||
value: SsaValue(0),
|
||||
op: SsaOp::Phi(smallvec![(BlockId(2), SsaValue(0))]),
|
||||
cfg_node: dummy_cfg,
|
||||
var_name: Some("x".into()),
|
||||
span: (0, 0),
|
||||
}],
|
||||
body: vec![],
|
||||
terminator: Terminator::Unreachable,
|
||||
preds: smallvec![BlockId(0)],
|
||||
succs: smallvec![],
|
||||
},
|
||||
],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![ValueDef {
|
||||
var_name: Some("x".into()),
|
||||
cfg_node: dummy_cfg,
|
||||
block: BlockId(1),
|
||||
}],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
errs.iter().any(|e| e.contains("references non-pred")),
|
||||
"expected a phi-operand-source violation, got: {:?}",
|
||||
errs
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn terminator_disagreeing_with_succs_is_detected() {
|
||||
use smallvec::smallvec;
|
||||
let body = SsaBody {
|
||||
blocks: vec![SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![],
|
||||
// Goto(1) but succs is empty.
|
||||
terminator: Terminator::Goto(BlockId(1)),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
}],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![],
|
||||
cfg_node_map: Default::default(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
let errs = check_structural_invariants(&body);
|
||||
assert!(
|
||||
errs.iter().any(|e| e.contains("Goto")),
|
||||
"expected a terminator/succ disagreement, got: {:?}",
|
||||
errs
|
||||
);
|
||||
}
|
||||
}
|
||||
213
src/ssa/ir.rs
Normal file
213
src/ssa/ir.rs
Normal file
|
|
@ -0,0 +1,213 @@
|
|||
use crate::constraint::domain::ConstValue;
|
||||
use crate::constraint::lower::ConditionExpr;
|
||||
use petgraph::graph::NodeIndex;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use smallvec::SmallVec;
|
||||
|
||||
/// Unique identifier for an SSA value (one per definition point).
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub struct SsaValue(pub u32);
|
||||
|
||||
/// Basic block identifier.
|
||||
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
|
||||
pub struct BlockId(pub u32);
|
||||
|
||||
/// SSA instruction operation.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub enum SsaOp {
|
||||
/// Phi: merge values from predecessor blocks.
|
||||
Phi(SmallVec<[(BlockId, SsaValue); 2]>),
|
||||
/// Assignment: result depends on the listed SSA values.
|
||||
Assign(SmallVec<[SsaValue; 4]>),
|
||||
/// Function/method call.
|
||||
Call {
|
||||
callee: String,
|
||||
/// Per-argument SSA value uses.
|
||||
args: Vec<SmallVec<[SsaValue; 2]>>,
|
||||
/// Receiver SSA value (for method calls).
|
||||
receiver: Option<SsaValue>,
|
||||
},
|
||||
/// Taint source introduction.
|
||||
Source,
|
||||
/// Constant / literal value (no taint).
|
||||
/// The optional string carries the raw source text when captured during lowering.
|
||||
Const(Option<String>),
|
||||
/// Function parameter (positional). Index is the 0-based positional
|
||||
/// parameter index, *excluding* any implicit receiver (`self`/`this`).
|
||||
/// The receiver, when present, is represented by [`SsaOp::SelfParam`].
|
||||
Param { index: usize },
|
||||
/// Implicit method receiver (`self` in Rust/Python, `this` in
|
||||
/// JS/TS/Java/PHP). Emitted in block 0 of a function body whenever the
|
||||
/// body has a receiver (either an explicit `self` formal parameter or an
|
||||
/// implicit `this` reference). Having a dedicated IR node keeps
|
||||
/// receiver taint tracking entirely separate from positional-parameter
|
||||
/// taint, eliminating off-by-receiver arithmetic at call sites.
|
||||
SelfParam,
|
||||
/// Catch-clause exception binding.
|
||||
CatchParam,
|
||||
/// Non-defining node (e.g. If condition evaluation, Entry, Exit).
|
||||
Nop,
|
||||
/// Sentinel for "no reaching definition on this control-flow edge".
|
||||
///
|
||||
/// Emitted by SSA lowering as a synthesized instruction in the entry
|
||||
/// block and referenced from phi operands whose incoming edge does
|
||||
/// not carry a definition of the phi's variable — e.g. a try/catch
|
||||
/// rejoin where a variable is only defined on the normal path, or
|
||||
/// an early-return branch on a later-defined variable.
|
||||
///
|
||||
/// Having an explicit value lets phis satisfy the invariant that
|
||||
/// `phi.operands.len() == block.preds.len()` (one operand per
|
||||
/// predecessor). Downstream analyses treat Undef as a
|
||||
/// no-taint / unknown / bottom-of-the-lattice contribution: a phi
|
||||
/// operand of Undef carries no caps, no concrete value, and no
|
||||
/// abstract fact.
|
||||
Undef,
|
||||
}
|
||||
|
||||
/// A single SSA instruction.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct SsaInst {
|
||||
/// The SSA value defined by this instruction.
|
||||
pub value: SsaValue,
|
||||
/// The operation.
|
||||
pub op: SsaOp,
|
||||
/// The original CFG node this instruction was derived from.
|
||||
pub cfg_node: NodeIndex,
|
||||
/// Original variable name (for debugging and label lookups).
|
||||
pub var_name: Option<String>,
|
||||
/// Source byte span from the original file.
|
||||
pub span: (usize, usize),
|
||||
}
|
||||
|
||||
/// Basic block terminator.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub enum Terminator {
|
||||
Goto(BlockId),
|
||||
Branch {
|
||||
cond: NodeIndex,
|
||||
true_blk: BlockId,
|
||||
false_blk: BlockId,
|
||||
/// Structured condition lowered from CFG metadata during SSA construction.
|
||||
/// `None` when the condition could not be lowered (falls back to text-based
|
||||
/// lowering in taint transfer).
|
||||
condition: Option<Box<ConditionExpr>>,
|
||||
},
|
||||
/// Multi-way switch dispatch.
|
||||
///
|
||||
/// `targets` lists the per-case successor blocks (order matches the
|
||||
/// source-order of cases in the switch); `default` is the fallback
|
||||
/// branch taken when no case matches. Block `succs` remain the
|
||||
/// authoritative flow set — the terminator is a structured summary.
|
||||
///
|
||||
/// Emitted only for switch-like dispatch whose semantics are
|
||||
/// guaranteed-exclusive across cases (e.g. Go `switch`, Java
|
||||
/// arrow-switch, Rust `match`). Fall-through switches (C, C++, Java
|
||||
/// classic switch without `break`) continue to use the cascaded
|
||||
/// `Branch` lowering because the precision advantage only holds when
|
||||
/// cases are mutually exclusive.
|
||||
Switch {
|
||||
scrutinee: SsaValue,
|
||||
targets: SmallVec<[BlockId; 4]>,
|
||||
default: BlockId,
|
||||
/// Per-target case literals, aligned 1:1 with `targets`.
|
||||
///
|
||||
/// `Some(c)` records the constant value the scrutinee must equal for
|
||||
/// the corresponding target to be taken. `None` means the literal is
|
||||
/// unknown — emitted for synthetic ≥3-way CFG fanouts or for case
|
||||
/// patterns that aren't plain literals (OR-patterns, ranges, guards).
|
||||
///
|
||||
/// When omitted/empty (length zero), all targets behave as "unknown
|
||||
/// literal" — preserves backward compatibility with consumers that
|
||||
/// only inspect `targets`/`default`.
|
||||
#[serde(default)]
|
||||
case_values: SmallVec<[Option<ConstValue>; 4]>,
|
||||
},
|
||||
Return(Option<SsaValue>),
|
||||
Unreachable,
|
||||
}
|
||||
|
||||
/// A basic block in SSA form.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct SsaBlock {
|
||||
pub id: BlockId,
|
||||
/// Phi instructions (always at block start).
|
||||
pub phis: Vec<SsaInst>,
|
||||
/// Body instructions (after phis).
|
||||
pub body: Vec<SsaInst>,
|
||||
/// Block terminator.
|
||||
pub terminator: Terminator,
|
||||
/// Predecessor block IDs.
|
||||
pub preds: SmallVec<[BlockId; 2]>,
|
||||
/// Successor block IDs.
|
||||
pub succs: SmallVec<[BlockId; 2]>,
|
||||
}
|
||||
|
||||
/// Per-value definition metadata.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct ValueDef {
|
||||
/// Original variable name (if any).
|
||||
pub var_name: Option<String>,
|
||||
/// The CFG node where this value was defined.
|
||||
pub cfg_node: NodeIndex,
|
||||
/// The block containing the definition.
|
||||
pub block: BlockId,
|
||||
}
|
||||
|
||||
/// Complete SSA representation for a function/scope.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct SsaBody {
|
||||
/// All basic blocks, indexed by BlockId.
|
||||
pub blocks: Vec<SsaBlock>,
|
||||
/// Entry block.
|
||||
pub entry: BlockId,
|
||||
/// Per-SsaValue definition info, indexed by SsaValue.0.
|
||||
pub value_defs: Vec<ValueDef>,
|
||||
/// Map from original CFG NodeIndex to the primary SsaValue defined there.
|
||||
pub cfg_node_map: std::collections::HashMap<NodeIndex, SsaValue>,
|
||||
/// Exception edges: (source block, catch entry block).
|
||||
/// Recorded during lowering when exception edges are stripped from the CFG.
|
||||
/// Used by taint analysis to seed catch blocks with try-body taint state.
|
||||
pub exception_edges: Vec<(BlockId, BlockId)>,
|
||||
}
|
||||
|
||||
impl SsaBody {
|
||||
/// Get a block by its ID.
|
||||
pub fn block(&self, id: BlockId) -> &SsaBlock {
|
||||
&self.blocks[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Get a mutable block by its ID.
|
||||
pub fn block_mut(&mut self, id: BlockId) -> &mut SsaBlock {
|
||||
&mut self.blocks[id.0 as usize]
|
||||
}
|
||||
|
||||
/// Total number of SSA values.
|
||||
pub fn num_values(&self) -> usize {
|
||||
self.value_defs.len()
|
||||
}
|
||||
|
||||
/// Look up definition info for an SSA value.
|
||||
pub fn def_of(&self, v: SsaValue) -> &ValueDef {
|
||||
&self.value_defs[v.0 as usize]
|
||||
}
|
||||
}
|
||||
|
||||
/// Errors that can occur during SSA lowering.
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum SsaError {
|
||||
/// The CFG has no reachable nodes from the entry.
|
||||
EmptyCfg,
|
||||
/// Entry node not found in the CFG.
|
||||
InvalidEntry,
|
||||
}
|
||||
|
||||
impl std::fmt::Display for SsaError {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
match self {
|
||||
SsaError::EmptyCfg => write!(f, "CFG has no reachable nodes"),
|
||||
SsaError::InvalidEntry => write!(f, "entry node not found in CFG"),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl std::error::Error for SsaError {}
|
||||
2864
src/ssa/lower.rs
Normal file
2864
src/ssa/lower.rs
Normal file
File diff suppressed because it is too large
Load diff
90
src/ssa/mod.rs
Normal file
90
src/ssa/mod.rs
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
#[allow(dead_code)] // IR types — fields used by Display impl, tests, and downstream analyses
|
||||
pub mod alias;
|
||||
pub mod const_prop;
|
||||
pub mod copy_prop;
|
||||
pub mod dce;
|
||||
pub mod display;
|
||||
pub mod heap;
|
||||
pub mod invariants;
|
||||
#[allow(dead_code)]
|
||||
pub mod ir;
|
||||
pub mod lower;
|
||||
pub mod param_points_to;
|
||||
pub mod pointsto;
|
||||
pub mod static_map;
|
||||
pub mod type_facts;
|
||||
|
||||
#[allow(unused_imports)]
|
||||
pub use ir::*;
|
||||
pub use lower::lower_to_ssa;
|
||||
pub use lower::lower_to_ssa_scoped_nop;
|
||||
pub use lower::lower_to_ssa_with_params;
|
||||
|
||||
use crate::cfg::Cfg;
|
||||
use crate::symbol::Lang;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
|
||||
/// Result of SSA optimization passes.
|
||||
#[derive(Clone, Debug, Serialize, Deserialize)]
|
||||
pub struct OptimizeResult {
|
||||
/// Per-SSA-value constant lattice values.
|
||||
pub const_values: HashMap<SsaValue, const_prop::ConstLattice>,
|
||||
/// Type fact analysis results.
|
||||
pub type_facts: type_facts::TypeFactResult,
|
||||
/// Base-variable alias groups from copy propagation.
|
||||
pub alias_result: alias::BaseAliasResult,
|
||||
/// Points-to analysis: per-SSA-value abstract heap object sets.
|
||||
pub points_to: heap::PointsToResult,
|
||||
/// Module aliases from `require()` calls: SSA value → possible module names.
|
||||
/// Used to resolve dynamic dispatch like `lib.request()` where `lib = require("http")`.
|
||||
pub module_aliases: HashMap<SsaValue, smallvec::SmallVec<[String; 2]>>,
|
||||
/// Number of branches pruned by constant propagation.
|
||||
pub branches_pruned: usize,
|
||||
/// Number of copies eliminated.
|
||||
pub copies_eliminated: usize,
|
||||
/// Number of dead definitions removed.
|
||||
pub dead_defs_removed: usize,
|
||||
}
|
||||
|
||||
/// Run all SSA optimization passes on a body.
|
||||
///
|
||||
/// Pipeline: const propagation → branch pruning → copy propagation → DCE → type facts.
|
||||
pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> OptimizeResult {
|
||||
// 1. Constant propagation (SCCP)
|
||||
let cp = const_prop::const_propagate(body);
|
||||
let branches_pruned = const_prop::apply_const_prop(body, &cp);
|
||||
|
||||
// 2. Copy propagation
|
||||
let (copies_eliminated, copy_map) = copy_prop::copy_propagate(body, cfg);
|
||||
|
||||
// 3. Alias analysis (uses copy_map before DCE removes dead defs)
|
||||
let alias_result = alias::compute_base_aliases(©_map, body);
|
||||
|
||||
// 4. Dead code elimination
|
||||
let dead_defs_removed = dce::eliminate_dead_defs(body, cfg);
|
||||
|
||||
// 5. Type fact analysis (uses const prop results + language for constructor inference)
|
||||
let type_facts = type_facts::analyze_types(body, cfg, &cp.values, lang);
|
||||
|
||||
// 6. Points-to analysis (uses allocation site detection + SSA def-use)
|
||||
let points_to = heap::analyze_points_to(body, cfg, lang);
|
||||
|
||||
// 7. Module alias analysis (require() tracking for JS/TS)
|
||||
let module_aliases = if matches!(lang, Some(Lang::JavaScript) | Some(Lang::TypeScript)) {
|
||||
const_prop::collect_module_aliases(body, &cp.values)
|
||||
} else {
|
||||
HashMap::new()
|
||||
};
|
||||
|
||||
OptimizeResult {
|
||||
const_values: cp.values,
|
||||
type_facts,
|
||||
alias_result,
|
||||
points_to,
|
||||
module_aliases,
|
||||
branches_pruned,
|
||||
copies_eliminated,
|
||||
dead_defs_removed,
|
||||
}
|
||||
}
|
||||
649
src/ssa/param_points_to.rs
Normal file
649
src/ssa/param_points_to.rs
Normal file
|
|
@ -0,0 +1,649 @@
|
|||
//! Parameter-granularity points-to analysis.
|
||||
//!
|
||||
//! Produces a [`PointsToSummary`] for a function body by walking the SSA
|
||||
//! once and recording two classes of aliasing:
|
||||
//!
|
||||
//! 1. **Param → Param field writes.** An `obj.field = val` where `obj`
|
||||
//! traces back to parameter `b` and `val` traces back to parameter `a`
|
||||
//! emits a `Param(a) → Param(b)` `MayAlias` edge. This captures the
|
||||
//! `mutating_helper` pattern — the callee mutates a shared heap cell
|
||||
//! through one parameter and the caller observes the mutation through
|
||||
//! its argument for that parameter.
|
||||
//!
|
||||
//! 2. **Param → Return aliases.** `Terminator::Return(v)` where `v`
|
||||
//! traces back to a parameter emits a `Param(i) → Return` edge. This
|
||||
//! captures the `returned_alias` pattern — the callee returns its
|
||||
//! argument unchanged and the caller treats the result as aliasing the
|
||||
//! input.
|
||||
//!
|
||||
//! Field-write detection uses the existing SSA lowering convention: a
|
||||
//! source-level `obj.x = val` is lowered to an `Assign` whose `var_name`
|
||||
//! is the dotted path `"obj.x"`, plus synthetic parent-path Assigns that
|
||||
//! propagate the write up to the base (`"obj"`). See
|
||||
//! [`crate::ssa::lower`]'s "Synthetic base update" block for the
|
||||
//! canonical source.
|
||||
//!
|
||||
//! The analysis is **flow-insensitive** and **bounded**: it does not
|
||||
//! reason about path feasibility, and it stops adding edges once the
|
||||
//! summary's [`MAX_ALIAS_EDGES`] cap is reached — the overflow flag is
|
||||
//! the conservative fallback that callers honour.
|
||||
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
use smallvec::SmallVec;
|
||||
|
||||
use crate::summary::points_to::{AliasKind, AliasPosition, PointsToSummary};
|
||||
use crate::symbol::Lang;
|
||||
|
||||
use super::ir::{SsaBody, SsaOp, SsaValue, Terminator};
|
||||
|
||||
/// Map an SSA value back to its defining instruction's op.
|
||||
///
|
||||
/// Local to this module — the taint engine has its own `build_inst_map`
|
||||
/// that also carries receiver info we do not need, and duplicating it
|
||||
/// keeps this analysis independent of that private helper's shape.
|
||||
fn build_op_map(ssa: &SsaBody) -> HashMap<SsaValue, SsaOp> {
|
||||
let mut map = HashMap::with_capacity(ssa.num_values());
|
||||
for block in &ssa.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
map.insert(inst.value, inst.op.clone());
|
||||
}
|
||||
}
|
||||
map
|
||||
}
|
||||
|
||||
/// Sibling of [`build_op_map`] that captures the optional `var_name`
|
||||
/// recorded on each SSA instruction. Used alongside the op map so a
|
||||
/// [`ParamHit`] can surface the underlying variable name for
|
||||
/// formal-index resolution.
|
||||
fn build_var_name_map(ssa: &SsaBody) -> HashMap<SsaValue, Option<String>> {
|
||||
let mut map = HashMap::with_capacity(ssa.num_values());
|
||||
for block in &ssa.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
map.insert(inst.value, inst.var_name.clone());
|
||||
}
|
||||
}
|
||||
map
|
||||
}
|
||||
|
||||
/// Information about an SSA `Param { index }` node needed to resolve
|
||||
/// back to a caller-side positional index via formal-params lookup.
|
||||
#[derive(Clone, Debug)]
|
||||
struct ParamHit {
|
||||
/// The `SsaOp::Param` index as lowered.
|
||||
ssa_index: usize,
|
||||
/// The parameter's variable name (from [`SsaInst::var_name`]). Used
|
||||
/// to map back to the formal-declaration position — the caller's
|
||||
/// `args[i]` slot is keyed by declaration position, not by SSA
|
||||
/// index, and the two can disagree when a formal parameter is
|
||||
/// skipped from SSA lowering (e.g., pure-output params).
|
||||
var_name: Option<String>,
|
||||
}
|
||||
|
||||
/// Walk Assign/Phi chains to find a backing `Param { index }` SSA op.
|
||||
///
|
||||
/// Returns the `SsaOp::Param`'s index *and* its var_name so callers can
|
||||
/// resolve the formal-positional index via the name lookup table — the
|
||||
/// two indices can disagree when SSA lowering skips a formal parameter
|
||||
/// (never used as a read), shifting subsequent param indices down.
|
||||
fn trace_to_param_hit(
|
||||
v: SsaValue,
|
||||
op_map: &HashMap<SsaValue, SsaOp>,
|
||||
var_names: &HashMap<SsaValue, Option<String>>,
|
||||
visited: &mut HashSet<SsaValue>,
|
||||
) -> Option<ParamHit> {
|
||||
if !visited.insert(v) {
|
||||
return None;
|
||||
}
|
||||
match op_map.get(&v)? {
|
||||
SsaOp::Param { index } => Some(ParamHit {
|
||||
ssa_index: *index,
|
||||
var_name: var_names.get(&v).cloned().flatten(),
|
||||
}),
|
||||
SsaOp::Assign(uses) => {
|
||||
for u in uses {
|
||||
if let Some(hit) = trace_to_param_hit(*u, op_map, var_names, visited) {
|
||||
return Some(hit);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
SsaOp::Phi(operands) => {
|
||||
for (_, pv) in operands {
|
||||
if let Some(hit) = trace_to_param_hit(*pv, op_map, var_names, visited) {
|
||||
return Some(hit);
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
// Call produces a fresh identity; Const / Source / CatchParam /
|
||||
// SelfParam / Nop are not param-derived.
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Resolve a [`ParamHit`] to a caller-side positional index using the
|
||||
/// formal-params name lookup. Falls back to the SSA `index` when no
|
||||
/// name-based match exists (e.g., extractor called without
|
||||
/// `formal_param_names`).
|
||||
fn param_hit_to_formal_index(hit: &ParamHit, params_by_name: &HashMap<String, usize>) -> usize {
|
||||
if let Some(name) = &hit.var_name
|
||||
&& let Some(&idx) = params_by_name.get(name)
|
||||
{
|
||||
return idx;
|
||||
}
|
||||
hit.ssa_index
|
||||
}
|
||||
|
||||
/// Parse the base of a dotted / indexed path into its root name.
|
||||
///
|
||||
/// * `"obj"` → `"obj"`
|
||||
/// * `"obj.field"` → `"obj"`
|
||||
/// * `"obj.field.sub"` → `"obj"`
|
||||
/// * `"obj[0]"` → `"obj"`
|
||||
/// * `"obj.list[2].name"` → `"obj"`
|
||||
///
|
||||
/// Used to decide whether a field-style Assign's LHS base names a
|
||||
/// parameter variable — we strip everything after the first separator
|
||||
/// and compare the remainder to the recorded param names.
|
||||
fn base_of_path(name: &str) -> &str {
|
||||
let dot = name.find('.');
|
||||
let bracket = name.find('[');
|
||||
let end = match (dot, bracket) {
|
||||
(Some(d), Some(b)) => d.min(b),
|
||||
(Some(d), None) => d,
|
||||
(None, Some(b)) => b,
|
||||
(None, None) => return name,
|
||||
};
|
||||
&name[..end]
|
||||
}
|
||||
|
||||
/// Local receiver check duplicated to avoid depending on private
|
||||
/// `lower::is_receiver_name`. Must stay in sync with that helper.
|
||||
fn is_receiver_name_local(name: &str) -> bool {
|
||||
matches!(name, "self" | "this")
|
||||
}
|
||||
|
||||
/// Walk Assign/Phi chains from a return value to decide whether the path
|
||||
/// ends at a fresh container allocation (literal or constructor call).
|
||||
///
|
||||
/// Returns `true` the first time a qualifying allocation is found.
|
||||
/// Parameter-terminated paths, `Call` ops that are not container
|
||||
/// constructors, and constants that are not container literals all
|
||||
/// return `false` — soundly under-approximating, since the caller will
|
||||
/// simply fall back to the existing `Param(i) → Return` / store-into-
|
||||
/// heap channels when the flag is absent.
|
||||
fn trace_to_fresh_alloc(
|
||||
v: SsaValue,
|
||||
op_map: &HashMap<SsaValue, SsaOp>,
|
||||
lang: Option<Lang>,
|
||||
visited: &mut HashSet<SsaValue>,
|
||||
) -> bool {
|
||||
if !visited.insert(v) {
|
||||
return false;
|
||||
}
|
||||
let Some(op) = op_map.get(&v) else {
|
||||
return false;
|
||||
};
|
||||
match op {
|
||||
SsaOp::Const(Some(text)) => crate::ssa::heap::is_container_literal_public(text),
|
||||
SsaOp::Call { callee, .. } => lang
|
||||
.map(|l| crate::ssa::heap::is_container_constructor(callee, l))
|
||||
.unwrap_or(false),
|
||||
SsaOp::Assign(uses) => uses
|
||||
.iter()
|
||||
.any(|u| trace_to_fresh_alloc(*u, op_map, lang, visited)),
|
||||
SsaOp::Phi(operands) => operands
|
||||
.iter()
|
||||
.any(|(_, pv)| trace_to_fresh_alloc(*pv, op_map, lang, visited)),
|
||||
_ => false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Whether any `Terminator::Return(Some(v))` in the body traces back to a
|
||||
/// fresh container allocation. Invoked once per function; the visited
|
||||
/// set is fresh per return block so distinct returns do not poison each
|
||||
/// other's searches.
|
||||
fn returns_fresh_allocation(
|
||||
ssa: &SsaBody,
|
||||
op_map: &HashMap<SsaValue, SsaOp>,
|
||||
lang: Option<Lang>,
|
||||
) -> bool {
|
||||
for block in &ssa.blocks {
|
||||
let Terminator::Return(Some(v)) = block.terminator else {
|
||||
continue;
|
||||
};
|
||||
let mut visited = HashSet::new();
|
||||
if trace_to_fresh_alloc(v, op_map, lang, &mut visited) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
/// Compute the parameter-granularity points-to summary for a function.
|
||||
///
|
||||
/// `param_info` carries one `(param_index, param_name, param_ssa_value)`
|
||||
/// tuple per formal parameter that was emitted as [`SsaOp::Param`] in the
|
||||
/// lowered body. The receiver is intentionally excluded — this table
|
||||
/// captures positional parameters only.
|
||||
///
|
||||
/// `formal_param_names`, when supplied, is the authoritative list of
|
||||
/// declared parameter names in declaration order. It matters for
|
||||
/// **pure-output parameters**: a param like `target` in
|
||||
/// `fn set(target, val): target.data = val` is never *used* in the body
|
||||
/// (only assigned into), so SSA lowering does not emit a `Param` node
|
||||
/// for it and `param_info` will not contain it. Falling back to
|
||||
/// `formal_param_names` lets the base-name lookup still find its index.
|
||||
///
|
||||
/// `formal_param_count` bounds the parameter indices written to the
|
||||
/// summary: scoped lowering synthesises `Param` ops for module-level
|
||||
/// captures at indices beyond the formal arity, and those must not leak
|
||||
/// into the summary (they would trip [`crate::summary::ssa_summary_fits_arity`]).
|
||||
pub fn analyse_param_points_to(
|
||||
ssa: &SsaBody,
|
||||
param_info: &[(usize, String, SsaValue)],
|
||||
formal_param_count: usize,
|
||||
formal_param_names: Option<&[String]>,
|
||||
lang: Option<Lang>,
|
||||
) -> PointsToSummary {
|
||||
let mut summary = PointsToSummary::empty();
|
||||
|
||||
let op_map = build_op_map(ssa);
|
||||
let var_names = build_var_name_map(ssa);
|
||||
|
||||
// ── 0. Fresh-container return detection ─────────────────────────────
|
||||
//
|
||||
// A return path traces back to either:
|
||||
// * `SsaOp::Const(text)` where `text` is a container literal
|
||||
// (`[]`, `{}`, `new Map()`, …), OR
|
||||
// * `SsaOp::Call { callee, … }` where `callee` matches a known
|
||||
// container constructor for `lang` (`ArrayList`, `dict`, …).
|
||||
//
|
||||
// When at least one return path matches, the callee produces a
|
||||
// caller-visible fresh heap identity on that path — callers
|
||||
// synthesise a `HeapObjectId` keyed on the call result so later
|
||||
// container operations have a stable heap cell. Traces that reach a
|
||||
// parameter are handled by the edge-based `Param(i) → Return` channel
|
||||
// below and do not contribute here; a mixed function emits both.
|
||||
//
|
||||
// Runs before the early-out on `formal_param_count == 0` so pure
|
||||
// factories (zero-param container constructors) still record the
|
||||
// fresh-alloc signal.
|
||||
if returns_fresh_allocation(ssa, &op_map, lang) {
|
||||
summary.returns_fresh_alloc = true;
|
||||
}
|
||||
|
||||
if formal_param_count == 0 {
|
||||
return summary;
|
||||
}
|
||||
// Build the name→positional-index map. Summary param indices are
|
||||
// *positional* — they match the call-site `args[i]` position, which
|
||||
// excludes the receiver (`self`/`this`). When `formal_param_names`
|
||||
// contains a leading receiver, skip it so the remaining names align
|
||||
// with the SSA `SsaOp::Param { index }` convention.
|
||||
let mut params_by_name: HashMap<String, usize> = HashMap::new();
|
||||
if let Some(names) = formal_param_names {
|
||||
let mut pos: usize = 0;
|
||||
for name in names {
|
||||
if is_receiver_name_local(name) {
|
||||
continue;
|
||||
}
|
||||
if pos >= formal_param_count {
|
||||
break;
|
||||
}
|
||||
params_by_name.insert(name.clone(), pos);
|
||||
pos += 1;
|
||||
}
|
||||
}
|
||||
// Overlay `param_info` ONLY when formal_param_names was absent.
|
||||
// When formal_param_names is supplied it is the authoritative
|
||||
// declaration-order mapping; SSA param indices can legitimately
|
||||
// diverge (a pure-output param is never emitted, shifting later
|
||||
// indices down), so trusting SSA here would mis-map the caller's
|
||||
// `args[i]` positional slot.
|
||||
if formal_param_names.is_none() {
|
||||
for (idx, name, _) in param_info {
|
||||
params_by_name.insert(name.clone(), *idx);
|
||||
}
|
||||
}
|
||||
|
||||
// ── 1. Field-store alias edges (Param(a) → Param(b)) ────────────────
|
||||
//
|
||||
// SSA lowering encodes `obj.field = val` as one or more Assigns whose
|
||||
// `var_name` is the dotted / indexed path. For every such Assign we
|
||||
// look up the root name, check it matches a parameter variable, and
|
||||
// trace each use back to a param for the `Param(a) → Param(b)` edge.
|
||||
for block in &ssa.blocks {
|
||||
for inst in block.body.iter() {
|
||||
let SsaOp::Assign(uses) = &inst.op else {
|
||||
continue;
|
||||
};
|
||||
let Some(name) = inst.var_name.as_ref() else {
|
||||
continue;
|
||||
};
|
||||
// Only field/index-style writes encode the base in var_name;
|
||||
// a plain `x = ...` doesn't imply aliasing with `x`'s param.
|
||||
if !name.contains('.') && !name.contains('[') {
|
||||
continue;
|
||||
}
|
||||
let base = base_of_path(name);
|
||||
let Some(&target_idx) = params_by_name.get(base) else {
|
||||
continue;
|
||||
};
|
||||
if target_idx >= formal_param_count {
|
||||
continue;
|
||||
}
|
||||
for u in uses {
|
||||
let mut visited = HashSet::new();
|
||||
let Some(hit) = trace_to_param_hit(*u, &op_map, &var_names, &mut visited) else {
|
||||
continue;
|
||||
};
|
||||
let src_idx = param_hit_to_formal_index(&hit, ¶ms_by_name);
|
||||
if src_idx >= formal_param_count {
|
||||
continue;
|
||||
}
|
||||
if src_idx == target_idx {
|
||||
// Self-alias is uninformative — the caller's
|
||||
// arg-to-itself propagation is already covered by
|
||||
// `param_to_return`/`param_to_sink`.
|
||||
continue;
|
||||
}
|
||||
summary.insert(
|
||||
AliasPosition::Param(src_idx as u32),
|
||||
AliasPosition::Param(target_idx as u32),
|
||||
AliasKind::MayAlias,
|
||||
);
|
||||
if summary.overflow {
|
||||
return summary;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── 2. Return-alias edges (Param(i) → Return) ───────────────────────
|
||||
//
|
||||
// `Terminator::Return(v)` with `v` tracing back to a parameter means
|
||||
// the call site's result aliases the corresponding argument's heap
|
||||
// identity. Joining across all return blocks is a plain set union.
|
||||
let mut return_param_indices: SmallVec<[usize; 4]> = SmallVec::new();
|
||||
for block in &ssa.blocks {
|
||||
let Terminator::Return(Some(v)) = block.terminator else {
|
||||
continue;
|
||||
};
|
||||
let mut visited = HashSet::new();
|
||||
if let Some(hit) = trace_to_param_hit(v, &op_map, &var_names, &mut visited) {
|
||||
let idx = param_hit_to_formal_index(&hit, ¶ms_by_name);
|
||||
if idx < formal_param_count && !return_param_indices.contains(&idx) {
|
||||
return_param_indices.push(idx);
|
||||
}
|
||||
}
|
||||
}
|
||||
for idx in return_param_indices {
|
||||
summary.insert(
|
||||
AliasPosition::Param(idx as u32),
|
||||
AliasPosition::Return,
|
||||
AliasKind::MayAlias,
|
||||
);
|
||||
if summary.overflow {
|
||||
return summary;
|
||||
}
|
||||
}
|
||||
|
||||
summary
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::ssa::ir::{BlockId, SsaBlock, SsaInst};
|
||||
use petgraph::graph::NodeIndex;
|
||||
use smallvec::smallvec;
|
||||
|
||||
fn mk_body(blocks: Vec<SsaBlock>, num_values: u32) -> SsaBody {
|
||||
use crate::ssa::ir::ValueDef;
|
||||
let value_defs = (0..num_values)
|
||||
.map(|_| ValueDef {
|
||||
var_name: None,
|
||||
cfg_node: NodeIndex::new(0),
|
||||
block: BlockId(0),
|
||||
})
|
||||
.collect();
|
||||
SsaBody {
|
||||
blocks,
|
||||
entry: BlockId(0),
|
||||
value_defs,
|
||||
cfg_node_map: HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
fn inst(v: u32, op: SsaOp, var_name: Option<&str>) -> SsaInst {
|
||||
SsaInst {
|
||||
value: SsaValue(v),
|
||||
op,
|
||||
cfg_node: NodeIndex::new(0),
|
||||
var_name: var_name.map(String::from),
|
||||
span: (0, 0),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn field_write_param_to_param_emits_edge() {
|
||||
// Simulate:
|
||||
// fn f(a, b):
|
||||
// b.data = a # Assign var_name="b.data" uses=[a_ssa]
|
||||
// synthetic: b = b.data # Assign var_name="b" uses=[assign0]
|
||||
// return
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
inst(0, SsaOp::Param { index: 0 }, Some("a")),
|
||||
inst(1, SsaOp::Param { index: 1 }, Some("b")),
|
||||
inst(2, SsaOp::Assign(smallvec![SsaValue(0)]), Some("b.data")),
|
||||
inst(3, SsaOp::Assign(smallvec![SsaValue(2)]), Some("b")),
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 4);
|
||||
let pinfo = vec![
|
||||
(0usize, "a".to_string(), SsaValue(0)),
|
||||
(1usize, "b".to_string(), SsaValue(1)),
|
||||
];
|
||||
let s = analyse_param_points_to(&body, &pinfo, 2, None, None);
|
||||
assert!(!s.overflow, "unexpected overflow: {s:?}");
|
||||
assert!(
|
||||
s.edges.iter().any(|e| e.source == AliasPosition::Param(0)
|
||||
&& e.target == AliasPosition::Param(1)
|
||||
&& e.kind == AliasKind::MayAlias),
|
||||
"expected Param(0) → Param(1) edge, got {s:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn return_alias_emits_edge() {
|
||||
// fn f(a): return a
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![inst(0, SsaOp::Param { index: 0 }, Some("a"))],
|
||||
terminator: Terminator::Return(Some(SsaValue(0))),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 1);
|
||||
let pinfo = vec![(0usize, "a".to_string(), SsaValue(0))];
|
||||
let s = analyse_param_points_to(&body, &pinfo, 1, None, None);
|
||||
assert!(!s.overflow);
|
||||
assert_eq!(s.edges.len(), 1);
|
||||
assert_eq!(s.edges[0].source, AliasPosition::Param(0));
|
||||
assert_eq!(s.edges[0].target, AliasPosition::Return);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn self_alias_is_dropped() {
|
||||
// fn f(b): b.data = b_other_field (reading b.x and writing b.y)
|
||||
// Both uses trace back to Param(0) and base is Param(0) →
|
||||
// self-alias is uninformative, no edge emitted.
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
inst(0, SsaOp::Param { index: 0 }, Some("b")),
|
||||
inst(1, SsaOp::Assign(smallvec![SsaValue(0)]), Some("b.x")),
|
||||
inst(2, SsaOp::Assign(smallvec![SsaValue(1)]), Some("b.data")),
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 3);
|
||||
let pinfo = vec![(0usize, "b".to_string(), SsaValue(0))];
|
||||
let s = analyse_param_points_to(&body, &pinfo, 1, None, None);
|
||||
assert!(
|
||||
s.is_empty(),
|
||||
"self-alias edges should not be emitted: {s:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn out_of_range_param_rejected() {
|
||||
// Synthetic Param with index >= formal_param_count must not leak
|
||||
// into the summary (it would trip ssa_summary_fits_arity).
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![
|
||||
inst(0, SsaOp::Param { index: 5 }, Some("capture")),
|
||||
inst(1, SsaOp::Param { index: 1 }, Some("b")),
|
||||
inst(2, SsaOp::Assign(smallvec![SsaValue(0)]), Some("b.data")),
|
||||
],
|
||||
terminator: Terminator::Return(None),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 3);
|
||||
let pinfo = vec![
|
||||
(5usize, "capture".to_string(), SsaValue(0)),
|
||||
(1usize, "b".to_string(), SsaValue(1)),
|
||||
];
|
||||
// formal_param_count = 2 — index 5 is out of range.
|
||||
let s = analyse_param_points_to(&body, &pinfo, 2, None, None);
|
||||
assert!(
|
||||
s.is_empty(),
|
||||
"synthetic captures past formal arity must not emit edges: {s:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bounded_graph_overflows_at_cap() {
|
||||
// Build MAX_ALIAS_EDGES+2 param→return edges by returning a Phi
|
||||
// of every param. This exercises the overflow fallback.
|
||||
let n = (crate::summary::points_to::MAX_ALIAS_EDGES + 2) as u32;
|
||||
let mut insts = Vec::new();
|
||||
let mut phi_operands: SmallVec<[(BlockId, SsaValue); 2]> = SmallVec::new();
|
||||
for i in 0..n {
|
||||
insts.push(inst(
|
||||
i,
|
||||
SsaOp::Param { index: i as usize },
|
||||
Some(&format!("p{i}")),
|
||||
));
|
||||
phi_operands.push((BlockId(0), SsaValue(i)));
|
||||
}
|
||||
let phi_v = n;
|
||||
insts.push(inst(phi_v, SsaOp::Phi(phi_operands), Some("ret")));
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: insts,
|
||||
terminator: Terminator::Return(Some(SsaValue(phi_v))),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], n + 1);
|
||||
let pinfo: Vec<(usize, String, SsaValue)> = (0..n as usize)
|
||||
.map(|i| (i, format!("p{i}"), SsaValue(i as u32)))
|
||||
.collect();
|
||||
// Only the first traced param is emitted (trace_to_param short-
|
||||
// circuits on first match), so overflow is not expected — we
|
||||
// instead verify the bounded behaviour: a single edge.
|
||||
let s = analyse_param_points_to(&body, &pinfo, n as usize, None, None);
|
||||
assert!(!s.overflow);
|
||||
assert_eq!(s.edges.len(), 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn fresh_container_literal_return_sets_flag() {
|
||||
// fn makeBag() { return []; }
|
||||
// v0 = Const("[]")
|
||||
// terminator: Return(v0)
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![inst(0, SsaOp::Const(Some("[]".to_string())), None)],
|
||||
terminator: Terminator::Return(Some(SsaValue(0))),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 1);
|
||||
let s = analyse_param_points_to(&body, &[], 0, None, Some(Lang::JavaScript));
|
||||
assert!(s.returns_fresh_alloc);
|
||||
assert!(s.edges.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn constructor_return_sets_flag() {
|
||||
// fn makeList() { return list(); }
|
||||
// v0 = Call("list", [])
|
||||
// terminator: Return(v0)
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![inst(
|
||||
0,
|
||||
SsaOp::Call {
|
||||
callee: "list".to_string(),
|
||||
args: vec![],
|
||||
receiver: None,
|
||||
},
|
||||
None,
|
||||
)],
|
||||
terminator: Terminator::Return(Some(SsaValue(0))),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 1);
|
||||
let s = analyse_param_points_to(&body, &[], 0, None, Some(Lang::Python));
|
||||
assert!(s.returns_fresh_alloc);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn return_of_param_does_not_set_fresh_flag() {
|
||||
// fn identity(a) { return a; }
|
||||
let block = SsaBlock {
|
||||
id: BlockId(0),
|
||||
phis: vec![],
|
||||
body: vec![inst(0, SsaOp::Param { index: 0 }, Some("a"))],
|
||||
terminator: Terminator::Return(Some(SsaValue(0))),
|
||||
preds: smallvec![],
|
||||
succs: smallvec![],
|
||||
};
|
||||
let body = mk_body(vec![block], 1);
|
||||
let pinfo = vec![(0usize, "a".to_string(), SsaValue(0))];
|
||||
let s = analyse_param_points_to(&body, &pinfo, 1, None, Some(Lang::JavaScript));
|
||||
assert!(
|
||||
!s.returns_fresh_alloc,
|
||||
"param-only return must not set fresh-alloc flag"
|
||||
);
|
||||
// But the Param(0) → Return edge must still be emitted.
|
||||
assert!(
|
||||
s.edges
|
||||
.iter()
|
||||
.any(|e| e.source == AliasPosition::Param(0) && e.target == AliasPosition::Return),
|
||||
"expected Param(0) → Return edge, got {s:?}"
|
||||
);
|
||||
}
|
||||
}
|
||||
314
src/ssa/pointsto.rs
Normal file
314
src/ssa/pointsto.rs
Normal file
|
|
@ -0,0 +1,314 @@
|
|||
//! Container operation classification for taint propagation.
|
||||
//!
|
||||
//! Recognises common container store/load patterns (push, pop, get, set, etc.)
|
||||
//! across all supported languages so that taint flows correctly through
|
||||
//! collection operations.
|
||||
|
||||
use crate::symbol::Lang;
|
||||
use smallvec::SmallVec;
|
||||
|
||||
// ── Container operation model ───────────────────────────────────────────
|
||||
|
||||
/// Describes how a container method moves taint.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
pub enum ContainerOp {
|
||||
/// Taint flows from the listed argument positions into the receiver
|
||||
/// container (e.g. `arr.push(val)` — val taint merges into arr).
|
||||
///
|
||||
/// `index_arg`: when `Some(pos)`, the argument at that logical position
|
||||
/// is the container index/key. If constant-propagation proves it a
|
||||
/// non-negative integer, the taint engine stores into `HeapSlot::Index(n)`
|
||||
/// instead of `HeapSlot::Elements`. `None` → always `Elements`.
|
||||
Store {
|
||||
value_args: SmallVec<[usize; 2]>,
|
||||
index_arg: Option<usize>,
|
||||
},
|
||||
/// Taint flows from the receiver container to the call's return value
|
||||
/// (e.g. `arr.pop()`, `items.join('')`).
|
||||
///
|
||||
/// `index_arg`: same semantics as `Store::index_arg` — when present and
|
||||
/// provably constant, loads from `HeapSlot::Index(n)`.
|
||||
Load { index_arg: Option<usize> },
|
||||
}
|
||||
|
||||
/// Convenience: store with a single value argument, no index tracking.
|
||||
#[inline]
|
||||
fn store(pos: usize) -> Option<ContainerOp> {
|
||||
let mut v = SmallVec::new();
|
||||
v.push(pos);
|
||||
Some(ContainerOp::Store {
|
||||
value_args: v,
|
||||
index_arg: None,
|
||||
})
|
||||
}
|
||||
|
||||
/// Convenience: store with index tracking. `val_pos` is the value arg,
|
||||
/// `idx_pos` is the index/key arg (resolved via const propagation).
|
||||
#[inline]
|
||||
fn store_indexed(val_pos: usize, idx_pos: usize) -> Option<ContainerOp> {
|
||||
let mut v = SmallVec::new();
|
||||
v.push(val_pos);
|
||||
Some(ContainerOp::Store {
|
||||
value_args: v,
|
||||
index_arg: Some(idx_pos),
|
||||
})
|
||||
}
|
||||
|
||||
/// Convenience: store with two value arguments, no index tracking.
|
||||
#[inline]
|
||||
fn store2(a: usize, b: usize) -> Option<ContainerOp> {
|
||||
let mut v = SmallVec::new();
|
||||
v.push(a);
|
||||
v.push(b);
|
||||
Some(ContainerOp::Store {
|
||||
value_args: v,
|
||||
index_arg: None,
|
||||
})
|
||||
}
|
||||
|
||||
/// Convenience: load without index tracking.
|
||||
#[inline]
|
||||
fn load() -> Option<ContainerOp> {
|
||||
Some(ContainerOp::Load { index_arg: None })
|
||||
}
|
||||
|
||||
/// Convenience: load with index tracking. `idx_pos` is the index/key arg.
|
||||
#[inline]
|
||||
fn load_indexed(idx_pos: usize) -> Option<ContainerOp> {
|
||||
Some(ContainerOp::Load {
|
||||
index_arg: Some(idx_pos),
|
||||
})
|
||||
}
|
||||
|
||||
// ── Classification ──────────────────────────────────────────────────────
|
||||
|
||||
/// Classify a callee as a container operation for the given language.
|
||||
///
|
||||
/// `callee` is the raw callee string from `NodeInfo.callee` (e.g.
|
||||
/// `"items.push"`, `"arr.pop"`). We extract the last segment after `.`
|
||||
/// for method matching. For Go builtins (e.g. `"append"`), the full name
|
||||
/// is used.
|
||||
///
|
||||
/// Returns `None` if the callee is not a recognised container operation.
|
||||
pub fn classify_container_op(callee: &str, lang: Lang) -> Option<ContainerOp> {
|
||||
// Extract method name: last segment after '.' (or full name if no dot).
|
||||
let method = callee.rsplit('.').next().unwrap_or(callee);
|
||||
|
||||
match lang {
|
||||
Lang::JavaScript | Lang::TypeScript => classify_js(method),
|
||||
Lang::Python => classify_python(method),
|
||||
Lang::Java => classify_java(method),
|
||||
Lang::Go => classify_go(method, callee),
|
||||
Lang::Ruby => classify_ruby(method),
|
||||
Lang::Php => classify_php(method),
|
||||
Lang::C | Lang::Cpp => classify_cpp(method),
|
||||
Lang::Rust => classify_rust(method),
|
||||
}
|
||||
}
|
||||
|
||||
// ── Per-language classifiers ────────────────────────────────────────────
|
||||
|
||||
fn classify_js(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
// Array store
|
||||
"push" | "unshift" => store(0),
|
||||
// Map/Set store: map.set(key, value) — key at 0, value at 1
|
||||
"set" => store_indexed(1, 0),
|
||||
"add" => store(0), // set.add(value)
|
||||
// Array/Map load
|
||||
"pop" | "shift" => load(),
|
||||
"join" | "flat" | "concat" | "slice" | "toString" => load(),
|
||||
// map.get(key) — key at 0
|
||||
"get" => load_indexed(0),
|
||||
"values" | "keys" | "entries" => load(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_python(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
// List store
|
||||
"append" | "extend" => store(0),
|
||||
"insert" => store_indexed(1, 0), // list.insert(index, value) — index at 0, value at 1
|
||||
// Set store
|
||||
"add" => store(0),
|
||||
// Dict store
|
||||
"update" => store(0),
|
||||
"setdefault" => store2(0, 1), // dict.setdefault(key, default)
|
||||
// List/Dict load
|
||||
"pop" => load(),
|
||||
"get" => load_indexed(0), // dict.get(key) / list index — key/index at 0
|
||||
"items" | "values" | "keys" => load(),
|
||||
"join" => load(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_java(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
// Collection store
|
||||
"add" | "addAll" | "putAll" | "offer" | "push" => store(0),
|
||||
// ArrayList.set(index, value) — index at 0, value at 1
|
||||
"set" => store_indexed(1, 0),
|
||||
// Map.put(key, value) — key at 0, value at 1
|
||||
"put" => store_indexed(1, 0),
|
||||
// Collection load: ArrayList.get(index) — index at 0
|
||||
"get" => load_indexed(0),
|
||||
"poll" | "peek" | "remove" | "pop" => load(),
|
||||
"stream" | "toArray" | "iterator" => load(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_go(method: &str, callee: &str) -> Option<ContainerOp> {
|
||||
// Go `append` is a builtin: `result = append(slice, val1, val2, ...)`
|
||||
// The callee is just "append" (no receiver dot-path).
|
||||
if callee == "append" || method == "append" {
|
||||
// arg 0 = existing slice, args 1+ = values to append.
|
||||
// Handled specially in try_container_propagation (Go append mode).
|
||||
return store(1);
|
||||
}
|
||||
// Map/slice operations in Go are via index expressions, not method calls,
|
||||
// so there are fewer method-based patterns.
|
||||
match method {
|
||||
"Add" | "Set" | "Store" | "Put" => store(0),
|
||||
"Get" | "Load" | "Pop" => load(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_ruby(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
"push" | "append" | "unshift" | "store" | "<<" => store(0),
|
||||
"pop" | "shift" | "first" | "last" | "fetch" | "join" => load(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_php(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
"array_push" => store(1), // array_push(&$arr, $val) — arr is arg 0, val is arg 1
|
||||
"array_pop" | "array_shift" | "current" | "next" | "reset" => load(),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_cpp(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
"push_back" | "emplace_back" | "insert" | "emplace" | "push" => store(0),
|
||||
"front" | "back" | "pop_back" | "pop_front" | "top" => load(),
|
||||
// vector.at(index) — index at 0
|
||||
"at" => load_indexed(0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
fn classify_rust(method: &str) -> Option<ContainerOp> {
|
||||
match method {
|
||||
"push" | "insert" | "extend" => store(0),
|
||||
"pop" | "first" | "last" | "iter" | "remove" => load(),
|
||||
// vec.get(index) — index at 0
|
||||
"get" => load_indexed(0),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
// ── Tests ───────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn js_push_is_store() {
|
||||
let op = classify_container_op("items.push", Lang::JavaScript);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_pop_is_load() {
|
||||
let op = classify_container_op("arr.pop", Lang::JavaScript);
|
||||
assert!(matches!(op, Some(ContainerOp::Load { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn js_join_is_load() {
|
||||
let op = classify_container_op("items.join", Lang::JavaScript);
|
||||
assert!(matches!(op, Some(ContainerOp::Load { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_append_is_store() {
|
||||
let op = classify_container_op("commands.append", Lang::Python);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn java_add_is_store() {
|
||||
let op = classify_container_op("list.add", Lang::Java);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn go_append_is_store() {
|
||||
let op = classify_container_op("append", Lang::Go);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_method_is_none() {
|
||||
assert!(classify_container_op("obj.frobnicate", Lang::JavaScript).is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rust_push_is_store() {
|
||||
let op = classify_container_op("vec.push", Lang::Rust);
|
||||
assert!(matches!(op, Some(ContainerOp::Store { .. })));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn store_value_args_correct() {
|
||||
// JS set → value at arg 1, index at arg 0
|
||||
if let Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) = classify_container_op("map.set", Lang::JavaScript)
|
||||
{
|
||||
assert_eq!(value_args.as_slice(), &[1]);
|
||||
assert_eq!(index_arg, Some(0));
|
||||
} else {
|
||||
panic!("expected Store");
|
||||
}
|
||||
// JS push → value at arg 0, no index
|
||||
if let Some(ContainerOp::Store {
|
||||
value_args,
|
||||
index_arg,
|
||||
}) = classify_container_op("arr.push", Lang::JavaScript)
|
||||
{
|
||||
assert_eq!(value_args.as_slice(), &[0]);
|
||||
assert_eq!(index_arg, None);
|
||||
} else {
|
||||
panic!("expected Store");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn load_index_arg_correct() {
|
||||
// JS get → index at arg 0
|
||||
if let Some(ContainerOp::Load { index_arg }) =
|
||||
classify_container_op("map.get", Lang::JavaScript)
|
||||
{
|
||||
assert_eq!(index_arg, Some(0));
|
||||
} else {
|
||||
panic!("expected Load");
|
||||
}
|
||||
// JS pop → no index
|
||||
if let Some(ContainerOp::Load { index_arg }) =
|
||||
classify_container_op("arr.pop", Lang::JavaScript)
|
||||
{
|
||||
assert_eq!(index_arg, None);
|
||||
} else {
|
||||
panic!("expected Load");
|
||||
}
|
||||
}
|
||||
}
|
||||
446
src/ssa/static_map.rs
Normal file
446
src/ssa/static_map.rs
Normal file
|
|
@ -0,0 +1,446 @@
|
|||
#![allow(clippy::collapsible_if, clippy::redundant_closure)]
|
||||
|
||||
//! Static hash-map lookup abstract analysis.
|
||||
//!
|
||||
//! Recognises the idiom
|
||||
//! ```ignore
|
||||
//! let mut table = HashMap::new();
|
||||
//! table.insert(K1, V1);
|
||||
//! table.insert(K2, V2);
|
||||
//! let cmd = table.get(k).copied().unwrap_or("safe");
|
||||
//! ```
|
||||
//! where every insert's *value* slot is a syntactic string literal and the
|
||||
//! final lookup is dereffed via a literal fallback (`.unwrap_or(LIT)`). The
|
||||
//! result `cmd` is then provably bounded to the finite set
|
||||
//! `{V1, V2, …, "safe"}`, regardless of what `k` carries — taint-flavour or
|
||||
//! otherwise. Downstream sink suppression consumes this finite set to
|
||||
//! clear SHELL/FILE/SQL injection findings whose payload is proved to be
|
||||
//! metacharacter-free.
|
||||
//!
|
||||
//! ## SSA shape assumption
|
||||
//!
|
||||
//! The taint CFG collapses each method chain into **one** SSA `Call`
|
||||
//! instruction whose `callee` text is the entire chain's "function" expression
|
||||
//! (e.g. `"table.get(key).copied().unwrap_or"` for `table.get(key).copied()
|
||||
//! .unwrap_or("safe")`) and whose `receiver` is the root identifier's SSA
|
||||
//! value. We therefore do not need to walk SSA `.copied()` / `.unwrap_or`
|
||||
//! instructions as separate hops — pattern-matching on the callee text is
|
||||
//! the source of truth. String-literal arguments that the callee text
|
||||
//! elides (e.g. the fallback `"safe"`) are read from the CFG node's
|
||||
//! `arg_string_literals`, populated during CFG construction.
|
||||
//!
|
||||
//! Scope is deliberately narrow: only same-function static maps, only
|
||||
//! literal-valued inserts, no escape beyond recognised mutate/read methods.
|
||||
//! Any deviation (dynamic insert, callee not in the allow-list, map used as
|
||||
//! a plain argument, map returned, map joined across a phi) invalidates the
|
||||
//! candidate. Missed detection is safe — it just falls through to existing
|
||||
//! behaviour.
|
||||
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
use super::const_prop::ConstLattice;
|
||||
use super::ir::*;
|
||||
use crate::cfg::Cfg;
|
||||
use crate::symbol::Lang;
|
||||
|
||||
/// Output of the static-map analysis: SSA values whose concrete string value
|
||||
/// is provably in a finite set, plus the set itself (sorted + deduped).
|
||||
#[derive(Clone, Debug, Default)]
|
||||
pub struct StaticMapResult {
|
||||
pub finite_string_values: HashMap<SsaValue, Vec<String>>,
|
||||
}
|
||||
|
||||
impl StaticMapResult {
|
||||
pub fn empty() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.finite_string_values.is_empty()
|
||||
}
|
||||
}
|
||||
|
||||
/// Rust-specific constructors that produce an empty map value.
|
||||
fn is_rust_map_constructor(callee: &str) -> bool {
|
||||
let leaf_after_colon = callee.rsplit("::").next().unwrap_or(callee);
|
||||
if leaf_after_colon != "new" {
|
||||
return false;
|
||||
}
|
||||
let type_part = callee.rsplit("::").nth(1).unwrap_or("");
|
||||
matches!(type_part, "HashMap" | "BTreeMap")
|
||||
}
|
||||
|
||||
/// Classification of a Call whose receiver is a candidate map.
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
enum MapUse {
|
||||
/// `{var}.insert(K, V)` — value contributes to the finite domain.
|
||||
Insert,
|
||||
/// `{var}.get(K)[.copied()|.cloned()|.as_deref()|.as_ref()]*.unwrap_or`
|
||||
/// — lookup result is bounded by the inserted values plus the fallback
|
||||
/// literal on the CFG node.
|
||||
StaticLookup,
|
||||
/// Whitelisted read-only method (no reference leak).
|
||||
ReadOnly,
|
||||
/// Anything else — invalidates the map candidate.
|
||||
Escape,
|
||||
}
|
||||
|
||||
/// Classify the callee of a Call whose `receiver` SSA value points to a
|
||||
/// candidate map bound to `map_var`. Returns [`MapUse::Escape`] when the
|
||||
/// callee doesn't match any recognised pattern so the caller invalidates
|
||||
/// the map rather than trusting an unknown mutation.
|
||||
fn classify_map_use(callee: &str, map_var: &str) -> MapUse {
|
||||
// Fast-path: exact single-method calls on the receiver.
|
||||
let method = callee
|
||||
.strip_prefix(map_var)
|
||||
.and_then(|rest| rest.strip_prefix('.'));
|
||||
if let Some(method) = method {
|
||||
// Single identifier method with no trailing chain.
|
||||
match method {
|
||||
"insert" => return MapUse::Insert,
|
||||
"contains_key" | "len" | "is_empty" | "clear" => return MapUse::ReadOnly,
|
||||
_ => {}
|
||||
}
|
||||
// Chained lookup: must start with `get(…)` and end with `.unwrap_or`.
|
||||
if let Some(rest) = method.strip_prefix("get(") {
|
||||
if let Some(after_args) = scan_past_balanced_parens(rest) {
|
||||
if is_identity_chain_ending_in_unwrap_or(after_args) {
|
||||
return MapUse::StaticLookup;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
MapUse::Escape
|
||||
}
|
||||
|
||||
/// Given `s` just after an opening `(`, return the slice after the matching
|
||||
/// close `)`. Returns `None` when parens are unbalanced.
|
||||
fn scan_past_balanced_parens(s: &str) -> Option<&str> {
|
||||
let bytes = s.as_bytes();
|
||||
let mut depth: i32 = 1;
|
||||
let mut i = 0;
|
||||
while i < bytes.len() {
|
||||
match bytes[i] {
|
||||
b'(' => depth += 1,
|
||||
b')' => {
|
||||
depth -= 1;
|
||||
if depth == 0 {
|
||||
return Some(&s[i + 1..]);
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Return `true` when `s` is a sequence of zero or more identity chain
|
||||
/// methods (`.copied()`, `.cloned()`, `.as_deref()`, `.as_ref()`) followed
|
||||
/// by `.unwrap_or` (and nothing else). The trailing arg list of
|
||||
/// `.unwrap_or` is elided in the callee text — it appears in the CFG node's
|
||||
/// `arg_string_literals` instead.
|
||||
fn is_identity_chain_ending_in_unwrap_or(mut s: &str) -> bool {
|
||||
const IDENTS: &[&str] = &[".copied()", ".cloned()", ".as_deref()", ".as_ref()"];
|
||||
loop {
|
||||
if s == ".unwrap_or" {
|
||||
return true;
|
||||
}
|
||||
let mut advanced = false;
|
||||
for id in IDENTS {
|
||||
if let Some(rest) = s.strip_prefix(id) {
|
||||
s = rest;
|
||||
advanced = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if !advanced {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn resolve_alias(v: SsaValue, aliases: &HashMap<SsaValue, SsaValue>) -> SsaValue {
|
||||
let mut cur = v;
|
||||
for _ in 0..64 {
|
||||
match aliases.get(&cur) {
|
||||
Some(&next) if next != cur => cur = next,
|
||||
_ => break,
|
||||
}
|
||||
}
|
||||
cur
|
||||
}
|
||||
|
||||
/// Run the analysis. Bails out immediately for non-Rust bodies — the current
|
||||
/// pattern set only models Rust `std::collections::HashMap`.
|
||||
pub fn analyze(
|
||||
body: &SsaBody,
|
||||
cfg: &Cfg,
|
||||
lang: Option<Lang>,
|
||||
_const_values: &HashMap<SsaValue, ConstLattice>,
|
||||
) -> StaticMapResult {
|
||||
if lang != Some(Lang::Rust) {
|
||||
return StaticMapResult::empty();
|
||||
}
|
||||
|
||||
// ── 1. Discover candidate map allocations + their bound var name ──────
|
||||
// The var_name is the identifier the CFG builder attaches to the define
|
||||
// site of the let-binding. Without a var_name we can't pattern-match
|
||||
// receiver uses in callee text, so such allocations are skipped.
|
||||
let mut candidates: HashMap<SsaValue, String> = HashMap::new();
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
if let SsaOp::Call { callee, .. } = &inst.op {
|
||||
if is_rust_map_constructor(callee) {
|
||||
if let Some(name) = inst.var_name.as_deref() {
|
||||
if !name.is_empty() {
|
||||
candidates.insert(inst.value, name.to_string());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if candidates.is_empty() {
|
||||
return StaticMapResult::empty();
|
||||
}
|
||||
|
||||
// ── 2. Build trivial alias chain: single-use Assign `v = w` where w is
|
||||
// a known (or aliased) candidate value. Keeps us robust to wrapper
|
||||
// copies SSA lowering occasionally introduces.
|
||||
let mut aliases: HashMap<SsaValue, SsaValue> = HashMap::new();
|
||||
for block in &body.blocks {
|
||||
for inst in &block.body {
|
||||
if let SsaOp::Assign(uses) = &inst.op {
|
||||
if uses.len() == 1 {
|
||||
let src = resolve_alias(uses[0], &aliases);
|
||||
if candidates.contains_key(&src) {
|
||||
aliases.insert(inst.value, src);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
let canonicalise = |v: SsaValue| -> Option<SsaValue> {
|
||||
let c = resolve_alias(v, &aliases);
|
||||
if candidates.contains_key(&c) {
|
||||
Some(c)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
};
|
||||
|
||||
// ── 3. Walk every instruction, classifying references to any candidate.
|
||||
// Collect per-candidate inserted literal values and mark invalidating
|
||||
// escapes (phi operand, non-whitelisted method, plain argument use,
|
||||
// non-copy Assign, Return).
|
||||
let mut inserted: HashMap<SsaValue, HashSet<String>> = HashMap::new();
|
||||
let mut invalid: HashSet<SsaValue> = HashSet::new();
|
||||
// Each lookup site: (map, result SSA value, fallback literal).
|
||||
let mut lookups: Vec<(SsaValue, SsaValue, String)> = Vec::new();
|
||||
for c in candidates.keys() {
|
||||
inserted.insert(*c, HashSet::new());
|
||||
}
|
||||
|
||||
for block in &body.blocks {
|
||||
for inst in block.phis.iter().chain(block.body.iter()) {
|
||||
match &inst.op {
|
||||
SsaOp::Phi(operands) => {
|
||||
for (_, v) in operands {
|
||||
if let Some(canon) = canonicalise(*v) {
|
||||
invalid.insert(canon);
|
||||
}
|
||||
}
|
||||
}
|
||||
SsaOp::Call {
|
||||
callee,
|
||||
args,
|
||||
receiver,
|
||||
} => {
|
||||
if candidates.contains_key(&inst.value) && is_rust_map_constructor(callee) {
|
||||
continue;
|
||||
}
|
||||
if let Some(map) = receiver.and_then(|r| canonicalise(r)) {
|
||||
let map_var = candidates.get(&map).cloned().unwrap_or_default();
|
||||
match classify_map_use(callee, &map_var) {
|
||||
MapUse::Insert => {
|
||||
let node_info = &cfg[inst.cfg_node];
|
||||
let value_lit =
|
||||
node_info.call.arg_string_literals.get(1).cloned().flatten();
|
||||
match value_lit {
|
||||
Some(lit) => {
|
||||
inserted.entry(map).or_default().insert(lit);
|
||||
}
|
||||
None => {
|
||||
invalid.insert(map);
|
||||
}
|
||||
}
|
||||
}
|
||||
MapUse::StaticLookup => {
|
||||
let node_info = &cfg[inst.cfg_node];
|
||||
if let Some(Some(fallback)) =
|
||||
node_info.call.arg_string_literals.first().cloned()
|
||||
{
|
||||
lookups.push((map, inst.value, fallback));
|
||||
}
|
||||
// A non-literal fallback silently falls
|
||||
// through: the map stays valid, we just
|
||||
// don't emit a finite domain for this site.
|
||||
}
|
||||
MapUse::ReadOnly => {}
|
||||
MapUse::Escape => {
|
||||
invalid.insert(map);
|
||||
}
|
||||
}
|
||||
}
|
||||
for group in args {
|
||||
for &v in group {
|
||||
if let Some(canon) = canonicalise(v) {
|
||||
invalid.insert(canon);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
SsaOp::Assign(uses) if uses.len() != 1 => {
|
||||
for &u in uses {
|
||||
if let Some(canon) = canonicalise(u) {
|
||||
invalid.insert(canon);
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
if let Terminator::Return(Some(v)) = &block.terminator {
|
||||
if let Some(canon) = canonicalise(*v) {
|
||||
invalid.insert(canon);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── 4. Emit results for still-valid candidates with at least one insert.
|
||||
let mut result = StaticMapResult::default();
|
||||
for (map, lookup_val, fallback) in lookups {
|
||||
if invalid.contains(&map) {
|
||||
continue;
|
||||
}
|
||||
let lits = match inserted.get(&map) {
|
||||
Some(s) if !s.is_empty() => s,
|
||||
_ => continue,
|
||||
};
|
||||
let mut domain: Vec<String> = lits.iter().cloned().collect();
|
||||
domain.push(fallback);
|
||||
domain.sort();
|
||||
domain.dedup();
|
||||
result.finite_string_values.insert(lookup_val, domain);
|
||||
}
|
||||
result
|
||||
}
|
||||
|
||||
// ── Tests ───────────────────────────────────────────────────────────────
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn rust_map_constructor_matches() {
|
||||
assert!(is_rust_map_constructor("HashMap::new"));
|
||||
assert!(is_rust_map_constructor("std::collections::HashMap::new"));
|
||||
assert!(is_rust_map_constructor("BTreeMap::new"));
|
||||
assert!(!is_rust_map_constructor("HashMap::from"));
|
||||
assert!(!is_rust_map_constructor("HashMap::with_capacity"));
|
||||
assert!(!is_rust_map_constructor("Vec::new"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_insert_call() {
|
||||
assert_eq!(classify_map_use("table.insert", "table"), MapUse::Insert);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_read_only_call() {
|
||||
assert_eq!(
|
||||
classify_map_use("table.contains_key", "table"),
|
||||
MapUse::ReadOnly
|
||||
);
|
||||
assert_eq!(classify_map_use("table.len", "table"), MapUse::ReadOnly);
|
||||
// Iterator-returning methods (values/iter/keys) escape: they leak
|
||||
// references that can flow anywhere.
|
||||
assert_eq!(classify_map_use("table.values", "table"), MapUse::Escape);
|
||||
assert_eq!(classify_map_use("table.iter", "table"), MapUse::Escape);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_static_lookup_with_copied() {
|
||||
assert_eq!(
|
||||
classify_map_use("table.get(key.as_str()).copied().unwrap_or", "table"),
|
||||
MapUse::StaticLookup
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_static_lookup_without_identity_chain() {
|
||||
// `.unwrap_or` directly after `.get(...)` also qualifies — Rust
|
||||
// `HashMap::get` returns `Option<&V>`, so `.unwrap_or(&"safe")` is
|
||||
// syntactically valid and equally bounded.
|
||||
assert_eq!(
|
||||
classify_map_use("table.get(k).unwrap_or", "table"),
|
||||
MapUse::StaticLookup
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_static_lookup_mixed_identity_chain() {
|
||||
assert_eq!(
|
||||
classify_map_use("t.get(k).as_deref().cloned().unwrap_or", "t"),
|
||||
MapUse::StaticLookup
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_rejects_unknown_terminator() {
|
||||
// `.unwrap_or_else(|| …)` is not modelled — closure can return anything.
|
||||
assert_eq!(
|
||||
classify_map_use("t.get(k).copied().unwrap_or_else", "t"),
|
||||
MapUse::Escape
|
||||
);
|
||||
// A bare `.unwrap()` after `.get(k)` panics rather than bounding,
|
||||
// so we refuse to treat it as safe. The caller would need a proven
|
||||
// `.contains_key` guard; that is out of scope here.
|
||||
assert_eq!(classify_map_use("t.get(k).unwrap", "t"), MapUse::Escape);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_rejects_other_receiver() {
|
||||
// `other.insert` does not belong to `table` — receiver mismatch.
|
||||
assert_eq!(classify_map_use("other.insert", "table"), MapUse::Escape);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn scan_past_balanced_parens_basic() {
|
||||
assert_eq!(scan_past_balanced_parens("foo)").unwrap_or(""), "");
|
||||
assert_eq!(scan_past_balanced_parens("foo).bar").unwrap_or(""), ".bar");
|
||||
assert_eq!(
|
||||
scan_past_balanced_parens("foo(bar)baz).x").unwrap_or(""),
|
||||
".x"
|
||||
);
|
||||
assert!(scan_past_balanced_parens("no-close").is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn non_rust_lang_returns_empty() {
|
||||
use petgraph::Graph;
|
||||
let body = SsaBody {
|
||||
blocks: vec![],
|
||||
entry: BlockId(0),
|
||||
value_defs: vec![],
|
||||
cfg_node_map: std::collections::HashMap::new(),
|
||||
exception_edges: vec![],
|
||||
};
|
||||
let cfg: Cfg = Graph::new();
|
||||
let const_values = HashMap::new();
|
||||
let result = analyze(&body, &cfg, Some(Lang::Java), &const_values);
|
||||
assert!(result.is_empty());
|
||||
}
|
||||
}
|
||||
1487
src/ssa/type_facts.rs
Normal file
1487
src/ssa/type_facts.rs
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue