Release/0.5.0 (#35)

* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures

* feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests

* feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements

* feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles

* feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing

* feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling

* feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures

* feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration

* feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests

* feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic

* feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection

* feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements

* feat: Enable auth-state analysis by default and update relevant tests in benchmark config

* test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test

* docs: update CHANGELOG.md

* feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers

* feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers

* feat: Implement per-index array slot tracking in symbolic heap with overflow collapse

* feat: Add implicit definition handling for uninitialized declarations in SSA value allocation

* feat: Refactor function parameters and constants for improved clarity and maintainability

* refactor: Reorder module imports and improve formatting for consistency

* refactor: Fix formatting erorrs

* refactor: Fix clippy warnings

* refactor: Fix fmt warnings (again)

* chore: Update dependencies and improve feature configuration

* Add comprehensive tests for undertested modules (#36) (COPILOT)

* Add comprehensive tests for undertested modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

* Add comprehensive tests for ext, project, walk, and errors modules

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: Update dependencies and improve feature configuration

* fix: formatting errors in new tests

* chore: Update license list in about.toml

* chore: made functions input inline

* chore: updated cfg graph to take up the full page

* chore: add Prettier configuration and update code formatting

* Add frontend test suite with Vitest (111 tests) (#37)

* Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7

* ci: add frontend test step to CI workflow

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* chore: simplify array initialization in test files for consistency

* ran typecheck

* feat: add AnalysisWorkspace component and integrate it into CfgViewerPage

* feat: update routing in AppLayout and improve empty state message in ExplorerPage

* feat: enhance scan progress tracking with additional metrics and stages

* feat: update license information and add license check script

* feat: implement cross-file symbolic execution with callee body persistence

* feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering

* feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions

* feat: enhance resource tracking with proxy method summaries and improve finding extraction

* feat: add terminal function exit detection for accurate resource leak analysis

* feat: add warnings for loops and functions without bodies to improve error recovery

* feat: update lambda expression handling to ensure proper function classification and control flow

* feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling

* feat: add inline return taint analysis and regression tests for improved security checks

* feat: add engine version management and migration handling for database schema updates

* feat: enhance first_call_ident to skip nested function bodies and add regression tests

* feat: enhance callee name resolution with two-segment normalization and disambiguation

* feat: add cross-file context flags and debug assertions for taint analysis

* feat: refactor taint analysis structure to unify context handling and improve clarity

* feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests

* docs: updated CHANGELOG.md

* fmt: formatting fixes

* fix: fixed frontend formatting and lint warnings

* fix: optimized ci

* fix: optimized ci

* Add comprehensive multi-file test coverage to Nyx (#38)

* Initial checklist for multi-file test suite expansion

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* Add 12 new multi-file test fixtures with TP/TN/near-miss coverage

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* deleted root repo

* rebuilt to test for regressions

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* feat: enhance import alias resolution and taint tracking

* feat: implement security hardening with CSRF protection and path validation

* feat: add support for import alias bindings in Python, PHP, and Rust

* feat: enhance CFG analysis modes and improve code readability

* feat: add detection for parameterized SQL queries to enhance security

* feat: add safe internal redirect handling and enhance session destroy validation

* feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads

* feat: enhance taint detection by adding support for inline source member expressions in call arguments

* feat: implement pre-emission of Source nodes for inline source member expressions in call arguments

* feat: add support for Throw statement in control flow and error handling

* feat: add debug and echo endpoints with potential information leakage

* feat: implement internal redirect suppression and enhance taint detection

* feat: implement module alias tracking for dynamic dispatch in JS/TS

* feat: add authorization analysis module with Express support

* feat: add authorization analysis module with Express support

* feat: add tests for admin guard requirements and clean checks in authorization analysis

* feat: integrate Koa and Fastify frameworks into authorization analysis

* feat: add Flask and Django support to authorization analysis module

* feat: add support for Rails and Sinatra frameworks in authorization analysis

* feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis

* feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis

* feat: add support for Rails and Sinatra in authorization analysis

* chore: add .DS_Store to .gitignore

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: update usage of Option methods for improved clarity and consistency

* refactor: improve code readability by simplifying conditional checks and formatting

* refactor: improve code formatting and readability by simplifying conditional checks

* refactor: simplify conditional checks and improve readability in multiple files

* refactor: simplify conditional checks in axum.rs for improved readability

* feat: add CodeQL analysis configuration for enhanced security scanning

* test: add comprehensive tests for `src/output.rs` SARIF builder (#39)

* chore: start test coverage improvement work

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* test: add comprehensive tests for src/output.rs SARIF builder

Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423

Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>

* refactor: improve code formatting and readability in output.rs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com>
Co-authored-by: elipeter <elicpeter@gmail.com>

* refactor: improve code formatting and readability in output.rs

* Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* refactor: enhance triage file path handling with improved error management and validation

* refactor: updated func summaries for richer detail

* refactor: update SSA summary extraction to use canonical FuncKey for distinct entries

* refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution

* refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls

* refactor: implement new Flask routes for safe and unsafe shell command execution

* refactor: separate receiver handling in SSA operations and enhance taint propagation

* refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments

* refactor: implement auth decorator extraction and classification for multiple languages

* refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation

* refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic

* refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior

* refactor: standardize default struct initialization across multiple files

* feat: add scripts for formatting checks and auto-fixes with test summaries

* refactor: simplify character splitting and enhance namespace qualifier handling

* refactor: improve documentation clarity and enhance code readability in resolver logic

* refactor: replace default struct initialization with explicit field assignments for clarity

* feat: enhance anonymous function naming by deriving context-based bindings

* refactor: streamline match expressions for improved readability and performance

* refactor: streamline match expressions for improved readability and performance

* refactor: replace loop with while let for improved clarity and performance

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: add SSA constant propagation support to analysis context for improved accuracy

* feat: implement shell metacharacter validation and bounded-length checks in Rust analysis

* feat: add static map analysis for command injection suppression and type safety

* refactor: simplify match statements and reduce line breaks for improved readability

* feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution

Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the
primary sink source-location through function summaries. Swap
SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse
Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a
backward-compatible cap_sites() helper and serde defaults so pre-phase-1
on-disk rows continue to deserialise cleanly.

Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by
extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the
locator in for the persisted pass-1 path, while pass-2 intra-file
transient summaries fall back to cap-only sites (behavior unchanged).
Merge: GlobalSummaries::insert now unions sink sites with
(file_rel, line, col, cap) dedup via shared union_param_sink_sites
helper.

Database: JSON-serialised summary columns carry the new shape
automatically; no schema change needed.

Phase 2 will consume SinkSite in build_taint_diag() to overwrite the
caller-site Finding.line with the callee's sink line when resolved via
summary. Phase 1 keeps behavior unchanged: scanning
tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the
same (wrong) line 10 finding.

Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink
sites, legacy-JSON default handling for both summary types, and merge
dedup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding

Plumb Phase 1's SinkSite through the event pipeline into Findings,
no output change yet.  SsaTaintEvent gains `primary_sink_site:
Option<SinkSite>`; when the main or callback sink-emission path has
non-empty `param_to_sink_sites`, filter to sites whose
`(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per
distinct site — the multi-primary collapse keeps each downstream
Finding single-primary.

Resolution: ResolvedSummary and SinkInfo gain mirror
`param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink`
(SSA + callback paths) and `FuncSummary.param_to_sink` (global paths).
Label, local-summary, and interop resolution paths leave the field
empty — they only ever had cap-level info to begin with.

Finding: new `primary_location: Option<SinkLocation>` with
`file_rel/line/col`.  `ssa_events_to_findings` maps
`event.primary_sink_site` → `Finding.primary_location`, filtering
cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never
leaks to formatters.  Dedup key extended with the primary location
so multi-site events aren't collapsed back together.

Invariants (debug_assert!):
* every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps
  != ∅` — enforced by the pick_primary_sink_sites* filters;
* every populated Finding.primary_location has `line != 0` AND
  non-empty `file_rel` — the cap-only → None translation upstream
  guarantees this.

Deliberately independent of `uses_summary`: that flag tracks whether
the *taint chain* used a summary, whereas primary attribution
requires only that the *sink* itself was summary-resolved.  A local
source reaching a cross-file sink produces `uses_summary=false`
alongside a populated primary_location — documented on
Finding.primary_location, covered by
`cross_file_sink_finding_carries_primary_location`.

build_taint_diag, SARIF/JSON/explanation formatters, and the
benchmark scorer remain untouched: finding.line still comes from
`cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10
and the benchmark's rs-cmdi-003 row still shows FN in the LOC column.

Tests: `cross_file_sink_finding_carries_primary_location` (proves
plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and
`cross_file_sink_cap_only_site_leaves_primary_location_none`
(regression guard against cap-only sites surfacing).  All 1566 lib
tests + integration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(output): phase 3/5 consume primary sink location in diag + SARIF

When a finding's primary_location (populated in phase 2 from a callee
summary's SinkSite) names the dangerous instruction inside a callee
body, attribute the diagnostic line to that location instead of the
caller's call site. The call site is demoted to a Call step in
flow_steps, and a synthetic Sink step at the primary location is
appended so analysts still see the full trace.

Changes:
- Add scan_root parameter to build_taint_diag so file_rel can be
  resolved back to an absolute path via a shared resolve_file_rel
  helper. Empty file_rel (single-file scans where namespace == "")
  resolves to the file under analysis.
- Extend SinkLocation with snippet, carried from the upstream
  SinkSite so the formatter needs no second file read.
- Relax the ssa_events_to_findings debug_assert to allow empty
  file_rel, which is valid when scan root equals the file itself.
- SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[];
  locations[0] already reflects the primary sink position via the
  updated diag line/col.

Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs
now reports line 5 (Command::new) as the primary sink, with the call
site at line 10 visible in flow_steps.

Two expect.json fixtures updated (must_match line_range widened):
- javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is
  the real sink inside run()).
- rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new
  inside the closure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(bench): phase 4/5 validate primary sink attribution across corpus

Extend the benchmark scorer and ground truth to lock in phase 3's
primary-location behavior, and add fixtures that exercise the new
capability end-to-end.

Scorer (tests/benchmark_test.rs):
- Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on
  Case. When present, score_location_level additionally requires at
  least one flow_step in the finding's evidence trace to fall within
  ±2 of the call-site range. When absent, the check is skipped —
  fully forward-compatible with existing fixtures.
- Retain ±2 tolerance on expected_sink_lines (compared against the
  now-primary Diag.line post-phase-3).

Ground truth edits:
- rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the
  transform::wrap call site (a cross-file propagator, not a sink);
  line 9 is Command::new, the real sink. The ±2 tolerance happened to
  mask this stale attribution but it was semantically wrong — phase 4
  is the right time to correct it. Also adds expected_call_site_lines
  [8,8] so the new field is exercised on an existing cross-file case.
- rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call).
  This fixture's sink (Command::new inside run_cmd at line 5) was the
  motivating case for phases 1-3; adding the call-site assertion
  guards against regression to caller-line attribution.

New fixtures:
- rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both
  takes two tainted params and invokes two Command sinks on
  consecutive lines. Locks in that primary line lands inside the
  helper (lines 5-6), not at the caller (line 12). Notes document
  that SinkSite is currently one-per-callee so both findings today
  collapse onto the first sink; expected_sink_lines=[5,6] and
  expected_call_site_lines=[12,12] stay valid either way.
- python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross-
  004): sink os.system lives in helper.py (cross-file), caller in
  app.py reads env source and calls run_cmd. Verifies phase 3's
  cross-file primary attribution: Diag.path = helper.py, Diag.line =
  5, with app.py:7 recorded in flow_steps as a Call step.

Acceptance:
- `cargo test --test benchmark_test -- --ignored --nocapture` passes.
- rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All
  pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are
  TP/TP/TP.
- Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994
  F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on
  264 pre-phase-4, delta is the +2 new cases both resolving TP).
- Full `cargo test` green (1566 lib tests + all integration tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(taint): phase 5/5 lock Finding.primary_location contract via regression test

Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic
SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three
emission stages (pick_primary_sink_sites → emit_ssa_taint_events →
ssa_events_to_findings) against a minimal caller SSA body.  Asserts the
resulting Finding.primary_location is exactly that triple.

The existing integration tests in src/taint/tests.rs cover the coarse
FuncSummary path end-to-end through analyse_file.  This test locks in the
lower-level SSA-side plumbing so a future refactor that silently drops the
site between pick → emit → findings fails here rather than only at the
benchmark layer.

Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003
remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4).

Closes the primary sink-location attribution feature (phases 1-5/5):
* Phase 1 — SinkSite data model on summaries.
* Phase 2 — SinkSite threaded into SsaTaintEvent and Finding.
* Phase 3 — diag + SARIF consume primary_location.
* Phase 4 — benchmark validates primary_call_site_lines across corpus.
* Phase 5 — regression test locks the event→finding contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: clean up formatting and improve readability in multiple files

* refactor: simplify type definition for deduplication key in findings

* test(harness): add must_not_match expectation for FP regression guards

Extends ExpectedFinding with must_not_match field that asserts a
diagnostic must NOT fire — presence is a hard failure. Non-consuming
scan so it coexists with must_match entries on the same rule_id.
Adds forbidden_violations accumulator and updates summary line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules

* feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking

* feat: update switch statement handling to improve control flow analysis

* feat: implement promisify alias handling for JS/TS to enhance taint tracking

* feat: enhance taint tracking by refining expectation handling and adding mode filtering

* feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters

* feat: update taint tracking rules to enforce full mode matching and improve flow analysis

* feat: enhance Ruby subshell handling to improve taint tracking and flow analysis

* feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding

* feat: refine framework detection and update expectation handling for Echo and Sinatra

* feat: implement max_count for taint tracking expectations and deduplicate findings

* feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files

* feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity

* feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files

* feat: add structural invariant checks for SSA bodies

* feat: ensure deterministic phi emission order using BTreeSet

* feat: enhance handling of terminators to ensure authoritative flow through successor edges

* feat: enhance Goto terminator handling to ensure all successors are marked executable

* feat: refactor code for improved readability and organization

* feat: simplify predicate checks and enhance readability in SSA handling

* feat: implement per-file parse timeout and enhance file size handling

* feat: migrate analysis engine toggles from environment variables to configuration file

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: remove unnecessary whitespace in hostile_input_tests.rs

* feat: update dependencies and enhance documentation on language maturity

* feat: enhance security headers and improve request body limits

* feat: implement sink capability bits for deduplication and enhance evidence tagging

* feat: implement dynamic activation handling for gated sinks and enhance validation logic

* feat: enhance configuration documentation and clarify inline analysis cache behavior

* feat: implement panic recovery during analysis to continue scans past errors

* feat: add expectations configuration for taint analysis and performance metrics

* feat: enhance error handling and logging during file reading and mutex locking

* feat: add cross-file body loading tests and plumbing for CF-1 phase

* feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures

* feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality

* feat: enhance classification span handling in CFG and AST for improved source attribution

* feat: add new Express routes for handling user input and telemetry data

* feat: implement ternary expression handling in CFG with diamond structure for JS/TS

* feat: implement Phase CF-3 abstract-domain transfer channels in summaries

* feat: add support for string-prefix transfer in cross-file calls and update tests

* docs: reduce RESULTS.md doc size

* feat: implement Phase CF-4 per-return-path summary decomposition with tests

* feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization

* feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests

* feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests

* refactor: update comments and documentation for clarity and consistency

* style: format code for consistency and readability

* refactor: simplify verdict handling and improve edge checking logic

* refactor: optimize path and identifier collection by avoiding unnecessary cloning

* chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults

* refactor: update documentation and improve clarity in configuration files

* refactor: update documentation and improve clarity in configuration files

* feat: add JS/TS pass-2 convergence tests and expectations configuration

* feat: add Phase 5 regression tests for inline cache origin attribution and update related logic

* feat: implement Phase 7 deduplication and alternative path linking for taint findings

* feat: implement structural DFS index for anonymous functions and update naming conventions

* feat: add Phase 8 regression tests for container-element taint in JS and Python

* feat: add engine-depth profiles and explain-engine option for CLI

* feat: update expectations and add new README fixtures for multi-file scan regression

* feat: implement Phase 11 callback-alias and factory patterns with regression tests

* feat: implement Terminator::Switch for multi-way dispatch and add regression tests

* feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants

* refactor: extract cfg and ssa_transfer to submodules

* refactor: cargo fmt

* refactor: remove unnecessary blank line in cfg_tests.rs

* refactor: remove unnecessary planning file

* chore: update Rust version to 1.88 and bump dependencies in Cargo files

* feat: enhance triage UI with new layout and controls, update README for clarity

* feat: enhance triage UI with new layout and controls, update README for clarity

* chore: remove outdated section from README for version 0.5.0

* docs: improve clarity and consistency in README content

* chore: add "GPL-3.0-or-later" to license options in about.toml

* chore: update license handling in about.toml and check-licenses.mjs

* style: format code for improved readability in TriagePage component

* style: format code for improved readability in TriagePage component

* chore: enhance license handling and improve body_id scoping in seed lookup

* feat: introduce owner and parent body IDs for enhanced seed scoping

* feat: implement direction-aware engine provenance with new CLI flag for strict CI gating

* feat: add Undef SSA operation for improved control-flow handling

* style: improve code formatting for consistency and readability in multiple files

* feat: add 16-function chain SCC across multiple files for enhanced analysis

* style: simplify code formatting for improved readability in multiple files

* fix: update CapHitReason default implementation and improve README clarity

* docs: enhance README with detailed explanations of taint analysis and limitations

* docs: refine README for clarity and consistency in taint analysis section

* style: improve code formatting for better readability in NewScanModal and scans

* fix: update cargo-about command to use --offline for deterministic license generation

* fix: update cargo-about command to use --offline for deterministic license generation

* ci: add step to prime cargo registry cache for deterministic license generation

* feat: add support for non-sink collections in authorization analysis

* feat: enhance authorization checks with row-level ownership equality and binding tracking

* feat: implement self-scoped user handling and enhance ownership checks

* refactor: simplify assertions and formatting in authorization analysis tests

* fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure

* docs: update AI disclosure section for clarity and conciseness

* feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure

* feat: enhance authorization analysis with SSA-derived variable type classification

* feat: implement auth_finding_to_diag function for enhanced security diagnostics

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add args_value_refs to CallSite struct for enhanced argument tracking

* feat: add direction-aware engine provenance with LossDirection classification and new CLI flag

* feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks

* feat: enhance error message handling in cli_validation_tests for better Windows compatibility

* feat: optimize release profile settings in Cargo.toml and update CodeQL configuration

* feat: enhance release build process with SBOM generation and SLSA provenance

* feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: introduce PathFact handling for path safety checks and rejection logic

* feat: update benchmark data and enhance path sanitization logic with new safety checks

* feat: document AI assistance in frontend UI development and human review process

* feat: add return path facts for enhanced path safety checks and update documentation

* chore: update release date for version 0.5.0 in CHANGELOG.md

* chore: clean up ci.yml by removing outdated comments and clarifying steps

* feat: implement cross-language path sanitizers and validators for enhanced security

* feat: enhance SSA value usage tracking by including block terminators and improve path safety checks

* feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases

* refactor: simplify conditional formatting and improve code readability in executor and lower modules

* feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers

* feat: add transform classifiers for Java, Go, and Ruby with corresponding tests

* refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Eli Peter 2026-04-25 17:59:11 -04:00 committed by GitHub
parent c4ce08b452
commit 41128177d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2144 changed files with 201812 additions and 8927 deletions

321
src/ssa/alias.rs Normal file
View file

@ -0,0 +1,321 @@
use std::collections::HashMap;
use serde::{Deserialize, Serialize};
use smallvec::SmallVec;
use super::ir::*;
/// Maximum members per alias group to bound analysis cost.
const MAX_ALIAS_GROUP_SIZE: usize = 16;
/// Result of base-variable alias analysis.
///
/// Maps variable base names that are known to reference the same object.
/// Two names in the same group are must-aliases: a copy `b = a` (with no
/// semantic labels) means `b` and `a` reference the same value, so field
/// paths like `b.data` and `a.data` are interchangeable.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct BaseAliasResult {
/// base_name → canonical name. All aliases map to the same canonical.
canonical: HashMap<String, String>,
/// canonical_name → all member base names (including the canonical itself).
members: HashMap<String, SmallVec<[String; 4]>>,
}
impl BaseAliasResult {
/// An empty result (no aliases detected).
pub fn empty() -> Self {
Self {
canonical: HashMap::new(),
members: HashMap::new(),
}
}
/// True when no aliases were found.
pub fn is_empty(&self) -> bool {
self.members.is_empty()
}
/// Get all must-alias base names for `base` (including itself).
/// Returns `None` if the name has no known aliases.
pub fn aliases_of(&self, base: &str) -> Option<&[String]> {
let canon = self.canonical.get(base)?;
self.members.get(canon).map(|v| v.as_slice())
}
/// Check if two base names are must-aliases.
pub fn are_aliases(&self, a: &str, b: &str) -> bool {
if a == b {
return true;
}
match (self.canonical.get(a), self.canonical.get(b)) {
(Some(ca), Some(cb)) => ca == cb,
_ => false,
}
}
}
/// Compute base-variable alias groups from the copy propagation replacement map.
///
/// For each entry `(dst_val, src_val)` where copy prop replaced `dst` with
/// `src`, looks up the original variable names. If both are plain identifiers
/// (no dots — i.e. not field paths), they are registered as base aliases.
/// Transitive closure is computed so `b = a; c = b` yields group `{a, b, c}`.
pub fn compute_base_aliases(
copy_map: &HashMap<SsaValue, SsaValue>,
body: &SsaBody,
) -> BaseAliasResult {
if copy_map.is_empty() {
return BaseAliasResult::empty();
}
// Union-Find for transitive closure (string-keyed, small N).
let mut parent: HashMap<String, String> = HashMap::new();
fn find(parent: &mut HashMap<String, String>, x: &str) -> String {
if !parent.contains_key(x) {
return x.to_string();
}
let mut root = x.to_string();
// Chase to root (with iteration cap for safety).
for _ in 0..100 {
match parent.get(&root) {
Some(p) if p != &root => root = p.clone(),
_ => break,
}
}
// Path compression.
let mut cur = x.to_string();
for _ in 0..100 {
match parent.get(&cur) {
Some(p) if p != &root => {
let next = p.clone();
parent.insert(cur, root.clone());
cur = next;
}
_ => break,
}
}
root
}
fn union(parent: &mut HashMap<String, String>, a: &str, b: &str) {
let ra = find(parent, a);
let rb = find(parent, b);
if ra != rb {
// Arbitrary root choice — alphabetically smaller becomes root
// for determinism.
if ra < rb {
parent.insert(rb, ra);
} else {
parent.insert(ra, rb);
}
}
}
// Collect alias pairs from the copy map.
for (&dst, &src) in copy_map {
let dst_idx = dst.0 as usize;
let src_idx = src.0 as usize;
if dst_idx >= body.value_defs.len() || src_idx >= body.value_defs.len() {
continue;
}
let dst_name = match &body.value_defs[dst_idx].var_name {
Some(n) => n.as_str(),
None => continue,
};
let src_name = match &body.value_defs[src_idx].var_name {
Some(n) => n.as_str(),
None => continue,
};
// Only alias plain idents — dotted paths (field accesses) are tracked
// independently in SSA and handled by field-aware suppression.
if dst_name.contains('.') || src_name.contains('.') {
continue;
}
// Skip self-aliases.
if dst_name == src_name {
continue;
}
// Ensure both exist in the parent map.
parent
.entry(dst_name.to_string())
.or_insert_with(|| dst_name.to_string());
parent
.entry(src_name.to_string())
.or_insert_with(|| src_name.to_string());
union(&mut parent, dst_name, src_name);
}
if parent.is_empty() {
return BaseAliasResult::empty();
}
// Build groups from union-find.
let mut groups: HashMap<String, SmallVec<[String; 4]>> = HashMap::new();
let all_names: Vec<String> = parent.keys().cloned().collect();
for name in &all_names {
let root = find(&mut parent, name);
groups.entry(root).or_default().push(name.clone());
}
// Remove singleton groups (no aliases) and enforce size limit.
groups.retain(|_, members| members.len() > 1);
for members in groups.values_mut() {
members.sort();
members.truncate(MAX_ALIAS_GROUP_SIZE);
}
// Build canonical map.
let mut canonical: HashMap<String, String> = HashMap::new();
for (root, members) in &groups {
for member in members {
canonical.insert(member.clone(), root.clone());
}
}
BaseAliasResult {
canonical,
members: groups,
}
}
#[cfg(test)]
mod tests {
use super::*;
use petgraph::graph::NodeIndex;
use std::collections::HashMap;
/// Helper: create a ValueDef with the given var_name.
fn vdef(name: &str) -> ValueDef {
ValueDef {
var_name: Some(name.to_string()),
cfg_node: NodeIndex::new(0),
block: BlockId(0),
}
}
fn vdef_none() -> ValueDef {
ValueDef {
var_name: None,
cfg_node: NodeIndex::new(0),
block: BlockId(0),
}
}
fn make_body(defs: Vec<ValueDef>) -> SsaBody {
SsaBody {
blocks: vec![],
entry: BlockId(0),
value_defs: defs,
cfg_node_map: HashMap::new(),
exception_edges: vec![],
}
}
#[test]
fn test_simple_alias_detection() {
// v0 = "a", v1 = "b"; copy_map: v1 → v0 ⇒ {a, b}
let body = make_body(vec![vdef("a"), vdef("b")]);
let mut copy_map = HashMap::new();
copy_map.insert(SsaValue(1), SsaValue(0));
let result = compute_base_aliases(&copy_map, &body);
assert!(!result.is_empty());
assert!(result.are_aliases("a", "b"));
assert!(result.are_aliases("b", "a"));
let aliases = result.aliases_of("a").unwrap();
assert_eq!(aliases.len(), 2);
assert!(aliases.contains(&"a".to_string()));
assert!(aliases.contains(&"b".to_string()));
}
#[test]
fn test_transitive_aliases() {
// v0="a", v1="b", v2="c"; copy_map: v1→v0, v2→v1 ⇒ {a, b, c}
let body = make_body(vec![vdef("a"), vdef("b"), vdef("c")]);
let mut copy_map = HashMap::new();
copy_map.insert(SsaValue(1), SsaValue(0));
copy_map.insert(SsaValue(2), SsaValue(1));
let result = compute_base_aliases(&copy_map, &body);
assert!(result.are_aliases("a", "b"));
assert!(result.are_aliases("b", "c"));
assert!(result.are_aliases("a", "c"));
let aliases = result.aliases_of("c").unwrap();
assert_eq!(aliases.len(), 3);
}
#[test]
fn test_no_alias_for_none_names() {
// v0=None, v1="b"; copy_map: v1→v0 ⇒ no aliases
let body = make_body(vec![vdef_none(), vdef("b")]);
let mut copy_map = HashMap::new();
copy_map.insert(SsaValue(1), SsaValue(0));
let result = compute_base_aliases(&copy_map, &body);
assert!(result.is_empty());
}
#[test]
fn test_dotted_paths_ignored() {
// v0="a.x", v1="b.x"; copy_map: v1→v0 ⇒ no aliases (dotted)
let body = make_body(vec![vdef("a.x"), vdef("b.x")]);
let mut copy_map = HashMap::new();
copy_map.insert(SsaValue(1), SsaValue(0));
let result = compute_base_aliases(&copy_map, &body);
assert!(result.is_empty());
}
#[test]
fn test_alias_group_size_limit() {
// Create 20 variables all aliased to v0
let mut defs = vec![vdef("v0")];
let mut copy_map = HashMap::new();
for i in 1..20u32 {
defs.push(vdef(&format!("v{}", i)));
copy_map.insert(SsaValue(i), SsaValue(0));
}
let body = make_body(defs);
let result = compute_base_aliases(&copy_map, &body);
// All should be aliases, but group is capped at MAX_ALIAS_GROUP_SIZE
let aliases = result.aliases_of("v0").unwrap();
assert_eq!(aliases.len(), MAX_ALIAS_GROUP_SIZE);
}
#[test]
fn test_empty_copy_map() {
let body = make_body(vec![vdef("a"), vdef("b")]);
let copy_map = HashMap::new();
let result = compute_base_aliases(&copy_map, &body);
assert!(result.is_empty());
}
#[test]
fn test_self_alias_ignored() {
// v0="a"; copy_map: v0→v0 ⇒ no aliases (self)
let body = make_body(vec![vdef("a")]);
let mut copy_map = HashMap::new();
copy_map.insert(SsaValue(0), SsaValue(0));
let result = compute_base_aliases(&copy_map, &body);
assert!(result.is_empty());
}
#[test]
fn test_are_aliases_same_name() {
let result = BaseAliasResult::empty();
// Same name is always an alias of itself
assert!(result.are_aliases("x", "x"));
}
}

754
src/ssa/const_prop.rs Normal file
View file

@ -0,0 +1,754 @@
use std::collections::{HashMap, HashSet, VecDeque};
use serde::{Deserialize, Serialize};
use super::ir::*;
/// Lattice value for constant propagation.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub enum ConstLattice {
/// Not yet analyzed (optimistic top).
Top,
/// Known string constant.
Str(String),
/// Known integer constant.
Int(i64),
/// Known boolean constant.
Bool(bool),
/// Null / nil / None.
Null,
/// Multiple possible values — not constant.
Varying,
}
impl ConstLattice {
/// Meet operation: combine two lattice values.
fn meet(&self, other: &Self) -> Self {
match (self, other) {
(ConstLattice::Top, x) | (x, ConstLattice::Top) => x.clone(),
(ConstLattice::Varying, _) | (_, ConstLattice::Varying) => ConstLattice::Varying,
(a, b) if a == b => a.clone(),
_ => ConstLattice::Varying,
}
}
/// Parse a raw constant text into a typed lattice value.
pub(crate) fn parse(text: &str) -> Self {
let trimmed = text.trim();
// Boolean
if trimmed == "true" || trimmed == "True" || trimmed == "TRUE" {
return ConstLattice::Bool(true);
}
if trimmed == "false" || trimmed == "False" || trimmed == "FALSE" {
return ConstLattice::Bool(false);
}
// Null variants
if trimmed == "null"
|| trimmed == "nil"
|| trimmed == "None"
|| trimmed == "NULL"
|| trimmed == "nullptr"
{
return ConstLattice::Null;
}
// Integer (including negative)
if let Ok(i) = trimmed.parse::<i64>() {
return ConstLattice::Int(i);
}
// String: strip surrounding quotes
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
{
let inner = &trimmed[1..trimmed.len() - 1];
return ConstLattice::Str(inner.to_string());
}
// Bare string (no quotes) — treat as string constant
ConstLattice::Str(trimmed.to_string())
}
/// Returns the boolean value if this is a known Bool.
pub fn as_bool(&self) -> Option<bool> {
match self {
ConstLattice::Bool(b) => Some(*b),
// Truthiness: null is false, 0 is false, empty string is false
ConstLattice::Null => Some(false),
ConstLattice::Int(0) => Some(false),
ConstLattice::Str(s) if s.is_empty() => Some(false),
_ => None,
}
}
}
/// Result of constant propagation analysis.
pub struct ConstPropResult {
/// Per-SSA-value constant lattice.
pub values: HashMap<SsaValue, ConstLattice>,
/// Blocks that are statically unreachable.
pub unreachable_blocks: HashSet<BlockId>,
}
/// Run Sparse Conditional Constant Propagation on an SSA body.
pub fn const_propagate(body: &SsaBody) -> ConstPropResult {
let num_blocks = body.blocks.len();
// Per-value lattice: starts at Top
let mut values: HashMap<SsaValue, ConstLattice> = HashMap::new();
// Executable flags per CFG edge (from_block, to_block)
let mut executable_edges: HashSet<(BlockId, BlockId)> = HashSet::new();
// Executable blocks
let mut executable_blocks: HashSet<BlockId> = HashSet::new();
// Two worklists
let mut cfg_worklist: VecDeque<BlockId> = VecDeque::new();
let mut ssa_worklist: VecDeque<SsaValue> = VecDeque::new();
// Mark entry executable
executable_blocks.insert(body.entry);
cfg_worklist.push_back(body.entry);
// Build use-map: SsaValue → list of (BlockId, instruction index in block)
// so we can propagate SSA value changes efficiently.
let mut use_sites: HashMap<SsaValue, Vec<BlockId>> = HashMap::new();
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
for used_val in inst_uses(inst) {
use_sites.entry(used_val).or_default().push(block.id);
}
}
}
// Initialize all values to Top
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
values.insert(inst.value, ConstLattice::Top);
}
}
// Process until both worklists are empty
loop {
let mut changed = false;
// Process CFG worklist
while let Some(block_id) = cfg_worklist.pop_front() {
let block = body.block(block_id);
// Evaluate phis
for phi in &block.phis {
if let SsaOp::Phi(operands) = &phi.op {
let old = values.get(&phi.value).cloned().unwrap_or(ConstLattice::Top);
let new_val = eval_phi(operands, &values, &executable_edges, block_id);
if new_val != old {
values.insert(phi.value, new_val);
ssa_worklist.push_back(phi.value);
changed = true;
}
}
}
// Evaluate body instructions
for inst in &block.body {
let old = values
.get(&inst.value)
.cloned()
.unwrap_or(ConstLattice::Top);
let new_val = eval_inst(inst, &values);
if new_val != old {
values.insert(inst.value, new_val);
ssa_worklist.push_back(inst.value);
changed = true;
}
}
// Process terminator: determine which successors are executable
process_terminator(
block,
body,
&values,
&mut executable_edges,
&mut executable_blocks,
&mut cfg_worklist,
);
}
// Process SSA worklist
while let Some(val) = ssa_worklist.pop_front() {
if let Some(blocks) = use_sites.get(&val) {
for &block_id in blocks {
if !executable_blocks.contains(&block_id) {
continue;
}
let block = body.block(block_id);
// Re-evaluate phis using this value
for phi in &block.phis {
if let SsaOp::Phi(operands) = &phi.op
&& operands.iter().any(|(_, v)| *v == val)
{
let old = values.get(&phi.value).cloned().unwrap_or(ConstLattice::Top);
let new_val = eval_phi(operands, &values, &executable_edges, block_id);
if new_val != old {
values.insert(phi.value, new_val);
ssa_worklist.push_back(phi.value);
changed = true;
}
}
}
// Re-evaluate body instructions using this value
for inst in &block.body {
if inst_uses(inst).contains(&val) {
let old = values
.get(&inst.value)
.cloned()
.unwrap_or(ConstLattice::Top);
let new_val = eval_inst(inst, &values);
if new_val != old {
values.insert(inst.value, new_val);
ssa_worklist.push_back(inst.value);
changed = true;
}
}
}
// Re-evaluate terminator if condition changed
process_terminator(
block,
body,
&values,
&mut executable_edges,
&mut executable_blocks,
&mut cfg_worklist,
);
}
}
}
if !changed {
break;
}
}
// Compute unreachable blocks
let unreachable_blocks: HashSet<BlockId> = (0..num_blocks)
.map(|i| BlockId(i as u32))
.filter(|bid| !executable_blocks.contains(bid))
.collect();
ConstPropResult {
values,
unreachable_blocks,
}
}
/// Evaluate a phi: meet of operands from executable predecessors.
fn eval_phi(
operands: &[(BlockId, SsaValue)],
values: &HashMap<SsaValue, ConstLattice>,
executable_edges: &HashSet<(BlockId, BlockId)>,
this_block: BlockId,
) -> ConstLattice {
let mut result = ConstLattice::Top;
for (pred_block, val) in operands {
if !executable_edges.contains(&(*pred_block, this_block)) {
continue; // skip non-executable predecessors
}
let operand_val = values.get(val).cloned().unwrap_or(ConstLattice::Top);
result = result.meet(&operand_val);
}
result
}
/// Evaluate a single instruction.
fn eval_inst(inst: &SsaInst, values: &HashMap<SsaValue, ConstLattice>) -> ConstLattice {
match &inst.op {
SsaOp::Const(Some(text)) => ConstLattice::parse(text),
SsaOp::Const(None) => ConstLattice::Varying, // unknown constant
SsaOp::Assign(uses) if uses.len() == 1 => {
// Copy: propagate the source's value
values.get(&uses[0]).cloned().unwrap_or(ConstLattice::Top)
}
SsaOp::Assign(_) => ConstLattice::Varying, // expression with multiple uses
SsaOp::Call { .. }
| SsaOp::Source
| SsaOp::Param { .. }
| SsaOp::SelfParam
| SsaOp::CatchParam => ConstLattice::Varying,
SsaOp::Phi(_) => ConstLattice::Varying, // phis in body shouldn't happen
SsaOp::Nop => ConstLattice::Varying,
// Undef contributes no knowledge: `Top` is the lattice identity
// for meet, so a phi operand of Undef leaves the joined value
// to the other incoming operands.
SsaOp::Undef => ConstLattice::Top,
}
}
/// Collect SSA values used by an instruction (for use-map building).
fn inst_uses(inst: &SsaInst) -> Vec<SsaValue> {
match &inst.op {
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
SsaOp::Assign(uses) => uses.to_vec(),
SsaOp::Call { args, receiver, .. } => {
let mut vals = Vec::new();
if let Some(rv) = receiver {
vals.push(*rv);
}
for arg in args {
vals.extend(arg.iter());
}
vals
}
SsaOp::Source
| SsaOp::Const(_)
| SsaOp::Param { .. }
| SsaOp::SelfParam
| SsaOp::CatchParam
| SsaOp::Nop
| SsaOp::Undef => Vec::new(),
}
}
/// Process a block's terminator to determine successor executability.
fn process_terminator(
block: &SsaBlock,
body: &SsaBody,
values: &HashMap<SsaValue, ConstLattice>,
executable_edges: &mut HashSet<(BlockId, BlockId)>,
executable_blocks: &mut HashSet<BlockId>,
cfg_worklist: &mut VecDeque<BlockId>,
) {
match &block.terminator {
Terminator::Goto(_) => {
// `block.succs` is authoritative. For collapsed ≥3-way fanouts
// (see src/ssa/lower.rs `three_successor_collapse`) the terminator
// only records the first successor; marking just that one would
// leave the others unreachable for SCCP. Iterate succs so every
// CFG successor is marked executable.
for &target in &block.succs {
mark_edge_executable(
block.id,
target,
executable_edges,
executable_blocks,
cfg_worklist,
);
}
}
Terminator::Branch {
cond,
true_blk,
false_blk,
condition: _,
} => {
// Try to resolve the condition to a known boolean
let cond_val = body
.cfg_node_map
.get(cond)
.and_then(|v| values.get(v))
.and_then(|c| c.as_bool());
match cond_val {
Some(true) => {
mark_edge_executable(
block.id,
*true_blk,
executable_edges,
executable_blocks,
cfg_worklist,
);
}
Some(false) => {
mark_edge_executable(
block.id,
*false_blk,
executable_edges,
executable_blocks,
cfg_worklist,
);
}
None => {
// Unknown: both successors executable
mark_edge_executable(
block.id,
*true_blk,
executable_edges,
executable_blocks,
cfg_worklist,
);
mark_edge_executable(
block.id,
*false_blk,
executable_edges,
executable_blocks,
cfg_worklist,
);
}
}
}
Terminator::Switch {
scrutinee,
targets,
default,
case_values,
} => {
// Try to resolve scrutinee to a concrete integer literal; if
// we can match it against one of the case literals (not
// currently available on the SSA IR), mark just that target.
// Until per-case literals are threaded through, fall back to
// the sound "any successor executable" behavior, which mirrors
// the pre-Switch cascade.
let _ = (scrutinee, targets, default, case_values);
for &target in &block.succs {
mark_edge_executable(
block.id,
target,
executable_edges,
executable_blocks,
cfg_worklist,
);
}
}
Terminator::Return(_) | Terminator::Unreachable => {
// `block.succs` is authoritative; the terminator is advisory.
// Finally/cleanup continuation edges live on `succs` even when
// the structured terminator is `Return`/`Unreachable`. Mark them
// executable so SCCP reaches downstream (e.g. finally) blocks.
for &target in &block.succs {
mark_edge_executable(
block.id,
target,
executable_edges,
executable_blocks,
cfg_worklist,
);
}
}
}
}
fn mark_edge_executable(
from: BlockId,
to: BlockId,
executable_edges: &mut HashSet<(BlockId, BlockId)>,
executable_blocks: &mut HashSet<BlockId>,
cfg_worklist: &mut VecDeque<BlockId>,
) {
if executable_edges.insert((from, to)) {
if executable_blocks.insert(to) {
cfg_worklist.push_back(to);
} else {
// Block already executable but new edge — re-evaluate phis
cfg_worklist.push_back(to);
}
}
}
/// Apply constant propagation results: prune branches where condition is known constant.
///
/// Returns the number of branches pruned.
pub fn apply_const_prop(body: &mut SsaBody, result: &ConstPropResult) -> usize {
// Collect pruning decisions first to avoid borrow conflicts.
// Each entry: (block_index, taken_block, untaken_block)
let mut prune_ops: Vec<(usize, BlockId, BlockId)> = Vec::new();
for (block_idx, block) in body.blocks.iter().enumerate() {
if let Terminator::Branch {
cond,
true_blk,
false_blk,
condition: _,
} = &block.terminator
{
let cond_val = body
.cfg_node_map
.get(cond)
.and_then(|v| result.values.get(v))
.and_then(|c| c.as_bool());
match cond_val {
Some(true) => {
prune_ops.push((block_idx, *true_blk, *false_blk));
}
Some(false) => {
prune_ops.push((block_idx, *false_blk, *true_blk));
}
None => {}
}
}
}
let pruned = prune_ops.len();
// Apply pruning
for (block_idx, taken, untaken) in prune_ops {
let pred_id = body.blocks[block_idx].id;
body.blocks[block_idx].terminator = Terminator::Goto(taken);
// Remove pred from untaken's preds
let untaken_idx = untaken.0 as usize;
if untaken_idx < body.blocks.len() {
body.blocks[untaken_idx].preds.retain(|p| *p != pred_id);
// Remove phi operands referencing this pred
for phi in &mut body.blocks[untaken_idx].phis {
if let SsaOp::Phi(operands) = &mut phi.op {
operands.retain(|(bid, _)| *bid != pred_id);
}
}
}
// Remove untaken from pred's succs
body.blocks[block_idx].succs.retain(|s| *s != untaken);
}
// Mark unreachable blocks
for &bid in &result.unreachable_blocks {
body.block_mut(bid).terminator = Terminator::Unreachable;
}
pruned
}
/// Collect module aliases from `require()` calls in the SSA body.
///
/// Detects patterns like `const http = require("http")` and propagates
/// aliases through phi nodes (e.g., `const lib = cond ? https : http`).
/// Returns a map from SSA value → set of possible module names.
///
/// Only tracks known HTTP-related modules to avoid false positives.
pub fn collect_module_aliases(
body: &SsaBody,
const_values: &HashMap<SsaValue, ConstLattice>,
) -> HashMap<SsaValue, smallvec::SmallVec<[String; 2]>> {
use smallvec::SmallVec;
// Known modules whose methods are security-relevant for alias tracking.
const KNOWN_MODULES: &[&str] = &["http", "https", "child_process", "fs", "net", "dgram"];
let mut aliases: HashMap<SsaValue, SmallVec<[String; 2]>> = HashMap::new();
// Pass 1: detect `require("module")` calls.
for block in &body.blocks {
for inst in &block.body {
if let SsaOp::Call { callee, args, .. } = &inst.op
&& (callee == "require" || callee.ends_with(".require"))
{
// Check if the first argument is a known module string constant.
if let Some(first_arg) = args.first()
&& let Some(&first_val) = first_arg.first()
&& let Some(ConstLattice::Str(module_name)) = const_values.get(&first_val)
&& KNOWN_MODULES.contains(&module_name.as_str())
{
aliases
.entry(inst.value)
.or_default()
.push(module_name.clone());
}
}
}
}
if aliases.is_empty() {
return aliases;
}
// Pass 2: propagate through copies (single-use Assign) and phi nodes.
let mut changed = true;
let mut iterations = 0;
while changed && iterations < 10 {
changed = false;
iterations += 1;
for block in &body.blocks {
// Phi nodes
for phi in &block.phis {
if let SsaOp::Phi(operands) = &phi.op {
let mut merged: SmallVec<[String; 2]> = SmallVec::new();
for (_, operand_val) in operands {
if let Some(operand_aliases) = aliases.get(operand_val) {
for a in operand_aliases {
if !merged.contains(a) {
merged.push(a.clone());
}
}
}
}
if !merged.is_empty() {
let entry = aliases.entry(phi.value).or_default();
for a in &merged {
if !entry.contains(a) {
entry.push(a.clone());
changed = true;
}
}
}
}
}
// Copy propagation through single-use Assign
for inst in &block.body {
if let SsaOp::Assign(uses) = &inst.op
&& uses.len() == 1
&& let Some(src_aliases) = aliases.get(&uses[0]).cloned()
{
let entry = aliases.entry(inst.value).or_default();
for a in &src_aliases {
if !entry.contains(a) {
entry.push(a.clone());
changed = true;
}
}
}
}
}
}
aliases
}
#[cfg(test)]
mod tests {
use super::*;
use petgraph::graph::NodeIndex;
use smallvec::SmallVec;
fn make_body(blocks: Vec<SsaBlock>, value_defs: Vec<ValueDef>) -> SsaBody {
let cfg_node_map = value_defs
.iter()
.enumerate()
.map(|(i, vd)| (vd.cfg_node, SsaValue(i as u32)))
.collect();
SsaBody {
blocks,
entry: BlockId(0),
value_defs,
cfg_node_map,
exception_edges: Vec::new(),
}
}
#[test]
fn const_literal_parsed() {
assert_eq!(ConstLattice::parse("42"), ConstLattice::Int(42));
assert_eq!(ConstLattice::parse("-1"), ConstLattice::Int(-1));
assert_eq!(ConstLattice::parse("true"), ConstLattice::Bool(true));
assert_eq!(ConstLattice::parse("false"), ConstLattice::Bool(false));
assert_eq!(ConstLattice::parse("null"), ConstLattice::Null);
assert_eq!(ConstLattice::parse("nil"), ConstLattice::Null);
assert_eq!(
ConstLattice::parse("\"hello\""),
ConstLattice::Str("hello".into())
);
assert_eq!(
ConstLattice::parse("'world'"),
ConstLattice::Str("world".into())
);
}
#[test]
fn meet_lattice() {
let a = ConstLattice::Int(42);
let b = ConstLattice::Int(42);
assert_eq!(a.meet(&b), ConstLattice::Int(42));
let c = ConstLattice::Int(99);
assert_eq!(a.meet(&c), ConstLattice::Varying);
assert_eq!(ConstLattice::Top.meet(&a), ConstLattice::Int(42));
assert_eq!(a.meet(&ConstLattice::Top), ConstLattice::Int(42));
assert_eq!(ConstLattice::Varying.meet(&a), ConstLattice::Varying);
}
#[test]
fn single_block_const() {
// v0 = const("42")
let n0 = NodeIndex::new(0);
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("42".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 2),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
};
let body = make_body(
vec![block],
vec![ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
}],
);
let result = const_propagate(&body);
assert_eq!(
result.values.get(&SsaValue(0)),
Some(&ConstLattice::Int(42))
);
assert!(result.unreachable_blocks.is_empty());
}
#[test]
fn copy_propagation_through_assign() {
// v0 = const("true"), v1 = assign(v0)
let n0 = NodeIndex::new(0);
let n1 = NodeIndex::new(1);
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("true".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 4),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("y".into()),
span: (5, 9),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
};
let body = make_body(
vec![block],
vec![
ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("y".into()),
cfg_node: n1,
block: BlockId(0),
},
],
);
let result = const_propagate(&body);
assert_eq!(
result.values.get(&SsaValue(0)),
Some(&ConstLattice::Bool(true))
);
assert_eq!(
result.values.get(&SsaValue(1)),
Some(&ConstLattice::Bool(true))
);
}
}

228
src/ssa/copy_prop.rs Normal file
View file

@ -0,0 +1,228 @@
#![allow(clippy::collapsible_if)]
use std::collections::HashMap;
use super::ir::*;
use crate::cfg::Cfg;
/// Run copy propagation on an SSA body.
///
/// Identifies `Assign([single_use])` instructions where the CFG node has no
/// labels (i.e., no semantic significance like sanitizer/source), then rewrites
/// all uses of the destination value to use the source value directly.
///
/// Returns `(copies_eliminated, resolved_replacement_map)`. The replacement map
/// maps each eliminated destination SsaValue to its transitive root source
/// SsaValue, used downstream by alias analysis to recover base-variable
/// aliasing relationships.
pub fn copy_propagate(body: &mut SsaBody, cfg: &Cfg) -> (usize, HashMap<SsaValue, SsaValue>) {
// 1. Identify copies: Assign with single operand and no labels on CFG node
let mut replace_map: HashMap<SsaValue, SsaValue> = HashMap::new();
for block in &body.blocks {
for inst in &block.body {
if let SsaOp::Assign(uses) = &inst.op {
if uses.len() == 1 {
let src = uses[0];
let info = &cfg[inst.cfg_node];
// Skip if the node has labels — sanitizers, sources, sinks
// have semantic meaning that must be preserved.
if !info.taint.labels.is_empty() {
continue;
}
// Skip numeric-length reads (`arr.length`, `map.size`, etc.):
// the destination is Int-typed (a derived property of the
// source) while the source is typically String/Object/
// Unknown. Copy-propagating through this Assign would
// erase the Int type fact and defeat HTML_ESCAPE / SQL /
// FILE_IO / SHELL sink suppression.
if info.is_numeric_length_access {
continue;
}
// Skip Assigns whose CFG node carries a `string_prefix`
// (template literals or `"lit" + var` RHS recognised by
// `extract_template_prefix`). The abstract-interpretation
// `transfer_abstract` consumes that prefix to seed a
// StringFact on the Assign's SSA value, which downstream
// SSRF suppression reads. Propagating past this Assign
// erases the prefix-bearing SSA value: the Call's args get
// rewritten to the bare upstream variable (no prefix), and
// `is_call_abstract_safe` falls through to a tainted-flow
// emission even on safe fixed-host URLs.
if info.string_prefix.is_some() {
continue;
}
replace_map.insert(inst.value, src);
}
}
}
}
if replace_map.is_empty() {
return (0, HashMap::new());
}
// 2. Build transitive replacement map: chase chains (SSA is acyclic)
let mut resolved: HashMap<SsaValue, SsaValue> = HashMap::new();
for &dst in replace_map.keys() {
let root = resolve_root(dst, &replace_map);
resolved.insert(dst, root);
}
// 3. Rewrite all uses
let mut count = 0;
for block in &mut body.blocks {
// Rewrite phi operands
for phi in &mut block.phis {
if let SsaOp::Phi(operands) = &mut phi.op {
for (_bid, val) in operands.iter_mut() {
if let Some(&root) = resolved.get(val) {
*val = root;
}
}
}
}
// Rewrite body instructions
for inst in &mut block.body {
match &mut inst.op {
SsaOp::Assign(uses) => {
for val in uses.iter_mut() {
if let Some(&root) = resolved.get(val) {
*val = root;
}
}
}
SsaOp::Call { args, receiver, .. } => {
if let Some(rv) = receiver {
if let Some(&root) = resolved.get(rv) {
*rv = root;
}
}
for arg in args.iter_mut() {
for val in arg.iter_mut() {
if let Some(&root) = resolved.get(val) {
*val = root;
}
}
}
}
_ => {}
}
}
}
// 4. Convert copy instructions to Nop (DCE will clean up)
for block in &mut body.blocks {
for inst in &mut block.body {
if resolved.contains_key(&inst.value) {
inst.op = SsaOp::Nop;
count += 1;
}
}
}
(count, resolved)
}
/// Chase the replacement chain to find the root value.
fn resolve_root(val: SsaValue, map: &HashMap<SsaValue, SsaValue>) -> SsaValue {
let mut current = val;
// Safety: SSA is acyclic, but cap iterations to be safe
for _ in 0..1000 {
match map.get(&current) {
Some(&next) if next != current => current = next,
_ => break,
}
}
current
}
#[cfg(test)]
mod tests {
use super::*;
use crate::cfg::{NodeInfo, StmtKind};
use petgraph::Graph;
use smallvec::SmallVec;
fn make_cfg_node(kind: StmtKind) -> NodeInfo {
NodeInfo {
kind,
..Default::default()
}
}
#[test]
fn simple_copy_eliminated() {
// v0 = const("42"), v1 = assign(v0), v2 = assign(v1)
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n2 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("42".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 2),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("y".into()),
span: (3, 5),
},
SsaInst {
value: SsaValue(2),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(1), 1)),
cfg_node: n2,
var_name: Some("z".into()),
span: (6, 8),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("y".into()),
cfg_node: n1,
block: BlockId(0),
},
ValueDef {
var_name: Some("z".into()),
cfg_node: n2,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1)), (n2, SsaValue(2))]
.into_iter()
.collect(),
exception_edges: vec![],
};
let (eliminated, copy_map) = copy_propagate(&mut body, &cfg);
assert_eq!(eliminated, 2);
// Both v1 and v2 should map to v0 (the root)
assert_eq!(copy_map.get(&SsaValue(1)), Some(&SsaValue(0)));
assert_eq!(copy_map.get(&SsaValue(2)), Some(&SsaValue(0)));
// v1 and v2 should be Nop now
assert!(matches!(body.blocks[0].body[1].op, SsaOp::Nop));
assert!(matches!(body.blocks[0].body[2].op, SsaOp::Nop));
}
}

449
src/ssa/dce.rs Normal file
View file

@ -0,0 +1,449 @@
use std::collections::HashMap;
use super::ir::*;
use crate::cfg::Cfg;
use crate::labels::DataLabel;
/// Eliminate dead definitions from an SSA body.
///
/// A definition is dead if its SsaValue has zero uses across the entire body,
/// except for instructions that must be preserved:
/// - `Source` (taint origin, must survive for correctness)
/// - `Call` (may have side effects)
/// - `CatchParam` (exception binding)
/// - Instructions whose CFG node has Sink labels (sink detection relies on them)
///
/// Returns the number of instructions removed.
pub fn eliminate_dead_defs(body: &mut SsaBody, cfg: &Cfg) -> usize {
let mut total_removed = 0;
// Iterate until no more removals (removing a def may make its operands dead)
loop {
let use_counts = build_use_counts(body);
let mut removed_this_pass = 0;
for block in &mut body.blocks {
// Remove dead body instructions
let before = block.body.len();
block.body.retain(|inst| !is_dead(inst, &use_counts, cfg));
removed_this_pass += before - block.body.len();
// Remove dead phi instructions
let before_phis = block.phis.len();
block.phis.retain(|inst| !is_dead(inst, &use_counts, cfg));
removed_this_pass += before_phis - block.phis.len();
}
total_removed += removed_this_pass;
if removed_this_pass == 0 {
break;
}
}
total_removed
}
/// Build a map of SsaValue → number of uses across all instructions and
/// block terminators.
///
/// Terminator uses must be counted: `Terminator::Return(rv)` references the
/// returned value and `Terminator::Branch { condition, .. }` references the
/// condition variable. Without counting these, a value used solely by a
/// terminator (the canonical case for short helpers like
/// `def f(s): return s`) is judged dead, and DCE strips every instruction
/// in the body — leaving empty blocks whose terminators reference
/// nonexistent SsaValues, breaking downstream analyses (per-return-path
/// PathFact narrowing, inline-summary extraction, etc.).
fn build_use_counts(body: &SsaBody) -> HashMap<SsaValue, usize> {
let mut counts: HashMap<SsaValue, usize> = HashMap::new();
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
for v in inst_used_values(inst) {
*counts.entry(v).or_insert(0) += 1;
}
}
for v in terminator_used_values(&block.terminator) {
*counts.entry(v).or_insert(0) += 1;
}
}
counts
}
/// Get all SSA values used by a block terminator.
fn terminator_used_values(term: &Terminator) -> Vec<SsaValue> {
use crate::constraint::lower::{ConditionExpr, Operand};
match term {
Terminator::Return(Some(rv)) => vec![*rv],
Terminator::Return(None) => Vec::new(),
Terminator::Branch { condition, .. } => match condition.as_deref() {
Some(ConditionExpr::BoolTest { var }) => vec![*var],
Some(ConditionExpr::NullCheck { var, .. }) => vec![*var],
Some(ConditionExpr::TypeCheck { var, .. }) => vec![*var],
Some(ConditionExpr::Comparison { lhs, rhs, .. }) => {
let mut out = Vec::new();
if let Operand::Value(v) = lhs {
out.push(*v);
}
if let Operand::Value(v) = rhs {
out.push(*v);
}
out
}
Some(ConditionExpr::Unknown) | None => Vec::new(),
},
Terminator::Switch { scrutinee, .. } => vec![*scrutinee],
Terminator::Goto(_) | Terminator::Unreachable => Vec::new(),
}
}
/// Check if an instruction is dead and safe to remove.
fn is_dead(inst: &SsaInst, use_counts: &HashMap<SsaValue, usize>, cfg: &Cfg) -> bool {
let uses = use_counts.get(&inst.value).copied().unwrap_or(0);
if uses > 0 {
return false;
}
// Never remove side-effectful or semantically required instructions
match &inst.op {
SsaOp::Source => return false,
SsaOp::Call { .. } => return false,
SsaOp::CatchParam => return false,
_ => {}
}
// Never remove instructions whose CFG node has Sink, Source, or Sanitizer labels
if cfg.node_weight(inst.cfg_node).is_some_and(|info| {
info.taint.labels.iter().any(|l| {
matches!(
l,
DataLabel::Sink(_) | DataLabel::Source(_) | DataLabel::Sanitizer(_)
)
})
}) {
return false;
}
true
}
/// Get all SSA values used by an instruction.
fn inst_used_values(inst: &SsaInst) -> Vec<SsaValue> {
match &inst.op {
SsaOp::Phi(operands) => operands.iter().map(|(_, v)| *v).collect(),
SsaOp::Assign(uses) => uses.to_vec(),
SsaOp::Call { args, receiver, .. } => {
let mut vals = Vec::new();
if let Some(rv) = receiver {
vals.push(*rv);
}
for arg in args {
vals.extend(arg.iter());
}
vals
}
SsaOp::Source
| SsaOp::Const(_)
| SsaOp::Param { .. }
| SsaOp::SelfParam
| SsaOp::CatchParam
| SsaOp::Nop
| SsaOp::Undef => Vec::new(),
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::cfg::{NodeInfo, StmtKind};
use petgraph::Graph;
use smallvec::SmallVec;
fn make_cfg_node(kind: StmtKind) -> NodeInfo {
NodeInfo {
kind,
..Default::default()
}
}
#[test]
fn dead_const_removed() {
// v0 = const("42") — unused, should be removed
// v1 = source() — must survive even if unused
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("42".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 2),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Source,
cfg_node: n1,
var_name: Some("tainted".into()),
span: (3, 10),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("tainted".into()),
cfg_node: n1,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(removed, 1);
assert_eq!(body.blocks[0].body.len(), 1);
// Source survives
assert!(matches!(body.blocks[0].body[0].op, SsaOp::Source));
}
#[test]
fn dead_sanitizer_label_preserved() {
// v0 has a Sanitizer label on its CFG node — must survive even if unused
use crate::labels::{Cap, DataLabel};
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(NodeInfo {
taint: crate::cfg::TaintMeta {
labels: smallvec::smallvec![DataLabel::Sanitizer(Cap::HTML_ESCAPE)],
..Default::default()
},
..make_cfg_node(StmtKind::Seq)
});
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Assign(SmallVec::new()),
cfg_node: n0,
var_name: Some("sanitized".into()),
span: (0, 5),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("sanitized".into()),
cfg_node: n0,
block: BlockId(0),
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(
removed, 0,
"Sanitizer-labeled instruction must not be removed"
);
assert_eq!(body.blocks[0].body.len(), 1);
}
#[test]
fn dead_source_label_preserved() {
// v0 has a Source label on its CFG node — must survive even if unused
use crate::labels::{Cap, DataLabel};
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(NodeInfo {
taint: crate::cfg::TaintMeta {
labels: smallvec::smallvec![DataLabel::Source(Cap::all())],
..Default::default()
},
..make_cfg_node(StmtKind::Seq)
});
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Assign(SmallVec::new()),
cfg_node: n0,
var_name: Some("src".into()),
span: (0, 3),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("src".into()),
cfg_node: n0,
block: BlockId(0),
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(removed, 0, "Source-labeled instruction must not be removed");
}
#[test]
fn dead_sink_label_still_preserved() {
// Regression: Sink-labeled dead instructions must still be kept
use crate::labels::{Cap, DataLabel};
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(NodeInfo {
taint: crate::cfg::TaintMeta {
labels: smallvec::smallvec![DataLabel::Sink(Cap::SQL_QUERY)],
..Default::default()
},
..make_cfg_node(StmtKind::Seq)
});
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Assign(SmallVec::new()),
cfg_node: n0,
var_name: Some("q".into()),
span: (0, 2),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("q".into()),
cfg_node: n0,
block: BlockId(0),
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(removed, 0, "Sink-labeled instruction must not be removed");
}
#[test]
fn dead_unlabeled_assign_still_removed() {
// Negative test: unlabeled dead assignments must still be eliminated
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Assign(SmallVec::new()),
cfg_node: n0,
var_name: Some("dead".into()),
span: (0, 4),
}],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("dead".into()),
cfg_node: n0,
block: BlockId(0),
}],
cfg_node_map: [(n0, SsaValue(0))].into_iter().collect(),
exception_edges: vec![],
};
let removed = eliminate_dead_defs(&mut body, &cfg);
assert_eq!(removed, 1, "unlabeled dead assignment must be removed");
assert!(body.blocks[0].body.is_empty());
}
#[test]
fn used_def_preserved() {
// v0 = const("42"), v1 = assign(v0) — v0 is used, both survive
let mut cfg: Cfg = Graph::new();
let n0 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let n1 = cfg.add_node(make_cfg_node(StmtKind::Seq));
let mut body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(Some("42".into())),
cfg_node: n0,
var_name: Some("x".into()),
span: (0, 2),
},
SsaInst {
value: SsaValue(1),
op: SsaOp::Assign(SmallVec::from_elem(SsaValue(0), 1)),
cfg_node: n1,
var_name: Some("y".into()),
span: (3, 5),
},
],
terminator: Terminator::Return(None),
preds: SmallVec::new(),
succs: SmallVec::new(),
}],
entry: BlockId(0),
value_defs: vec![
ValueDef {
var_name: Some("x".into()),
cfg_node: n0,
block: BlockId(0),
},
ValueDef {
var_name: Some("y".into()),
cfg_node: n1,
block: BlockId(0),
},
],
cfg_node_map: [(n0, SsaValue(0)), (n1, SsaValue(1))].into_iter().collect(),
exception_edges: vec![],
};
let removed = eliminate_dead_defs(&mut body, &cfg);
// v1 is dead (unused), but v0 is used by v1 so on first pass only v1 removed,
// then v0 becomes dead on second pass
assert_eq!(removed, 2);
assert_eq!(body.blocks[0].body.len(), 0);
}
}

147
src/ssa/display.rs Normal file
View file

@ -0,0 +1,147 @@
use std::fmt;
use super::ir::*;
impl fmt::Display for SsaBody {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for block in &self.blocks {
let entry_marker = if block.id == self.entry {
" (entry)"
} else {
""
};
writeln!(f, "Block B{}{entry_marker}:", block.id.0)?;
// Predecessors
if !block.preds.is_empty() {
let preds: Vec<String> = block.preds.iter().map(|p| format!("B{}", p.0)).collect();
writeln!(f, " ; preds: {}", preds.join(", "))?;
}
// Phi instructions
for inst in &block.phis {
write!(f, " v{} = ", inst.value.0)?;
if let SsaOp::Phi(ref operands) = inst.op {
let ops: Vec<String> = operands
.iter()
.map(|(bid, val)| format!("B{}:v{}", bid.0, val.0))
.collect();
write!(f, "phi({})", ops.join(", "))?;
}
if let Some(ref name) = inst.var_name {
write!(f, " # {name}")?;
}
writeln!(f)?;
}
// Body instructions
for inst in &block.body {
write!(f, " v{} = ", inst.value.0)?;
match &inst.op {
SsaOp::Phi(_) => write!(f, "phi(???)")?, // shouldn't appear in body
SsaOp::Assign(uses) => {
let uses_str: Vec<String> =
uses.iter().map(|v| format!("v{}", v.0)).collect();
write!(f, "assign({})", uses_str.join(", "))?;
}
SsaOp::Call {
callee,
args,
receiver,
} => {
if let Some(rv) = receiver {
write!(f, "v{}.{callee}(", rv.0)?;
} else {
write!(f, "{callee}(")?;
}
let arg_strs: Vec<String> = args
.iter()
.map(|arg| {
let vs: Vec<String> =
arg.iter().map(|v| format!("v{}", v.0)).collect();
vs.join("+")
})
.collect();
write!(f, "{})", arg_strs.join(", "))?;
}
SsaOp::Source => write!(f, "source()")?,
SsaOp::Const(val) => {
if let Some(v) = val {
write!(f, "const({v})")?;
} else {
write!(f, "const")?;
}
}
SsaOp::Param { index } => write!(f, "param({index})")?,
SsaOp::SelfParam => write!(f, "self_param()")?,
SsaOp::CatchParam => write!(f, "catch_param()")?,
SsaOp::Nop => write!(f, "nop")?,
SsaOp::Undef => write!(f, "undef")?,
}
if let Some(ref name) = inst.var_name {
write!(f, " # {name}")?;
}
// Span info
if inst.span != (0, 0) {
write!(f, " @ {}..{}", inst.span.0, inst.span.1)?;
}
writeln!(f)?;
}
// Terminator
match &block.terminator {
Terminator::Goto(target) => writeln!(f, " goto → B{}", target.0)?,
Terminator::Branch {
true_blk,
false_blk,
..
} => writeln!(
f,
" branch → B{} (true), B{} (false)",
true_blk.0, false_blk.0
)?,
Terminator::Switch {
scrutinee,
targets,
default,
case_values,
} => {
write!(f, " switch v{} → [", scrutinee.0)?;
for (i, t) in targets.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
match case_values.get(i).and_then(|cv| cv.as_ref()) {
Some(lit) => write!(f, "{:?}=B{}", lit, t.0)?,
None => write!(f, "B{}", t.0)?,
}
}
writeln!(f, "] default B{}", default.0)?;
}
Terminator::Return(ret_val) => {
if let Some(v) = ret_val {
writeln!(f, " return v{}", v.0)?
} else {
writeln!(f, " return")?
}
}
Terminator::Unreachable => writeln!(f, " unreachable")?,
}
writeln!(f)?;
}
Ok(())
}
}
impl fmt::Display for SsaValue {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "v{}", self.0)
}
}
impl fmt::Display for BlockId {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "B{}", self.0)
}
}

1350
src/ssa/heap.rs Normal file

File diff suppressed because it is too large Load diff

915
src/ssa/invariants.rs Normal file
View file

@ -0,0 +1,915 @@
//! Structural invariant checks for SSA bodies.
//!
//! In addition to the `Vec<String>` aggregation used by
//! [`check_structural_invariants`], targeted checks that SSA *lowering* may
//! want to query directly (e.g. to decide whether to panic in debug builds
//! or warn + attach an engine note in release builds) return a
//! [`Result<(), InvariantError>`] for a more ergonomic API.
//!
//!
//! These checks prove that [`SsaBody`] instances are well-formed: single-
//! assignment holds, pred/succ edges are mutually consistent, phi operands
//! reference actual predecessors, terminators agree with the successor
//! list, and every `SsaValue` is backed by a matching `ValueDef`.
//!
//! The module is intentionally separate from the lowering code so the same
//! invariants can be exercised from tests that do not have access to the
//! private scaffolding inside [`crate::ssa::lower`]. Each function returns
//! a `Vec<String>` of violation messages rather than panicking; tests can
//! aggregate violations across an entire corpus before failing.
//!
//! Invariants are split into two groups:
//!
//! **Group A — SSA integrity (must hold unconditionally):**
//!
//! 1. `BlockId` indexing — `blocks[i].id == BlockId(i)`
//! 2. Entry block has no predecessors
//! 3. Pred/succ symmetry — `B.succs.contains(S)` ⇔ `S.preds.contains(B)`
//! 4. Phi placement — every phi appears in `block.phis` (never in body)
//! 5. Phi operand arity — ≤ `block.preds.len()`
//! 6. Phi operand sources — every `(pred_bid, _)` operand has
//! `block.preds.contains(pred_bid)`
//! 7. Unique SSA definitions — every `SsaValue` is defined at most once
//! across all phi + body instructions
//! 8. `value_defs` coverage — every defined `SsaValue.0` is a valid index
//! into `value_defs`, and `value_defs[v.0].block` matches the block
//! containing the defining instruction
//! 9. `cfg_node_map` consistency — every `(node, SsaValue)` pair points
//! to an instruction whose `cfg_node == node`
//!
//! **Group B — terminator and reachability (loose, reflecting lowering):**
//!
//! 10. Terminator/succs agreement *subset* form:
//! * `Goto(t)` → `succs.contains(t)` — extras tolerated
//! (3-successor collapse fallback)
//! * `Branch{t, f, …}` → `succs` contains both `t` and `f`
//! * `Return`/`Unreachable` → no constraint on `succs` (CFG may carry
//! finally/cleanup continuation edges that downstream analysis
//! propagates through)
//! 11. Reachability from entry — tolerated exceptions:
//! * blocks that appear as the `catch` side of an exception edge
//!
//! Group B is deliberately permissive: the SSA body's `succs` field is the
//! authoritative successor set for analysis (taint, abstract interp,
//! symbolic execution all enumerate `block.succs`), while the terminator
//! is a structured summary that may simplify or drop CFG-level info.
//! Regression value comes from catching *new* deviations from these
//! already-understood patterns, not from enforcing a textbook SSA shape
//! the lowering never promised.
use super::ir::*;
/// Errors returned by targeted invariant checks.
///
/// Wraps a list of human-readable violation messages — one per offending
/// block — so callers can include every failure in a single panic /
/// warning.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct InvariantError {
pub messages: Vec<String>,
}
impl InvariantError {
/// Join every message onto its own line.
pub fn joined(&self) -> String {
self.messages.join("\n")
}
}
impl std::fmt::Display for InvariantError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.joined())
}
}
impl std::error::Error for InvariantError {}
/// Aggregate invariant violations found in a single body. An empty
/// vector means the body is structurally well-formed.
pub fn check_structural_invariants(body: &SsaBody) -> Vec<String> {
let mut errors = Vec::new();
check_block_ids(body, &mut errors);
check_entry_has_no_preds(body, &mut errors);
check_pred_succ_symmetry(body, &mut errors);
check_terminator_succ_agreement(body, &mut errors);
check_phi_placement_and_arity(body, &mut errors);
check_phi_operand_sources(body, &mut errors);
check_unique_definitions(body, &mut errors);
check_value_def_coverage(body, &mut errors);
check_cfg_node_map(body, &mut errors);
check_reachability(body, &mut errors);
if let Err(e) = check_catch_block_reachability(body) {
errors.extend(e.messages);
}
errors
}
/// Every block carrying an [`SsaOp::CatchParam`] — an exception-handler
/// entry — must be reachable from either the function entry (via normal
/// flow) or from at least one entry in [`SsaBody::exception_edges`].
///
/// When this fails, the CFG builder has produced an orphan catch block
/// that should have been wired up as an exception successor but was not —
/// a real construction bug that otherwise manifests as silent false
/// negatives in resource-cleanup / exception-flow findings.
pub fn check_catch_block_reachability(body: &SsaBody) -> Result<(), InvariantError> {
let n = body.blocks.len();
if n == 0 {
return Ok(());
}
// 1. Identify catch blocks: any block containing a CatchParam op in
// either its phi or body lists.
let catch_blocks: Vec<BlockId> = body
.blocks
.iter()
.filter(|b| {
b.phis
.iter()
.chain(b.body.iter())
.any(|inst| matches!(inst.op, SsaOp::CatchParam))
})
.map(|b| b.id)
.collect();
if catch_blocks.is_empty() {
return Ok(());
}
// 2. BFS from entry via normal succs.
let mut reachable = vec![false; n];
let entry_idx = body.entry.0 as usize;
if entry_idx < n {
reachable[entry_idx] = true;
let mut stack: Vec<BlockId> = vec![body.entry];
while let Some(b) = stack.pop() {
for &s in &body.blocks[b.0 as usize].succs {
let sidx = s.0 as usize;
if sidx < n && !reachable[sidx] {
reachable[sidx] = true;
stack.push(s);
}
}
}
}
// 3. Collect exception-edge targets.
let exception_targets: std::collections::HashSet<BlockId> = body
.exception_edges
.iter()
.map(|(_, catch)| *catch)
.collect();
// 4. Each catch block must be normal-reachable OR an exception target.
let mut messages = Vec::new();
for bid in catch_blocks {
let idx = bid.0 as usize;
let normal = idx < n && reachable[idx];
let via_exception = exception_targets.contains(&bid);
if !normal && !via_exception {
messages.push(format!(
"catch-block orphan: block {:?} carries CatchParam but is neither \
reachable from entry {:?} nor a target of any exception edge",
bid, body.entry
));
}
}
if messages.is_empty() {
Ok(())
} else {
Err(InvariantError { messages })
}
}
// ── Individual invariant checks ─────────────────────────────────────────
fn check_block_ids(body: &SsaBody, errors: &mut Vec<String>) {
for (i, block) in body.blocks.iter().enumerate() {
if block.id.0 as usize != i {
errors.push(format!(
"block at index {i} has mismatched id {:?}",
block.id
));
}
}
}
fn check_entry_has_no_preds(body: &SsaBody, errors: &mut Vec<String>) {
let entry_idx = body.entry.0 as usize;
if entry_idx >= body.blocks.len() {
errors.push(format!("entry {:?} is out of bounds", body.entry));
return;
}
let entry = &body.blocks[entry_idx];
if !entry.preds.is_empty() {
errors.push(format!(
"entry block {:?} has {} predecessor(s): {:?}",
body.entry,
entry.preds.len(),
entry.preds
));
}
}
fn check_pred_succ_symmetry(body: &SsaBody, errors: &mut Vec<String>) {
for block in &body.blocks {
for &succ in &block.succs {
let sidx = succ.0 as usize;
if sidx >= body.blocks.len() {
errors.push(format!(
"block {:?} has out-of-bounds succ {:?}",
block.id, succ
));
continue;
}
if !body.blocks[sidx].preds.contains(&block.id) {
errors.push(format!(
"block {:?} lists succ {:?} but {:?} does not list {:?} as pred",
block.id, succ, succ, block.id
));
}
}
for &pred in &block.preds {
let pidx = pred.0 as usize;
if pidx >= body.blocks.len() {
errors.push(format!(
"block {:?} has out-of-bounds pred {:?}",
block.id, pred
));
continue;
}
if !body.blocks[pidx].succs.contains(&block.id) {
errors.push(format!(
"block {:?} lists pred {:?} but {:?} does not list {:?} as succ",
block.id, pred, pred, block.id
));
}
}
}
}
fn check_terminator_succ_agreement(body: &SsaBody, errors: &mut Vec<String>) {
// Group B — loose agreement. See module docs for rationale.
for block in &body.blocks {
match &block.terminator {
Terminator::Goto(target) => {
if !block.succs.iter().any(|s| s == target) {
errors.push(format!(
"block {:?} Goto({:?}) target not in succs {:?}",
block.id, target, block.succs
));
}
}
Terminator::Branch {
true_blk,
false_blk,
..
} => {
if !block.succs.iter().any(|s| s == true_blk) {
errors.push(format!(
"block {:?} Branch true target {:?} not in succs {:?}",
block.id, true_blk, block.succs
));
}
if !block.succs.iter().any(|s| s == false_blk) {
errors.push(format!(
"block {:?} Branch false target {:?} not in succs {:?}",
block.id, false_blk, block.succs
));
}
}
Terminator::Switch {
targets, default, ..
} => {
// Every Switch target and the default arm must be in succs.
for t in targets {
if !block.succs.iter().any(|s| s == t) {
errors.push(format!(
"block {:?} Switch target {:?} not in succs {:?}",
block.id, t, block.succs
));
}
}
if !block.succs.iter().any(|s| s == default) {
errors.push(format!(
"block {:?} Switch default {:?} not in succs {:?}",
block.id, default, block.succs
));
}
}
Terminator::Return(_) | Terminator::Unreachable => {
// Loose by design — cleanup/finally continuation edges in
// `succs` are expected. Downstream consumers (taint
// `compute_succ_states`, SCCP `process_terminator`) treat
// `succs` as authoritative and propagate across these edges,
// so the terminator shape must not forbid them.
}
}
}
}
fn check_phi_placement_and_arity(body: &SsaBody, errors: &mut Vec<String>) {
for block in &body.blocks {
// Phis must not appear in body.
for inst in &block.body {
if matches!(inst.op, SsaOp::Phi(_)) {
errors.push(format!(
"block {:?} has a Phi op in body (should be in phis): value {:?}",
block.id, inst.value
));
}
}
// Every entry in `phis` must be a Phi op.
for inst in &block.phis {
if !matches!(inst.op, SsaOp::Phi(_)) {
errors.push(format!(
"block {:?} has non-Phi op in phis slot: value {:?}",
block.id, inst.value
));
}
if let SsaOp::Phi(ref ops) = inst.op
&& ops.len() > block.preds.len()
{
errors.push(format!(
"block {:?} phi for {:?} has {} operand(s) > {} pred(s)",
block.id,
inst.value,
ops.len(),
block.preds.len()
));
}
}
}
}
fn check_phi_operand_sources(body: &SsaBody, errors: &mut Vec<String>) {
for block in &body.blocks {
for inst in &block.phis {
if let SsaOp::Phi(ref ops) = inst.op {
for &(pred_bid, operand_value) in ops.iter() {
if !block.preds.contains(&pred_bid) {
errors.push(format!(
"block {:?} phi for {:?} references non-pred {:?} (preds: {:?})",
block.id, inst.value, pred_bid, block.preds
));
}
// Operand value must be a valid SSA index.
if (operand_value.0 as usize) >= body.value_defs.len() {
errors.push(format!(
"block {:?} phi for {:?} has operand {:?} out of value_defs range",
block.id, inst.value, operand_value
));
}
}
}
}
}
}
fn check_unique_definitions(body: &SsaBody, errors: &mut Vec<String>) {
let mut seen: std::collections::HashMap<SsaValue, BlockId> =
std::collections::HashMap::with_capacity(body.value_defs.len());
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
if let Some(prev) = seen.insert(inst.value, block.id) {
errors.push(format!(
"SSA {:?} defined in both {:?} and {:?} — single-assignment violated",
inst.value, prev, block.id
));
}
}
}
}
fn check_value_def_coverage(body: &SsaBody, errors: &mut Vec<String>) {
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
let idx = inst.value.0 as usize;
if idx >= body.value_defs.len() {
errors.push(format!(
"instruction defining {:?} in block {:?} has no entry in value_defs (len {})",
inst.value,
block.id,
body.value_defs.len()
));
continue;
}
let def = &body.value_defs[idx];
if def.block != block.id {
errors.push(format!(
"value_defs[{}] records block {:?} but instruction lives in block {:?}",
idx, def.block, block.id
));
}
}
}
}
fn check_cfg_node_map(body: &SsaBody, errors: &mut Vec<String>) {
for (&cfg_node, &sv) in body.cfg_node_map.iter() {
let idx = sv.0 as usize;
if idx >= body.value_defs.len() {
errors.push(format!(
"cfg_node_map points {:?} → {:?} which is out of value_defs range",
cfg_node, sv
));
continue;
}
let def = &body.value_defs[idx];
if def.cfg_node != cfg_node {
errors.push(format!(
"cfg_node_map inconsistency: map says {:?} → {:?}, but value_defs[{}].cfg_node = {:?}",
cfg_node, sv, idx, def.cfg_node
));
}
}
}
fn check_reachability(body: &SsaBody, errors: &mut Vec<String>) {
let n = body.blocks.len();
if n == 0 {
errors.push("body has zero blocks".into());
return;
}
let entry_idx = body.entry.0 as usize;
if entry_idx >= n {
// already reported by check_entry_has_no_preds
return;
}
// Multi-root BFS: start from the entry *and* from every catch target
// recorded in `exception_edges`. Exception-handler blocks are reached
// via stripped exception edges, so from the SSA body's perspective they
// look like roots — as does anything transitively reachable from them
// (e.g. a `finally` block chained after a `catch`).
let mut visited = vec![false; n];
let mut stack: Vec<BlockId> = Vec::new();
let seed = |bid: BlockId, visited: &mut [bool], stack: &mut Vec<BlockId>| {
let idx = bid.0 as usize;
if idx < visited.len() && !visited[idx] {
visited[idx] = true;
stack.push(bid);
}
};
seed(body.entry, &mut visited, &mut stack);
for (_src, catch_target) in &body.exception_edges {
seed(*catch_target, &mut visited, &mut stack);
}
while let Some(bid) = stack.pop() {
let block = &body.blocks[bid.0 as usize];
for &s in &block.succs {
let sidx = s.0 as usize;
if sidx < n && !visited[sidx] {
visited[sidx] = true;
stack.push(s);
}
}
}
for (i, v) in visited.iter().enumerate() {
if !*v {
let block = &body.blocks[i];
errors.push(format!(
"block {:?} is unreachable from entry {:?} or any exception-handler root",
block.id, body.entry
));
}
}
}
// ── Optimization idempotence ─────────────────────────────────────────────
/// Compute a structural fingerprint of an [`SsaBody`] that is stable across
/// equivalent lowerings / optimisations. Two bodies producing the same
/// fingerprint have the same block structure, terminator shape, per-block
/// phi/body instruction counts and op-kind sequences. SsaValue numbers are
/// not part of the fingerprint, so renumbering between runs does not cause
/// spurious diffs — only shape changes do.
///
/// Phis are emitted in their natural (insertion) order. Lowering now drives
/// phi placement through a `BTreeSet`, so that order is deterministic
/// (alphabetical by `var_name`) and any divergence between runs is a real
/// regression rather than hasher noise.
pub fn body_fingerprint(body: &SsaBody) -> String {
use std::fmt::Write;
let mut out = String::new();
let _ = writeln!(out, "entry={:?}", body.entry);
let _ = writeln!(out, "blocks={}", body.blocks.len());
for block in &body.blocks {
let _ = writeln!(
out,
" b{:?} preds={} succs={} phis={} body={} term={}",
block.id,
block.preds.len(),
block.succs.len(),
block.phis.len(),
block.body.len(),
terminator_kind(&block.terminator),
);
for inst in &block.phis {
if let SsaOp::Phi(ref ops) = inst.op {
let _ = writeln!(
out,
" phi var={} operands={}",
inst.var_name.as_deref().unwrap_or(""),
ops.len(),
);
}
}
for inst in &block.body {
let _ = writeln!(out, " {}", op_kind(&inst.op));
}
}
out
}
fn terminator_kind(t: &Terminator) -> &'static str {
match t {
Terminator::Goto(_) => "Goto",
Terminator::Branch { .. } => "Branch",
Terminator::Switch { .. } => "Switch",
Terminator::Return(_) => "Return",
Terminator::Unreachable => "Unreachable",
}
}
fn op_kind(op: &SsaOp) -> &'static str {
match op {
SsaOp::Phi(_) => "Phi",
SsaOp::Assign(_) => "Assign",
SsaOp::Call { .. } => "Call",
SsaOp::Source => "Source",
SsaOp::Const(_) => "Const",
SsaOp::Param { .. } => "Param",
SsaOp::SelfParam => "SelfParam",
SsaOp::CatchParam => "CatchParam",
SsaOp::Nop => "Nop",
SsaOp::Undef => "Undef",
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::cfg::{Cfg, EdgeKind, NodeInfo, StmtKind, TaintMeta};
use crate::ssa::lower_to_ssa;
use petgraph::Graph;
use petgraph::graph::NodeIndex;
fn make_node(kind: StmtKind) -> NodeInfo {
NodeInfo {
kind,
..Default::default()
}
}
fn def(var: &str) -> NodeInfo {
NodeInfo {
taint: TaintMeta {
defines: Some(var.into()),
..Default::default()
},
..make_node(StmtKind::Seq)
}
}
fn use_var(var: &str) -> NodeInfo {
NodeInfo {
taint: TaintMeta {
uses: vec![var.into()],
..Default::default()
},
..make_node(StmtKind::Seq)
}
}
fn assert_well_formed(body: &SsaBody) {
let errs = check_structural_invariants(body);
assert!(
errs.is_empty(),
"structural invariants failed:\n{}",
errs.join("\n")
);
}
#[test]
fn linear_cfg_is_well_formed() {
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
let n1 = cfg.add_node(def("x"));
let n2 = cfg.add_node(use_var("x"));
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, n1, EdgeKind::Seq);
cfg.add_edge(n1, n2, EdgeKind::Seq);
cfg.add_edge(n2, exit, EdgeKind::Seq);
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
assert_well_formed(&body);
}
#[test]
fn diamond_cfg_is_well_formed() {
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
let if_n = cfg.add_node(make_node(StmtKind::If));
let t = cfg.add_node(def("x"));
let f = cfg.add_node(def("x"));
let join = cfg.add_node(use_var("x"));
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, if_n, EdgeKind::Seq);
cfg.add_edge(if_n, t, EdgeKind::True);
cfg.add_edge(if_n, f, EdgeKind::False);
cfg.add_edge(t, join, EdgeKind::Seq);
cfg.add_edge(f, join, EdgeKind::Seq);
cfg.add_edge(join, exit, EdgeKind::Seq);
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
assert_well_formed(&body);
// Additionally: the join block must carry a phi whose operands come
// from exactly its two predecessors.
let phi_block = body
.blocks
.iter()
.find(|b| !b.phis.is_empty())
.expect("diamond should produce a phi");
for phi in &phi_block.phis {
if let SsaOp::Phi(ref ops) = phi.op {
for (pred, _) in ops {
assert!(
phi_block.preds.iter().any(|p| p == pred),
"phi operand {pred:?} is not a pred of {:?}",
phi_block.id
);
}
}
}
}
#[test]
fn loop_cfg_is_well_formed() {
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
let init = cfg.add_node(def("x"));
let header = cfg.add_node(make_node(StmtKind::Loop));
let body_n = cfg.add_node(def("x"));
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, init, EdgeKind::Seq);
cfg.add_edge(init, header, EdgeKind::Seq);
cfg.add_edge(header, body_n, EdgeKind::True);
cfg.add_edge(body_n, header, EdgeKind::Back);
cfg.add_edge(header, exit, EdgeKind::False);
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
assert_well_formed(&body);
}
#[test]
fn fingerprint_is_stable_on_double_lowering() {
// Lowering twice on the same CFG must produce the same fingerprint.
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
let if_n = cfg.add_node(make_node(StmtKind::If));
let t = cfg.add_node(def("x"));
let f = cfg.add_node(def("x"));
let join = cfg.add_node(use_var("x"));
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, if_n, EdgeKind::Seq);
cfg.add_edge(if_n, t, EdgeKind::True);
cfg.add_edge(if_n, f, EdgeKind::False);
cfg.add_edge(t, join, EdgeKind::Seq);
cfg.add_edge(f, join, EdgeKind::Seq);
cfg.add_edge(join, exit, EdgeKind::Seq);
let a = lower_to_ssa(&cfg, entry, None, true).unwrap();
let b = lower_to_ssa(&cfg, entry, None, true).unwrap();
assert_eq!(body_fingerprint(&a), body_fingerprint(&b));
}
#[test]
fn phis_are_emitted_in_alphabetical_order() {
// Diamond CFG with multiple variables defined on both sides:
// Entry → If → [True: a=, b=, c=] [False: a=, b=, c=] → Join → Exit
// Join should carry phis for a, b, and c, emitted alphabetically
// as a consequence of the BTreeSet-backed phi_placements.
fn defs(vars: &[&str]) -> NodeInfo {
// Chain multiple Seq nodes; tests/fixtures route each `def(var)`
// through its own node, so build a little sub-block here.
// For a single NodeInfo we can only record one define; callers
// emit one node per variable.
NodeInfo {
taint: TaintMeta {
defines: Some(vars[0].into()),
..Default::default()
},
..make_node(StmtKind::Seq)
}
}
let mut cfg: Cfg = Graph::new();
let entry = cfg.add_node(make_node(StmtKind::Entry));
let if_n = cfg.add_node(make_node(StmtKind::If));
// True branch defines c, then a, then b (intentionally non-alphabetical
// to prove the fingerprint order is driven by lowering, not source).
let t_c = cfg.add_node(defs(&["c"]));
let t_a = cfg.add_node(defs(&["a"]));
let t_b = cfg.add_node(defs(&["b"]));
// False branch: same vars, different order to make sure neither side
// accidentally sets the ordering downstream.
let f_b = cfg.add_node(defs(&["b"]));
let f_c = cfg.add_node(defs(&["c"]));
let f_a = cfg.add_node(defs(&["a"]));
let join = cfg.add_node(NodeInfo {
taint: TaintMeta {
uses: vec!["a".into(), "b".into(), "c".into()],
..Default::default()
},
..make_node(StmtKind::Seq)
});
let exit = cfg.add_node(make_node(StmtKind::Exit));
cfg.add_edge(entry, if_n, EdgeKind::Seq);
cfg.add_edge(if_n, t_c, EdgeKind::True);
cfg.add_edge(t_c, t_a, EdgeKind::Seq);
cfg.add_edge(t_a, t_b, EdgeKind::Seq);
cfg.add_edge(t_b, join, EdgeKind::Seq);
cfg.add_edge(if_n, f_b, EdgeKind::False);
cfg.add_edge(f_b, f_c, EdgeKind::Seq);
cfg.add_edge(f_c, f_a, EdgeKind::Seq);
cfg.add_edge(f_a, join, EdgeKind::Seq);
cfg.add_edge(join, exit, EdgeKind::Seq);
let body = lower_to_ssa(&cfg, entry, None, true).unwrap();
let join_block = body
.blocks
.iter()
.find(|b| b.phis.len() >= 3)
.expect("join block should carry phis for a, b, c");
let names: Vec<&str> = join_block
.phis
.iter()
.filter_map(|inst| inst.var_name.as_deref())
.collect();
assert_eq!(
names,
vec!["a", "b", "c"],
"phis within a block must be emitted in alphabetical var_name order"
);
}
#[test]
fn broken_pred_succ_symmetry_is_detected() {
// Hand-craft a body with inconsistent pred/succ lists.
use smallvec::smallvec;
let body = SsaBody {
blocks: vec![
SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![],
terminator: Terminator::Goto(BlockId(1)),
preds: smallvec![],
succs: smallvec![BlockId(1)],
},
SsaBlock {
id: BlockId(1),
phis: vec![],
body: vec![],
terminator: Terminator::Unreachable,
preds: smallvec![], // Missing pred back to 0.
succs: smallvec![],
},
],
entry: BlockId(0),
value_defs: vec![],
cfg_node_map: Default::default(),
exception_edges: vec![],
};
let errs = check_structural_invariants(&body);
assert!(
errs.iter().any(|e| e.contains("does not list")),
"expected a symmetry violation, got: {:?}",
errs
);
}
#[test]
fn duplicate_ssa_def_is_detected() {
use smallvec::smallvec;
let dummy_cfg = NodeIndex::new(0);
let body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
SsaInst {
value: SsaValue(0),
op: SsaOp::Const(None),
cfg_node: dummy_cfg,
var_name: None,
span: (0, 0),
},
SsaInst {
value: SsaValue(0), // duplicate
op: SsaOp::Const(None),
cfg_node: dummy_cfg,
var_name: None,
span: (0, 0),
},
],
terminator: Terminator::Unreachable,
preds: smallvec![],
succs: smallvec![],
}],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: None,
cfg_node: dummy_cfg,
block: BlockId(0),
}],
cfg_node_map: Default::default(),
exception_edges: vec![],
};
let errs = check_structural_invariants(&body);
assert!(
errs.iter()
.any(|e| e.contains("single-assignment violated")),
"expected a duplicate-def violation, got: {:?}",
errs
);
}
#[test]
fn phi_operand_from_non_pred_is_detected() {
use smallvec::smallvec;
let dummy_cfg = NodeIndex::new(0);
let body = SsaBody {
blocks: vec![
SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![],
terminator: Terminator::Goto(BlockId(1)),
preds: smallvec![],
succs: smallvec![BlockId(1)],
},
SsaBlock {
id: BlockId(1),
// Phi claims an operand from block 2 which isn't in preds.
phis: vec![SsaInst {
value: SsaValue(0),
op: SsaOp::Phi(smallvec![(BlockId(2), SsaValue(0))]),
cfg_node: dummy_cfg,
var_name: Some("x".into()),
span: (0, 0),
}],
body: vec![],
terminator: Terminator::Unreachable,
preds: smallvec![BlockId(0)],
succs: smallvec![],
},
],
entry: BlockId(0),
value_defs: vec![ValueDef {
var_name: Some("x".into()),
cfg_node: dummy_cfg,
block: BlockId(1),
}],
cfg_node_map: Default::default(),
exception_edges: vec![],
};
let errs = check_structural_invariants(&body);
assert!(
errs.iter().any(|e| e.contains("references non-pred")),
"expected a phi-operand-source violation, got: {:?}",
errs
);
}
#[test]
fn terminator_disagreeing_with_succs_is_detected() {
use smallvec::smallvec;
let body = SsaBody {
blocks: vec![SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![],
// Goto(1) but succs is empty.
terminator: Terminator::Goto(BlockId(1)),
preds: smallvec![],
succs: smallvec![],
}],
entry: BlockId(0),
value_defs: vec![],
cfg_node_map: Default::default(),
exception_edges: vec![],
};
let errs = check_structural_invariants(&body);
assert!(
errs.iter().any(|e| e.contains("Goto")),
"expected a terminator/succ disagreement, got: {:?}",
errs
);
}
}

213
src/ssa/ir.rs Normal file
View file

@ -0,0 +1,213 @@
use crate::constraint::domain::ConstValue;
use crate::constraint::lower::ConditionExpr;
use petgraph::graph::NodeIndex;
use serde::{Deserialize, Serialize};
use smallvec::SmallVec;
/// Unique identifier for an SSA value (one per definition point).
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
pub struct SsaValue(pub u32);
/// Basic block identifier.
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
pub struct BlockId(pub u32);
/// SSA instruction operation.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum SsaOp {
/// Phi: merge values from predecessor blocks.
Phi(SmallVec<[(BlockId, SsaValue); 2]>),
/// Assignment: result depends on the listed SSA values.
Assign(SmallVec<[SsaValue; 4]>),
/// Function/method call.
Call {
callee: String,
/// Per-argument SSA value uses.
args: Vec<SmallVec<[SsaValue; 2]>>,
/// Receiver SSA value (for method calls).
receiver: Option<SsaValue>,
},
/// Taint source introduction.
Source,
/// Constant / literal value (no taint).
/// The optional string carries the raw source text when captured during lowering.
Const(Option<String>),
/// Function parameter (positional). Index is the 0-based positional
/// parameter index, *excluding* any implicit receiver (`self`/`this`).
/// The receiver, when present, is represented by [`SsaOp::SelfParam`].
Param { index: usize },
/// Implicit method receiver (`self` in Rust/Python, `this` in
/// JS/TS/Java/PHP). Emitted in block 0 of a function body whenever the
/// body has a receiver (either an explicit `self` formal parameter or an
/// implicit `this` reference). Having a dedicated IR node keeps
/// receiver taint tracking entirely separate from positional-parameter
/// taint, eliminating off-by-receiver arithmetic at call sites.
SelfParam,
/// Catch-clause exception binding.
CatchParam,
/// Non-defining node (e.g. If condition evaluation, Entry, Exit).
Nop,
/// Sentinel for "no reaching definition on this control-flow edge".
///
/// Emitted by SSA lowering as a synthesized instruction in the entry
/// block and referenced from phi operands whose incoming edge does
/// not carry a definition of the phi's variable — e.g. a try/catch
/// rejoin where a variable is only defined on the normal path, or
/// an early-return branch on a later-defined variable.
///
/// Having an explicit value lets phis satisfy the invariant that
/// `phi.operands.len() == block.preds.len()` (one operand per
/// predecessor). Downstream analyses treat Undef as a
/// no-taint / unknown / bottom-of-the-lattice contribution: a phi
/// operand of Undef carries no caps, no concrete value, and no
/// abstract fact.
Undef,
}
/// A single SSA instruction.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct SsaInst {
/// The SSA value defined by this instruction.
pub value: SsaValue,
/// The operation.
pub op: SsaOp,
/// The original CFG node this instruction was derived from.
pub cfg_node: NodeIndex,
/// Original variable name (for debugging and label lookups).
pub var_name: Option<String>,
/// Source byte span from the original file.
pub span: (usize, usize),
}
/// Basic block terminator.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum Terminator {
Goto(BlockId),
Branch {
cond: NodeIndex,
true_blk: BlockId,
false_blk: BlockId,
/// Structured condition lowered from CFG metadata during SSA construction.
/// `None` when the condition could not be lowered (falls back to text-based
/// lowering in taint transfer).
condition: Option<Box<ConditionExpr>>,
},
/// Multi-way switch dispatch.
///
/// `targets` lists the per-case successor blocks (order matches the
/// source-order of cases in the switch); `default` is the fallback
/// branch taken when no case matches. Block `succs` remain the
/// authoritative flow set — the terminator is a structured summary.
///
/// Emitted only for switch-like dispatch whose semantics are
/// guaranteed-exclusive across cases (e.g. Go `switch`, Java
/// arrow-switch, Rust `match`). Fall-through switches (C, C++, Java
/// classic switch without `break`) continue to use the cascaded
/// `Branch` lowering because the precision advantage only holds when
/// cases are mutually exclusive.
Switch {
scrutinee: SsaValue,
targets: SmallVec<[BlockId; 4]>,
default: BlockId,
/// Per-target case literals, aligned 1:1 with `targets`.
///
/// `Some(c)` records the constant value the scrutinee must equal for
/// the corresponding target to be taken. `None` means the literal is
/// unknown — emitted for synthetic ≥3-way CFG fanouts or for case
/// patterns that aren't plain literals (OR-patterns, ranges, guards).
///
/// When omitted/empty (length zero), all targets behave as "unknown
/// literal" — preserves backward compatibility with consumers that
/// only inspect `targets`/`default`.
#[serde(default)]
case_values: SmallVec<[Option<ConstValue>; 4]>,
},
Return(Option<SsaValue>),
Unreachable,
}
/// A basic block in SSA form.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct SsaBlock {
pub id: BlockId,
/// Phi instructions (always at block start).
pub phis: Vec<SsaInst>,
/// Body instructions (after phis).
pub body: Vec<SsaInst>,
/// Block terminator.
pub terminator: Terminator,
/// Predecessor block IDs.
pub preds: SmallVec<[BlockId; 2]>,
/// Successor block IDs.
pub succs: SmallVec<[BlockId; 2]>,
}
/// Per-value definition metadata.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct ValueDef {
/// Original variable name (if any).
pub var_name: Option<String>,
/// The CFG node where this value was defined.
pub cfg_node: NodeIndex,
/// The block containing the definition.
pub block: BlockId,
}
/// Complete SSA representation for a function/scope.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct SsaBody {
/// All basic blocks, indexed by BlockId.
pub blocks: Vec<SsaBlock>,
/// Entry block.
pub entry: BlockId,
/// Per-SsaValue definition info, indexed by SsaValue.0.
pub value_defs: Vec<ValueDef>,
/// Map from original CFG NodeIndex to the primary SsaValue defined there.
pub cfg_node_map: std::collections::HashMap<NodeIndex, SsaValue>,
/// Exception edges: (source block, catch entry block).
/// Recorded during lowering when exception edges are stripped from the CFG.
/// Used by taint analysis to seed catch blocks with try-body taint state.
pub exception_edges: Vec<(BlockId, BlockId)>,
}
impl SsaBody {
/// Get a block by its ID.
pub fn block(&self, id: BlockId) -> &SsaBlock {
&self.blocks[id.0 as usize]
}
/// Get a mutable block by its ID.
pub fn block_mut(&mut self, id: BlockId) -> &mut SsaBlock {
&mut self.blocks[id.0 as usize]
}
/// Total number of SSA values.
pub fn num_values(&self) -> usize {
self.value_defs.len()
}
/// Look up definition info for an SSA value.
pub fn def_of(&self, v: SsaValue) -> &ValueDef {
&self.value_defs[v.0 as usize]
}
}
/// Errors that can occur during SSA lowering.
#[derive(Debug, Clone)]
pub enum SsaError {
/// The CFG has no reachable nodes from the entry.
EmptyCfg,
/// Entry node not found in the CFG.
InvalidEntry,
}
impl std::fmt::Display for SsaError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
SsaError::EmptyCfg => write!(f, "CFG has no reachable nodes"),
SsaError::InvalidEntry => write!(f, "entry node not found in CFG"),
}
}
}
impl std::error::Error for SsaError {}

2864
src/ssa/lower.rs Normal file

File diff suppressed because it is too large Load diff

90
src/ssa/mod.rs Normal file
View file

@ -0,0 +1,90 @@
#[allow(dead_code)] // IR types — fields used by Display impl, tests, and downstream analyses
pub mod alias;
pub mod const_prop;
pub mod copy_prop;
pub mod dce;
pub mod display;
pub mod heap;
pub mod invariants;
#[allow(dead_code)]
pub mod ir;
pub mod lower;
pub mod param_points_to;
pub mod pointsto;
pub mod static_map;
pub mod type_facts;
#[allow(unused_imports)]
pub use ir::*;
pub use lower::lower_to_ssa;
pub use lower::lower_to_ssa_scoped_nop;
pub use lower::lower_to_ssa_with_params;
use crate::cfg::Cfg;
use crate::symbol::Lang;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
/// Result of SSA optimization passes.
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct OptimizeResult {
/// Per-SSA-value constant lattice values.
pub const_values: HashMap<SsaValue, const_prop::ConstLattice>,
/// Type fact analysis results.
pub type_facts: type_facts::TypeFactResult,
/// Base-variable alias groups from copy propagation.
pub alias_result: alias::BaseAliasResult,
/// Points-to analysis: per-SSA-value abstract heap object sets.
pub points_to: heap::PointsToResult,
/// Module aliases from `require()` calls: SSA value → possible module names.
/// Used to resolve dynamic dispatch like `lib.request()` where `lib = require("http")`.
pub module_aliases: HashMap<SsaValue, smallvec::SmallVec<[String; 2]>>,
/// Number of branches pruned by constant propagation.
pub branches_pruned: usize,
/// Number of copies eliminated.
pub copies_eliminated: usize,
/// Number of dead definitions removed.
pub dead_defs_removed: usize,
}
/// Run all SSA optimization passes on a body.
///
/// Pipeline: const propagation → branch pruning → copy propagation → DCE → type facts.
pub fn optimize_ssa(body: &mut SsaBody, cfg: &Cfg, lang: Option<Lang>) -> OptimizeResult {
// 1. Constant propagation (SCCP)
let cp = const_prop::const_propagate(body);
let branches_pruned = const_prop::apply_const_prop(body, &cp);
// 2. Copy propagation
let (copies_eliminated, copy_map) = copy_prop::copy_propagate(body, cfg);
// 3. Alias analysis (uses copy_map before DCE removes dead defs)
let alias_result = alias::compute_base_aliases(&copy_map, body);
// 4. Dead code elimination
let dead_defs_removed = dce::eliminate_dead_defs(body, cfg);
// 5. Type fact analysis (uses const prop results + language for constructor inference)
let type_facts = type_facts::analyze_types(body, cfg, &cp.values, lang);
// 6. Points-to analysis (uses allocation site detection + SSA def-use)
let points_to = heap::analyze_points_to(body, cfg, lang);
// 7. Module alias analysis (require() tracking for JS/TS)
let module_aliases = if matches!(lang, Some(Lang::JavaScript) | Some(Lang::TypeScript)) {
const_prop::collect_module_aliases(body, &cp.values)
} else {
HashMap::new()
};
OptimizeResult {
const_values: cp.values,
type_facts,
alias_result,
points_to,
module_aliases,
branches_pruned,
copies_eliminated,
dead_defs_removed,
}
}

649
src/ssa/param_points_to.rs Normal file
View file

@ -0,0 +1,649 @@
//! Parameter-granularity points-to analysis.
//!
//! Produces a [`PointsToSummary`] for a function body by walking the SSA
//! once and recording two classes of aliasing:
//!
//! 1. **Param → Param field writes.** An `obj.field = val` where `obj`
//! traces back to parameter `b` and `val` traces back to parameter `a`
//! emits a `Param(a) → Param(b)` `MayAlias` edge. This captures the
//! `mutating_helper` pattern — the callee mutates a shared heap cell
//! through one parameter and the caller observes the mutation through
//! its argument for that parameter.
//!
//! 2. **Param → Return aliases.** `Terminator::Return(v)` where `v`
//! traces back to a parameter emits a `Param(i) → Return` edge. This
//! captures the `returned_alias` pattern — the callee returns its
//! argument unchanged and the caller treats the result as aliasing the
//! input.
//!
//! Field-write detection uses the existing SSA lowering convention: a
//! source-level `obj.x = val` is lowered to an `Assign` whose `var_name`
//! is the dotted path `"obj.x"`, plus synthetic parent-path Assigns that
//! propagate the write up to the base (`"obj"`). See
//! [`crate::ssa::lower`]'s "Synthetic base update" block for the
//! canonical source.
//!
//! The analysis is **flow-insensitive** and **bounded**: it does not
//! reason about path feasibility, and it stops adding edges once the
//! summary's [`MAX_ALIAS_EDGES`] cap is reached — the overflow flag is
//! the conservative fallback that callers honour.
use std::collections::{HashMap, HashSet};
use smallvec::SmallVec;
use crate::summary::points_to::{AliasKind, AliasPosition, PointsToSummary};
use crate::symbol::Lang;
use super::ir::{SsaBody, SsaOp, SsaValue, Terminator};
/// Map an SSA value back to its defining instruction's op.
///
/// Local to this module — the taint engine has its own `build_inst_map`
/// that also carries receiver info we do not need, and duplicating it
/// keeps this analysis independent of that private helper's shape.
fn build_op_map(ssa: &SsaBody) -> HashMap<SsaValue, SsaOp> {
let mut map = HashMap::with_capacity(ssa.num_values());
for block in &ssa.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
map.insert(inst.value, inst.op.clone());
}
}
map
}
/// Sibling of [`build_op_map`] that captures the optional `var_name`
/// recorded on each SSA instruction. Used alongside the op map so a
/// [`ParamHit`] can surface the underlying variable name for
/// formal-index resolution.
fn build_var_name_map(ssa: &SsaBody) -> HashMap<SsaValue, Option<String>> {
let mut map = HashMap::with_capacity(ssa.num_values());
for block in &ssa.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
map.insert(inst.value, inst.var_name.clone());
}
}
map
}
/// Information about an SSA `Param { index }` node needed to resolve
/// back to a caller-side positional index via formal-params lookup.
#[derive(Clone, Debug)]
struct ParamHit {
/// The `SsaOp::Param` index as lowered.
ssa_index: usize,
/// The parameter's variable name (from [`SsaInst::var_name`]). Used
/// to map back to the formal-declaration position — the caller's
/// `args[i]` slot is keyed by declaration position, not by SSA
/// index, and the two can disagree when a formal parameter is
/// skipped from SSA lowering (e.g., pure-output params).
var_name: Option<String>,
}
/// Walk Assign/Phi chains to find a backing `Param { index }` SSA op.
///
/// Returns the `SsaOp::Param`'s index *and* its var_name so callers can
/// resolve the formal-positional index via the name lookup table — the
/// two indices can disagree when SSA lowering skips a formal parameter
/// (never used as a read), shifting subsequent param indices down.
fn trace_to_param_hit(
v: SsaValue,
op_map: &HashMap<SsaValue, SsaOp>,
var_names: &HashMap<SsaValue, Option<String>>,
visited: &mut HashSet<SsaValue>,
) -> Option<ParamHit> {
if !visited.insert(v) {
return None;
}
match op_map.get(&v)? {
SsaOp::Param { index } => Some(ParamHit {
ssa_index: *index,
var_name: var_names.get(&v).cloned().flatten(),
}),
SsaOp::Assign(uses) => {
for u in uses {
if let Some(hit) = trace_to_param_hit(*u, op_map, var_names, visited) {
return Some(hit);
}
}
None
}
SsaOp::Phi(operands) => {
for (_, pv) in operands {
if let Some(hit) = trace_to_param_hit(*pv, op_map, var_names, visited) {
return Some(hit);
}
}
None
}
// Call produces a fresh identity; Const / Source / CatchParam /
// SelfParam / Nop are not param-derived.
_ => None,
}
}
/// Resolve a [`ParamHit`] to a caller-side positional index using the
/// formal-params name lookup. Falls back to the SSA `index` when no
/// name-based match exists (e.g., extractor called without
/// `formal_param_names`).
fn param_hit_to_formal_index(hit: &ParamHit, params_by_name: &HashMap<String, usize>) -> usize {
if let Some(name) = &hit.var_name
&& let Some(&idx) = params_by_name.get(name)
{
return idx;
}
hit.ssa_index
}
/// Parse the base of a dotted / indexed path into its root name.
///
/// * `"obj"` → `"obj"`
/// * `"obj.field"` → `"obj"`
/// * `"obj.field.sub"` → `"obj"`
/// * `"obj[0]"` → `"obj"`
/// * `"obj.list[2].name"` → `"obj"`
///
/// Used to decide whether a field-style Assign's LHS base names a
/// parameter variable — we strip everything after the first separator
/// and compare the remainder to the recorded param names.
fn base_of_path(name: &str) -> &str {
let dot = name.find('.');
let bracket = name.find('[');
let end = match (dot, bracket) {
(Some(d), Some(b)) => d.min(b),
(Some(d), None) => d,
(None, Some(b)) => b,
(None, None) => return name,
};
&name[..end]
}
/// Local receiver check duplicated to avoid depending on private
/// `lower::is_receiver_name`. Must stay in sync with that helper.
fn is_receiver_name_local(name: &str) -> bool {
matches!(name, "self" | "this")
}
/// Walk Assign/Phi chains from a return value to decide whether the path
/// ends at a fresh container allocation (literal or constructor call).
///
/// Returns `true` the first time a qualifying allocation is found.
/// Parameter-terminated paths, `Call` ops that are not container
/// constructors, and constants that are not container literals all
/// return `false` — soundly under-approximating, since the caller will
/// simply fall back to the existing `Param(i) → Return` / store-into-
/// heap channels when the flag is absent.
fn trace_to_fresh_alloc(
v: SsaValue,
op_map: &HashMap<SsaValue, SsaOp>,
lang: Option<Lang>,
visited: &mut HashSet<SsaValue>,
) -> bool {
if !visited.insert(v) {
return false;
}
let Some(op) = op_map.get(&v) else {
return false;
};
match op {
SsaOp::Const(Some(text)) => crate::ssa::heap::is_container_literal_public(text),
SsaOp::Call { callee, .. } => lang
.map(|l| crate::ssa::heap::is_container_constructor(callee, l))
.unwrap_or(false),
SsaOp::Assign(uses) => uses
.iter()
.any(|u| trace_to_fresh_alloc(*u, op_map, lang, visited)),
SsaOp::Phi(operands) => operands
.iter()
.any(|(_, pv)| trace_to_fresh_alloc(*pv, op_map, lang, visited)),
_ => false,
}
}
/// Whether any `Terminator::Return(Some(v))` in the body traces back to a
/// fresh container allocation. Invoked once per function; the visited
/// set is fresh per return block so distinct returns do not poison each
/// other's searches.
fn returns_fresh_allocation(
ssa: &SsaBody,
op_map: &HashMap<SsaValue, SsaOp>,
lang: Option<Lang>,
) -> bool {
for block in &ssa.blocks {
let Terminator::Return(Some(v)) = block.terminator else {
continue;
};
let mut visited = HashSet::new();
if trace_to_fresh_alloc(v, op_map, lang, &mut visited) {
return true;
}
}
false
}
/// Compute the parameter-granularity points-to summary for a function.
///
/// `param_info` carries one `(param_index, param_name, param_ssa_value)`
/// tuple per formal parameter that was emitted as [`SsaOp::Param`] in the
/// lowered body. The receiver is intentionally excluded — this table
/// captures positional parameters only.
///
/// `formal_param_names`, when supplied, is the authoritative list of
/// declared parameter names in declaration order. It matters for
/// **pure-output parameters**: a param like `target` in
/// `fn set(target, val): target.data = val` is never *used* in the body
/// (only assigned into), so SSA lowering does not emit a `Param` node
/// for it and `param_info` will not contain it. Falling back to
/// `formal_param_names` lets the base-name lookup still find its index.
///
/// `formal_param_count` bounds the parameter indices written to the
/// summary: scoped lowering synthesises `Param` ops for module-level
/// captures at indices beyond the formal arity, and those must not leak
/// into the summary (they would trip [`crate::summary::ssa_summary_fits_arity`]).
pub fn analyse_param_points_to(
ssa: &SsaBody,
param_info: &[(usize, String, SsaValue)],
formal_param_count: usize,
formal_param_names: Option<&[String]>,
lang: Option<Lang>,
) -> PointsToSummary {
let mut summary = PointsToSummary::empty();
let op_map = build_op_map(ssa);
let var_names = build_var_name_map(ssa);
// ── 0. Fresh-container return detection ─────────────────────────────
//
// A return path traces back to either:
// * `SsaOp::Const(text)` where `text` is a container literal
// (`[]`, `{}`, `new Map()`, …), OR
// * `SsaOp::Call { callee, … }` where `callee` matches a known
// container constructor for `lang` (`ArrayList`, `dict`, …).
//
// When at least one return path matches, the callee produces a
// caller-visible fresh heap identity on that path — callers
// synthesise a `HeapObjectId` keyed on the call result so later
// container operations have a stable heap cell. Traces that reach a
// parameter are handled by the edge-based `Param(i) → Return` channel
// below and do not contribute here; a mixed function emits both.
//
// Runs before the early-out on `formal_param_count == 0` so pure
// factories (zero-param container constructors) still record the
// fresh-alloc signal.
if returns_fresh_allocation(ssa, &op_map, lang) {
summary.returns_fresh_alloc = true;
}
if formal_param_count == 0 {
return summary;
}
// Build the name→positional-index map. Summary param indices are
// *positional* — they match the call-site `args[i]` position, which
// excludes the receiver (`self`/`this`). When `formal_param_names`
// contains a leading receiver, skip it so the remaining names align
// with the SSA `SsaOp::Param { index }` convention.
let mut params_by_name: HashMap<String, usize> = HashMap::new();
if let Some(names) = formal_param_names {
let mut pos: usize = 0;
for name in names {
if is_receiver_name_local(name) {
continue;
}
if pos >= formal_param_count {
break;
}
params_by_name.insert(name.clone(), pos);
pos += 1;
}
}
// Overlay `param_info` ONLY when formal_param_names was absent.
// When formal_param_names is supplied it is the authoritative
// declaration-order mapping; SSA param indices can legitimately
// diverge (a pure-output param is never emitted, shifting later
// indices down), so trusting SSA here would mis-map the caller's
// `args[i]` positional slot.
if formal_param_names.is_none() {
for (idx, name, _) in param_info {
params_by_name.insert(name.clone(), *idx);
}
}
// ── 1. Field-store alias edges (Param(a) → Param(b)) ────────────────
//
// SSA lowering encodes `obj.field = val` as one or more Assigns whose
// `var_name` is the dotted / indexed path. For every such Assign we
// look up the root name, check it matches a parameter variable, and
// trace each use back to a param for the `Param(a) → Param(b)` edge.
for block in &ssa.blocks {
for inst in block.body.iter() {
let SsaOp::Assign(uses) = &inst.op else {
continue;
};
let Some(name) = inst.var_name.as_ref() else {
continue;
};
// Only field/index-style writes encode the base in var_name;
// a plain `x = ...` doesn't imply aliasing with `x`'s param.
if !name.contains('.') && !name.contains('[') {
continue;
}
let base = base_of_path(name);
let Some(&target_idx) = params_by_name.get(base) else {
continue;
};
if target_idx >= formal_param_count {
continue;
}
for u in uses {
let mut visited = HashSet::new();
let Some(hit) = trace_to_param_hit(*u, &op_map, &var_names, &mut visited) else {
continue;
};
let src_idx = param_hit_to_formal_index(&hit, &params_by_name);
if src_idx >= formal_param_count {
continue;
}
if src_idx == target_idx {
// Self-alias is uninformative — the caller's
// arg-to-itself propagation is already covered by
// `param_to_return`/`param_to_sink`.
continue;
}
summary.insert(
AliasPosition::Param(src_idx as u32),
AliasPosition::Param(target_idx as u32),
AliasKind::MayAlias,
);
if summary.overflow {
return summary;
}
}
}
}
// ── 2. Return-alias edges (Param(i) → Return) ───────────────────────
//
// `Terminator::Return(v)` with `v` tracing back to a parameter means
// the call site's result aliases the corresponding argument's heap
// identity. Joining across all return blocks is a plain set union.
let mut return_param_indices: SmallVec<[usize; 4]> = SmallVec::new();
for block in &ssa.blocks {
let Terminator::Return(Some(v)) = block.terminator else {
continue;
};
let mut visited = HashSet::new();
if let Some(hit) = trace_to_param_hit(v, &op_map, &var_names, &mut visited) {
let idx = param_hit_to_formal_index(&hit, &params_by_name);
if idx < formal_param_count && !return_param_indices.contains(&idx) {
return_param_indices.push(idx);
}
}
}
for idx in return_param_indices {
summary.insert(
AliasPosition::Param(idx as u32),
AliasPosition::Return,
AliasKind::MayAlias,
);
if summary.overflow {
return summary;
}
}
summary
}
#[cfg(test)]
mod tests {
use super::*;
use crate::ssa::ir::{BlockId, SsaBlock, SsaInst};
use petgraph::graph::NodeIndex;
use smallvec::smallvec;
fn mk_body(blocks: Vec<SsaBlock>, num_values: u32) -> SsaBody {
use crate::ssa::ir::ValueDef;
let value_defs = (0..num_values)
.map(|_| ValueDef {
var_name: None,
cfg_node: NodeIndex::new(0),
block: BlockId(0),
})
.collect();
SsaBody {
blocks,
entry: BlockId(0),
value_defs,
cfg_node_map: HashMap::new(),
exception_edges: vec![],
}
}
fn inst(v: u32, op: SsaOp, var_name: Option<&str>) -> SsaInst {
SsaInst {
value: SsaValue(v),
op,
cfg_node: NodeIndex::new(0),
var_name: var_name.map(String::from),
span: (0, 0),
}
}
#[test]
fn field_write_param_to_param_emits_edge() {
// Simulate:
// fn f(a, b):
// b.data = a # Assign var_name="b.data" uses=[a_ssa]
// synthetic: b = b.data # Assign var_name="b" uses=[assign0]
// return
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
inst(0, SsaOp::Param { index: 0 }, Some("a")),
inst(1, SsaOp::Param { index: 1 }, Some("b")),
inst(2, SsaOp::Assign(smallvec![SsaValue(0)]), Some("b.data")),
inst(3, SsaOp::Assign(smallvec![SsaValue(2)]), Some("b")),
],
terminator: Terminator::Return(None),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 4);
let pinfo = vec![
(0usize, "a".to_string(), SsaValue(0)),
(1usize, "b".to_string(), SsaValue(1)),
];
let s = analyse_param_points_to(&body, &pinfo, 2, None, None);
assert!(!s.overflow, "unexpected overflow: {s:?}");
assert!(
s.edges.iter().any(|e| e.source == AliasPosition::Param(0)
&& e.target == AliasPosition::Param(1)
&& e.kind == AliasKind::MayAlias),
"expected Param(0) → Param(1) edge, got {s:?}"
);
}
#[test]
fn return_alias_emits_edge() {
// fn f(a): return a
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![inst(0, SsaOp::Param { index: 0 }, Some("a"))],
terminator: Terminator::Return(Some(SsaValue(0))),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 1);
let pinfo = vec![(0usize, "a".to_string(), SsaValue(0))];
let s = analyse_param_points_to(&body, &pinfo, 1, None, None);
assert!(!s.overflow);
assert_eq!(s.edges.len(), 1);
assert_eq!(s.edges[0].source, AliasPosition::Param(0));
assert_eq!(s.edges[0].target, AliasPosition::Return);
}
#[test]
fn self_alias_is_dropped() {
// fn f(b): b.data = b_other_field (reading b.x and writing b.y)
// Both uses trace back to Param(0) and base is Param(0) →
// self-alias is uninformative, no edge emitted.
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
inst(0, SsaOp::Param { index: 0 }, Some("b")),
inst(1, SsaOp::Assign(smallvec![SsaValue(0)]), Some("b.x")),
inst(2, SsaOp::Assign(smallvec![SsaValue(1)]), Some("b.data")),
],
terminator: Terminator::Return(None),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 3);
let pinfo = vec![(0usize, "b".to_string(), SsaValue(0))];
let s = analyse_param_points_to(&body, &pinfo, 1, None, None);
assert!(
s.is_empty(),
"self-alias edges should not be emitted: {s:?}"
);
}
#[test]
fn out_of_range_param_rejected() {
// Synthetic Param with index >= formal_param_count must not leak
// into the summary (it would trip ssa_summary_fits_arity).
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![
inst(0, SsaOp::Param { index: 5 }, Some("capture")),
inst(1, SsaOp::Param { index: 1 }, Some("b")),
inst(2, SsaOp::Assign(smallvec![SsaValue(0)]), Some("b.data")),
],
terminator: Terminator::Return(None),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 3);
let pinfo = vec![
(5usize, "capture".to_string(), SsaValue(0)),
(1usize, "b".to_string(), SsaValue(1)),
];
// formal_param_count = 2 — index 5 is out of range.
let s = analyse_param_points_to(&body, &pinfo, 2, None, None);
assert!(
s.is_empty(),
"synthetic captures past formal arity must not emit edges: {s:?}"
);
}
#[test]
fn bounded_graph_overflows_at_cap() {
// Build MAX_ALIAS_EDGES+2 param→return edges by returning a Phi
// of every param. This exercises the overflow fallback.
let n = (crate::summary::points_to::MAX_ALIAS_EDGES + 2) as u32;
let mut insts = Vec::new();
let mut phi_operands: SmallVec<[(BlockId, SsaValue); 2]> = SmallVec::new();
for i in 0..n {
insts.push(inst(
i,
SsaOp::Param { index: i as usize },
Some(&format!("p{i}")),
));
phi_operands.push((BlockId(0), SsaValue(i)));
}
let phi_v = n;
insts.push(inst(phi_v, SsaOp::Phi(phi_operands), Some("ret")));
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: insts,
terminator: Terminator::Return(Some(SsaValue(phi_v))),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], n + 1);
let pinfo: Vec<(usize, String, SsaValue)> = (0..n as usize)
.map(|i| (i, format!("p{i}"), SsaValue(i as u32)))
.collect();
// Only the first traced param is emitted (trace_to_param short-
// circuits on first match), so overflow is not expected — we
// instead verify the bounded behaviour: a single edge.
let s = analyse_param_points_to(&body, &pinfo, n as usize, None, None);
assert!(!s.overflow);
assert_eq!(s.edges.len(), 1);
}
#[test]
fn fresh_container_literal_return_sets_flag() {
// fn makeBag() { return []; }
// v0 = Const("[]")
// terminator: Return(v0)
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![inst(0, SsaOp::Const(Some("[]".to_string())), None)],
terminator: Terminator::Return(Some(SsaValue(0))),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 1);
let s = analyse_param_points_to(&body, &[], 0, None, Some(Lang::JavaScript));
assert!(s.returns_fresh_alloc);
assert!(s.edges.is_empty());
}
#[test]
fn constructor_return_sets_flag() {
// fn makeList() { return list(); }
// v0 = Call("list", [])
// terminator: Return(v0)
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![inst(
0,
SsaOp::Call {
callee: "list".to_string(),
args: vec![],
receiver: None,
},
None,
)],
terminator: Terminator::Return(Some(SsaValue(0))),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 1);
let s = analyse_param_points_to(&body, &[], 0, None, Some(Lang::Python));
assert!(s.returns_fresh_alloc);
}
#[test]
fn return_of_param_does_not_set_fresh_flag() {
// fn identity(a) { return a; }
let block = SsaBlock {
id: BlockId(0),
phis: vec![],
body: vec![inst(0, SsaOp::Param { index: 0 }, Some("a"))],
terminator: Terminator::Return(Some(SsaValue(0))),
preds: smallvec![],
succs: smallvec![],
};
let body = mk_body(vec![block], 1);
let pinfo = vec![(0usize, "a".to_string(), SsaValue(0))];
let s = analyse_param_points_to(&body, &pinfo, 1, None, Some(Lang::JavaScript));
assert!(
!s.returns_fresh_alloc,
"param-only return must not set fresh-alloc flag"
);
// But the Param(0) → Return edge must still be emitted.
assert!(
s.edges
.iter()
.any(|e| e.source == AliasPosition::Param(0) && e.target == AliasPosition::Return),
"expected Param(0) → Return edge, got {s:?}"
);
}
}

314
src/ssa/pointsto.rs Normal file
View file

@ -0,0 +1,314 @@
//! Container operation classification for taint propagation.
//!
//! Recognises common container store/load patterns (push, pop, get, set, etc.)
//! across all supported languages so that taint flows correctly through
//! collection operations.
use crate::symbol::Lang;
use smallvec::SmallVec;
// ── Container operation model ───────────────────────────────────────────
/// Describes how a container method moves taint.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum ContainerOp {
/// Taint flows from the listed argument positions into the receiver
/// container (e.g. `arr.push(val)` — val taint merges into arr).
///
/// `index_arg`: when `Some(pos)`, the argument at that logical position
/// is the container index/key. If constant-propagation proves it a
/// non-negative integer, the taint engine stores into `HeapSlot::Index(n)`
/// instead of `HeapSlot::Elements`. `None` → always `Elements`.
Store {
value_args: SmallVec<[usize; 2]>,
index_arg: Option<usize>,
},
/// Taint flows from the receiver container to the call's return value
/// (e.g. `arr.pop()`, `items.join('')`).
///
/// `index_arg`: same semantics as `Store::index_arg` — when present and
/// provably constant, loads from `HeapSlot::Index(n)`.
Load { index_arg: Option<usize> },
}
/// Convenience: store with a single value argument, no index tracking.
#[inline]
fn store(pos: usize) -> Option<ContainerOp> {
let mut v = SmallVec::new();
v.push(pos);
Some(ContainerOp::Store {
value_args: v,
index_arg: None,
})
}
/// Convenience: store with index tracking. `val_pos` is the value arg,
/// `idx_pos` is the index/key arg (resolved via const propagation).
#[inline]
fn store_indexed(val_pos: usize, idx_pos: usize) -> Option<ContainerOp> {
let mut v = SmallVec::new();
v.push(val_pos);
Some(ContainerOp::Store {
value_args: v,
index_arg: Some(idx_pos),
})
}
/// Convenience: store with two value arguments, no index tracking.
#[inline]
fn store2(a: usize, b: usize) -> Option<ContainerOp> {
let mut v = SmallVec::new();
v.push(a);
v.push(b);
Some(ContainerOp::Store {
value_args: v,
index_arg: None,
})
}
/// Convenience: load without index tracking.
#[inline]
fn load() -> Option<ContainerOp> {
Some(ContainerOp::Load { index_arg: None })
}
/// Convenience: load with index tracking. `idx_pos` is the index/key arg.
#[inline]
fn load_indexed(idx_pos: usize) -> Option<ContainerOp> {
Some(ContainerOp::Load {
index_arg: Some(idx_pos),
})
}
// ── Classification ──────────────────────────────────────────────────────
/// Classify a callee as a container operation for the given language.
///
/// `callee` is the raw callee string from `NodeInfo.callee` (e.g.
/// `"items.push"`, `"arr.pop"`). We extract the last segment after `.`
/// for method matching. For Go builtins (e.g. `"append"`), the full name
/// is used.
///
/// Returns `None` if the callee is not a recognised container operation.
pub fn classify_container_op(callee: &str, lang: Lang) -> Option<ContainerOp> {
// Extract method name: last segment after '.' (or full name if no dot).
let method = callee.rsplit('.').next().unwrap_or(callee);
match lang {
Lang::JavaScript | Lang::TypeScript => classify_js(method),
Lang::Python => classify_python(method),
Lang::Java => classify_java(method),
Lang::Go => classify_go(method, callee),
Lang::Ruby => classify_ruby(method),
Lang::Php => classify_php(method),
Lang::C | Lang::Cpp => classify_cpp(method),
Lang::Rust => classify_rust(method),
}
}
// ── Per-language classifiers ────────────────────────────────────────────
fn classify_js(method: &str) -> Option<ContainerOp> {
match method {
// Array store
"push" | "unshift" => store(0),
// Map/Set store: map.set(key, value) — key at 0, value at 1
"set" => store_indexed(1, 0),
"add" => store(0), // set.add(value)
// Array/Map load
"pop" | "shift" => load(),
"join" | "flat" | "concat" | "slice" | "toString" => load(),
// map.get(key) — key at 0
"get" => load_indexed(0),
"values" | "keys" | "entries" => load(),
_ => None,
}
}
fn classify_python(method: &str) -> Option<ContainerOp> {
match method {
// List store
"append" | "extend" => store(0),
"insert" => store_indexed(1, 0), // list.insert(index, value) — index at 0, value at 1
// Set store
"add" => store(0),
// Dict store
"update" => store(0),
"setdefault" => store2(0, 1), // dict.setdefault(key, default)
// List/Dict load
"pop" => load(),
"get" => load_indexed(0), // dict.get(key) / list index — key/index at 0
"items" | "values" | "keys" => load(),
"join" => load(),
_ => None,
}
}
fn classify_java(method: &str) -> Option<ContainerOp> {
match method {
// Collection store
"add" | "addAll" | "putAll" | "offer" | "push" => store(0),
// ArrayList.set(index, value) — index at 0, value at 1
"set" => store_indexed(1, 0),
// Map.put(key, value) — key at 0, value at 1
"put" => store_indexed(1, 0),
// Collection load: ArrayList.get(index) — index at 0
"get" => load_indexed(0),
"poll" | "peek" | "remove" | "pop" => load(),
"stream" | "toArray" | "iterator" => load(),
_ => None,
}
}
fn classify_go(method: &str, callee: &str) -> Option<ContainerOp> {
// Go `append` is a builtin: `result = append(slice, val1, val2, ...)`
// The callee is just "append" (no receiver dot-path).
if callee == "append" || method == "append" {
// arg 0 = existing slice, args 1+ = values to append.
// Handled specially in try_container_propagation (Go append mode).
return store(1);
}
// Map/slice operations in Go are via index expressions, not method calls,
// so there are fewer method-based patterns.
match method {
"Add" | "Set" | "Store" | "Put" => store(0),
"Get" | "Load" | "Pop" => load(),
_ => None,
}
}
fn classify_ruby(method: &str) -> Option<ContainerOp> {
match method {
"push" | "append" | "unshift" | "store" | "<<" => store(0),
"pop" | "shift" | "first" | "last" | "fetch" | "join" => load(),
_ => None,
}
}
fn classify_php(method: &str) -> Option<ContainerOp> {
match method {
"array_push" => store(1), // array_push(&$arr, $val) — arr is arg 0, val is arg 1
"array_pop" | "array_shift" | "current" | "next" | "reset" => load(),
_ => None,
}
}
fn classify_cpp(method: &str) -> Option<ContainerOp> {
match method {
"push_back" | "emplace_back" | "insert" | "emplace" | "push" => store(0),
"front" | "back" | "pop_back" | "pop_front" | "top" => load(),
// vector.at(index) — index at 0
"at" => load_indexed(0),
_ => None,
}
}
fn classify_rust(method: &str) -> Option<ContainerOp> {
match method {
"push" | "insert" | "extend" => store(0),
"pop" | "first" | "last" | "iter" | "remove" => load(),
// vec.get(index) — index at 0
"get" => load_indexed(0),
_ => None,
}
}
// ── Tests ───────────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn js_push_is_store() {
let op = classify_container_op("items.push", Lang::JavaScript);
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
#[test]
fn js_pop_is_load() {
let op = classify_container_op("arr.pop", Lang::JavaScript);
assert!(matches!(op, Some(ContainerOp::Load { .. })));
}
#[test]
fn js_join_is_load() {
let op = classify_container_op("items.join", Lang::JavaScript);
assert!(matches!(op, Some(ContainerOp::Load { .. })));
}
#[test]
fn python_append_is_store() {
let op = classify_container_op("commands.append", Lang::Python);
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
#[test]
fn java_add_is_store() {
let op = classify_container_op("list.add", Lang::Java);
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
#[test]
fn go_append_is_store() {
let op = classify_container_op("append", Lang::Go);
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
#[test]
fn unknown_method_is_none() {
assert!(classify_container_op("obj.frobnicate", Lang::JavaScript).is_none());
}
#[test]
fn rust_push_is_store() {
let op = classify_container_op("vec.push", Lang::Rust);
assert!(matches!(op, Some(ContainerOp::Store { .. })));
}
#[test]
fn store_value_args_correct() {
// JS set → value at arg 1, index at arg 0
if let Some(ContainerOp::Store {
value_args,
index_arg,
}) = classify_container_op("map.set", Lang::JavaScript)
{
assert_eq!(value_args.as_slice(), &[1]);
assert_eq!(index_arg, Some(0));
} else {
panic!("expected Store");
}
// JS push → value at arg 0, no index
if let Some(ContainerOp::Store {
value_args,
index_arg,
}) = classify_container_op("arr.push", Lang::JavaScript)
{
assert_eq!(value_args.as_slice(), &[0]);
assert_eq!(index_arg, None);
} else {
panic!("expected Store");
}
}
#[test]
fn load_index_arg_correct() {
// JS get → index at arg 0
if let Some(ContainerOp::Load { index_arg }) =
classify_container_op("map.get", Lang::JavaScript)
{
assert_eq!(index_arg, Some(0));
} else {
panic!("expected Load");
}
// JS pop → no index
if let Some(ContainerOp::Load { index_arg }) =
classify_container_op("arr.pop", Lang::JavaScript)
{
assert_eq!(index_arg, None);
} else {
panic!("expected Load");
}
}
}

446
src/ssa/static_map.rs Normal file
View file

@ -0,0 +1,446 @@
#![allow(clippy::collapsible_if, clippy::redundant_closure)]
//! Static hash-map lookup abstract analysis.
//!
//! Recognises the idiom
//! ```ignore
//! let mut table = HashMap::new();
//! table.insert(K1, V1);
//! table.insert(K2, V2);
//! let cmd = table.get(k).copied().unwrap_or("safe");
//! ```
//! where every insert's *value* slot is a syntactic string literal and the
//! final lookup is dereffed via a literal fallback (`.unwrap_or(LIT)`). The
//! result `cmd` is then provably bounded to the finite set
//! `{V1, V2, …, "safe"}`, regardless of what `k` carries — taint-flavour or
//! otherwise. Downstream sink suppression consumes this finite set to
//! clear SHELL/FILE/SQL injection findings whose payload is proved to be
//! metacharacter-free.
//!
//! ## SSA shape assumption
//!
//! The taint CFG collapses each method chain into **one** SSA `Call`
//! instruction whose `callee` text is the entire chain's "function" expression
//! (e.g. `"table.get(key).copied().unwrap_or"` for `table.get(key).copied()
//! .unwrap_or("safe")`) and whose `receiver` is the root identifier's SSA
//! value. We therefore do not need to walk SSA `.copied()` / `.unwrap_or`
//! instructions as separate hops — pattern-matching on the callee text is
//! the source of truth. String-literal arguments that the callee text
//! elides (e.g. the fallback `"safe"`) are read from the CFG node's
//! `arg_string_literals`, populated during CFG construction.
//!
//! Scope is deliberately narrow: only same-function static maps, only
//! literal-valued inserts, no escape beyond recognised mutate/read methods.
//! Any deviation (dynamic insert, callee not in the allow-list, map used as
//! a plain argument, map returned, map joined across a phi) invalidates the
//! candidate. Missed detection is safe — it just falls through to existing
//! behaviour.
use std::collections::{HashMap, HashSet};
use super::const_prop::ConstLattice;
use super::ir::*;
use crate::cfg::Cfg;
use crate::symbol::Lang;
/// Output of the static-map analysis: SSA values whose concrete string value
/// is provably in a finite set, plus the set itself (sorted + deduped).
#[derive(Clone, Debug, Default)]
pub struct StaticMapResult {
pub finite_string_values: HashMap<SsaValue, Vec<String>>,
}
impl StaticMapResult {
pub fn empty() -> Self {
Self::default()
}
pub fn is_empty(&self) -> bool {
self.finite_string_values.is_empty()
}
}
/// Rust-specific constructors that produce an empty map value.
fn is_rust_map_constructor(callee: &str) -> bool {
let leaf_after_colon = callee.rsplit("::").next().unwrap_or(callee);
if leaf_after_colon != "new" {
return false;
}
let type_part = callee.rsplit("::").nth(1).unwrap_or("");
matches!(type_part, "HashMap" | "BTreeMap")
}
/// Classification of a Call whose receiver is a candidate map.
#[derive(Clone, Debug, PartialEq, Eq)]
enum MapUse {
/// `{var}.insert(K, V)` — value contributes to the finite domain.
Insert,
/// `{var}.get(K)[.copied()|.cloned()|.as_deref()|.as_ref()]*.unwrap_or`
/// — lookup result is bounded by the inserted values plus the fallback
/// literal on the CFG node.
StaticLookup,
/// Whitelisted read-only method (no reference leak).
ReadOnly,
/// Anything else — invalidates the map candidate.
Escape,
}
/// Classify the callee of a Call whose `receiver` SSA value points to a
/// candidate map bound to `map_var`. Returns [`MapUse::Escape`] when the
/// callee doesn't match any recognised pattern so the caller invalidates
/// the map rather than trusting an unknown mutation.
fn classify_map_use(callee: &str, map_var: &str) -> MapUse {
// Fast-path: exact single-method calls on the receiver.
let method = callee
.strip_prefix(map_var)
.and_then(|rest| rest.strip_prefix('.'));
if let Some(method) = method {
// Single identifier method with no trailing chain.
match method {
"insert" => return MapUse::Insert,
"contains_key" | "len" | "is_empty" | "clear" => return MapUse::ReadOnly,
_ => {}
}
// Chained lookup: must start with `get(…)` and end with `.unwrap_or`.
if let Some(rest) = method.strip_prefix("get(") {
if let Some(after_args) = scan_past_balanced_parens(rest) {
if is_identity_chain_ending_in_unwrap_or(after_args) {
return MapUse::StaticLookup;
}
}
}
}
MapUse::Escape
}
/// Given `s` just after an opening `(`, return the slice after the matching
/// close `)`. Returns `None` when parens are unbalanced.
fn scan_past_balanced_parens(s: &str) -> Option<&str> {
let bytes = s.as_bytes();
let mut depth: i32 = 1;
let mut i = 0;
while i < bytes.len() {
match bytes[i] {
b'(' => depth += 1,
b')' => {
depth -= 1;
if depth == 0 {
return Some(&s[i + 1..]);
}
}
_ => {}
}
i += 1;
}
None
}
/// Return `true` when `s` is a sequence of zero or more identity chain
/// methods (`.copied()`, `.cloned()`, `.as_deref()`, `.as_ref()`) followed
/// by `.unwrap_or` (and nothing else). The trailing arg list of
/// `.unwrap_or` is elided in the callee text — it appears in the CFG node's
/// `arg_string_literals` instead.
fn is_identity_chain_ending_in_unwrap_or(mut s: &str) -> bool {
const IDENTS: &[&str] = &[".copied()", ".cloned()", ".as_deref()", ".as_ref()"];
loop {
if s == ".unwrap_or" {
return true;
}
let mut advanced = false;
for id in IDENTS {
if let Some(rest) = s.strip_prefix(id) {
s = rest;
advanced = true;
break;
}
}
if !advanced {
return false;
}
}
}
fn resolve_alias(v: SsaValue, aliases: &HashMap<SsaValue, SsaValue>) -> SsaValue {
let mut cur = v;
for _ in 0..64 {
match aliases.get(&cur) {
Some(&next) if next != cur => cur = next,
_ => break,
}
}
cur
}
/// Run the analysis. Bails out immediately for non-Rust bodies — the current
/// pattern set only models Rust `std::collections::HashMap`.
pub fn analyze(
body: &SsaBody,
cfg: &Cfg,
lang: Option<Lang>,
_const_values: &HashMap<SsaValue, ConstLattice>,
) -> StaticMapResult {
if lang != Some(Lang::Rust) {
return StaticMapResult::empty();
}
// ── 1. Discover candidate map allocations + their bound var name ──────
// The var_name is the identifier the CFG builder attaches to the define
// site of the let-binding. Without a var_name we can't pattern-match
// receiver uses in callee text, so such allocations are skipped.
let mut candidates: HashMap<SsaValue, String> = HashMap::new();
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
if let SsaOp::Call { callee, .. } = &inst.op {
if is_rust_map_constructor(callee) {
if let Some(name) = inst.var_name.as_deref() {
if !name.is_empty() {
candidates.insert(inst.value, name.to_string());
}
}
}
}
}
}
if candidates.is_empty() {
return StaticMapResult::empty();
}
// ── 2. Build trivial alias chain: single-use Assign `v = w` where w is
// a known (or aliased) candidate value. Keeps us robust to wrapper
// copies SSA lowering occasionally introduces.
let mut aliases: HashMap<SsaValue, SsaValue> = HashMap::new();
for block in &body.blocks {
for inst in &block.body {
if let SsaOp::Assign(uses) = &inst.op {
if uses.len() == 1 {
let src = resolve_alias(uses[0], &aliases);
if candidates.contains_key(&src) {
aliases.insert(inst.value, src);
}
}
}
}
}
let canonicalise = |v: SsaValue| -> Option<SsaValue> {
let c = resolve_alias(v, &aliases);
if candidates.contains_key(&c) {
Some(c)
} else {
None
}
};
// ── 3. Walk every instruction, classifying references to any candidate.
// Collect per-candidate inserted literal values and mark invalidating
// escapes (phi operand, non-whitelisted method, plain argument use,
// non-copy Assign, Return).
let mut inserted: HashMap<SsaValue, HashSet<String>> = HashMap::new();
let mut invalid: HashSet<SsaValue> = HashSet::new();
// Each lookup site: (map, result SSA value, fallback literal).
let mut lookups: Vec<(SsaValue, SsaValue, String)> = Vec::new();
for c in candidates.keys() {
inserted.insert(*c, HashSet::new());
}
for block in &body.blocks {
for inst in block.phis.iter().chain(block.body.iter()) {
match &inst.op {
SsaOp::Phi(operands) => {
for (_, v) in operands {
if let Some(canon) = canonicalise(*v) {
invalid.insert(canon);
}
}
}
SsaOp::Call {
callee,
args,
receiver,
} => {
if candidates.contains_key(&inst.value) && is_rust_map_constructor(callee) {
continue;
}
if let Some(map) = receiver.and_then(|r| canonicalise(r)) {
let map_var = candidates.get(&map).cloned().unwrap_or_default();
match classify_map_use(callee, &map_var) {
MapUse::Insert => {
let node_info = &cfg[inst.cfg_node];
let value_lit =
node_info.call.arg_string_literals.get(1).cloned().flatten();
match value_lit {
Some(lit) => {
inserted.entry(map).or_default().insert(lit);
}
None => {
invalid.insert(map);
}
}
}
MapUse::StaticLookup => {
let node_info = &cfg[inst.cfg_node];
if let Some(Some(fallback)) =
node_info.call.arg_string_literals.first().cloned()
{
lookups.push((map, inst.value, fallback));
}
// A non-literal fallback silently falls
// through: the map stays valid, we just
// don't emit a finite domain for this site.
}
MapUse::ReadOnly => {}
MapUse::Escape => {
invalid.insert(map);
}
}
}
for group in args {
for &v in group {
if let Some(canon) = canonicalise(v) {
invalid.insert(canon);
}
}
}
}
SsaOp::Assign(uses) if uses.len() != 1 => {
for &u in uses {
if let Some(canon) = canonicalise(u) {
invalid.insert(canon);
}
}
}
_ => {}
}
}
if let Terminator::Return(Some(v)) = &block.terminator {
if let Some(canon) = canonicalise(*v) {
invalid.insert(canon);
}
}
}
// ── 4. Emit results for still-valid candidates with at least one insert.
let mut result = StaticMapResult::default();
for (map, lookup_val, fallback) in lookups {
if invalid.contains(&map) {
continue;
}
let lits = match inserted.get(&map) {
Some(s) if !s.is_empty() => s,
_ => continue,
};
let mut domain: Vec<String> = lits.iter().cloned().collect();
domain.push(fallback);
domain.sort();
domain.dedup();
result.finite_string_values.insert(lookup_val, domain);
}
result
}
// ── Tests ───────────────────────────────────────────────────────────────
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn rust_map_constructor_matches() {
assert!(is_rust_map_constructor("HashMap::new"));
assert!(is_rust_map_constructor("std::collections::HashMap::new"));
assert!(is_rust_map_constructor("BTreeMap::new"));
assert!(!is_rust_map_constructor("HashMap::from"));
assert!(!is_rust_map_constructor("HashMap::with_capacity"));
assert!(!is_rust_map_constructor("Vec::new"));
}
#[test]
fn classify_insert_call() {
assert_eq!(classify_map_use("table.insert", "table"), MapUse::Insert);
}
#[test]
fn classify_read_only_call() {
assert_eq!(
classify_map_use("table.contains_key", "table"),
MapUse::ReadOnly
);
assert_eq!(classify_map_use("table.len", "table"), MapUse::ReadOnly);
// Iterator-returning methods (values/iter/keys) escape: they leak
// references that can flow anywhere.
assert_eq!(classify_map_use("table.values", "table"), MapUse::Escape);
assert_eq!(classify_map_use("table.iter", "table"), MapUse::Escape);
}
#[test]
fn classify_static_lookup_with_copied() {
assert_eq!(
classify_map_use("table.get(key.as_str()).copied().unwrap_or", "table"),
MapUse::StaticLookup
);
}
#[test]
fn classify_static_lookup_without_identity_chain() {
// `.unwrap_or` directly after `.get(...)` also qualifies — Rust
// `HashMap::get` returns `Option<&V>`, so `.unwrap_or(&"safe")` is
// syntactically valid and equally bounded.
assert_eq!(
classify_map_use("table.get(k).unwrap_or", "table"),
MapUse::StaticLookup
);
}
#[test]
fn classify_static_lookup_mixed_identity_chain() {
assert_eq!(
classify_map_use("t.get(k).as_deref().cloned().unwrap_or", "t"),
MapUse::StaticLookup
);
}
#[test]
fn classify_rejects_unknown_terminator() {
// `.unwrap_or_else(|| …)` is not modelled — closure can return anything.
assert_eq!(
classify_map_use("t.get(k).copied().unwrap_or_else", "t"),
MapUse::Escape
);
// A bare `.unwrap()` after `.get(k)` panics rather than bounding,
// so we refuse to treat it as safe. The caller would need a proven
// `.contains_key` guard; that is out of scope here.
assert_eq!(classify_map_use("t.get(k).unwrap", "t"), MapUse::Escape);
}
#[test]
fn classify_rejects_other_receiver() {
// `other.insert` does not belong to `table` — receiver mismatch.
assert_eq!(classify_map_use("other.insert", "table"), MapUse::Escape);
}
#[test]
fn scan_past_balanced_parens_basic() {
assert_eq!(scan_past_balanced_parens("foo)").unwrap_or(""), "");
assert_eq!(scan_past_balanced_parens("foo).bar").unwrap_or(""), ".bar");
assert_eq!(
scan_past_balanced_parens("foo(bar)baz).x").unwrap_or(""),
".x"
);
assert!(scan_past_balanced_parens("no-close").is_none());
}
#[test]
fn non_rust_lang_returns_empty() {
use petgraph::Graph;
let body = SsaBody {
blocks: vec![],
entry: BlockId(0),
value_defs: vec![],
cfg_node_map: std::collections::HashMap::new(),
exception_edges: vec![],
};
let cfg: Cfg = Graph::new();
let const_values = HashMap::new();
let result = analyze(&body, &cfg, Some(Lang::Java), &const_values);
assert!(result.is_empty());
}
}

1487
src/ssa/type_facts.rs Normal file

File diff suppressed because it is too large Load diff