mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-15 20:05:13 +02:00
* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures * feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests * feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements * feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles * feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing * feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling * feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures * feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration * feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests * feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic * feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection * feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements * feat: Enable auth-state analysis by default and update relevant tests in benchmark config * test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test * docs: update CHANGELOG.md * feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers * feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers * feat: Implement per-index array slot tracking in symbolic heap with overflow collapse * feat: Add implicit definition handling for uninitialized declarations in SSA value allocation * feat: Refactor function parameters and constants for improved clarity and maintainability * refactor: Reorder module imports and improve formatting for consistency * refactor: Fix formatting erorrs * refactor: Fix clippy warnings * refactor: Fix fmt warnings (again) * chore: Update dependencies and improve feature configuration * Add comprehensive tests for undertested modules (#36) (COPILOT) * Add comprehensive tests for undertested modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 * Add comprehensive tests for ext, project, walk, and errors modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: Update dependencies and improve feature configuration * fix: formatting errors in new tests * chore: Update license list in about.toml * chore: made functions input inline * chore: updated cfg graph to take up the full page * chore: add Prettier configuration and update code formatting * Add frontend test suite with Vitest (111 tests) (#37) * Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7 * ci: add frontend test step to CI workflow Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: simplify array initialization in test files for consistency * ran typecheck * feat: add AnalysisWorkspace component and integrate it into CfgViewerPage * feat: update routing in AppLayout and improve empty state message in ExplorerPage * feat: enhance scan progress tracking with additional metrics and stages * feat: update license information and add license check script * feat: implement cross-file symbolic execution with callee body persistence * feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering * feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions * feat: enhance resource tracking with proxy method summaries and improve finding extraction * feat: add terminal function exit detection for accurate resource leak analysis * feat: add warnings for loops and functions without bodies to improve error recovery * feat: update lambda expression handling to ensure proper function classification and control flow * feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling * feat: add inline return taint analysis and regression tests for improved security checks * feat: add engine version management and migration handling for database schema updates * feat: enhance first_call_ident to skip nested function bodies and add regression tests * feat: enhance callee name resolution with two-segment normalization and disambiguation * feat: add cross-file context flags and debug assertions for taint analysis * feat: refactor taint analysis structure to unify context handling and improve clarity * feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests * docs: updated CHANGELOG.md * fmt: formatting fixes * fix: fixed frontend formatting and lint warnings * fix: optimized ci * fix: optimized ci * Add comprehensive multi-file test coverage to Nyx (#38) * Initial checklist for multi-file test suite expansion Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * Add 12 new multi-file test fixtures with TP/TN/near-miss coverage Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * deleted root repo * rebuilt to test for regressions --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * feat: enhance import alias resolution and taint tracking * feat: implement security hardening with CSRF protection and path validation * feat: add support for import alias bindings in Python, PHP, and Rust * feat: enhance CFG analysis modes and improve code readability * feat: add detection for parameterized SQL queries to enhance security * feat: add safe internal redirect handling and enhance session destroy validation * feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads * feat: enhance taint detection by adding support for inline source member expressions in call arguments * feat: implement pre-emission of Source nodes for inline source member expressions in call arguments * feat: add support for Throw statement in control flow and error handling * feat: add debug and echo endpoints with potential information leakage * feat: implement internal redirect suppression and enhance taint detection * feat: implement module alias tracking for dynamic dispatch in JS/TS * feat: add authorization analysis module with Express support * feat: add authorization analysis module with Express support * feat: add tests for admin guard requirements and clean checks in authorization analysis * feat: integrate Koa and Fastify frameworks into authorization analysis * feat: add Flask and Django support to authorization analysis module * feat: add support for Rails and Sinatra frameworks in authorization analysis * feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis * feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis * feat: add support for Rails and Sinatra in authorization analysis * chore: add .DS_Store to .gitignore * refactor: simplify conditional checks and improve readability in multiple files * refactor: update usage of Option methods for improved clarity and consistency * refactor: improve code readability by simplifying conditional checks and formatting * refactor: improve code formatting and readability by simplifying conditional checks * refactor: simplify conditional checks and improve readability in multiple files * refactor: simplify conditional checks in axum.rs for improved readability * feat: add CodeQL analysis configuration for enhanced security scanning * test: add comprehensive tests for `src/output.rs` SARIF builder (#39) * chore: start test coverage improvement work Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * test: add comprehensive tests for src/output.rs SARIF builder Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * refactor: improve code formatting and readability in output.rs --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * refactor: improve code formatting and readability in output.rs * Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * refactor: enhance triage file path handling with improved error management and validation * refactor: updated func summaries for richer detail * refactor: update SSA summary extraction to use canonical FuncKey for distinct entries * refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution * refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls * refactor: implement new Flask routes for safe and unsafe shell command execution * refactor: separate receiver handling in SSA operations and enhance taint propagation * refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments * refactor: implement auth decorator extraction and classification for multiple languages * refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation * refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic * refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior * refactor: standardize default struct initialization across multiple files * feat: add scripts for formatting checks and auto-fixes with test summaries * refactor: simplify character splitting and enhance namespace qualifier handling * refactor: improve documentation clarity and enhance code readability in resolver logic * refactor: replace default struct initialization with explicit field assignments for clarity * feat: enhance anonymous function naming by deriving context-based bindings * refactor: streamline match expressions for improved readability and performance * refactor: streamline match expressions for improved readability and performance * refactor: replace loop with while let for improved clarity and performance * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: implement shell metacharacter validation and bounded-length checks in Rust analysis * feat: add static map analysis for command injection suppression and type safety * refactor: simplify match statements and reduce line breaks for improved readability * feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the primary sink source-location through function summaries. Swap SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a backward-compatible cap_sites() helper and serde defaults so pre-phase-1 on-disk rows continue to deserialise cleanly. Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the locator in for the persisted pass-1 path, while pass-2 intra-file transient summaries fall back to cap-only sites (behavior unchanged). Merge: GlobalSummaries::insert now unions sink sites with (file_rel, line, col, cap) dedup via shared union_param_sink_sites helper. Database: JSON-serialised summary columns carry the new shape automatically; no schema change needed. Phase 2 will consume SinkSite in build_taint_diag() to overwrite the caller-site Finding.line with the callee's sink line when resolved via summary. Phase 1 keeps behavior unchanged: scanning tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the same (wrong) line 10 finding. Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink sites, legacy-JSON default handling for both summary types, and merge dedup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding Plumb Phase 1's SinkSite through the event pipeline into Findings, no output change yet. SsaTaintEvent gains `primary_sink_site: Option<SinkSite>`; when the main or callback sink-emission path has non-empty `param_to_sink_sites`, filter to sites whose `(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per distinct site — the multi-primary collapse keeps each downstream Finding single-primary. Resolution: ResolvedSummary and SinkInfo gain mirror `param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink` (SSA + callback paths) and `FuncSummary.param_to_sink` (global paths). Label, local-summary, and interop resolution paths leave the field empty — they only ever had cap-level info to begin with. Finding: new `primary_location: Option<SinkLocation>` with `file_rel/line/col`. `ssa_events_to_findings` maps `event.primary_sink_site` → `Finding.primary_location`, filtering cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never leaks to formatters. Dedup key extended with the primary location so multi-site events aren't collapsed back together. Invariants (debug_assert!): * every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps != ∅` — enforced by the pick_primary_sink_sites* filters; * every populated Finding.primary_location has `line != 0` AND non-empty `file_rel` — the cap-only → None translation upstream guarantees this. Deliberately independent of `uses_summary`: that flag tracks whether the *taint chain* used a summary, whereas primary attribution requires only that the *sink* itself was summary-resolved. A local source reaching a cross-file sink produces `uses_summary=false` alongside a populated primary_location — documented on Finding.primary_location, covered by `cross_file_sink_finding_carries_primary_location`. build_taint_diag, SARIF/JSON/explanation formatters, and the benchmark scorer remain untouched: finding.line still comes from `cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10 and the benchmark's rs-cmdi-003 row still shows FN in the LOC column. Tests: `cross_file_sink_finding_carries_primary_location` (proves plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and `cross_file_sink_cap_only_site_leaves_primary_location_none` (regression guard against cap-only sites surfacing). All 1566 lib tests + integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(output): phase 3/5 consume primary sink location in diag + SARIF When a finding's primary_location (populated in phase 2 from a callee summary's SinkSite) names the dangerous instruction inside a callee body, attribute the diagnostic line to that location instead of the caller's call site. The call site is demoted to a Call step in flow_steps, and a synthetic Sink step at the primary location is appended so analysts still see the full trace. Changes: - Add scan_root parameter to build_taint_diag so file_rel can be resolved back to an absolute path via a shared resolve_file_rel helper. Empty file_rel (single-file scans where namespace == "") resolves to the file under analysis. - Extend SinkLocation with snippet, carried from the upstream SinkSite so the formatter needs no second file read. - Relax the ssa_events_to_findings debug_assert to allow empty file_rel, which is valid when scan root equals the file itself. - SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[]; locations[0] already reflects the primary sink position via the updated diag line/col. Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs now reports line 5 (Command::new) as the primary sink, with the call site at line 10 visible in flow_steps. Two expect.json fixtures updated (must_match line_range widened): - javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is the real sink inside run()). - rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new inside the closure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bench): phase 4/5 validate primary sink attribution across corpus Extend the benchmark scorer and ground truth to lock in phase 3's primary-location behavior, and add fixtures that exercise the new capability end-to-end. Scorer (tests/benchmark_test.rs): - Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on Case. When present, score_location_level additionally requires at least one flow_step in the finding's evidence trace to fall within ±2 of the call-site range. When absent, the check is skipped — fully forward-compatible with existing fixtures. - Retain ±2 tolerance on expected_sink_lines (compared against the now-primary Diag.line post-phase-3). Ground truth edits: - rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the transform::wrap call site (a cross-file propagator, not a sink); line 9 is Command::new, the real sink. The ±2 tolerance happened to mask this stale attribution but it was semantically wrong — phase 4 is the right time to correct it. Also adds expected_call_site_lines [8,8] so the new field is exercised on an existing cross-file case. - rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call). This fixture's sink (Command::new inside run_cmd at line 5) was the motivating case for phases 1-3; adding the call-site assertion guards against regression to caller-line attribution. New fixtures: - rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both takes two tainted params and invokes two Command sinks on consecutive lines. Locks in that primary line lands inside the helper (lines 5-6), not at the caller (line 12). Notes document that SinkSite is currently one-per-callee so both findings today collapse onto the first sink; expected_sink_lines=[5,6] and expected_call_site_lines=[12,12] stay valid either way. - python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross- 004): sink os.system lives in helper.py (cross-file), caller in app.py reads env source and calls run_cmd. Verifies phase 3's cross-file primary attribution: Diag.path = helper.py, Diag.line = 5, with app.py:7 recorded in flow_steps as a Call step. Acceptance: - `cargo test --test benchmark_test -- --ignored --nocapture` passes. - rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are TP/TP/TP. - Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994 F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on 264 pre-phase-4, delta is the +2 new cases both resolving TP). - Full `cargo test` green (1566 lib tests + all integration tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(taint): phase 5/5 lock Finding.primary_location contract via regression test Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three emission stages (pick_primary_sink_sites → emit_ssa_taint_events → ssa_events_to_findings) against a minimal caller SSA body. Asserts the resulting Finding.primary_location is exactly that triple. The existing integration tests in src/taint/tests.rs cover the coarse FuncSummary path end-to-end through analyse_file. This test locks in the lower-level SSA-side plumbing so a future refactor that silently drops the site between pick → emit → findings fails here rather than only at the benchmark layer. Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003 remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4). Closes the primary sink-location attribution feature (phases 1-5/5): * Phase 1 — SinkSite data model on summaries. * Phase 2 — SinkSite threaded into SsaTaintEvent and Finding. * Phase 3 — diag + SARIF consume primary_location. * Phase 4 — benchmark validates primary_call_site_lines across corpus. * Phase 5 — regression test locks the event→finding contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: clean up formatting and improve readability in multiple files * refactor: simplify type definition for deduplication key in findings * test(harness): add must_not_match expectation for FP regression guards Extends ExpectedFinding with must_not_match field that asserts a diagnostic must NOT fire — presence is a hard failure. Non-consuming scan so it coexists with must_match entries on the same rule_id. Adds forbidden_violations accumulator and updates summary line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules * feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking * feat: update switch statement handling to improve control flow analysis * feat: implement promisify alias handling for JS/TS to enhance taint tracking * feat: enhance taint tracking by refining expectation handling and adding mode filtering * feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters * feat: update taint tracking rules to enforce full mode matching and improve flow analysis * feat: enhance Ruby subshell handling to improve taint tracking and flow analysis * feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding * feat: refine framework detection and update expectation handling for Echo and Sinatra * feat: implement max_count for taint tracking expectations and deduplicate findings * feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files * feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity * feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files * feat: add structural invariant checks for SSA bodies * feat: ensure deterministic phi emission order using BTreeSet * feat: enhance handling of terminators to ensure authoritative flow through successor edges * feat: enhance Goto terminator handling to ensure all successors are marked executable * feat: refactor code for improved readability and organization * feat: simplify predicate checks and enhance readability in SSA handling * feat: implement per-file parse timeout and enhance file size handling * feat: migrate analysis engine toggles from environment variables to configuration file * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: update dependencies and enhance documentation on language maturity * feat: enhance security headers and improve request body limits * feat: implement sink capability bits for deduplication and enhance evidence tagging * feat: implement dynamic activation handling for gated sinks and enhance validation logic * feat: enhance configuration documentation and clarify inline analysis cache behavior * feat: implement panic recovery during analysis to continue scans past errors * feat: add expectations configuration for taint analysis and performance metrics * feat: enhance error handling and logging during file reading and mutex locking * feat: add cross-file body loading tests and plumbing for CF-1 phase * feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures * feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality * feat: enhance classification span handling in CFG and AST for improved source attribution * feat: add new Express routes for handling user input and telemetry data * feat: implement ternary expression handling in CFG with diamond structure for JS/TS * feat: implement Phase CF-3 abstract-domain transfer channels in summaries * feat: add support for string-prefix transfer in cross-file calls and update tests * docs: reduce RESULTS.md doc size * feat: implement Phase CF-4 per-return-path summary decomposition with tests * feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization * feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests * feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests * refactor: update comments and documentation for clarity and consistency * style: format code for consistency and readability * refactor: simplify verdict handling and improve edge checking logic * refactor: optimize path and identifier collection by avoiding unnecessary cloning * chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults * refactor: update documentation and improve clarity in configuration files * refactor: update documentation and improve clarity in configuration files * feat: add JS/TS pass-2 convergence tests and expectations configuration * feat: add Phase 5 regression tests for inline cache origin attribution and update related logic * feat: implement Phase 7 deduplication and alternative path linking for taint findings * feat: implement structural DFS index for anonymous functions and update naming conventions * feat: add Phase 8 regression tests for container-element taint in JS and Python * feat: add engine-depth profiles and explain-engine option for CLI * feat: update expectations and add new README fixtures for multi-file scan regression * feat: implement Phase 11 callback-alias and factory patterns with regression tests * feat: implement Terminator::Switch for multi-way dispatch and add regression tests * feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants * refactor: extract cfg and ssa_transfer to submodules * refactor: cargo fmt * refactor: remove unnecessary blank line in cfg_tests.rs * refactor: remove unnecessary planning file * chore: update Rust version to 1.88 and bump dependencies in Cargo files * feat: enhance triage UI with new layout and controls, update README for clarity * feat: enhance triage UI with new layout and controls, update README for clarity * chore: remove outdated section from README for version 0.5.0 * docs: improve clarity and consistency in README content * chore: add "GPL-3.0-or-later" to license options in about.toml * chore: update license handling in about.toml and check-licenses.mjs * style: format code for improved readability in TriagePage component * style: format code for improved readability in TriagePage component * chore: enhance license handling and improve body_id scoping in seed lookup * feat: introduce owner and parent body IDs for enhanced seed scoping * feat: implement direction-aware engine provenance with new CLI flag for strict CI gating * feat: add Undef SSA operation for improved control-flow handling * style: improve code formatting for consistency and readability in multiple files * feat: add 16-function chain SCC across multiple files for enhanced analysis * style: simplify code formatting for improved readability in multiple files * fix: update CapHitReason default implementation and improve README clarity * docs: enhance README with detailed explanations of taint analysis and limitations * docs: refine README for clarity and consistency in taint analysis section * style: improve code formatting for better readability in NewScanModal and scans * fix: update cargo-about command to use --offline for deterministic license generation * fix: update cargo-about command to use --offline for deterministic license generation * ci: add step to prime cargo registry cache for deterministic license generation * feat: add support for non-sink collections in authorization analysis * feat: enhance authorization checks with row-level ownership equality and binding tracking * feat: implement self-scoped user handling and enhance ownership checks * refactor: simplify assertions and formatting in authorization analysis tests * fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure * docs: update AI disclosure section for clarity and conciseness * feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure * feat: enhance authorization analysis with SSA-derived variable type classification * feat: implement auth_finding_to_diag function for enhanced security diagnostics * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add direction-aware engine provenance with LossDirection classification and new CLI flag * feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks * feat: enhance error message handling in cli_validation_tests for better Windows compatibility * feat: optimize release profile settings in Cargo.toml and update CodeQL configuration * feat: enhance release build process with SBOM generation and SLSA provenance * feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries * feat: introduce PathFact handling for path safety checks and rejection logic * feat: introduce PathFact handling for path safety checks and rejection logic * feat: update benchmark data and enhance path sanitization logic with new safety checks * feat: document AI assistance in frontend UI development and human review process * feat: add return path facts for enhanced path safety checks and update documentation * chore: update release date for version 0.5.0 in CHANGELOG.md * chore: clean up ci.yml by removing outdated comments and clarifying steps * feat: implement cross-language path sanitizers and validators for enhanced security * feat: enhance SSA value usage tracking by including block terminators and improve path safety checks * feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases * refactor: simplify conditional formatting and improve code readability in executor and lower modules * feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: add transform classifiers for Java, Go, and Ruby with corresponding tests * refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1555 lines
64 KiB
Rust
1555 lines
64 KiB
Rust
pub mod points_to;
|
||
pub mod ssa_summary;
|
||
|
||
use crate::labels::Cap;
|
||
use crate::summary::ssa_summary::SsaFuncSummary;
|
||
use crate::symbol::{FuncKey, FuncKind, Lang, normalize_namespace};
|
||
use serde::{Deserialize, Deserializer, Serialize};
|
||
use smallvec::SmallVec;
|
||
use std::collections::{BTreeMap, HashMap};
|
||
use std::hash::{Hash, Hasher};
|
||
|
||
// ── Sink site (primary sink-location attribution) ───────────────────────
|
||
|
||
/// A single dangerous-instruction site recorded inside a function's body.
|
||
///
|
||
/// `SinkSite` pairs a [`Cap`] (the bits this particular site consumes) with
|
||
/// the file-relative source location of the instruction that consumes them.
|
||
/// Carrying this alongside a summary's `param_to_sink` map lets cross-file
|
||
/// findings attribute the finding line to the actual dangerous call inside
|
||
/// the callee, rather than to the caller's call-site (which is all a
|
||
/// bare `(param_idx, Cap)` pair could support).
|
||
///
|
||
/// Primary sink-location attribution stores this data in the summary so
|
||
/// `build_taint_diag()` can consume it and overwrite the caller-site
|
||
/// `Finding.line` when the sink was resolved via summary.
|
||
///
|
||
/// Fields
|
||
/// ──────
|
||
/// * `file_rel` — the callee file's path relative to the workspace root
|
||
/// being scanned. Matches the `FuncKey::namespace` convention so the
|
||
/// site's origin is addressable without additional workspace context.
|
||
/// * `line` / `col` — 1-based source coordinates of the sink instruction.
|
||
/// `0` indicates the extractor could not resolve coordinates (e.g. a
|
||
/// pass-2 transient summary without tree access).
|
||
/// * `snippet` — the trimmed source line, capped at 120 characters, empty
|
||
/// when coordinates could not be resolved.
|
||
/// * `cap` — the [`Cap`] bits this specific site consumes. A parameter's
|
||
/// total sink caps is the union across every site associated with it.
|
||
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq)]
|
||
pub struct SinkSite {
|
||
#[serde(default, skip_serializing_if = "String::is_empty")]
|
||
pub file_rel: String,
|
||
#[serde(default, skip_serializing_if = "is_zero_u32")]
|
||
pub line: u32,
|
||
#[serde(default, skip_serializing_if = "is_zero_u32")]
|
||
pub col: u32,
|
||
#[serde(default, skip_serializing_if = "String::is_empty")]
|
||
pub snippet: String,
|
||
pub cap: Cap,
|
||
}
|
||
|
||
impl SinkSite {
|
||
/// Dedup key comparing the full identity of a site. Two sites with the
|
||
/// same `(file_rel, line, col, cap)` describe the same consumption of
|
||
/// the same bits at the same source location and should collapse when
|
||
/// summaries are merged.
|
||
pub(crate) fn dedup_key(&self) -> (&str, u32, u32, u16) {
|
||
(self.file_rel.as_str(), self.line, self.col, self.cap.bits())
|
||
}
|
||
|
||
/// Build a site that only carries a [`Cap`] — no resolved source
|
||
/// coordinates. Used by extraction paths that have no tree/bytes
|
||
/// context (e.g. pass-2 transient summaries), so downstream consumers
|
||
/// unioning caps across sites still see the correct bits even when
|
||
/// primary-location attribution is not available.
|
||
pub fn cap_only(cap: Cap) -> Self {
|
||
Self {
|
||
file_rel: String::new(),
|
||
line: 0,
|
||
col: 0,
|
||
snippet: String::new(),
|
||
cap,
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Tree/bytes context for resolving a CFG span to a [`SinkSite`].
|
||
///
|
||
/// Summary extraction runs deep inside the taint engine, far from the
|
||
/// `ParsedFile` that owns the tree; `SinkSiteLocator` is the narrow
|
||
/// reference bundle the extractor needs to populate `SinkSite.line`,
|
||
/// `col`, and `snippet`. The struct is intentionally plain references
|
||
/// so construction is free and threading it as `Option<&Locator>` is
|
||
/// cheap.
|
||
pub struct SinkSiteLocator<'a> {
|
||
pub tree: &'a tree_sitter::Tree,
|
||
pub bytes: &'a [u8],
|
||
pub file_rel: &'a str,
|
||
}
|
||
|
||
impl<'a> SinkSiteLocator<'a> {
|
||
/// Resolve a `(start_byte, end_byte)` span to a [`SinkSite`] with the
|
||
/// given `cap`. Coordinates fall back to `(0, 0)` and the snippet to
|
||
/// empty when the byte offset is out of range (should not happen for
|
||
/// spans that came from the same tree).
|
||
pub fn site_for_span(&self, span: (usize, usize), cap: Cap) -> SinkSite {
|
||
let byte = span.0;
|
||
let point = self
|
||
.tree
|
||
.root_node()
|
||
.descendant_for_byte_range(byte, byte)
|
||
.map(|n| n.start_position())
|
||
.unwrap_or(tree_sitter::Point { row: 0, column: 0 });
|
||
let snippet = line_snippet(self.bytes, byte).unwrap_or_default();
|
||
SinkSite {
|
||
file_rel: self.file_rel.to_string(),
|
||
line: (point.row + 1) as u32,
|
||
col: (point.column + 1) as u32,
|
||
snippet,
|
||
cap,
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Extract the source line containing `byte_offset`, trimmed and capped at
|
||
/// 120 chars. Returns `None` when the offset is out of range or the line
|
||
/// is entirely blank after trimming.
|
||
pub(crate) fn line_snippet(src: &[u8], byte_offset: usize) -> Option<String> {
|
||
if byte_offset >= src.len() {
|
||
return None;
|
||
}
|
||
let line_start = src[..byte_offset]
|
||
.iter()
|
||
.rposition(|&b| b == b'\n')
|
||
.map_or(0, |p| p + 1);
|
||
let line_end = src[byte_offset..]
|
||
.iter()
|
||
.position(|&b| b == b'\n')
|
||
.map_or(src.len(), |p| byte_offset + p);
|
||
let line = std::str::from_utf8(&src[line_start..line_end]).ok()?;
|
||
let trimmed = line.trim();
|
||
if trimmed.is_empty() {
|
||
return None;
|
||
}
|
||
if trimmed.len() > 120 {
|
||
Some(format!("{}...", &trimmed[..120]))
|
||
} else {
|
||
Some(trimmed.to_string())
|
||
}
|
||
}
|
||
|
||
/// Union two `SmallVec<[SinkSite; 1]>` lists with `(file_rel, line, col,
|
||
/// cap)` dedup. Preserves insertion order of `existing` then appends any
|
||
/// new sites from `incoming` not already present.
|
||
pub(crate) fn union_sink_sites(existing: &mut SmallVec<[SinkSite; 1]>, incoming: &[SinkSite]) {
|
||
for site in incoming {
|
||
let key = site.dedup_key();
|
||
if !existing.iter().any(|s| s.dedup_key() == key) {
|
||
existing.push(site.clone());
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Union two `Vec<(usize, SmallVec<[SinkSite; 1]>)>` lists keyed by
|
||
/// parameter index. Each parameter keeps its own deduped site list.
|
||
pub(crate) fn union_param_sink_sites(
|
||
existing: &mut Vec<(usize, SmallVec<[SinkSite; 1]>)>,
|
||
incoming: &[(usize, SmallVec<[SinkSite; 1]>)],
|
||
) {
|
||
for (idx, sites) in incoming {
|
||
if let Some((_, ex)) = existing.iter_mut().find(|(i, _)| *i == *idx) {
|
||
union_sink_sites(ex, sites);
|
||
} else {
|
||
existing.push((*idx, sites.clone()));
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Top bit of [`FuncKey::disambig`] reserved for synthetic discriminators
|
||
/// minted by [`GlobalSummaries`] when an identity collision is detected
|
||
/// between structurally incompatible summaries.
|
||
///
|
||
/// Real disambigs come from `tree_sitter::Node::start_byte` (see
|
||
/// `cfg.rs:fn_disambig`), which is a byte offset into the source file.
|
||
/// Source files in practice are far below 2 GiB, so bit 31 of a real
|
||
/// disambig is always zero — setting it marks a value as synthetic and
|
||
/// keeps it in a disjoint namespace from byte-offset disambigs.
|
||
const SYNTHETIC_DISAMBIG_BIT: u32 = 0x8000_0000;
|
||
|
||
// ── Callee site metadata ────────────────────────────────────────────────
|
||
|
||
/// Richer per-call-site metadata preserved in a function's summary.
|
||
///
|
||
/// Replaces the legacy `Vec<String>` callee list. Carries enough structure
|
||
/// to disambiguate same-name overloads and method calls at resolution time
|
||
/// without having to re-parse the raw callee string.
|
||
///
|
||
/// * `name` — the raw callee text as it appeared in source
|
||
/// (`"obj.method"`, `"env::var"`, `"helper"`). Preserved for diagnostics.
|
||
/// * `arity` — number of positional arguments at the call site. `None`
|
||
/// when splats / keyword-args / rest-params make the count unreliable.
|
||
/// * `receiver` — structured receiver identifier for method calls
|
||
/// (e.g. `"obj"` in `obj.method()`). Carries the root receiver for
|
||
/// chained calls; `None` for non-method or complex receivers.
|
||
/// * `qualifier` — the segment immediately before the leaf for non-method
|
||
/// qualified calls (e.g. `"env"` in `env::var`). Extracted once at CFG
|
||
/// time rather than re-parsed downstream.
|
||
/// * `ordinal` — the per-function call ordinal matching
|
||
/// `CallMeta.call_ordinal`, allowing cross-file consumers to address a
|
||
/// specific call site rather than just a callee name.
|
||
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq, Hash)]
|
||
pub struct CalleeSite {
|
||
pub name: String,
|
||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||
pub arity: Option<usize>,
|
||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||
pub receiver: Option<String>,
|
||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||
pub qualifier: Option<String>,
|
||
#[serde(default, skip_serializing_if = "is_zero_u32")]
|
||
pub ordinal: u32,
|
||
}
|
||
|
||
fn is_zero_u32(n: &u32) -> bool {
|
||
*n == 0
|
||
}
|
||
|
||
impl CalleeSite {
|
||
/// Construct a bare call-site reference from a name, with no other metadata.
|
||
pub fn bare(name: impl Into<String>) -> Self {
|
||
Self {
|
||
name: name.into(),
|
||
..Default::default()
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<String> for CalleeSite {
|
||
fn from(name: String) -> Self {
|
||
Self {
|
||
name,
|
||
..Default::default()
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<&str> for CalleeSite {
|
||
fn from(name: &str) -> Self {
|
||
Self {
|
||
name: name.to_string(),
|
||
..Default::default()
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Deserialize a `Vec<CalleeSite>` while tolerating the legacy
|
||
/// on-disk form where callees were a plain array of strings.
|
||
///
|
||
/// Accepts:
|
||
/// * `[{"name": "foo", "arity": 1, ...}, ...]` ← current structured form
|
||
/// * `["foo", "bar", ...]` ← legacy string form
|
||
fn deserialize_callee_sites<'de, D>(de: D) -> Result<Vec<CalleeSite>, D::Error>
|
||
where
|
||
D: Deserializer<'de>,
|
||
{
|
||
#[derive(Deserialize)]
|
||
#[serde(untagged)]
|
||
enum Entry {
|
||
Structured(CalleeSite),
|
||
Bare(String),
|
||
}
|
||
|
||
let raw: Vec<Entry> = Vec::deserialize(de)?;
|
||
Ok(raw
|
||
.into_iter()
|
||
.map(|e| match e {
|
||
Entry::Structured(s) => s,
|
||
Entry::Bare(name) => CalleeSite::bare(name),
|
||
})
|
||
.collect())
|
||
}
|
||
|
||
/// Serialisable summary of a single function's taint behaviour.
|
||
///
|
||
/// One of these is produced per function during **pass 1** of a scan and
|
||
/// persisted to the `function_summaries` SQLite table. During **pass 2** the
|
||
/// full set of summaries across every file is loaded into memory so the taint
|
||
/// engine can resolve cross‑file calls.
|
||
///
|
||
/// Design notes
|
||
/// ────────────
|
||
/// * **All three cap fields are independent.** A function can simultaneously
|
||
/// act as a source (introduces fresh taint), a sanitizer (cleans certain
|
||
/// bits), and a sink (passes tainted data to a dangerous operation).
|
||
/// The old code picked a single `DataLabel` which lost information.
|
||
///
|
||
/// * **`propagating_params`** captures per‑argument pass‑through behaviour:
|
||
/// which parameter indices (0‑based) flow through to the return value.
|
||
/// This is essential for chains like `let y = transform(tainted_x); sink(y);`.
|
||
/// The legacy boolean `propagates_taint` is kept for deserialising old JSON.
|
||
///
|
||
/// * **`callees`** drive call‑graph construction in `callgraph.rs`, which
|
||
/// yields the topological order and SCC batches used between pass 1 and
|
||
/// pass 2 (see `scan::run_topo_batches` and `scc_file_batches_with_metadata`).
|
||
///
|
||
/// * **`tainted_sink_params`** marks which parameter *positions* flow to
|
||
/// internal sinks and is consumed by SSA callee resolution
|
||
/// (`ssa_transfer::mod.rs` `resolve_callee`) to build the per-parameter
|
||
/// `param_to_sink` list, so caller-side sink propagation fires on the
|
||
/// specific argument positions rather than the whole call.
|
||
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
|
||
pub struct FuncSummary {
|
||
/// Function name as it appears in the source (`my_func`, not the full path).
|
||
pub name: String,
|
||
|
||
/// Absolute path of the file that defines this function.
|
||
pub file_path: String,
|
||
|
||
/// Language slug (`"rust"`, `"javascript"`, …).
|
||
pub lang: String,
|
||
|
||
// ── Signature information ────────────────────────────────────────────
|
||
/// Total number of parameters (including `self`/`&self` for methods).
|
||
pub param_count: usize,
|
||
|
||
/// Parameter names in declaration order.
|
||
pub param_names: Vec<String>,
|
||
|
||
// ── Taint behaviour ──────────────────────────────────────────────────
|
||
// Stored as raw `u16` so serde doesn't need to know about `bitflags`.
|
||
/// Caps this function **introduces** — i.e. the return value carries
|
||
/// freshly‑tainted data even if no argument was tainted.
|
||
pub source_caps: u16,
|
||
|
||
/// Caps this function **cleans** — passing tainted data through this
|
||
/// function strips the corresponding bits.
|
||
pub sanitizer_caps: u16,
|
||
|
||
/// Caps this function **consumes unsafely** — calling it with tainted
|
||
/// arguments that still carry these bits is a finding.
|
||
pub sink_caps: u16,
|
||
|
||
/// Which parameter indices (0‑based) flow through to the return value.
|
||
#[serde(default)]
|
||
pub propagating_params: Vec<usize>,
|
||
|
||
/// Legacy field — kept only for deserialising old JSON from SQLite.
|
||
/// New code should use `propagating_params` instead.
|
||
#[serde(default, skip_serializing)]
|
||
pub propagates_taint: bool,
|
||
|
||
/// Indices of parameters that flow to internal sinks (0‑based).
|
||
pub tainted_sink_params: Vec<usize>,
|
||
|
||
/// Per-parameter [`SinkSite`] records — mirrors
|
||
/// [`SsaFuncSummary::param_to_sink`] so the coarse legacy summary also
|
||
/// carries primary sink-location attribution through the two-pass
|
||
/// architecture. Empty when the extractor lacked tree access.
|
||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||
pub param_to_sink: Vec<(usize, SmallVec<[SinkSite; 1]>)>,
|
||
|
||
/// Per-call-site metadata for every function/method/macro invoked
|
||
/// inside this body (`CalleeSite`). Carries arity, receiver,
|
||
/// qualifier, and call ordinal so downstream resolution does not have
|
||
/// to re-parse the raw callee string.
|
||
///
|
||
/// A custom deserializer tolerates legacy on-disk rows whose callees
|
||
/// field was a plain `Vec<String>`; those are lifted to
|
||
/// `CalleeSite { name, .. }` with no additional metadata.
|
||
#[serde(default, deserialize_with = "deserialize_callee_sites")]
|
||
pub callees: Vec<CalleeSite>,
|
||
|
||
// ── Identity discriminators ──────────────────────────────────────────
|
||
/// Enclosing container path (class / impl / module / outer function),
|
||
/// segments joined with `::`. Empty for free top-level functions.
|
||
#[serde(default)]
|
||
pub container: String,
|
||
|
||
/// Numeric discriminator for same-name siblings (closure byte offset,
|
||
/// nested-function occurrence index). `None` when no sibling collision.
|
||
#[serde(default)]
|
||
pub disambig: Option<u32>,
|
||
|
||
/// Structural role of this definition. Defaults to `Function` when
|
||
/// deserialising legacy JSON.
|
||
#[serde(default)]
|
||
pub kind: FuncKind,
|
||
|
||
// ── Rust-specific module-resolution metadata ────────────────────────
|
||
/// Crate-relative module path for this function's defining file
|
||
/// (e.g. `"auth::token"` for `src/auth/token.rs`). Only populated
|
||
/// when `lang == "rust"`. Used by the call graph to resolve
|
||
/// `use`-imported callees to their fully-qualified module.
|
||
///
|
||
/// `None` for non-Rust files and for Rust files outside a recognised
|
||
/// `src/` tree (tests, examples, build scripts).
|
||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||
pub module_path: Option<String>,
|
||
|
||
/// Per-file `use`-alias map for the defining Rust source.
|
||
///
|
||
/// Maps the local identifier introduced by a `use` declaration to its
|
||
/// fully qualified path (`"validate"` → `"crate::auth::token::validate"`).
|
||
/// Carried on every summary for the file even though it is per-file
|
||
/// information; the duplication keeps the persistence schema simple
|
||
/// and lets resolution operate purely off the caller's summary.
|
||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||
pub rust_use_map: Option<BTreeMap<String, String>>,
|
||
|
||
/// Fully qualified prefixes of any wildcard `use ...::*` imports in
|
||
/// the defining Rust source. Stored separately because they expand
|
||
/// the candidate space at resolution time rather than naming a single
|
||
/// alias.
|
||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||
pub rust_wildcards: Option<Vec<String>>,
|
||
}
|
||
|
||
// ── Cap conversion helpers ──────────────────────────────────────────────
|
||
|
||
impl FuncSummary {
|
||
#[inline]
|
||
pub fn source_caps(&self) -> Cap {
|
||
Cap::from_bits_truncate(self.source_caps)
|
||
}
|
||
|
||
#[inline]
|
||
pub fn sanitizer_caps(&self) -> Cap {
|
||
Cap::from_bits_truncate(self.sanitizer_caps)
|
||
}
|
||
|
||
#[inline]
|
||
pub fn sink_caps(&self) -> Cap {
|
||
Cap::from_bits_truncate(self.sink_caps)
|
||
}
|
||
|
||
/// Returns `true` when any parameter flows to the return value.
|
||
/// Also returns `true` for legacy summaries with `propagates_taint: true`
|
||
/// but empty `propagating_params` (backward compat).
|
||
pub fn propagates_any(&self) -> bool {
|
||
!self.propagating_params.is_empty() || self.propagates_taint
|
||
}
|
||
|
||
/// Build a [`FuncKey`] from this summary, normalizing the namespace
|
||
/// relative to `scan_root`.
|
||
pub fn func_key(&self, scan_root: Option<&str>) -> FuncKey {
|
||
FuncKey {
|
||
lang: Lang::from_slug(&self.lang).unwrap_or(Lang::Rust),
|
||
namespace: normalize_namespace(&self.file_path, scan_root),
|
||
container: self.container.clone(),
|
||
name: self.name.clone(),
|
||
arity: Some(self.param_count),
|
||
disambig: self.disambig,
|
||
kind: self.kind,
|
||
}
|
||
}
|
||
}
|
||
|
||
// ── Callee resolution ────────────────────────────────────────────────────
|
||
|
||
/// Result of resolving a bare callee name to a [`FuncKey`].
|
||
///
|
||
/// Three-valued: the call graph builder and taint engine need to distinguish
|
||
/// "no candidates at all" from "multiple candidates, can't pick one".
|
||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||
pub enum CalleeResolution {
|
||
/// Exactly one candidate matched.
|
||
Resolved(FuncKey),
|
||
/// No candidates found at all.
|
||
NotFound,
|
||
/// Multiple candidates — ambiguous, cannot pick one.
|
||
Ambiguous(Vec<FuncKey>),
|
||
}
|
||
|
||
/// Structured query describing a call site.
|
||
///
|
||
/// Carries every hint needed to pick the right callee *by qualified identity*
|
||
/// first and only fall back on bare-leaf lookup as a last resort. The old
|
||
/// entry points (`resolve_callee_key`, `resolve_callee_key_with_container`)
|
||
/// are now thin wrappers that build a `CalleeQuery` with partial information.
|
||
///
|
||
/// Hint categories, ordered from strongest to weakest:
|
||
///
|
||
/// * `receiver_type` — authoritative class/impl/module name (e.g. from
|
||
/// type inference or a `use ...` resolution). When set, the resolver
|
||
/// *requires* the callee's container to equal this name and refuses to
|
||
/// fall back to a leaf-name collision if the qualified lookup misses.
|
||
/// * `namespace_qualifier` — syntactic qualifier parsed from the callee
|
||
/// (e.g. `"env"` in `env::var`, `"http"` in `http.Get`). Treated as a
|
||
/// container hint but not authoritative: a miss falls through.
|
||
/// * `receiver_var` — syntactic receiver variable name (e.g. `"obj"` in
|
||
/// `obj.method()`). Soft hint, used only to tie-break ambiguity.
|
||
/// * `caller_container` — caller's own enclosing container, used to
|
||
/// resolve bare self-calls inside a class/impl body.
|
||
///
|
||
/// `arity` is a hard filter — when `Some`, every candidate whose arity
|
||
/// differs is excluded from consideration.
|
||
#[derive(Debug, Clone)]
|
||
pub struct CalleeQuery<'a> {
|
||
/// Leaf (unqualified) callee name, e.g. `"process"` for `OrderService::process`.
|
||
pub name: &'a str,
|
||
pub caller_lang: Lang,
|
||
/// Project-relative namespace (file path) of the caller. Used for
|
||
/// same-namespace disambiguation when qualified hints miss.
|
||
pub caller_namespace: &'a str,
|
||
/// The caller's own container (`FuncKey::container`), for resolving
|
||
/// bare `self`/intra-class calls without a receiver.
|
||
pub caller_container: Option<&'a str>,
|
||
/// Authoritative receiver class/impl name. Populated from type facts
|
||
/// (`TypeKind::label_prefix`) or from Rust use-map resolution.
|
||
pub receiver_type: Option<&'a str>,
|
||
/// Syntactic namespace qualifier (non-authoritative). For
|
||
/// `std::env::var` in Rust the caller passes `"env"`; for `http.Get`
|
||
/// in Go, `"http"`. Left `None` for purely bare calls.
|
||
pub namespace_qualifier: Option<&'a str>,
|
||
/// Syntactic receiver variable name. Used only as a tie-breaker — a
|
||
/// variable name is a weak proxy for a class name.
|
||
pub receiver_var: Option<&'a str>,
|
||
/// Positional-argument count at the call site. Hard filter when set.
|
||
pub arity: Option<usize>,
|
||
}
|
||
|
||
impl<'a> CalleeQuery<'a> {
|
||
/// Whether this query carries any qualified identity hint stronger than
|
||
/// a bare leaf name. Used by the resolver to decide whether an
|
||
/// unresolved qualified match should still fall through to leaf lookup
|
||
/// (no hints → fall through; authoritative hints → refuse to guess).
|
||
pub fn has_qualified_hint(&self) -> bool {
|
||
self.receiver_type.is_some()
|
||
|| self.namespace_qualifier.is_some()
|
||
|| self.caller_container.is_some_and(|s| !s.is_empty())
|
||
}
|
||
}
|
||
|
||
// ── Lookup map used by the taint engine ─────────────────────────────────
|
||
|
||
/// A merged view of all function summaries keyed by qualified [`FuncKey`].
|
||
///
|
||
/// Functions are partitioned by language + namespace + name + arity. Two
|
||
/// functions with the same bare name but different languages or namespaces
|
||
/// are stored separately — no implicit cross-language merging occurs.
|
||
///
|
||
/// A secondary index `(Lang, name)` supports fast lookup by language + name
|
||
/// for same-language resolution in the taint engine.
|
||
#[derive(Default)]
|
||
pub struct GlobalSummaries {
|
||
by_key: HashMap<FuncKey, FuncSummary>,
|
||
/// Bare leaf-name index — kept for compatibility with callers that only
|
||
/// see an unqualified call string. A single name may map to many keys
|
||
/// across containers / files / arities.
|
||
by_lang_name: HashMap<(Lang, String), Vec<FuncKey>>,
|
||
/// Container-qualified index: keyed on `"{container}::{name}"` (or just
|
||
/// `name` for free functions). Used to resolve calls when the call-site
|
||
/// can supply a receiver / container hint (e.g. `OrderService::process`).
|
||
by_lang_qualified: HashMap<(Lang, String), Vec<FuncKey>>,
|
||
/// Rust-only secondary index keyed on `(module_path, name)`.
|
||
///
|
||
/// Populated whenever a Rust [`FuncSummary`] is inserted with a
|
||
/// `module_path` set. Used by use-map driven resolution to look up
|
||
/// candidates by their crate-relative module rather than their
|
||
/// filesystem path. Same name / module / arity overloads land on the
|
||
/// same vector — arity narrowing happens at resolution time.
|
||
by_rust_module: HashMap<(String, String), Vec<FuncKey>>,
|
||
/// Precise SSA-derived per-parameter summaries, keyed by `FuncKey`.
|
||
/// These take precedence over `FuncSummary` during callee resolution.
|
||
ssa_by_key: HashMap<FuncKey, SsaFuncSummary>,
|
||
/// Cross-file callee bodies for interprocedural symbolic execution.
|
||
/// Keyed by `FuncKey` (same identity model as SSA summaries).
|
||
bodies_by_key: HashMap<FuncKey, crate::taint::ssa_transfer::CalleeSsaBody>,
|
||
/// Per-function auth-check summaries for cross-file helper lifting.
|
||
/// Keyed by `FuncKey` so a call-site resolver can go from a resolved
|
||
/// callee name to the helper's auth-check signature. Populated in
|
||
/// pass 1 and consumed by
|
||
/// [`crate::auth_analysis::run_auth_analysis`] during pass 2.
|
||
auth_by_key: HashMap<FuncKey, crate::auth_analysis::model::AuthCheckSummary>,
|
||
}
|
||
|
||
impl GlobalSummaries {
|
||
pub fn new() -> Self {
|
||
Self::default()
|
||
}
|
||
|
||
/// Walk a proposed insertion key, bumping the synthetic disambig
|
||
/// until either (a) the key is unoccupied, or (b) the entry found at
|
||
/// that key is compatible with the incoming summary (safe to merge).
|
||
///
|
||
/// Identity collisions are extraordinarily rare in practice (they
|
||
/// require two structurally distinct functions to land on the same
|
||
/// non-synthetic key, e.g. both with `disambig: None`). The loop
|
||
/// bound is defensive — if synthetic probing still collides after
|
||
/// 1024 attempts we fall through and let the caller merge, which
|
||
/// degrades gracefully to the old behaviour rather than looping
|
||
/// forever.
|
||
fn reconcile_func_summary_key(&self, mut key: FuncKey, summary: &FuncSummary) -> FuncKey {
|
||
let mut probe: u32 = 0;
|
||
loop {
|
||
match self.by_key.get(&key) {
|
||
Some(existing) if !summaries_compatible(existing, summary) => {
|
||
let synth = synthesize_disambig(summary).wrapping_add(probe);
|
||
key.disambig = Some(SYNTHETIC_DISAMBIG_BIT | (synth & !SYNTHETIC_DISAMBIG_BIT));
|
||
probe = probe.wrapping_add(1);
|
||
if probe >= 1024 {
|
||
tracing::warn!(
|
||
"summary identity collision probe gave up after 1024 attempts; \
|
||
falling back to union-merge for {}",
|
||
key
|
||
);
|
||
return key;
|
||
}
|
||
}
|
||
_ => return key,
|
||
}
|
||
}
|
||
}
|
||
|
||
/// SSA-summary variant of [`Self::reconcile_func_summary_key`].
|
||
///
|
||
/// Distinctness signals for SSA summaries are weaker than for
|
||
/// coarse `FuncSummary`s — the summary itself carries no explicit
|
||
/// `param_count`, only references to parameter indices. We combine:
|
||
///
|
||
/// * **Key arity fit** — any parameter index referenced by the new
|
||
/// summary that exceeds `key.arity` is a structural mismatch.
|
||
/// * **Existing-entry compare** — if an entry already lives at
|
||
/// this key and it disagrees on the set of referenced parameter
|
||
/// indices, the two cannot both describe the same function.
|
||
fn reconcile_ssa_summary_key(&self, mut key: FuncKey, summary: &SsaFuncSummary) -> FuncKey {
|
||
let mut probe: u32 = 0;
|
||
loop {
|
||
let conflict = match self.ssa_by_key.get(&key) {
|
||
Some(existing) => !ssa_summaries_compatible(existing, summary, key.arity),
|
||
None => !ssa_summary_fits_arity(summary, key.arity),
|
||
};
|
||
if !conflict {
|
||
return key;
|
||
}
|
||
let synth = synthesize_ssa_disambig(summary).wrapping_add(probe);
|
||
key.disambig = Some(SYNTHETIC_DISAMBIG_BIT | (synth & !SYNTHETIC_DISAMBIG_BIT));
|
||
probe = probe.wrapping_add(1);
|
||
if probe >= 1024 {
|
||
tracing::warn!(
|
||
"SSA summary identity collision probe gave up after 1024 attempts \
|
||
for {}",
|
||
key
|
||
);
|
||
return key;
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Body variant of [`Self::reconcile_func_summary_key`].
|
||
///
|
||
/// `CalleeSsaBody` carries an explicit `param_count`, which must
|
||
/// agree with both `key.arity` and any co-located body's
|
||
/// `param_count`. A mismatch is a hard collision.
|
||
fn reconcile_body_key(
|
||
&self,
|
||
mut key: FuncKey,
|
||
body: &crate::taint::ssa_transfer::CalleeSsaBody,
|
||
) -> FuncKey {
|
||
let mut probe: u32 = 0;
|
||
loop {
|
||
let conflict = match self.bodies_by_key.get(&key) {
|
||
Some(existing) => existing.param_count != body.param_count,
|
||
None => match key.arity {
|
||
Some(a) => a != body.param_count,
|
||
None => false,
|
||
},
|
||
};
|
||
if !conflict {
|
||
return key;
|
||
}
|
||
let synth = (body.param_count as u32)
|
||
.wrapping_mul(0x9E37_79B9)
|
||
.wrapping_add(probe);
|
||
key.disambig = Some(SYNTHETIC_DISAMBIG_BIT | (synth & !SYNTHETIC_DISAMBIG_BIT));
|
||
probe = probe.wrapping_add(1);
|
||
if probe >= 1024 {
|
||
tracing::warn!(
|
||
"SSA body identity collision probe gave up after 1024 attempts for {}",
|
||
key
|
||
);
|
||
return key;
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Insert or merge a summary. If an exact `FuncKey` match exists and
|
||
/// the two summaries describe the same function, merge conservatively
|
||
/// (OR caps/booleans, union params/callees).
|
||
///
|
||
/// `FuncKey` is structurally precise *when every producer populates
|
||
/// `disambig`*. Legacy on-disk JSON, interop configs, DB rows written
|
||
/// by older versions, and any code path that keeps `disambig: None`
|
||
/// can produce two keys that hash-equal even though they belong to
|
||
/// structurally distinct functions (e.g. different `param_count`,
|
||
/// `kind`, `container`, or `param_names`). Silently unioning those
|
||
/// would leak security-relevant caps across unrelated functions and
|
||
/// drop one of the two summaries entirely.
|
||
///
|
||
/// We therefore inspect the existing entry first. If the new summary
|
||
/// is not [`summaries_compatible`] with it, we mint a synthetic
|
||
/// disambig (top bit set to stay disjoint from byte-offset disambigs)
|
||
/// and retry the insert under the fresh key so *both* functions are
|
||
/// preserved.
|
||
pub fn insert(&mut self, key: FuncKey, summary: FuncSummary) {
|
||
let key = self.reconcile_func_summary_key(key, &summary);
|
||
let lang = key.lang;
|
||
let name = key.name.clone();
|
||
let qualified = key.qualified_name();
|
||
let rust_module = if lang == Lang::Rust {
|
||
summary.module_path.clone()
|
||
} else {
|
||
None
|
||
};
|
||
|
||
self.by_key
|
||
.entry(key.clone())
|
||
.and_modify(|existing| {
|
||
existing.source_caps |= summary.source_caps;
|
||
existing.sanitizer_caps |= summary.sanitizer_caps;
|
||
existing.sink_caps |= summary.sink_caps;
|
||
existing.propagates_taint |= summary.propagates_taint;
|
||
for &idx in &summary.propagating_params {
|
||
if !existing.propagating_params.contains(&idx) {
|
||
existing.propagating_params.push(idx);
|
||
}
|
||
}
|
||
for &idx in &summary.tainted_sink_params {
|
||
if !existing.tainted_sink_params.contains(&idx) {
|
||
existing.tainted_sink_params.push(idx);
|
||
}
|
||
}
|
||
union_param_sink_sites(&mut existing.param_to_sink, &summary.param_to_sink);
|
||
for c in &summary.callees {
|
||
if !existing.callees.iter().any(|e| {
|
||
e.name == c.name
|
||
&& e.arity == c.arity
|
||
&& e.receiver == c.receiver
|
||
&& e.qualifier == c.qualifier
|
||
&& e.ordinal == c.ordinal
|
||
}) {
|
||
existing.callees.push(c.clone());
|
||
}
|
||
}
|
||
})
|
||
.or_insert(summary);
|
||
|
||
let keys = self.by_lang_name.entry((lang, name)).or_default();
|
||
if !keys.contains(&key) {
|
||
keys.push(key.clone());
|
||
}
|
||
|
||
let q_keys = self.by_lang_qualified.entry((lang, qualified)).or_default();
|
||
if !q_keys.contains(&key) {
|
||
q_keys.push(key.clone());
|
||
}
|
||
|
||
if let Some(mp) = rust_module {
|
||
let mk = self
|
||
.by_rust_module
|
||
.entry((mp, key.name.clone()))
|
||
.or_default();
|
||
if !mk.contains(&key) {
|
||
mk.push(key);
|
||
}
|
||
}
|
||
}
|
||
|
||
/// Exact lookup by fully-qualified key.
|
||
pub fn get(&self, key: &FuncKey) -> Option<&FuncSummary> {
|
||
self.by_key.get(key)
|
||
}
|
||
|
||
/// Interop / external-edge lookup: tolerant of `disambig` being `None`.
|
||
///
|
||
/// Interop edges originate outside the source code (user-specified JSON,
|
||
/// language-bridge config) and cannot know a callee's internal byte-offset
|
||
/// disambiguator. When the query key has `disambig = None` we fall back to
|
||
/// scanning for a single match on `(lang, namespace, container, name,
|
||
/// arity, kind)`. If exactly one matches it is returned; otherwise we
|
||
/// return `None` to preserve determinism (ambiguity is treated as unknown).
|
||
pub fn get_for_interop(&self, key: &FuncKey) -> Option<&FuncSummary> {
|
||
if let Some(hit) = self.by_key.get(key) {
|
||
return Some(hit);
|
||
}
|
||
if key.disambig.is_some() {
|
||
return None;
|
||
}
|
||
let mut matches = self.by_key.iter().filter(|(k, _)| {
|
||
k.lang == key.lang
|
||
&& k.namespace == key.namespace
|
||
&& k.container == key.container
|
||
&& k.name == key.name
|
||
&& k.arity == key.arity
|
||
&& k.kind == key.kind
|
||
});
|
||
let first = matches.next()?;
|
||
if matches.next().is_some() {
|
||
None
|
||
} else {
|
||
Some(first.1)
|
||
}
|
||
}
|
||
|
||
/// All same-language matches for a bare function name.
|
||
pub fn lookup_same_lang(&self, lang: Lang, name: &str) -> Vec<(&FuncKey, &FuncSummary)> {
|
||
self.by_lang_name
|
||
.get(&(lang, name.to_string()))
|
||
.map(|keys| {
|
||
keys.iter()
|
||
.filter_map(|k| self.by_key.get(k).map(|v| (k, v)))
|
||
.collect()
|
||
})
|
||
.unwrap_or_default()
|
||
}
|
||
|
||
/// Rust-only lookup by `(module_path, name)`.
|
||
///
|
||
/// Returns every candidate that was inserted with a matching module
|
||
/// path. Arity filtering is applied by the caller so that the index
|
||
/// stays ambiguity-aware (two overloads legitimately share a module
|
||
/// path + name and only differ in arity).
|
||
pub fn lookup_rust_module(
|
||
&self,
|
||
module_path: &str,
|
||
name: &str,
|
||
) -> Vec<(&FuncKey, &FuncSummary)> {
|
||
self.by_rust_module
|
||
.get(&(module_path.to_string(), name.to_string()))
|
||
.map(|keys| {
|
||
keys.iter()
|
||
.filter_map(|k| self.by_key.get(k).map(|v| (k, v)))
|
||
.collect()
|
||
})
|
||
.unwrap_or_default()
|
||
}
|
||
|
||
/// Container-qualified lookup. `qualified` should be
|
||
/// `"Container::name"` (use [`FuncKey::qualified_name`]) or `"name"`.
|
||
pub fn lookup_qualified(&self, lang: Lang, qualified: &str) -> Vec<(&FuncKey, &FuncSummary)> {
|
||
self.by_lang_qualified
|
||
.get(&(lang, qualified.to_string()))
|
||
.map(|keys| {
|
||
keys.iter()
|
||
.filter_map(|k| self.by_key.get(k).map(|v| (k, v)))
|
||
.collect()
|
||
})
|
||
.unwrap_or_default()
|
||
}
|
||
|
||
/// Merge another `GlobalSummaries` into this one (for parallel fold/reduce).
|
||
pub fn merge(&mut self, other: GlobalSummaries) {
|
||
// `insert` rebuilds every secondary index (by_lang_name, by_lang_qualified,
|
||
// by_rust_module) from the summary itself, so we do not need to copy
|
||
// `other.by_rust_module` explicitly — draining `other.by_key` is enough.
|
||
for (key, summary) in other.by_key {
|
||
self.insert(key, summary);
|
||
}
|
||
// SSA summaries: last-writer-wins (exact-key replacement, no unioning)
|
||
for (key, ssa_sum) in other.ssa_by_key {
|
||
self.ssa_by_key.insert(key, ssa_sum);
|
||
}
|
||
// Cross-file bodies: last-writer-wins
|
||
for (key, body) in other.bodies_by_key {
|
||
self.bodies_by_key.insert(key, body);
|
||
}
|
||
// Auth summaries: last-writer-wins (exact-key replacement)
|
||
for (key, auth_sum) in other.auth_by_key {
|
||
self.auth_by_key.insert(key, auth_sum);
|
||
}
|
||
}
|
||
|
||
/// Insert an SSA summary.
|
||
///
|
||
/// Per-function refinement is expressed via last-writer-wins for
|
||
/// *compatible* summaries: re-analysing the same function body with
|
||
/// more precise seeds yields a strictly better summary, and the
|
||
/// caller genuinely wants the new one to replace the old.
|
||
///
|
||
/// When the existing entry is **incompatible** with the incoming
|
||
/// one — the key's `arity` disagrees with the new summary's referenced
|
||
/// parameter indices, or the two summaries would describe different
|
||
/// functions — we synthesize a disambig so both are kept. Silent
|
||
/// replacement in that case would drop one function's cross-file
|
||
/// taint signal entirely, which the caller cannot recover.
|
||
pub fn insert_ssa(&mut self, key: FuncKey, summary: SsaFuncSummary) {
|
||
let key = self.reconcile_ssa_summary_key(key, &summary);
|
||
self.ssa_by_key.insert(key, summary);
|
||
}
|
||
|
||
/// Exact lookup of an SSA summary by fully-qualified key.
|
||
pub fn get_ssa(&self, key: &FuncKey) -> Option<&SsaFuncSummary> {
|
||
self.ssa_by_key.get(key)
|
||
}
|
||
|
||
/// Insert an `AuthCheckSummary` for cross-file helper lifting.
|
||
///
|
||
/// Last-writer-wins: re-analysing a file produces a fresh summary
|
||
/// that fully replaces any earlier entry. No compatibility
|
||
/// reconciliation is needed because `AuthCheckSummary` carries no
|
||
/// identity-sensitive signal beyond the key itself.
|
||
pub fn insert_auth(
|
||
&mut self,
|
||
key: FuncKey,
|
||
summary: crate::auth_analysis::model::AuthCheckSummary,
|
||
) {
|
||
self.auth_by_key.insert(key, summary);
|
||
}
|
||
|
||
/// Exact lookup of an `AuthCheckSummary` by fully-qualified key.
|
||
pub fn get_auth(
|
||
&self,
|
||
key: &FuncKey,
|
||
) -> Option<&crate::auth_analysis::model::AuthCheckSummary> {
|
||
self.auth_by_key.get(key)
|
||
}
|
||
|
||
/// Direct access to the auth-summary map. `None` when empty so
|
||
/// callers can distinguish "no cross-file auth summaries loaded"
|
||
/// from "some were loaded but none matched the call site".
|
||
pub fn auth_by_key(
|
||
&self,
|
||
) -> Option<&HashMap<FuncKey, crate::auth_analysis::model::AuthCheckSummary>> {
|
||
if self.auth_by_key.is_empty() {
|
||
None
|
||
} else {
|
||
Some(&self.auth_by_key)
|
||
}
|
||
}
|
||
|
||
/// Count of cross-file auth summaries currently loaded.
|
||
pub fn auth_len(&self) -> usize {
|
||
self.auth_by_key.len()
|
||
}
|
||
|
||
/// Insert a cross-file callee body.
|
||
///
|
||
/// See [`insert_ssa`](Self::insert_ssa) for the identity-safety rule.
|
||
/// Bodies additionally carry `param_count`, giving a hard structural
|
||
/// signal: a collision between bodies with different `param_count`
|
||
/// cannot be the same function and is always rekeyed.
|
||
pub fn insert_body(&mut self, key: FuncKey, body: crate::taint::ssa_transfer::CalleeSsaBody) {
|
||
let key = self.reconcile_body_key(key, &body);
|
||
self.bodies_by_key.insert(key, body);
|
||
}
|
||
|
||
/// Exact lookup of a cross-file callee body by fully-qualified key.
|
||
pub fn get_body(&self, key: &FuncKey) -> Option<&crate::taint::ssa_transfer::CalleeSsaBody> {
|
||
self.bodies_by_key.get(key)
|
||
}
|
||
|
||
/// Direct access to the cross-file body map.
|
||
///
|
||
/// Returns `None` when no cross-file bodies were loaded (empty map).
|
||
/// The taint engine uses this to thread bodies through
|
||
/// [`crate::taint::ssa_transfer::SsaTaintTransfer::cross_file_bodies`]
|
||
/// and `resolve_callee` for context-sensitive cross-file inline
|
||
/// analysis.
|
||
pub fn bodies_by_key(
|
||
&self,
|
||
) -> Option<&HashMap<FuncKey, crate::taint::ssa_transfer::CalleeSsaBody>> {
|
||
if self.bodies_by_key.is_empty() {
|
||
None
|
||
} else {
|
||
Some(&self.bodies_by_key)
|
||
}
|
||
}
|
||
|
||
/// Count of cross-file bodies currently loaded. Exposed for
|
||
/// `tracing::debug!` observability — lets callers distinguish "no
|
||
/// bodies available" from "bodies available but inline didn't fire".
|
||
pub fn bodies_len(&self) -> usize {
|
||
self.bodies_by_key.len()
|
||
}
|
||
|
||
/// Resolve a bare callee name to a cross-file body.
|
||
///
|
||
/// Uses `resolve_callee_key()` for strict deterministic resolution,
|
||
/// then checks `bodies_by_key`. Returns `None` on `Ambiguous` or `NotFound`.
|
||
pub fn resolve_callee_body(
|
||
&self,
|
||
lang: Lang,
|
||
name: &str,
|
||
arity_hint: Option<usize>,
|
||
caller_namespace: &str,
|
||
) -> Option<&crate::taint::ssa_transfer::CalleeSsaBody> {
|
||
match self.resolve_callee_key(name, lang, caller_namespace, arity_hint) {
|
||
CalleeResolution::Resolved(key) => self.bodies_by_key.get(&key),
|
||
CalleeResolution::NotFound | CalleeResolution::Ambiguous(_) => None,
|
||
}
|
||
}
|
||
|
||
#[allow(dead_code)] // used by tests and future call-graph consumers
|
||
pub fn is_empty(&self) -> bool {
|
||
self.by_key.is_empty() && self.ssa_by_key.is_empty() && self.auth_by_key.is_empty()
|
||
}
|
||
|
||
/// Iterate over all (key, summary) pairs.
|
||
pub fn iter(&self) -> impl Iterator<Item = (&FuncKey, &FuncSummary)> {
|
||
self.by_key.iter()
|
||
}
|
||
|
||
/// Snapshot the convergence-relevant fields of every summary.
|
||
///
|
||
/// Returns `(source_caps, sanitizer_caps, sink_caps, propagating_params)`
|
||
/// per key. Used by the SCC fixed-point loop to detect when an iteration
|
||
/// has not changed any summary — i.e. convergence.
|
||
pub fn snapshot_caps(&self) -> HashMap<FuncKey, (u16, u16, u16, Vec<usize>)> {
|
||
self.by_key
|
||
.iter()
|
||
.map(|(k, s)| {
|
||
(
|
||
k.clone(),
|
||
(
|
||
s.source_caps,
|
||
s.sanitizer_caps,
|
||
s.sink_caps,
|
||
s.propagating_params.clone(),
|
||
),
|
||
)
|
||
})
|
||
.collect()
|
||
}
|
||
|
||
/// Snapshot the SSA summaries for convergence detection.
|
||
///
|
||
/// Used alongside [`snapshot_caps`] in the SCC fixed-point loop so that
|
||
/// SSA-only refinements (e.g. a `StripBits` transform appearing after a
|
||
/// cross-file sanitizer is resolved) are not invisible to convergence.
|
||
pub fn snapshot_ssa(&self) -> &HashMap<FuncKey, SsaFuncSummary> {
|
||
&self.ssa_by_key
|
||
}
|
||
|
||
/// Rust-only resolution that consults the caller's `use` map before
|
||
/// falling back to generic resolution.
|
||
///
|
||
/// The caller passes the callee's leaf name plus the (optional)
|
||
/// structured qualifier that `CalleeSite.qualifier` carries for Rust
|
||
/// call sites (e.g. `"crate::auth::token"` for `crate::auth::token::validate()`).
|
||
/// The `use` map and wildcard list come from the caller's own
|
||
/// [`FuncSummary`].
|
||
///
|
||
/// Resolution order:
|
||
///
|
||
/// 1. If the caller has a `use_map` and (qualifier, name) resolves to a
|
||
/// fully qualified path, strip the leading `crate::` and look up
|
||
/// `(module_path, name)` in the Rust module index. If arity filtering
|
||
/// leaves exactly one candidate → resolved.
|
||
/// 2. Otherwise, for each wildcard prefix in scope, try
|
||
/// `(wildcard_prefix, name)` in the module index. If across all
|
||
/// wildcards exactly one arity-filtered candidate appears → resolved.
|
||
/// 3. Otherwise fall through to [`resolve_callee_key_with_container`]
|
||
/// with no `container_hint` — meaning only the existing namespace /
|
||
/// arity disambiguation applies.
|
||
///
|
||
/// A `None` use_map (non-Rust file or no `use` declarations) makes this
|
||
/// equivalent to the generic path.
|
||
pub fn resolve_callee_key_rust(
|
||
&self,
|
||
callee: &str,
|
||
qualifier: Option<&str>,
|
||
arity_hint: Option<usize>,
|
||
caller_namespace: &str,
|
||
use_map: Option<&crate::rust_resolve::RustUseMap>,
|
||
) -> CalleeResolution {
|
||
use crate::rust_resolve::{resolve_with_use_map, split_module_and_name};
|
||
|
||
// 1) Try direct use-map resolution.
|
||
if let Some(um) = use_map
|
||
&& let Some(full) = resolve_with_use_map(um, qualifier, callee)
|
||
{
|
||
let (module_path, name) = split_module_and_name(&full);
|
||
if !module_path.is_empty() {
|
||
let candidates = self.lookup_rust_module(&module_path, &name);
|
||
let filtered: Vec<&FuncKey> = match arity_hint {
|
||
Some(a) => candidates
|
||
.iter()
|
||
.filter(|(k, _)| k.arity == Some(a))
|
||
.map(|(k, _)| *k)
|
||
.collect(),
|
||
None => candidates.iter().map(|(k, _)| *k).collect(),
|
||
};
|
||
if filtered.len() == 1 {
|
||
return CalleeResolution::Resolved(filtered[0].clone());
|
||
}
|
||
}
|
||
}
|
||
|
||
// 2) Try wildcards. Each wildcard expands `use prefix::*;` into an
|
||
// implicit `(prefix, name)` candidate set; we union across all
|
||
// wildcards and only resolve when exactly one matches under the
|
||
// arity filter.
|
||
if let Some(um) = use_map
|
||
&& !um.wildcards.is_empty()
|
||
{
|
||
let mut collected: Vec<FuncKey> = Vec::new();
|
||
for w in &um.wildcards {
|
||
let prefix = w.strip_prefix("crate::").unwrap_or(w);
|
||
if prefix.is_empty() {
|
||
continue;
|
||
}
|
||
for (k, _) in self.lookup_rust_module(prefix, callee) {
|
||
if let Some(a) = arity_hint
|
||
&& k.arity != Some(a)
|
||
{
|
||
continue;
|
||
}
|
||
if !collected.contains(k) {
|
||
collected.push(k.clone());
|
||
}
|
||
}
|
||
}
|
||
if collected.len() == 1 {
|
||
return CalleeResolution::Resolved(collected.remove(0));
|
||
}
|
||
}
|
||
|
||
// 3) Fall back to generic same-language resolution.
|
||
self.resolve_callee_key_with_container(
|
||
callee,
|
||
Lang::Rust,
|
||
caller_namespace,
|
||
None,
|
||
arity_hint,
|
||
)
|
||
}
|
||
|
||
/// Resolve a bare (already-normalized) callee name to a [`FuncKey`].
|
||
///
|
||
/// Thin wrapper around [`resolve_callee`] that constructs a minimal
|
||
/// [`CalleeQuery`] with no qualified hints. Kept for call sites that
|
||
/// only hold a string callee and an arity; prefer [`resolve_callee`]
|
||
/// whenever receiver / qualifier / container information is available.
|
||
pub fn resolve_callee_key(
|
||
&self,
|
||
callee: &str,
|
||
caller_lang: Lang,
|
||
caller_namespace: &str,
|
||
arity_hint: Option<usize>,
|
||
) -> CalleeResolution {
|
||
self.resolve_callee(&CalleeQuery {
|
||
name: callee,
|
||
caller_lang,
|
||
caller_namespace,
|
||
caller_container: None,
|
||
receiver_type: None,
|
||
namespace_qualifier: None,
|
||
receiver_var: None,
|
||
arity: arity_hint,
|
||
})
|
||
}
|
||
|
||
/// Resolve a callee name with an optional container hint.
|
||
///
|
||
/// Legacy entry point — kept so tests and older callers compile
|
||
/// unchanged. `container_hint` is interpreted as a syntactic
|
||
/// container qualifier (not an authoritative receiver type), so a
|
||
/// miss is allowed to fall through to leaf-name lookup. New
|
||
/// callers should route through [`resolve_callee`] and classify
|
||
/// their hint as `receiver_type` vs `namespace_qualifier` vs
|
||
/// `receiver_var` so the resolver can apply the correct policy.
|
||
pub fn resolve_callee_key_with_container(
|
||
&self,
|
||
callee: &str,
|
||
caller_lang: Lang,
|
||
caller_namespace: &str,
|
||
container_hint: Option<&str>,
|
||
arity_hint: Option<usize>,
|
||
) -> CalleeResolution {
|
||
self.resolve_callee(&CalleeQuery {
|
||
name: callee,
|
||
caller_lang,
|
||
caller_namespace,
|
||
caller_container: None,
|
||
receiver_type: None,
|
||
namespace_qualifier: container_hint,
|
||
receiver_var: None,
|
||
arity: arity_hint,
|
||
})
|
||
}
|
||
|
||
/// Resolve a callee with full structured hints.
|
||
///
|
||
/// **New resolution order** (qualified identity primary, leaf name
|
||
/// fallback):
|
||
///
|
||
/// 1. **Receiver-type qualified** — if `receiver_type` is set,
|
||
/// consult `by_lang_qualified[{receiver_type}::{name}]` with the
|
||
/// arity filter. Exactly-one → resolved; same-namespace
|
||
/// tie-breaker if multiple. *Receiver types are authoritative*:
|
||
/// a miss does not fall back to bare leaf lookup (that would be
|
||
/// a silent reinterpretation).
|
||
/// 2. **Namespace-qualifier qualified** — if `namespace_qualifier`
|
||
/// is set, try the qualified index with that container.
|
||
/// Non-authoritative: a miss falls through.
|
||
/// 3. **Caller-self-container** — when the caller lives inside a
|
||
/// container (method body), try the qualified index against the
|
||
/// caller's own container. Resolves bare `foo()` self-calls
|
||
/// inside a class without collapsing into an unrelated same-leaf
|
||
/// definition in another file.
|
||
/// 4. **Same-namespace unique leaf** — intra-file bare-leaf call:
|
||
/// if the caller's namespace contains exactly one arity-matched
|
||
/// candidate with this leaf, resolve to it.
|
||
/// 5. **Receiver-variable tie-break** — if the same-namespace
|
||
/// lookup misses but the raw call came with a receiver variable,
|
||
/// try `{receiver_var}::{name}` as a last qualified attempt.
|
||
///
|
||
/// 5.5. **Bare-call free-function preference** — for a truly bare
|
||
/// call (no receiver type, no namespace qualifier, no receiver
|
||
/// variable), if exactly one same-namespace arity-matched
|
||
/// candidate has an empty container, resolve to it. A class
|
||
/// method cannot be invoked with bare-call syntax from outside
|
||
/// its class, so this disambiguation is safe even when same-name
|
||
/// methods exist elsewhere in the file.
|
||
/// 6. **Leaf-name fallback** — arity-filtered same-language lookup.
|
||
/// Unique → resolved. Multiple + we had any qualified hint →
|
||
/// Ambiguous (refuse to guess when a qualifier exists but
|
||
/// missed). Multiple + no qualified hint → narrow by namespace,
|
||
/// then container.
|
||
pub fn resolve_callee(&self, q: &CalleeQuery<'_>) -> CalleeResolution {
|
||
// ── Helpers ─────────────────────────────────────────────────
|
||
let arity_matches = |k: &FuncKey| match q.arity {
|
||
Some(a) => k.arity == Some(a),
|
||
None => true,
|
||
};
|
||
|
||
// Look up `{container}::{name}` and return a single arity-matched
|
||
// candidate if one exists (using same-namespace to break ties).
|
||
let try_qualified = |container: &str| -> Option<FuncKey> {
|
||
if container.is_empty() {
|
||
return None;
|
||
}
|
||
let qual = format!("{container}::{}", q.name);
|
||
let candidates: Vec<&FuncKey> = self
|
||
.lookup_qualified(q.caller_lang, &qual)
|
||
.into_iter()
|
||
.map(|(k, _)| k)
|
||
.filter(|k| arity_matches(k))
|
||
.collect();
|
||
match candidates.len() {
|
||
0 => None,
|
||
1 => Some(candidates[0].clone()),
|
||
_ => {
|
||
let same_ns: Vec<&FuncKey> = candidates
|
||
.iter()
|
||
.copied()
|
||
.filter(|k| k.namespace == q.caller_namespace)
|
||
.collect();
|
||
if same_ns.len() == 1 {
|
||
Some(same_ns[0].clone())
|
||
} else {
|
||
None
|
||
}
|
||
}
|
||
}
|
||
};
|
||
|
||
// ── Step 1: receiver_type (authoritative) ───────────────────
|
||
if let Some(rt) = q.receiver_type {
|
||
if let Some(key) = try_qualified(rt) {
|
||
return CalleeResolution::Resolved(key);
|
||
}
|
||
// Authoritative miss: before returning, check whether any
|
||
// candidate exists at all for the leaf name. If there are
|
||
// some, report Ambiguous with the leaf candidates (so the
|
||
// caller knows we saw the name but refused to pick the
|
||
// wrong container). If there are none, return NotFound.
|
||
let bare: Vec<&FuncKey> = self
|
||
.lookup_same_lang(q.caller_lang, q.name)
|
||
.into_iter()
|
||
.map(|(k, _)| k)
|
||
.filter(|k| arity_matches(k))
|
||
.collect();
|
||
return if bare.is_empty() {
|
||
CalleeResolution::NotFound
|
||
} else {
|
||
CalleeResolution::Ambiguous(bare.into_iter().cloned().collect())
|
||
};
|
||
}
|
||
|
||
// ── Step 2: namespace_qualifier (non-authoritative) ─────────
|
||
if let Some(nq) = q.namespace_qualifier
|
||
&& let Some(key) = try_qualified(nq)
|
||
{
|
||
return CalleeResolution::Resolved(key);
|
||
}
|
||
|
||
// ── Step 3: caller self-container ───────────────────────────
|
||
if let Some(cc) = q.caller_container
|
||
&& let Some(key) = try_qualified(cc)
|
||
{
|
||
return CalleeResolution::Resolved(key);
|
||
}
|
||
|
||
// ── Step 4: same-namespace unique leaf ──────────────────────
|
||
let all_candidates: Vec<&FuncKey> = self
|
||
.lookup_same_lang(q.caller_lang, q.name)
|
||
.into_iter()
|
||
.map(|(k, _)| k)
|
||
.collect();
|
||
if all_candidates.is_empty() {
|
||
return CalleeResolution::NotFound;
|
||
}
|
||
|
||
let arity_filtered: Vec<&FuncKey> = all_candidates
|
||
.iter()
|
||
.copied()
|
||
.filter(|k| arity_matches(k))
|
||
.collect();
|
||
if arity_filtered.is_empty() {
|
||
return CalleeResolution::NotFound;
|
||
}
|
||
|
||
let same_ns: Vec<&FuncKey> = arity_filtered
|
||
.iter()
|
||
.copied()
|
||
.filter(|k| k.namespace == q.caller_namespace)
|
||
.collect();
|
||
if same_ns.len() == 1 {
|
||
return CalleeResolution::Resolved(same_ns[0].clone());
|
||
}
|
||
|
||
// ── Step 5: receiver_var tie-break (soft) ───────────────────
|
||
if let Some(rv) = q.receiver_var
|
||
&& let Some(key) = try_qualified(rv)
|
||
{
|
||
return CalleeResolution::Resolved(key);
|
||
}
|
||
|
||
// ── Step 5.5: bare-call free-function preference ────────────
|
||
// A call with no receiver, no namespace qualifier, and no
|
||
// authoritative receiver type is syntactically a free-function
|
||
// invocation: a class method cannot be invoked that way from
|
||
// outside its own class (intra-class self-calls were already
|
||
// resolved by step 3). When the same-namespace candidate set
|
||
// contains exactly one empty-container entry, it is the
|
||
// unambiguous target — returning Ambiguous here would be a
|
||
// silent false negative whenever a top-level helper happens to
|
||
// share a name with some method elsewhere in the file.
|
||
let syntactic_bare = q.receiver_type.is_none()
|
||
&& q.namespace_qualifier.is_none()
|
||
&& q.receiver_var.is_none();
|
||
if syntactic_bare {
|
||
let empty_container_same_ns: Vec<&FuncKey> = same_ns
|
||
.iter()
|
||
.copied()
|
||
.filter(|k| k.container.is_empty())
|
||
.collect();
|
||
if empty_container_same_ns.len() == 1 {
|
||
return CalleeResolution::Resolved(empty_container_same_ns[0].clone());
|
||
}
|
||
}
|
||
|
||
// ── Step 6: leaf fallback ───────────────────────────────────
|
||
if arity_filtered.len() == 1 {
|
||
return CalleeResolution::Resolved(arity_filtered[0].clone());
|
||
}
|
||
|
||
// Multiple arity-matched candidates remain. When a qualified
|
||
// hint was supplied but missed, refuse to guess — a silent
|
||
// leaf-name pick would defeat the point of qualified-first
|
||
// resolution. (`receiver_type` is handled in Step 1 and never
|
||
// reaches here; `namespace_qualifier` / `caller_container`
|
||
// missing their target flow through as a soft miss.)
|
||
if q.has_qualified_hint() {
|
||
return CalleeResolution::Ambiguous(arity_filtered.into_iter().cloned().collect());
|
||
}
|
||
|
||
// No qualified hints whatsoever — tolerate namespace narrowing.
|
||
match same_ns.len() {
|
||
1 => CalleeResolution::Resolved(same_ns[0].clone()),
|
||
0 => CalleeResolution::Ambiguous(arity_filtered.into_iter().cloned().collect()),
|
||
_ => CalleeResolution::Ambiguous(same_ns.into_iter().cloned().collect()),
|
||
}
|
||
}
|
||
}
|
||
|
||
impl std::fmt::Debug for GlobalSummaries {
|
||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||
f.debug_struct("GlobalSummaries")
|
||
.field("len", &self.by_key.len())
|
||
.field("ssa_len", &self.ssa_by_key.len())
|
||
.field("bodies_len", &self.bodies_by_key.len())
|
||
.field("auth_len", &self.auth_by_key.len())
|
||
.finish()
|
||
}
|
||
}
|
||
|
||
/// Return `true` iff two `FuncSummary`s can be safely union-merged at the
|
||
/// same `FuncKey`.
|
||
///
|
||
/// Only fields that a single function definition is guaranteed to agree on
|
||
/// are compared. Behaviour fields (`source_caps`, `propagating_params`,
|
||
/// `callees`, …) are deliberately ignored: merge is *allowed* to combine
|
||
/// those. The test is symmetric.
|
||
///
|
||
/// Comparison rules
|
||
/// ────────────────
|
||
/// * **`param_count` / `kind` / `container`** — unconditional agreement.
|
||
/// Any mismatch is a hard collision between distinct functions.
|
||
/// * **`file_path`** — agree when both sides are populated. A blank path
|
||
/// can come from synthetic summaries constructed in tests / interop
|
||
/// configs and should not force a split.
|
||
/// * **`param_names`** — agree when both sides are populated. Legacy
|
||
/// summaries may persist with empty names; treating empty as "unknown"
|
||
/// avoids gratuitous splits while still catching real divergence.
|
||
/// * **`module_path`** — Rust-only. Agreed when both sides are `Some`.
|
||
/// A missing module path on one side is legacy-compatible; two *distinct*
|
||
/// `Some` values mean the two summaries belong to different crates'
|
||
/// module trees.
|
||
pub(crate) fn summaries_compatible(a: &FuncSummary, b: &FuncSummary) -> bool {
|
||
if a.param_count != b.param_count {
|
||
return false;
|
||
}
|
||
if a.kind != b.kind {
|
||
return false;
|
||
}
|
||
if a.container != b.container {
|
||
return false;
|
||
}
|
||
if !a.file_path.is_empty() && !b.file_path.is_empty() && a.file_path != b.file_path {
|
||
return false;
|
||
}
|
||
if !a.param_names.is_empty() && !b.param_names.is_empty() && a.param_names != b.param_names {
|
||
return false;
|
||
}
|
||
match (&a.module_path, &b.module_path) {
|
||
(Some(l), Some(r)) if l != r => return false,
|
||
_ => {}
|
||
}
|
||
true
|
||
}
|
||
|
||
/// Derive a deterministic synthetic disambiguator from the
|
||
/// identity-relevant fields of a `FuncSummary`.
|
||
///
|
||
/// The top bit is **not** set here — the caller composes the final value
|
||
/// via `SYNTHETIC_DISAMBIG_BIT | (hash & !SYNTHETIC_DISAMBIG_BIT)` so that
|
||
/// (a) the caller can safely bump the low bits to probe for a free slot,
|
||
/// and (b) the synthetic namespace stays disjoint from byte-offset
|
||
/// disambigs produced by `cfg.rs`.
|
||
pub(crate) fn synthesize_disambig(summary: &FuncSummary) -> u32 {
|
||
let mut h = std::collections::hash_map::DefaultHasher::new();
|
||
summary.param_count.hash(&mut h);
|
||
summary.param_names.hash(&mut h);
|
||
summary.container.hash(&mut h);
|
||
summary.kind.hash(&mut h);
|
||
summary.file_path.hash(&mut h);
|
||
summary.source_caps.hash(&mut h);
|
||
summary.sanitizer_caps.hash(&mut h);
|
||
summary.sink_caps.hash(&mut h);
|
||
summary.module_path.hash(&mut h);
|
||
h.finish() as u32
|
||
}
|
||
|
||
/// Return `true` iff the new `SsaFuncSummary` is consistent with the
|
||
/// existing one at the same `FuncKey`.
|
||
///
|
||
/// `SsaFuncSummary` carries no explicit `param_count`; we approximate
|
||
/// it via the maximum parameter index referenced by either summary.
|
||
/// Two summaries are compatible when neither references a parameter
|
||
/// index the other cannot — an upward compatibility check, so a refined
|
||
/// summary that merely adds flows for previously-silent parameters is
|
||
/// still considered compatible.
|
||
fn ssa_summaries_compatible(
|
||
existing: &SsaFuncSummary,
|
||
new: &SsaFuncSummary,
|
||
key_arity: Option<usize>,
|
||
) -> bool {
|
||
if !ssa_summary_fits_arity(existing, key_arity) {
|
||
// Existing entry itself is inconsistent with the key; don't let
|
||
// that inconsistency mask a real collision with the new entry.
|
||
return false;
|
||
}
|
||
if !ssa_summary_fits_arity(new, key_arity) {
|
||
return false;
|
||
}
|
||
true
|
||
}
|
||
|
||
/// Every parameter index referenced by `summary` must fit inside
|
||
/// `key_arity` when it is known. `None` (unknown arity) accepts any
|
||
/// index.
|
||
fn ssa_summary_fits_arity(summary: &SsaFuncSummary, key_arity: Option<usize>) -> bool {
|
||
let arity = match key_arity {
|
||
Some(a) => a,
|
||
None => return true,
|
||
};
|
||
let refs = summary
|
||
.param_to_return
|
||
.iter()
|
||
.map(|(i, _)| *i)
|
||
.chain(summary.param_to_sink.iter().map(|(i, _)| *i))
|
||
.chain(summary.param_to_sink_param.iter().map(|(i, _, _)| *i))
|
||
.chain(summary.param_container_to_return.iter().copied())
|
||
.chain(
|
||
summary
|
||
.param_to_container_store
|
||
.iter()
|
||
.flat_map(|(a, b)| [*a, *b]),
|
||
)
|
||
.chain(summary.source_to_callback.iter().map(|(i, _)| *i))
|
||
.chain(summary.abstract_transfer.iter().map(|(i, _)| *i))
|
||
.chain(summary.param_return_paths.iter().map(|(i, _)| *i));
|
||
for i in refs {
|
||
if i >= arity {
|
||
return false;
|
||
}
|
||
}
|
||
// Every parameter referenced by a points-to edge must also fit the
|
||
// key's arity. An overflow-flagged summary is conservative by
|
||
// construction and can be kept as-is.
|
||
if let Some(max) = summary.points_to.max_param_index()
|
||
&& (max as usize) >= arity
|
||
{
|
||
return false;
|
||
}
|
||
true
|
||
}
|
||
|
||
/// Derive a deterministic synthetic disambiguator for an
|
||
/// `SsaFuncSummary`. Mirrors `synthesize_disambig` but restricted to
|
||
/// SSA-level structural signals.
|
||
fn synthesize_ssa_disambig(summary: &SsaFuncSummary) -> u32 {
|
||
let mut h = std::collections::hash_map::DefaultHasher::new();
|
||
summary.param_to_return.len().hash(&mut h);
|
||
summary.param_to_sink.len().hash(&mut h);
|
||
summary.source_caps.bits().hash(&mut h);
|
||
summary.param_to_sink_param.len().hash(&mut h);
|
||
summary.param_container_to_return.len().hash(&mut h);
|
||
summary.param_to_container_store.len().hash(&mut h);
|
||
summary.receiver_to_sink.bits().hash(&mut h);
|
||
summary.receiver_to_return.is_some().hash(&mut h);
|
||
summary.return_type.is_some().hash(&mut h);
|
||
summary.return_abstract.is_some().hash(&mut h);
|
||
summary.source_to_callback.len().hash(&mut h);
|
||
summary.abstract_transfer.len().hash(&mut h);
|
||
summary.param_return_paths.len().hash(&mut h);
|
||
summary.points_to.edges.len().hash(&mut h);
|
||
summary.points_to.overflow.hash(&mut h);
|
||
summary.points_to.returns_fresh_alloc.hash(&mut h);
|
||
h.finish() as u32
|
||
}
|
||
|
||
/// Merge a set of per‑file summaries into a single `GlobalSummaries` map.
|
||
///
|
||
/// Merging only happens for exact `FuncKey` matches (same lang + namespace +
|
||
/// name + arity). Functions with the same bare name but different languages
|
||
/// or namespaces are stored separately.
|
||
pub fn merge_summaries(
|
||
per_file: impl IntoIterator<Item = FuncSummary>,
|
||
scan_root: Option<&str>,
|
||
) -> GlobalSummaries {
|
||
let mut map = GlobalSummaries::new();
|
||
|
||
for fs in per_file {
|
||
let key = fs.func_key(scan_root);
|
||
map.insert(key, fs);
|
||
}
|
||
|
||
map
|
||
}
|
||
|
||
#[cfg(test)]
|
||
mod tests;
|