nyx/src/labels/rust.rs
Eli Peter 3c21efba75
Added experimental control flow analysis and syntax classification for rust lang (#22)
* Introduce control flow graph (CFG) support:

- Added `cfg.rs` with CFG generation and analysis utilities.
- Integrated `petgraph` library for graph-based computations.
- Updated `ast.rs` to utilize CFG for function analysis.
- Modified `Cargo.toml` and `Cargo.lock` to include new dependencies.
- Improved static analysis with taint tracking through CFG paths.

* feat: enhance control flow analysis with taint tracking and node labeling

* feat: improve control flow graph with enhanced node handling and new tests

* Remove unnecessary reference marker in `byte_offset_to_point` comment.

* Remove unnecessary reference marker in `byte_offset_to_point` comment.

* Refactor `ast.rs` for performance and clarity; enhance `cfg.rs` with recursive CFG generation and improved classification logic for AST analysis.

* Refactor CFG and taint tracking logic:

- Enhanced `cfg.rs` with inline helper function `text_of` for cleaner UTF-8 handling in AST nodes.
- Expanded `labels.rs` rules with detailed `Sources`, `Sanitizers`, and `Sinks` for improved classification.
- Refined `push_node` to handle method call expressions with object-function pairing.
- Simplified code handling in trivia skipping and debug-only logic.

* Enhance `cfg.rs` with `first_call_ident` helper and improve identifier extraction logic in `push_node`.

* Add targeted CFG taint-tracking tests to enhance analysis coverage.

* Enhance CFG generation with loop expression handling and improve taint tracking logic. Add new sanitization example in `examples/sanitize/example.rs`.

* Update README with installation instructions for Cargo and GitHub releases.

* Expand taint-tracking with precise `def-use` computation and enhance `labels.rs` for detailed classification. Extend `examples/sanitize` with realistic scenarios demonstrating new rules.

* Refactor `labels.rs`:

- Removed redundant `LabelRule` entries for cleaner rule definitions.
- Adjusted matching logic to prioritize suffix and prefix matches effectively.

* Refactor `labels.rs`:

- Removed redundant `LabelRule` entries for cleaner rule definitions.
- Adjusted matching logic to prioritize suffix and prefix matches effectively.

* Add test for taint tracking with multiple sources in `cfg.rs`.

* Add `function_summaries` table and implement summary upsert/load methods. Refactor to handle summary storage and retrieval efficiently, with placeholder clean/drop logic.

* refactor: split `labels.rs` into modular structure with language-specific files

* refactor: split `labels.rs` into modular structure with language-specific files

* refactor: clean up SQL table definitions in `database.rs` for better readability

* refactor: simplify CFG structure by removing lifetime parameters and enhancing taint metadata handling

* refactor: update TODO comments in `cfg.rs` to clarify future enhancements for cap labels and function details

* refactor: remove redundant header from README.md for improved clarity

* feat: add PHF-based syntax classifiers and Kind enum for efficient syntax mapping across languages

* feat: introduce analysis modes for enhanced scanner configuration and diagnostics

* feat: define Kind enum for syntax classification in control flow analysis

* feat: bump version to 0.2.0-alpha and update CHANGELOG for new features and fixes

* refactor: clean up imports and formatting in AST and CFG modules for improved readability

* refactor: simplify function signatures and improve code readability in CFG and module files

* fix: correct rayon_thread_stack_size comment to reflect actual value of 8 MiB

* refactor: update string formatting in clean and project modules for consistency

* refactor: fix indentation in clean.rs for improved readability

---------

Co-authored-by: elipeter <eli.peter@es.fcm.travel>
2025-06-28 17:36:14 +02:00

72 lines
2.5 KiB
Rust

use crate::labels::{Cap, DataLabel, Kind, LabelRule};
use phf::{Map, phf_map};
pub static RULES: &[LabelRule] = &[
// ─────────── Sources ───────────
LabelRule {
matchers: &["std::env::var", "env::var"],
label: DataLabel::Source(Cap::all()),
},
// ───────── Sanitizers ──────────
// `fn sanitize_*(&str) -> String`
LabelRule {
matchers: &["html_escape::encode_safe", "sanitize_", "sanitize_html"],
label: DataLabel::Sanitizer(Cap::HTML_ESCAPE),
},
LabelRule {
matchers: &["shell_escape::unix::escape"],
label: DataLabel::Sanitizer(Cap::SHELL_ESCAPE),
},
// ─────────── Sinks ─────────────
// All the key points where untrusted strings reach the OS shell.
LabelRule {
matchers: &[
"command::new",
"std::process::command::new",
"command::arg",
"command::args",
"command::status",
"command::output",
],
label: DataLabel::Sink(Cap::SHELL_ESCAPE),
},
];
pub static KINDS: Map<&'static str, Kind> = phf_map! {
// control-flow
"if_expression" => Kind::If,
"loop_expression" => Kind::InfiniteLoop,
"loop_statement" => Kind::LoopBody,
"while_statement" => Kind::While,
"for_statement" => Kind::For,
"return_statement" => Kind::Return,
"break_expression" => Kind::Break,
"break_statement" => Kind::Break,
"continue_expression" => Kind::Continue,
"continue_statement" => Kind::Continue,
// structure
"source_file" => Kind::SourceFile,
"block" => Kind::Block,
"function_item" => Kind::Function,
// data-flow
"call_expression" => Kind::CallFn,
"method_call_expression" => Kind::CallMethod,
"macro_invocation" => Kind::CallMacro,
"let_declaration" => Kind::CallWrapper,
"expression_statement" => Kind::CallWrapper,
"assignment_expression" => Kind::Assignment,
// trivia
"line_comment" => Kind::Trivia,
"block_comment" => Kind::Trivia,
";" => Kind::Trivia, "," => Kind::Trivia,
"(" => Kind::Trivia, ")" => Kind::Trivia,
"{" => Kind::Trivia, "}" => Kind::Trivia, "\n" => Kind::Trivia,
"use_declaration" => Kind::Trivia,
"attribute_item" => Kind::Trivia,
"mod_item" => Kind::Trivia,
"type_item" => Kind::Trivia,
};