nyx/examples/sanatize/example.rs

96 lines
4.2 KiB
Rust
Raw Normal View History

Added experimental control flow analysis and syntax classification for rust lang (#22) * Introduce control flow graph (CFG) support: - Added `cfg.rs` with CFG generation and analysis utilities. - Integrated `petgraph` library for graph-based computations. - Updated `ast.rs` to utilize CFG for function analysis. - Modified `Cargo.toml` and `Cargo.lock` to include new dependencies. - Improved static analysis with taint tracking through CFG paths. * feat: enhance control flow analysis with taint tracking and node labeling * feat: improve control flow graph with enhanced node handling and new tests * Remove unnecessary reference marker in `byte_offset_to_point` comment. * Remove unnecessary reference marker in `byte_offset_to_point` comment. * Refactor `ast.rs` for performance and clarity; enhance `cfg.rs` with recursive CFG generation and improved classification logic for AST analysis. * Refactor CFG and taint tracking logic: - Enhanced `cfg.rs` with inline helper function `text_of` for cleaner UTF-8 handling in AST nodes. - Expanded `labels.rs` rules with detailed `Sources`, `Sanitizers`, and `Sinks` for improved classification. - Refined `push_node` to handle method call expressions with object-function pairing. - Simplified code handling in trivia skipping and debug-only logic. * Enhance `cfg.rs` with `first_call_ident` helper and improve identifier extraction logic in `push_node`. * Add targeted CFG taint-tracking tests to enhance analysis coverage. * Enhance CFG generation with loop expression handling and improve taint tracking logic. Add new sanitization example in `examples/sanitize/example.rs`. * Update README with installation instructions for Cargo and GitHub releases. * Expand taint-tracking with precise `def-use` computation and enhance `labels.rs` for detailed classification. Extend `examples/sanitize` with realistic scenarios demonstrating new rules. * Refactor `labels.rs`: - Removed redundant `LabelRule` entries for cleaner rule definitions. - Adjusted matching logic to prioritize suffix and prefix matches effectively. * Refactor `labels.rs`: - Removed redundant `LabelRule` entries for cleaner rule definitions. - Adjusted matching logic to prioritize suffix and prefix matches effectively. * Add test for taint tracking with multiple sources in `cfg.rs`. * Add `function_summaries` table and implement summary upsert/load methods. Refactor to handle summary storage and retrieval efficiently, with placeholder clean/drop logic. * refactor: split `labels.rs` into modular structure with language-specific files * refactor: split `labels.rs` into modular structure with language-specific files * refactor: clean up SQL table definitions in `database.rs` for better readability * refactor: simplify CFG structure by removing lifetime parameters and enhancing taint metadata handling * refactor: update TODO comments in `cfg.rs` to clarify future enhancements for cap labels and function details * refactor: remove redundant header from README.md for improved clarity * feat: add PHF-based syntax classifiers and Kind enum for efficient syntax mapping across languages * feat: introduce analysis modes for enhanced scanner configuration and diagnostics * feat: define Kind enum for syntax classification in control flow analysis * feat: bump version to 0.2.0-alpha and update CHANGELOG for new features and fixes * refactor: clean up imports and formatting in AST and CFG modules for improved readability * refactor: simplify function signatures and improve code readability in CFG and module files * fix: correct rayon_thread_stack_size comment to reflect actual value of 8 MiB * refactor: update string formatting in clean and project modules for consistency * refactor: fix indentation in clean.rs for improved readability --------- Co-authored-by: elipeter <eli.peter@es.fcm.travel>
2025-06-28 17:36:14 +02:00
//! demo.rs — realistic taint-tracking playground
//! `cargo add html-escape shell-escape` before compiling.
use std::{env, process::Command, fs};
#[derive(Default)]
struct UserCtx {
query: String, // potentially tainted
sanitized: String, // should remain clean
}
/// ---------- helper wrappers so we get nice Source / Sink labels ----------
fn source_env(var: &str) -> String {
env::var(var).unwrap_or_default() // Source(env-var)
}
fn source_file(path: &str) -> String {
fs::read_to_string(path).unwrap_or_default() // Source(file-io)
}
fn sink_shell(arg: &str) {
Command::new("sh").arg(arg).status().unwrap(); // Sink(process-spawn)
}
fn sink_html(out: &str) {
println!("{out}"); // Sink(html-out)
}
fn sanitize_html(s: &str) -> String {
html_escape::encode_safe(s) // Sanitizer(html-escape)
}
fn sanitize_shell(s: &str) -> String {
shell_escape::unix::escape(s.into()).into_owned() // Sanitizer(shell-escape)
}
/// ---------- 1. Main demo fuction ----------
fn main() {
// FLOW A ────────────────────────────────────────────────────────────────
// env → sanitized → safe shell
let raw = source_env("USER_CMD");
let clean = sanitize_shell(&raw);
sink_shell(&clean); // EXPECT: SAFE
// FLOW B ────────────────────────────────────────────────────────────────
// env → if-else, only one branch escapes
let arg = source_env("ANOTHER");
if arg.len() > 5 {
sink_shell(&arg); // EXPECT: UNSAFE (branch tainted)
} else {
let escaped = sanitize_shell(&arg);
sink_shell(&escaped); // safe
}
// FLOW C ────────────────────────────────────────────────────────────────
// file → while loop → HTML sanitizer cleared
let mut data = source_file("/tmp/input.txt");
while data.len() < 32 {
data.push('x');
}
let html_ok = sanitize_html(&data);
sink_html(&html_ok); // safe
// FLOW D ────────────────────────────────────────────────────────────────
// file → struct field → match → unsanitised HTML
let mut ctx = UserCtx::default();
ctx.query = source_file("/tmp/q.txt");
// overwrite the clean field; `ctx.sanitized` is *not* tainted
ctx.sanitized = sanitize_html("constant");
match ctx {
UserCtx { query, sanitized } if query.contains("DROP") => {
sink_html(&query); // EXPECT: UNSAFE
}
_ => {
sink_html(&ctx.sanitized); // safe
}
}
// FLOW E ────────────────────────────────────────────────────────────────
// source → function call → reassignment clears taint
let mut name = source_env("USER"); // tainted
greet(&name); // just prints
name = "anonymous".into(); // kills taint
greet(&name); // safe
// FLOW F ────────────────────────────────────────────────────────────────
// Multiple sanitizers, only the *right* one matters
let cmd = source_env("MIXED");
let partly = sanitize_html(&cmd); // wrong sanitizer
sink_shell(&partly); // EXPECT: UNSAFE
}
/// helper (non-sink) function
fn greet(who: &str) {
println!("Hello, {who}");
}