nyx/README.md
Eli Peter 3c21efba75
Added experimental control flow analysis and syntax classification for rust lang (#22)
* Introduce control flow graph (CFG) support:

- Added `cfg.rs` with CFG generation and analysis utilities.
- Integrated `petgraph` library for graph-based computations.
- Updated `ast.rs` to utilize CFG for function analysis.
- Modified `Cargo.toml` and `Cargo.lock` to include new dependencies.
- Improved static analysis with taint tracking through CFG paths.

* feat: enhance control flow analysis with taint tracking and node labeling

* feat: improve control flow graph with enhanced node handling and new tests

* Remove unnecessary reference marker in `byte_offset_to_point` comment.

* Remove unnecessary reference marker in `byte_offset_to_point` comment.

* Refactor `ast.rs` for performance and clarity; enhance `cfg.rs` with recursive CFG generation and improved classification logic for AST analysis.

* Refactor CFG and taint tracking logic:

- Enhanced `cfg.rs` with inline helper function `text_of` for cleaner UTF-8 handling in AST nodes.
- Expanded `labels.rs` rules with detailed `Sources`, `Sanitizers`, and `Sinks` for improved classification.
- Refined `push_node` to handle method call expressions with object-function pairing.
- Simplified code handling in trivia skipping and debug-only logic.

* Enhance `cfg.rs` with `first_call_ident` helper and improve identifier extraction logic in `push_node`.

* Add targeted CFG taint-tracking tests to enhance analysis coverage.

* Enhance CFG generation with loop expression handling and improve taint tracking logic. Add new sanitization example in `examples/sanitize/example.rs`.

* Update README with installation instructions for Cargo and GitHub releases.

* Expand taint-tracking with precise `def-use` computation and enhance `labels.rs` for detailed classification. Extend `examples/sanitize` with realistic scenarios demonstrating new rules.

* Refactor `labels.rs`:

- Removed redundant `LabelRule` entries for cleaner rule definitions.
- Adjusted matching logic to prioritize suffix and prefix matches effectively.

* Refactor `labels.rs`:

- Removed redundant `LabelRule` entries for cleaner rule definitions.
- Adjusted matching logic to prioritize suffix and prefix matches effectively.

* Add test for taint tracking with multiple sources in `cfg.rs`.

* Add `function_summaries` table and implement summary upsert/load methods. Refactor to handle summary storage and retrieval efficiently, with placeholder clean/drop logic.

* refactor: split `labels.rs` into modular structure with language-specific files

* refactor: split `labels.rs` into modular structure with language-specific files

* refactor: clean up SQL table definitions in `database.rs` for better readability

* refactor: simplify CFG structure by removing lifetime parameters and enhancing taint metadata handling

* refactor: update TODO comments in `cfg.rs` to clarify future enhancements for cap labels and function details

* refactor: remove redundant header from README.md for improved clarity

* feat: add PHF-based syntax classifiers and Kind enum for efficient syntax mapping across languages

* feat: introduce analysis modes for enhanced scanner configuration and diagnostics

* feat: define Kind enum for syntax classification in control flow analysis

* feat: bump version to 0.2.0-alpha and update CHANGELOG for new features and fixes

* refactor: clean up imports and formatting in AST and CFG modules for improved readability

* refactor: simplify function signatures and improve code readability in CFG and module files

* fix: correct rayon_thread_stack_size comment to reflect actual value of 8 MiB

* refactor: update string formatting in clean and project modules for consistency

* refactor: fix indentation in clean.rs for improved readability

---------

Co-authored-by: elipeter <eli.peter@es.fcm.travel>
2025-06-28 17:36:14 +02:00

217 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<div align="center">
<img src="assets/logo.png" alt="nyx logo" width="300"/>
**Fast, cross-language cli vulnerability scanner.**
[![crates.io](https://img.shields.io/crates/v/nyx-scanner.svg)](https://crates.io/crates/nyx-scanner)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Rust 1.85+](https://img.shields.io/badge/rust-1.85%2B-orange)](https://www.rust-lang.org)
[![CI](https://img.shields.io/github/actions/workflow/status/ecpeter23/nyx/ci.yml?branch=master)](https://github.com/ecpeter23/nyx/actions)
</div>
---
## What is Nyx?
**Nyx** is a lightweight lightning-fast Rustnative commandline tool that detects potentially dangerous code patterns across several programming languages. It combines the accuracy of [`treesitter`](https://tree-sitter.github.io/) parsing with a curated rule set and an optional SQLitebacked index to deliver fast, repeatable scans on projects of any size.
> **Project status Alpha**
> Nyx is under active development. The public interface, rule set, and output formats may change without notice while we stabilise the core. The new CFG + taint engine is experimental and Rust-only for now please report any crashes or false-positives. Pin exact versions in production environments
---
## Key Capabilities
| Capability | Description |
|------------------------------|-------------------------------------------------------------------------------------------|
| Multilanguage support | Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript |
| ASTlevel pattern matching | Languagespecific queries written against precise parse trees |
| Incremental indexing | SQLite database stores file hashes and previous findings to skip unchanged files |
| Parallel execution | File walking and rule execution run concurrently; defaults scale with available CPU cores |
| Configurable scan parameters | Exclude directories, set maximum file size, tune worker threads, limit output, and more |
| Multiple output formats | Humanreadable console view (default) and machinereadable JSON / CSV / SARIF (roadmap) |
---
## Why choose Nyx?
| Advantage | What it means for you |
|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Pure-Rust, single binary** | No JVM, Python, or server to install; drop the `nyx` executable into your `$PATH` and go. |
| **Massively parallel** | Uses Rayon and a thread-pool walker; scales to all CPU cores. Example: scanning the entire **rust-lang/rust** codebase (~53,000 files) on an M2 MacBook Pro takes **≈ 1 s**. |
| **Index-aware** | An optional SQLite index stores file hashes and findings, subsequent scans touch *only* changed files, slashing CI times. |
| **Offline & privacy-friendly** | Requires no login, cloud account, or telemetry. Perfect for air-gapped environments and strict compliance policies. |
| **Tree-sitter precision** | Parses real language grammars, not regexes, giving far fewer false positives than line-based scanners. |
| **Extensible** | Add new patterns with concise `tree-sitter` queries; no SaaS lock-in. |
---
## Installation
### Install crate
```bash
$ cargo install nyx-scanner
```
### Install Github release
1. Navigate to the [Releases](https://github.com/ecpeter23/nyx/releases) page of the repository.
2. Download the appropriate binary for your system:
```nyx-x86_64-unknown-linux-gnu.zip``` for Linux
```nyx-x86_64-pc-windows-msvc.zip``` for Windows
```nyx-x86_64-apple-darwin.zip``` or ```nyx-aarch64-apple-darwin.zip``` for macOS (Intel or Apple Silicon)
3. Unzip the file and move the executable to a directory in your system PATH:
```bash
# Example for Unix systems
unzip nyx-x86_64-unknown-linux-gnu.zip
chmod +x nyx
sudo mv nyx /usr/local/bin/
```
```bash
# Example for Windows in PowerShell
Expand-Archive -Path nyx-x86_64-pc-windows-msvc.zip -DestinationPath .
Move-Item -Path .\nyx.exe -Destination "C:\Program Files\Nyx\" # Add to PATH manually if needed
```
4. Verify the installation:
```bash
nyx --version
```
### Build from source
```bash
$ git clone https://github.com/ecpeter23/nyx.git
$ cd nyx
$ cargo build --release
# optional copy the binary into PATH
$ cargo install --path .
```
Nyx targets **stable Rust 1.85 or later**.
---
## Quick Start
```bash
# Scan the current directory (creates/uses an index automatically)
$ nyx scan
# Scan a specific path and emit JSON
$ nyx scan ./server --format json
# Perform an adhoc scan without touching the index
$ nyx scan --no-index
# Restrict results to highseverity findings
$ nyx scan --high-only
```
### Index Management
```bash
# Create or rebuild an index
$ nyx index build [PATH] [--force]
# Display index metadata (size, modified date, etc.)
$ nyx index status [PATH]
# List all indexed projects (add -v for detailed view)
$ nyx list [-v]
# Remove a single project or purge all indexes
$ nyx clean <PROJECT_NAME>
$ nyx clean --all
```
---
## Configuration Overview
Nyx merges a default configuration file (`nyx.conf`) with user overrides (`nyx.local`). Both live in the platformspecific configuration directory shown below.
| Platform | Directory |
|---------------|-----------------------------------|
| Linux / macOS | `~/.config/nyx/` |
| Windows | `%APPDATA%\ecpeter23\nyx\config\` |
Minimal example (`nyx.local`):
```toml
[scanner]
min_severity = "Medium"
follow_symlinks = true
excluded_extensions = ["mp3", "mp4"]
[output]
default_format = "json"
max_results = 200
[performance]
worker_threads = 8 # 0 = autodetect
batch_size = 200
channel_multiplier = 2
```
A fully documented `nyx.conf` is generated automatically on first run.
---
## Architecture in Brief
1. **File enumeration** A highly parallel walker applies ignore rules, size limits, and user exclusions.
2. **Parsing** Supported files are parsed into ASTs via the appropriate `treesitter` grammar.
3. **Rule execution** Each language ships with a dedicated rule set expressed as `treesitter` queries. Matches are classified into three severity levels (`High`, `Medium`, `Low`).
4. **Indexing (optional)** File digests and findings are stored in SQLite. Later scans skip files whose content and modification time are unchanged.
5. **Reporting** Results are grouped by file and emitted to the console or serialized in the requested format.
---
## Roadmap
| Area | Planned Improvements |
|-----------------------|-------------------------------------------------------------------------------------------------------|
| More language support | Plans to create rule sets for over 100 languages for maximum coverage |
| Controlflow analysis | Interprocedural function summaries. Cap label propagation & bitflag checks. Loop/branch sensitivity |
| Taint tracking | Intra / interprocedural tracing of untrusted data from sources to sinks |
| Output formats | Full SARIF 2.1.0, JUnit XML, HTML report generator |
| Rule updates | Remote rule feed with signature verification |
| Performance & UX | Incremental CFG cache, progressbar UX, smart filewatch rescan |
Community feedback will help shape priorities; please open an issue to discuss proposed changes.
---
## Experimental Features & Feedback
The new Rust intraprocedural CFG + taint engine is not enabled.
Expect rough edges: slightly slower scans, occasional false positives, limited language coverage.
Please open an issue for every crash, panic, or suspicious result attach the minimal code snippet and mention the Nyx version.
---
## Contributing
Pull requests are welcome. To contribute:
1. Fork the repository and create a feature branch.
2. Adhere to `rustfmt` and ensure `cargo clippy --all -- -D warnings` passes.
3. Add unit and/or integration tests where applicable (`cargo test` should remain green).
4. Submit a concise, welldocumented pull request.
See `CONTRIBUTING.md` for full guidelines.
---
## License
Nyx is licensed under the **GNU General Public License v3.0 (GPL3.0)**.
This ensures that all modified versions of the scanner remain free and open-source, protecting the integrity and transparency of security tools.
See [LICENSE](./LICENSE) for full details.