mirror of
https://github.com/elicpeter/nyx.git
synced 2026-06-09 19:45:13 +02:00
* chore: Exclude CLAUDE.md from Cargo.toml * feat: Add configurable analysis rules and CLI commands for custom sanitizers and terminators * feat: Enhance resource management and analysis efficiency - Implemented parallel summary merging in `scan_filesystem` using rayon for improved performance. - Introduced `GlobalSummaries::merge()` for efficient merging of summaries. - Optimized file reading and hashing to eliminate redundant I/O operations. - Added `should_scan_with_hash()` and `upsert_file_with_hash()` methods to streamline file processing. - Enhanced taint analysis with in-place mutations to reduce memory allocations. - Updated resource acquisition patterns to exclude false positives for `freopen` and wrapper functions. * feat: Implement severity downgrade for findings in non-production paths and add source kind inference * feat: Update versioning information in SECURITY.md for new stable line * feat: Update categories in Cargo.toml to include parser-implementations and text-processing * feat: Update dependencies in Cargo.lock for improved compatibility and performance * feat: Update dependencies in Cargo.lock and Cargo.toml for improved compatibility
20 KiB
20 KiB
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.3.0] - 2026-02-25
Added
- Configurable analysis rules -- users can define custom sources, sanitizers, and sinks per language via TOML config (
nyx.local) or the newnyx configCLI. Config rules take priority over built-in rules, so project-specific sanitizers likeescapeHtml()are recognized without code changes. nyx configCLI subcommand with four actions:show-- print effective merged configuration as TOMLpath-- print config directory pathadd-rule --lang <LANG> --matcher <NAME> --kind <KIND> --cap <CAP>-- append a label rule tonyx.localadd-terminator --lang <LANG> --name <NAME>-- append a terminator function tonyx.local
--include-nonprodCLI flag -- by default, findings in non-production paths (tests, vendor, benchmarks, examples, fixtures, build scripts,*.min.js) are now downgraded by one severity tier (High→Medium, Medium→Low). Pass--include-nonprodto restore original severity. Controlled byscanner.include_nonprodconfig key.SourceKindenum in the taint engine -- taint findings now carry asource_kindfield (UserInput,EnvironmentConfig,FileSystem,Database,Unknown) inferred from the source callee name and capabilities. Severity is based on source kind rather than hardcoded to High: filesystem and database sources produce Medium, user input and environment sources produce High.- Configurable terminators -- functions like
process.exit()can be declared as terminators per language; the CFG treats them as dead ends, preventing false positives on code after termination calls. - Event handler callback suppression -- functions passed as arguments to configured event handler calls (e.g.
addEventListener) are no longer flagged as unreachable code. - Exec-path guard rules -- calls to
which,resolve_binary,find_program,lookup_path, andshutil.whichare recognized as guards forSHELL_ESCAPEsinks. If such a guard dominates a shell-exec sink, thecfg-unguarded-sinkfinding is suppressed. - One-hop constant binding trace -- the constant-arg sink suppression now traces one hop through the CFG. If a sink's variable was defined by a node with no uses and no Source label, it is treated as constant. Fixes false positives on patterns like
cmd = "git"; subprocess.run([cmd, "status"]). - Evidence-based severity in cfg-only mode -- when taint analysis is not active (no global summaries and no taint findings), structural
cfg-unguarded-sinkfindings without source-derived evidence are downgraded from Medium to Low. - FileResponse ownership transfer -- file handles passed to consuming sinks (
FileResponse,StreamingHttpResponse,send_file,make_response) are no longer flagged as resource leaks. - Lock-not-released refinement -- mutex findings now require an explicit
.acquire()or.lock()call on the acquired variable. Constructor-only patterns likelock = threading.Lock()without acquire no longer producecfg-lock-not-released. - Python
connect/cursorexclusions --signal.connect,event.connect, and.registerare excluded from the Python db-connection acquire pattern, preventing falsecfg-resource-leakfindings on Django signal handlers and event registrations. location.hrefsink rules for JavaScript --location.href,window.location.href, anddocument.location.hrefassignments are classified asSink(URL_ENCODE).throw_statementas terminator in JavaScript --thrownow terminates the current block in the CFG (mapped toKind::Return), preventing falsecfg-error-fallthroughfindings after throw statements.Cap::FMT_STRINGcapability bit -- new bitflag (0b0100_0000) for format-string vulnerabilities, distinct from HTML injection. Sources usingCap::all()automatically match.- Python taint sources --
open,argparse.parse_args,urllib.request.urlopen,requests.get,requests.postadded asCap::all()sources for broader attack-surface coverage. - SARIF 2.1.0 output format (
-f sarif) -- produces spec-compliant Static Analysis Results Interchange Format JSON on stdout. Includes tool metadata, deduplicated rule definitions with descriptions, severity-to-level mapping (High→error,Medium→warning,Low→note), and physical locations with relative paths. Suitable for GitHub Code Scanning, Azure DevOps, and other SARIF-consuming CI tools. - Progress bars via
indicatif-- file discovery, Pass 1, and Pass 2 each display a progress bar on stderr with file counts and ETA. Bars are automatically hidden when output format isjson/sarifor quiet mode is enabled. Index building also shows progress. - Quiet mode (
output.quiet = true) -- suppresses all status messages (config notes, "Checking...", "Finished in...") on stderr. Useful for CI pipelines and scripted invocations. - Resource leak detection for Python, Ruby, PHP, JavaScript, and TypeScript -- new acquire/release pairs: Python (
open/.close,socket/.close,connect/.close,threading.Lock/.release), Ruby (File.open/.close,TCPSocket.new/.close,.lock/.unlock), PHP (fopen/fclose,mysqli_connect/mysqli_close,curl_init/curl_close), JS/TS (fs.open/fs.close,createReadStream/.close). - Walker config wired up --
performance.max_depth,scanner.one_file_system,scanner.require_git_to_read_vcsignore, andscanner.excluded_filesare now enforced during directory walking (previously parsed but ignored). database.vacuum_on_startup-- when enabled, runs SQLite VACUUM before indexed scans to reclaim space.- 31 new unit tests covering config round-trip, rule merging, classify extension, href classification, throw termination, terminator detection, config sanitizer suppression, Python/C++ precision, unreachable+unguarded dedup, resource leak detection, one-hop constant binding, exec-path guards, cfg-only severity downgrade, FileResponse ownership, lock constructor suppression, signal.connect exclusion, nonprod path detection, and severity downgrade.
Changed
taint::Findingstruct -- addedsource_kind: SourceKindfield. Code that constructsFindingdirectly must include this field.AnalysisContextstruct -- addedtaint_active: boolandanalysis_rulesfields. Code that constructsAnalysisContextdirectly must include these fields.ScannerConfigstruct -- addedinclude_nonprod: boolfield (defaultfalse). Deserialization is unaffected due to#[serde(default)].proto_pollutionAST pattern severity -- downgraded from High to Low. The AST-only pattern is a structural indicator; the taint engine separately produces High findings when attacker-controlled data flows to__proto__.location_href_assignmentAST pattern -- constrained to require a known browser global object (window,location,document,self,top,parent,frames). Preventsel.href = valfrom matching; onlywindow.location.href = valand similar patterns trigger the finding.- Taint finding severity -- no longer hardcoded to High. Severity is now derived from
SourceKind: UserInput/EnvironmentConfig/Unknown → High, FileSystem/Database → Medium. - C/C++ sink reclassification --
printf/fprintfmoved fromSink(HTML_ESCAPE)toSink(FMT_STRING).std::cout,std::cerr,std::clogremoved from sinks entirely (output/logging, not injection vectors).sprintf/strcpy/strcatremainSink(HTML_ESCAPE). classify()now accepts an optionalextra: Option<&[RuntimeLabelRule]>parameter; config-defined rules are checked first (higher priority) before built-in static rules.build_cfg(),build_sub(), andpush_node()accept optionalLangAnalysisRulesfor config-driven label classification, terminator detection, and event handler awareness.find_guard_nodes()andis_guard_call()now recognize config-defined sanitizers as guards with matching capability bits.merge_configs()union-merges analysis rules, terminators, and event handlers per language key with dedup.- Assignment LHS classification now tries the full member expression text (e.g.
location.href) before falling back to property-only (e.g.innerHTML), fixing false positives ona.hrefassignments. handle_command()now receivesconfig_dirto support theconfigsubcommand.- Fused single-pass analysis -- AST-only mode now runs a single fused pass (
analyse_file_fused) that parses each file and builds the CFG once, producing both function summaries and diagnostics. Previously every file was parsed twice (once for summary extraction, once for analysis). Taint mode uses the fused pass for Pass 1, eliminating redundant CFG construction during summary extraction. - O(N²) → O(N) function-level dataflow sweep in CFG builder -- the light-weight dataflow sweep and return-node wiring in
build_subforKind::Functionnow iterate only over nodes created within the current function scope (tracked via a snapshot of the node count) instead of scanning the entire graph. Eliminates quadratic scaling in files with many functions. - Parallel summary merging --
scan_filesystemnow uses rayonfold/reduceto build per-threadGlobalSummariesmaps in parallel, then merges them in a binary reduce tree. Eliminates the serialmerge_summariesbottleneck. AddedGlobalSummaries::merge(). - Redundant file I/O eliminated in indexed path -- files are now read once and hashed once per scan. Added
Indexer::should_scan_with_hash()andIndexer::upsert_file_with_hash()to accept pre-computed hashes. Pass 2 usesrun_rules_on_byteswith already-read bytes instead of re-reading from disk. Previously files could be read up to 4 times and hashed up to 3 times per indexed scan. - SQLite mutex mode relaxed -- switched from
SQLITE_OPEN_FULL_MUTEX(global serialization) toSQLITE_OPEN_NO_MUTEX. The r2d2 connection pool guarantees one-connection-per-thread safety; combined with WAL mode this allows concurrent readers without a global lock. - Parallel JSON deserialization in
load_all_summaries-- for large result sets (>256 summaries), JSON deserialization is now parallelized with rayon. - Zero-allocation taint hashing --
taint_hash()replaced sorted-Vec+ blake3 with an order-independent XOR-of-FNV scheme. Eliminates a heap allocation and sort per BFS edge in the taint engine. - In-place taint transfer --
apply_taint()now mutates the taint map in place instead of cloning and returning a newHashMapper node visit. The BFS loop caches hash values and usesstd::mem::takefor the last successor to avoid unnecessary clones.
Fixed
- False positives on one-hop constant bindings --
cmd = "git"; Command::new(cmd)no longer triggerscfg-unguarded-sinkbecause the variable is traced back to a constant definition. - False positives from exec-path guards --
resolve_binary(&bin); Command::new(bin)is now recognized as guarded. - False
cfg-resource-leakon Django signal handlers --signal.connect(handler)no longer matches the Python db-connection acquire pattern. - False
cfg-lock-not-releasedon Lock constructors --threading.Lock()without.acquire()no longer produces a finding. - False
cfg-resource-leakon FileResponse --f = open(...); return FileResponse(f)is recognized as ownership transfer. - Inflated severity in cfg-only mode -- structural findings without taint evidence now correctly produce Low severity instead of Medium.
el.href = valfalse positive in AST patterns -- thelocation_href_assignmentpattern now requires a known browser global, eliminating matches on DOM element.hrefassignments.- Structured output modes (
-f json,-f sarif) now produce zero stderr noise -- config notes, "Checking …", and "Finished in …" messages are fully suppressed (not just redirected to stderr) so thatnyx scan -f json | jqand CI SARIF upload work without extraneous output. Human-readable console format continues to show status messages. - Console output column alignment -- severity tags are now bracketed and padded to a fixed display width (
[HIGH],[MEDIUM],[LOW]) so that rule IDs align consistently regardless of severity. ANSI color codes are applied after width calculation, not before. .hreffalse positives --el.href = "/about"no longer triggerslocation_href_assignmentor sink classification; onlylocation.href(andwindow.location.href,document.location.href) match.- Constant-arg sink false positives -- sinks whose arguments are all constants (no variable uses beyond the callee name) with no taint confirmation are now suppressed. Fixes false positives on patterns like
subprocess.run(["make","clean"])andprintf("hello\n"). - Unreachable + unguarded dedup -- when both
cfg-unreachable-sinkandcfg-unguarded-sinkfire on the same span, the unguarded finding is suppressed (unreachable is more specific). std::coutfalse positives --std::coutno longer classified as a sink, eliminating spurious findings on every C++ iostream print.- Break/continue scope correctness --
breakandcontinueinside loops now correctly wire to their enclosing loop header/exit. Previously,breakin awhile/forbody created a dead-end node that left post-loop code unreachable, producing falsecfg-unreachable-*findings. The If handler's no-else case also now correctly flows the false branch to subsequent code when the then-branch terminates (return/break/continue). True/False edge labels are applied to branch entry nodes rather than exit nodes, fixingcfg-error-fallthroughfalse positives onif (err) { return; }patterns. - Preprocessor dangling-else CFG recovery --
#ifdef/#endifblocks that split anif/elseacross preprocessor boundaries no longer orphan subsequent code. The CFG block handler now recovers the frontier after preprocessor nodes, preventing false unreachable-code findings on code following#ifdef ... #endifblocks. - Wrapper resource function recognition --
curlx_fopen,curlx_fdopen,fdopen, andcurlx_fcloseare now recognized as acquire/release functions for C file handles, eliminating falsecfg-resource-leakfindings on codebases (e.g. curl) that use wrapper functions around standard I/O. freopenfalse positive --freopen()(andcurlx_freopen) no longer triggerscfg-resource-leakfindings. Previouslyfreopenmatched thefopenacquire pattern viaends_with; a newexclude_acquirefield onResourcePairfilters out these false matches for both the file handle and file descriptor resource pairs.- Struct field ownership transfer -- resource leak detection now recognizes ownership transfer via struct field assignment (
s->stream = fp,obj.field = ptr). When an acquired resource is stored into a struct field downstream, the finding is suppressed since the receiving struct assumes lifetime responsibility. - Linked-list/global insertion -- resource leak detection now recognizes linked-list insertion patterns (
p->next = list; list = p) and global variable assignment as ownership transfers, eliminating falsecfg-resource-leakfindings on common C allocation-and-insert idioms. - Removed incorrect
value_enumattribute from CLI--formatargument. - Benchmark compilation error:
classify()calls inbenches/scan_bench.rswere missing the thirdextraparameter.
[0.2.0] - 2026-02-24
Added
- Cross-file taint analysis -- two-pass architecture: Pass 1 extracts
FuncSummaryper function (source/sanitizer/sink capabilities, taint propagation, callees), Pass 2 runs BFS taint propagation with cross-file callee resolution. - CFG analysis engine with five detectors: unguarded sinks (
cfg-unguarded-sink), auth gaps in web handlers (cfg-auth-gap), unreachable security code (cfg-unreachable-*), error fallthrough (cfg-error-fallthrough), and resource leaks (cfg-resource-leak). - Cross-language interop -- taint flows across language boundaries via explicit
InteropEdgestructs without false-positive name collisions. - Function summaries persisted to SQLite (
function_summariestable) with arity, parameter names, capability bitflags, and callee lists. - Multi-language CFG + taint support -- all 10 languages (Rust, C, C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript) now have
KINDSmaps,RULES, andPARAM_CONFIGfor full CFG construction and taint analysis. - Resource leak detection for C/C++ (malloc/free, fopen/fclose), Go (os.Open/Close, Lock/Unlock), Rust (alloc/dealloc), and Java (streams, connections).
- Finding scoring system -- numeric scores based on severity, proximity to entry point, path complexity, taint confirmation, and confidence multiplier.
- Analysis modes --
Full(default),Ast(--ast-only), andTaint(--cfg-only) selectable via CLI flags orscanner.modeconfig. GlobalSummarieswith conservative merge: union caps, OR booleans, union param/callee lists on name collisions across files.- Performance optimizations --
_from_bytesvariants to read-once/hash-once, lock-free rayon parallelism, SQLite WAL + 8 MB cache + 256 MB mmap. - Tracing instrumentation --
tracingspans on all pipeline phases (walk, pass1, merge, pass2, per-file ops, db_init). - Benchmark suite -- criterion benchmarks in
benches/scan_bench.rswith fixtures. - 107 unit tests covering taint propagation, cross-file resolution, cross-language interop, CFG analysis, and summaries.
Changed
- Bumped all dependencies to latest compatible versions.
Capbitflags expanded:ENV_VAR,HTML_ESCAPE,SHELL_ESCAPE,URL_ENCODE,JSON_PARSE,FILE_IO.classify()in labels uses zero-allocation byte-level case-insensitive comparisons.- Indexed scans now always re-analyze all files in Pass 2 when taint is enabled (conservative: global summaries may have changed even if a file didn't).
Fixed
- Clippy
ptr_arglint in perf tests (&PathBuf->&Path).
[0.2.0-alpha] - 2025-06-28
Added
- Experimental intra‑procedural CFG + taint analysis for Rust. Nyx now builds a control‑flow graph, applies data‑flow rules, and flags unsanitised Source → Sink paths (e.g. env::var → Command::new).
- O(1) node‑kind lookup via per‑language PHF tables for zero‑cost dispatch.
- Six unit tests covering conditionals, loops, sanitizers, and multiple sources.
- Debug channel target=cfg (use RUST_LOG=nyx::cfg=debug) to inspect generated graphs.
Fixed
- Fixed a bug in the release pipeline where Windows was trying to call the zip, PowerShell doesn't have a zip command
[0.1.1-alpha] - 2025-06-25
Fixed
- Fixed a bug where the
scan --no-indexcommand would not respect themax_resultsconfig setting (#1)
Added
- Integration tests covering indexing and scanning pipelines (#3, #4, #5, #8)
[0.1.0-alpha] - 2025-06-25
Added
- Initial alpha release of Nyx CLI tool
- Multi-language AST pattern scanning via
tree-sitterfor Rust, C/C++, Java, Go, PHP, Python, Ruby, TypeScript, JavaScript scancommand: filesystem walker, pattern execution, console outputindexcommand: build, rebuild, and status reporting of SQLite-backed indexlistcommand: list indexed projects with optional verbositycleancommand: remove one or all project indexes- Configuration system with
nyx.conf(generated) andnyx.local(user overrides) - Default severity levels: High, Medium, Low
- Unit tests for core modules (config, ext, project utils)