Release/0.5.0 (#35)

* feat: Introduce function-scoped variable interning for state analysis with new tests and fixtures * feat: Add Phase 26 symbolic execution enhancements with bitwise operator support, abstract interpretation refinements, and new taint analysis tests * feat: Refine state analysis to handle factory-pattern resource returns with mixed-path tests and leak detection enhancements * feat: Add Phase 27 debug views with symbolic execution, abstract interpretation, SSA, and call graph viewers; integrate with debug layout and styles * feat: Add Phase 31 type-qualified symbolic resolution with receiver-based callee disambiguation and testing * feat: Extend symbolic execution with state iteration, enhanced debug views, and debounced input handling * feat: Add Phase 13 resource and auth pattern extensions with new tests and fixtures * feat: Introduce CFG debug graph renderer with compact mode, toolbar, and DAG layout integration * feat: Add Phase 28 encoding and decoding transform modeling with structural symex enhancements and new taint analysis tests * feat: Extend abstract interpretation with type facts and constant value tracking in debug views and server logic * feat: Add linear path handling and witness extraction to symbolic execution with Phase 28 transform mismatch detection * feat: Refine Go auth and sanitizer handling with enhanced rules, state updates, and benchmark improvements * feat: Enable auth-state analysis by default and update relevant tests in benchmark config * test: Update state_tests to reflect default enablement of auth-state analysis and add auth suppression test * docs: update CHANGELOG.md * feat: Introduce per-index taint tracking in `HeapState` with `HeapSlot`, overflow handling, and revised SSA transfers * feat: Introduce C/C++ language labels and refine heap state tracking in SSA transfers * feat: Implement per-index array slot tracking in symbolic heap with overflow collapse * feat: Add implicit definition handling for uninitialized declarations in SSA value allocation * feat: Refactor function parameters and constants for improved clarity and maintainability * refactor: Reorder module imports and improve formatting for consistency * refactor: Fix formatting erorrs * refactor: Fix clippy warnings * refactor: Fix fmt warnings (again) * chore: Update dependencies and improve feature configuration * Add comprehensive tests for undertested modules (#36) (COPILOT) * Add comprehensive tests for undertested modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 * Add comprehensive tests for ext, project, walk, and errors modules Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/f3fc877e-f386-49ba-9793-fc93d3805083 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: Update dependencies and improve feature configuration * fix: formatting errors in new tests * chore: Update license list in about.toml * chore: made functions input inline * chore: updated cfg graph to take up the full page * chore: add Prettier configuration and update code formatting * Add frontend test suite with Vitest (111 tests) (#37) * Add Vitest test suite for frontend - 111 tests across utils, components, hooks, and graph utilities Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/7cf0dba2-ecff-4740-ba4d-92717e74a0b7 * ci: add frontend test step to CI workflow Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/5bc0ac9f-0a32-4d03-9cb7-7a15aea53fca --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * chore: simplify array initialization in test files for consistency * ran typecheck * feat: add AnalysisWorkspace component and integrate it into CfgViewerPage * feat: update routing in AppLayout and improve empty state message in ExplorerPage * feat: enhance scan progress tracking with additional metrics and stages * feat: update license information and add license check script * feat: implement cross-file symbolic execution with callee body persistence * feat: replace dagre graphs with Graphology + ELK + Sigma for more advanced call stack and cfg rendering * feat: ensure CFG function view is scoped to the selected function, preventing bleed into sibling functions * feat: enhance resource tracking with proxy method summaries and improve finding extraction * feat: add terminal function exit detection for accurate resource leak analysis * feat: add warnings for loops and functions without bodies to improve error recovery * feat: update lambda expression handling to ensure proper function classification and control flow * feat: remove bounded formatting/string ops and add JSON.parse sanitizer for improved data handling * feat: add inline return taint analysis and regression tests for improved security checks * feat: add engine version management and migration handling for database schema updates * feat: enhance first_call_ident to skip nested function bodies and add regression tests * feat: enhance callee name resolution with two-segment normalization and disambiguation * feat: add cross-file context flags and debug assertions for taint analysis * feat: refactor taint analysis structure to unify context handling and improve clarity * feat: enhance dead code elimination to preserve Sink, Source, and Sanitizer labels with new tests * docs: updated CHANGELOG.md * fmt: formatting fixes * fix: fixed frontend formatting and lint warnings * fix: optimized ci * fix: optimized ci * Add comprehensive multi-file test coverage to Nyx (#38) * Initial checklist for multi-file test suite expansion Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * Add 12 new multi-file test fixtures with TP/TN/near-miss coverage Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/e550cb88-9767-4442-94d4-101bf5bb0e23 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * deleted root repo * rebuilt to test for regressions --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * feat: enhance import alias resolution and taint tracking * feat: implement security hardening with CSRF protection and path validation * feat: add support for import alias bindings in Python, PHP, and Rust * feat: enhance CFG analysis modes and improve code readability * feat: add detection for parameterized SQL queries to enhance security * feat: add safe internal redirect handling and enhance session destroy validation * feat: implement security improvements by addressing vulnerabilities in execAsync, session management, and file downloads * feat: enhance taint detection by adding support for inline source member expressions in call arguments * feat: implement pre-emission of Source nodes for inline source member expressions in call arguments * feat: add support for Throw statement in control flow and error handling * feat: add debug and echo endpoints with potential information leakage * feat: implement internal redirect suppression and enhance taint detection * feat: implement module alias tracking for dynamic dispatch in JS/TS * feat: add authorization analysis module with Express support * feat: add authorization analysis module with Express support * feat: add tests for admin guard requirements and clean checks in authorization analysis * feat: integrate Koa and Fastify frameworks into authorization analysis * feat: add Flask and Django support to authorization analysis module * feat: add support for Rails and Sinatra frameworks in authorization analysis * feat: add support for Axum, ActixWeb, and Rocket frameworks in authorization analysis * feat: add support for ActixWeb, Axum, and Rocket frameworks in authorization analysis * feat: add support for Rails and Sinatra in authorization analysis * chore: add .DS_Store to .gitignore * refactor: simplify conditional checks and improve readability in multiple files * refactor: update usage of Option methods for improved clarity and consistency * refactor: improve code readability by simplifying conditional checks and formatting * refactor: improve code formatting and readability by simplifying conditional checks * refactor: simplify conditional checks and improve readability in multiple files * refactor: simplify conditional checks in axum.rs for improved readability * feat: add CodeQL analysis configuration for enhanced security scanning * test: add comprehensive tests for `src/output.rs` SARIF builder (#39) * chore: start test coverage improvement work Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * test: add comprehensive tests for src/output.rs SARIF builder Agent-Logs-Url: https://github.com/elicpeter/nyx/sessions/cd7ff398-134e-4728-a5e7-0353a0744423 Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> * refactor: improve code formatting and readability in output.rs --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: elicpeter <54954007+elicpeter@users.noreply.github.com> Co-authored-by: elipeter <elicpeter@gmail.com> * refactor: improve code formatting and readability in output.rs * Potential fix for code scanning alert no. 210: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Potential fix for code scanning alert no. 211: Uncontrolled data used in path expression Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * refactor: enhance triage file path handling with improved error management and validation * refactor: updated func summaries for richer detail * refactor: update SSA summary extraction to use canonical FuncKey for distinct entries * refactor: enhance callee metadata structure to support arity, receiver, and qualifier for better overload resolution * refactor: add support for keyword arguments in function calls and enhance receiver extraction for method-style calls * refactor: implement new Flask routes for safe and unsafe shell command execution * refactor: separate receiver handling in SSA operations and enhance taint propagation * refactor: improve arity handling by using arg_uses for positional argument count and enhance witness scoring for tainted arguments * refactor: implement auth decorator extraction and classification for multiple languages * refactor: enhance Rust module path resolution and use map handling for cross-file disambiguation * refactor: introduce CalleeQuery struct for structured callee resolution and enhance resolver logic * refactor: implement same-file identity collision handling for `runTask` to ensure correct resolver behavior * refactor: standardize default struct initialization across multiple files * feat: add scripts for formatting checks and auto-fixes with test summaries * refactor: simplify character splitting and enhance namespace qualifier handling * refactor: improve documentation clarity and enhance code readability in resolver logic * refactor: replace default struct initialization with explicit field assignments for clarity * feat: enhance anonymous function naming by deriving context-based bindings * refactor: streamline match expressions for improved readability and performance * refactor: streamline match expressions for improved readability and performance * refactor: replace loop with while let for improved clarity and performance * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: add SSA constant propagation support to analysis context for improved accuracy * feat: implement shell metacharacter validation and bounded-length checks in Rust analysis * feat: add static map analysis for command injection suppression and type safety * refactor: simplify match statements and reduce line breaks for improved readability * feat(summary): phase 1/5 SinkSite data model for primary sink-location attribution Introduce SinkSite (file_rel, line, col, snippet, cap) carrying the primary sink source-location through function summaries. Swap SsaFuncSummary.param_to_sink and FuncSummary.param_to_sink from a coarse Cap map to a deduped SmallVec<[SinkSite; 1]> per parameter, with a backward-compatible cap_sites() helper and serde defaults so pre-phase-1 on-disk rows continue to deserialise cleanly. Extraction: SinkSiteLocator bundles the tree/bytes/file_rel needed by extract_ssa_func_summary; ParsedFile::extract_ssa_artifacts wires the locator in for the persisted pass-1 path, while pass-2 intra-file transient summaries fall back to cap-only sites (behavior unchanged). Merge: GlobalSummaries::insert now unions sink sites with (file_rel, line, col, cap) dedup via shared union_param_sink_sites helper. Database: JSON-serialised summary columns carry the new shape automatically; no schema change needed. Phase 2 will consume SinkSite in build_taint_diag() to overwrite the caller-site Finding.line with the callee's sink line when resolved via summary. Phase 1 keeps behavior unchanged: scanning tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs still produces the same (wrong) line 10 finding. Adds round-trip tests covering SinkSite solo, SsaFuncSummary with sink sites, legacy-JSON default handling for both summary types, and merge dedup. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(taint): phase 2/5 thread SinkSite into SsaTaintEvent and Finding Plumb Phase 1's SinkSite through the event pipeline into Findings, no output change yet. SsaTaintEvent gains `primary_sink_site: Option<SinkSite>`; when the main or callback sink-emission path has non-empty `param_to_sink_sites`, filter to sites whose `(line != 0) && (cap ∩ sink_caps != ∅)` and emit one event per distinct site — the multi-primary collapse keeps each downstream Finding single-primary. Resolution: ResolvedSummary and SinkInfo gain mirror `param_to_sink_sites` fields, populated from `SsaFuncSummary.param_to_sink` (SSA + callback paths) and `FuncSummary.param_to_sink` (global paths). Label, local-summary, and interop resolution paths leave the field empty — they only ever had cap-level info to begin with. Finding: new `primary_location: Option<SinkLocation>` with `file_rel/line/col`. `ssa_events_to_findings` maps `event.primary_sink_site` → `Finding.primary_location`, filtering cap-only sites (`line == 0`) to `None` so the (0,0) sentinel never leaks to formatters. Dedup key extended with the primary location so multi-site events aren't collapsed back together. Invariants (debug_assert!): * every SinkSite reaching emission has `line != 0 && cap ∩ sink_caps != ∅` — enforced by the pick_primary_sink_sites* filters; * every populated Finding.primary_location has `line != 0` AND non-empty `file_rel` — the cap-only → None translation upstream guarantees this. Deliberately independent of `uses_summary`: that flag tracks whether the *taint chain* used a summary, whereas primary attribution requires only that the *sink* itself was summary-resolved. A local source reaching a cross-file sink produces `uses_summary=false` alongside a populated primary_location — documented on Finding.primary_location, covered by `cross_file_sink_finding_carries_primary_location`. build_taint_diag, SARIF/JSON/explanation formatters, and the benchmark scorer remain untouched: finding.line still comes from `cfg_graph[finding.sink]`, so cmdi_indirect.rs still reports line 10 and the benchmark's rs-cmdi-003 row still shows FN in the LOC column. Tests: `cross_file_sink_finding_carries_primary_location` (proves plumbing via a synthetic FuncSummary carrying a SinkSite at 42:5) and `cross_file_sink_cap_only_site_leaves_primary_location_none` (regression guard against cap-only sites surfacing). All 1566 lib tests + integration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(output): phase 3/5 consume primary sink location in diag + SARIF When a finding's primary_location (populated in phase 2 from a callee summary's SinkSite) names the dangerous instruction inside a callee body, attribute the diagnostic line to that location instead of the caller's call site. The call site is demoted to a Call step in flow_steps, and a synthetic Sink step at the primary location is appended so analysts still see the full trace. Changes: - Add scan_root parameter to build_taint_diag so file_rel can be resolved back to an absolute path via a shared resolve_file_rel helper. Empty file_rel (single-file scans where namespace == "") resolves to the file under analysis. - Extend SinkLocation with snippet, carried from the upstream SinkSite so the formatter needs no second file read. - Relax the ssa_events_to_findings debug_assert to allow empty file_rel, which is valid when scan root equals the file itself. - SARIF: emit data-flow as codeFlows[0].threadFlows[0].locations[]; locations[0] already reflects the primary sink position via the updated diag line/col. Acceptance: scan on tests/benchmark/corpus/rust/cmdi/cmdi_indirect.rs now reports line 5 (Command::new) as the primary sink, with the call site at line 10 visible in flow_steps. Two expect.json fixtures updated (must_match line_range widened): - javascript/taint/context_sensitive_call: 12-14 -> 7-14 (line 8 is the real sink inside run()). - rust/cfg/closure_async: 10-10 -> 10-11 (line 11 is Command::new inside the closure). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(bench): phase 4/5 validate primary sink attribution across corpus Extend the benchmark scorer and ground truth to lock in phase 3's primary-location behavior, and add fixtures that exercise the new capability end-to-end. Scorer (tests/benchmark_test.rs): - Add optional `expected_call_site_lines: Option<Vec<[usize; 2]>>` on Case. When present, score_location_level additionally requires at least one flow_step in the finding's evidence trace to fall within ±2 of the call-site range. When absent, the check is skipped — fully forward-compatible with existing fixtures. - Retain ±2 tolerance on expected_sink_lines (compared against the now-primary Diag.line post-phase-3). Ground truth edits: - rs-cmdi-cross-001: expected_sink_lines [8,8] -> [9,9]. Line 8 is the transform::wrap call site (a cross-file propagator, not a sink); line 9 is Command::new, the real sink. The ±2 tolerance happened to mask this stale attribution but it was semantically wrong — phase 4 is the right time to correct it. Also adds expected_call_site_lines [8,8] so the new field is exercised on an existing cross-file case. - rs-cmdi-003: adds expected_call_site_lines [10,10] (run_cmd call). This fixture's sink (Command::new inside run_cmd at line 5) was the motivating case for phases 1-3; adding the call-site assertion guards against regression to caller-line attribution. New fixtures: - rust/cmdi/cmdi_indirect_multisink.rs (rs-cmdi-009): helper run_both takes two tainted params and invokes two Command sinks on consecutive lines. Locks in that primary line lands inside the helper (lines 5-6), not at the caller (line 12). Notes document that SinkSite is currently one-per-callee so both findings today collapse onto the first sink; expected_sink_lines=[5,6] and expected_call_site_lines=[12,12] stay valid either way. - python/cmdi/cross_indirect_sink/{app.py,helper.py} (py-cmdi-cross- 004): sink os.system lives in helper.py (cross-file), caller in app.py reads env source and calls run_cmd. Verifies phase 3's cross-file primary attribution: Diag.path = helper.py, Diag.line = 5, with app.py:7 recorded in flow_steps as a Call step. Acceptance: - `cargo test --test benchmark_test -- --ignored --nocapture` passes. - rs-cmdi-003 is TP/TP/TP (the target flip FN->TP at LOC). All pre-existing TP/TP/TP fixtures remain TP/TP/TP; 2 new fixtures are TP/TP/TP. - Aggregate rule-level: TP=158 FP=10 FN=1 TN=97, P=0.940 R=0.994 F1=0.966 on the 266-case corpus (was TP=156 FP=10 FN=1 TN=97 on 264 pre-phase-4, delta is the +2 new cases both resolving TP). - Full `cargo test` green (1566 lib tests + all integration tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(taint): phase 5/5 lock Finding.primary_location contract via regression test Add a regression test in src/taint/ssa_transfer.rs that wires up a synthetic SsaFuncSummary with a SinkSite at other.rs:42:10 and drives the three emission stages (pick_primary_sink_sites → emit_ssa_taint_events → ssa_events_to_findings) against a minimal caller SSA body. Asserts the resulting Finding.primary_location is exactly that triple. The existing integration tests in src/taint/tests.rs cover the coarse FuncSummary path end-to-end through analyse_file. This test locks in the lower-level SSA-side plumbing so a future refactor that silently drops the site between pick → emit → findings fails here rather than only at the benchmark layer. Also refreshes tests/benchmark/results/latest.json (timestamp only; rs-cmdi-003 remains TP/TP/TP and the aggregate P/R/F1 are unchanged from phase 4). Closes the primary sink-location attribution feature (phases 1-5/5): * Phase 1 — SinkSite data model on summaries. * Phase 2 — SinkSite threaded into SsaTaintEvent and Finding. * Phase 3 — diag + SARIF consume primary_location. * Phase 4 — benchmark validates primary_call_site_lines across corpus. * Phase 5 — regression test locks the event→finding contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: clean up formatting and improve readability in multiple files * refactor: simplify type definition for deduplication key in findings * test(harness): add must_not_match expectation for FP regression guards Extends ExpectedFinding with must_not_match field that asserts a diagnostic must NOT fire — presence is a hard failure. Non-consuming scan so it coexists with must_match entries on the same rule_id. Adds forbidden_violations accumulator and updates summary line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(regression): update expectations to ensure must_not_match for various taint and resource leak rules * feat: implement auto-seeding for JS/TS handler parameters to enhance taint tracking * feat: update switch statement handling to improve control flow analysis * feat: implement promisify alias handling for JS/TS to enhance taint tracking * feat: enhance taint tracking by refining expectation handling and adding mode filtering * feat: refine SQL handling in stream processing and enhance auto-seeding for handler parameters * feat: update taint tracking rules to enforce full mode matching and improve flow analysis * feat: enhance Ruby subshell handling to improve taint tracking and flow analysis * feat: update xss_response expectations to refine taint flow analysis and enhance regression guarding * feat: refine framework detection and update expectation handling for Echo and Sinatra * feat: implement max_count for taint tracking expectations and deduplicate findings * feat: add strict_unexpected handling for taint-unsanitised-flow in expectation files * feat: enhance deduplication of taint-unsanitised-flow findings by collapsing based on line and severity * feat: add strict_unexpected handling for taint-unsanitised-flow in multiple expectation files * feat: add structural invariant checks for SSA bodies * feat: ensure deterministic phi emission order using BTreeSet * feat: enhance handling of terminators to ensure authoritative flow through successor edges * feat: enhance Goto terminator handling to ensure all successors are marked executable * feat: refactor code for improved readability and organization * feat: simplify predicate checks and enhance readability in SSA handling * feat: implement per-file parse timeout and enhance file size handling * feat: migrate analysis engine toggles from environment variables to configuration file * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: remove unnecessary whitespace in hostile_input_tests.rs * feat: update dependencies and enhance documentation on language maturity * feat: enhance security headers and improve request body limits * feat: implement sink capability bits for deduplication and enhance evidence tagging * feat: implement dynamic activation handling for gated sinks and enhance validation logic * feat: enhance configuration documentation and clarify inline analysis cache behavior * feat: implement panic recovery during analysis to continue scans past errors * feat: add expectations configuration for taint analysis and performance metrics * feat: enhance error handling and logging during file reading and mutex locking * feat: add cross-file body loading tests and plumbing for CF-1 phase * feat: implement cross-file k=1 context-sensitive inline taint analysis with new tests and fixtures * feat: implement indexed-scan parity in cross-file inline analysis with new dropdown and copy functionality * feat: enhance classification span handling in CFG and AST for improved source attribution * feat: add new Express routes for handling user input and telemetry data * feat: implement ternary expression handling in CFG with diamond structure for JS/TS * feat: implement Phase CF-3 abstract-domain transfer channels in summaries * feat: add support for string-prefix transfer in cross-file calls and update tests * docs: reduce RESULTS.md doc size * feat: implement Phase CF-4 per-return-path summary decomposition with tests * feat: update parameter handling in pass1 and refactor SsaFuncSummary initialization * feat: implement Phase CF-5 for cross-file SCC joint fixed-point convergence with new flags and tests * feat: implement Phase CF-6 with parameter-granularity points-to summaries and associated tests * refactor: update comments and documentation for clarity and consistency * style: format code for consistency and readability * refactor: simplify verdict handling and improve edge checking logic * refactor: optimize path and identifier collection by avoiding unnecessary cloning * chore: update Cargo.toml for Rust version 1.85 and add ignored files; modify CHANGELOG and README for clarity on state analysis defaults * refactor: update documentation and improve clarity in configuration files * refactor: update documentation and improve clarity in configuration files * feat: add JS/TS pass-2 convergence tests and expectations configuration * feat: add Phase 5 regression tests for inline cache origin attribution and update related logic * feat: implement Phase 7 deduplication and alternative path linking for taint findings * feat: implement structural DFS index for anonymous functions and update naming conventions * feat: add Phase 8 regression tests for container-element taint in JS and Python * feat: add engine-depth profiles and explain-engine option for CLI * feat: update expectations and add new README fixtures for multi-file scan regression * feat: implement Phase 11 callback-alias and factory patterns with regression tests * feat: implement Terminator::Switch for multi-way dispatch and add regression tests * feat: add real-CVE benchmark fixtures for CVE-2023-48022, CVE-2019-14939, and CVE-2023-26159 with corresponding patched variants * refactor: extract cfg and ssa_transfer to submodules * refactor: cargo fmt * refactor: remove unnecessary blank line in cfg_tests.rs * refactor: remove unnecessary planning file * chore: update Rust version to 1.88 and bump dependencies in Cargo files * feat: enhance triage UI with new layout and controls, update README for clarity * feat: enhance triage UI with new layout and controls, update README for clarity * chore: remove outdated section from README for version 0.5.0 * docs: improve clarity and consistency in README content * chore: add "GPL-3.0-or-later" to license options in about.toml * chore: update license handling in about.toml and check-licenses.mjs * style: format code for improved readability in TriagePage component * style: format code for improved readability in TriagePage component * chore: enhance license handling and improve body_id scoping in seed lookup * feat: introduce owner and parent body IDs for enhanced seed scoping * feat: implement direction-aware engine provenance with new CLI flag for strict CI gating * feat: add Undef SSA operation for improved control-flow handling * style: improve code formatting for consistency and readability in multiple files * feat: add 16-function chain SCC across multiple files for enhanced analysis * style: simplify code formatting for improved readability in multiple files * fix: update CapHitReason default implementation and improve README clarity * docs: enhance README with detailed explanations of taint analysis and limitations * docs: refine README for clarity and consistency in taint analysis section * style: improve code formatting for better readability in NewScanModal and scans * fix: update cargo-about command to use --offline for deterministic license generation * fix: update cargo-about command to use --offline for deterministic license generation * ci: add step to prime cargo registry cache for deterministic license generation * feat: add support for non-sink collections in authorization analysis * feat: enhance authorization checks with row-level ownership equality and binding tracking * feat: implement self-scoped user handling and enhance ownership checks * refactor: simplify assertions and formatting in authorization analysis tests * fix: normalize line endings in THIRDPARTY-LICENSES.html generation and update README with AI disclosure * docs: update AI disclosure section for clarity and conciseness * feat: add AI Contribution Policy and update contributing guidelines for AI assistance disclosure * feat: enhance authorization analysis with SSA-derived variable type classification * feat: implement auth_finding_to_diag function for enhanced security diagnostics * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add args_value_refs to CallSite struct for enhanced argument tracking * feat: add direction-aware engine provenance with LossDirection classification and new CLI flag * feat: simplify strip_cap_from_call_args call by removing unnecessary line breaks * feat: enhance error message handling in cli_validation_tests for better Windows compatibility * feat: optimize release profile settings in Cargo.toml and update CodeQL configuration * feat: enhance release build process with SBOM generation and SLSA provenance * feat: update actions/checkout and actions/setup-node to v6, enhance CLI options, and improve auth-check summaries * feat: introduce PathFact handling for path safety checks and rejection logic * feat: introduce PathFact handling for path safety checks and rejection logic * feat: update benchmark data and enhance path sanitization logic with new safety checks * feat: document AI assistance in frontend UI development and human review process * feat: add return path facts for enhanced path safety checks and update documentation * chore: update release date for version 0.5.0 in CHANGELOG.md * chore: clean up ci.yml by removing outdated comments and clarifying steps * feat: implement cross-language path sanitizers and validators for enhanced security * feat: enhance SSA value usage tracking by including block terminators and improve path safety checks * feat: enhance switch statement handling by adding per-case path constraints and support for exclusive cases * refactor: simplify conditional formatting and improve code readability in executor and lower modules * feat: add vulnerable examples for various languages demonstrating authentication and sanitization issues * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: enhance actor context recognition for self-actor identifiers and add support for global non-sink receivers * feat: add transform classifiers for Java, Go, and Ruby with corresponding tests * refactor: clarify comments on reassign-to-constant idiom and sink behavior in guards.rs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-07-03 20:41:00 +02:00 · 2026-04-25 17:59:11 -04:00 · 2026-04-25 17:59:11 -04:00 · 41128177d2
commit 41128177d2
parent c4ce08b452
2144 changed files with 201812 additions and 8927 deletions
--- a/docs/detectors/cfg.md
+++ b/docs/detectors/cfg.md
@ -1,161 +1,130 @@
-# CFG Structural Analysis
+# CFG structural analysis

-## Summary
+Nyx builds an intra-procedural control-flow graph per function and checks structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error paths terminate before reaching dangerous code.

-Nyx builds an intra-procedural control-flow graph (CFG) for each function and analyzes structural properties: whether sinks are guarded by sanitizers or validators, whether web handlers check authentication, whether resources are released on all exit paths, and whether error-handling code terminates properly.
-
-These detectors use **dominator analysis** — they check whether a guard node dominates (must execute before) a sink node on the CFG.
+These detectors use dominator analysis. A guard dominates a sink when the guard must execute before the sink on every path from entry.

 ## Rule IDs

-| Rule ID | Severity | Description |
-|---------|----------|-------------|
-| `cfg-unguarded-sink` | High/Medium | Sink reachable without a dominating guard or sanitizer |
-| `cfg-auth-gap` | High | Web handler reaches privileged sink without auth check |
-| `cfg-unreachable-sink` | Medium | Dangerous function in unreachable code |
-| `cfg-unreachable-sanitizer` | Low | Sanitizer in unreachable code |
-| `cfg-unreachable-source` | Low | Source in unreachable code |
-| `cfg-error-fallthrough` | High/Medium | Error check doesn't terminate; dangerous code follows |
-| `cfg-resource-leak` | Medium | Resource acquired but not released on all exit paths |
-| `cfg-lock-not-released` | Medium | Lock acquired but not released on all exit paths |
+| Rule ID | Severity |
+|---|---|
+| `cfg-unguarded-sink` | High/Medium |
+| `cfg-auth-gap` | High |
+| `cfg-unreachable-sink` | Medium |
+| `cfg-unreachable-sanitizer` | Low |
+| `cfg-unreachable-source` | Low |
+| `cfg-error-fallthrough` | High/Medium |
+| `cfg-resource-leak` | Medium |
+| `cfg-lock-not-released` | Medium |

-## What It Detects
+## What it detects

-### Unguarded sinks (`cfg-unguarded-sink`)
-A sink call (e.g. `system()`, `eval()`, `Command::new()`) is reachable from the function entry without passing through a guard or sanitizer that matches the sink's capability.
+**`cfg-unguarded-sink`**: A sink call (`system`, `eval`, `Command::new`, `db.execute`, etc.) is reachable from function entry without passing through any guard or sanitizer that matches the sink's capability.

-### Auth gaps (`cfg-auth-gap`)
-A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`) reaches a privileged sink (shell execution, file I/O) without a prior call to an authentication function (`is_authenticated`, `require_auth`, `check_permission`, etc.).
+**`cfg-auth-gap`**: A function identified as a web handler (by parameter naming conventions like `req`, `res`, `ctx`, `request`, language-dependent) reaches a privileged sink (shell execution, file I/O) without a preceding authentication call.

-### Unreachable security code (`cfg-unreachable-*`)
-Sinks, sanitizers, or sources in dead code branches. This often indicates a refactoring error where security-critical code was accidentally made unreachable.
+**`cfg-unreachable-*`**: Sinks, sanitizers, or sources in dead code. Usually signals a refactoring error that silently disabled security-relevant logic.

-### Error fallthrough (`cfg-error-fallthrough`)
-An error check (null check, error return check) does not terminate the function or loop back. Execution continues to a dangerous operation on the error path.
+**`cfg-error-fallthrough`**: An error-handling branch (null check, error-return check) does not terminate. Execution falls through to a dangerous operation on the error path.

-### Resource leaks (`cfg-resource-leak`, `cfg-lock-not-released`)
-A resource acquisition call (e.g. `File::open`, `fopen`, `socket`, `Lock`) is not matched by a release call (e.g. `close`, `fclose`, `unlock`) on all exit paths from the function.
+**`cfg-resource-leak`, `cfg-lock-not-released`**: A resource acquisition (`File::open`, `fopen`, `socket`, `Lock`) is not matched by a release on every exit path from the function.

-## What It Cannot Detect
+## What it can't detect

- **Inter-procedural guards**: If authentication is checked in a middleware function that calls this handler, the CFG detector cannot see it. It only analyzes one function at a time.
- **Dynamic dispatch**: Virtual method calls, function pointers, and closures are opaque to the CFG.
- **Complex guard patterns**: Only recognized guard function names are checked. Custom validation logic (e.g. `if password == expected`) is not recognized as a guard.
- **Correct sanitization**: The detector checks that *some* guard dominates the sink, not that the guard is *correct*. A guard that always passes would suppress the finding.
- **Cross-function resource flows**: If a file handle is opened in one function and closed in another, the detector will report a leak in the first function.
+- **Inter-procedural guards.** Middleware-level auth, helper functions that internally call auth, and cleanup performed in a caller are invisible.
+- **Dynamic dispatch.** Virtual calls, function pointers, closures resolve to no specific callee.
+- **Correctness of guards.** The detector checks *a* guard dominates the sink. It cannot check the guard is correct. A no-op `if true {}` would suppress the finding.
+- **Custom validation logic.** Only recognised guard names are checked. `if password == expected` is not a recognised guard.
+- **Cross-function resource flows.** If a file handle opens in one function and closes in another, the opener gets flagged as a leak. This is the largest source of FPs on factory-pattern code.

-## Common False Positives
+## Common false positives

-| Scenario | Why it fires | Mitigation |
-|----------|-------------|------------|
-| Framework-level auth middleware | Handler doesn't call auth directly | Document as expected; suppress with severity filter |
-| Resource closed via RAII/defer | Implicit cleanup not visible to CFG | Currently not detected; known limitation |
-| Custom guard function name | Function not in the recognized guard list | Add the function name as a sanitizer in config |
-| Test handlers | Intentionally skip auth in tests | Default non-prod downgrade reduces severity; or exclude test dirs |
+| Scenario | Why | Mitigation |
+|---|---|---|
+| Framework middleware auth | Handler doesn't call auth directly | Expected; suppress with severity filter or exclude handlers |
+| RAII / defer cleanup | Implicit release not visible to CFG (partially handled for Rust Drop and Go defer) | Known limitation |
+| Custom guard name | Function not in the recognised guard list | Add it as a sanitizer rule in config |
+| Test handlers | Intentional lack of auth | Default non-prod downgrade reduces severity; or exclude test dirs |

-## Common False Negatives
+## Common false negatives

-| Scenario | Why it's missed |
-|----------|----------------|
-| Auth in called function | Cross-function guards not tracked |
-| Guard via type system | Type-level guarantees (e.g. Rust's `AuthenticatedUser` wrapper) not analyzed |
-| Resource closed in finally/defer | Some cleanup patterns not recognized |
+| Scenario | Why |
+|---|---|
+| Auth in a called helper | Cross-function guards not tracked |
+| Type-system guards | Rust `AuthenticatedUser<T>` wrappers, typestate patterns not analysed |
+| Cleanup in `finally`/`ensure`/`defer` in callers | Cross-function cleanup not tracked |

-## Confidence Signals
+## Tuning

-| Signal | Meaning |
-|--------|---------|
-| **Evidence lists guard nodes** | Shows which guards were checked and found missing |
-| **Sink has high capability** | Shell execution or file I/O sinks are higher risk |
-| **Handler detection matched** | Web handler identification is based on conventional parameter names |
+### Recognised guard names

-## Tuning and Noise Controls
+Nyx accepts these patterns as dominating guards:

-### Add custom guards/sanitizers
+| Pattern | Applies to |
+|---|---|
+| `validate*`, `sanitize*` | All sinks |
+| `check_*`, `verify_*`, `assert_*` | All sinks |
+| `shell_escape` | Shell sinks |
+| `html_escape` | HTML/XSS sinks |
+| `url_encode` | URL sinks |
+| `which` | Shell execution (binary lookup) |
+
+### Recognised auth names
+
+| Pattern | Language |
+|---|---|
+| `is_authenticated`, `require_auth`, `check_permission`, `authorize`, `authenticate`, `require_login`, `check_auth`, `verify_token`, `validate_token` | Cross-language |
+| `middleware.auth`, `auth.required` | Go |
+| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
+
+For Rust auth checks (`require_*`, ownership equality, row-level checks), see [auth.md](../auth.md).
+
+### Custom guards

 ```toml
 [[analysis.languages.python.rules]]
 matchers = ["validate_request", "check_csrf"]
 kind = "sanitizer"
-cap = "all"
+cap  = "all"
 ```

-### Add auth rules
-
-Auth checks are recognized by function name. If your codebase uses non-standard names:
+### Custom auth functions

 ```toml
 [[analysis.languages.javascript.rules]]
 matchers = ["ensureLoggedIn", "requirePermission"]
 kind = "sanitizer"
-cap = "all"
-```
-
-### Filter results
-
-```bash
-# Skip low-severity unreachable findings
-nyx scan . --severity ">=MEDIUM"
-```
-
-### Disable CFG analysis
-
-```bash
-nyx scan . --mode ast   # AST patterns only
+cap  = "all"
 ```

 ## Examples

-### Unguarded sink
+Unguarded sink:

 ```go
 func handler(w http.ResponseWriter, r *http.Request) {
    cmd := r.URL.Query().Get("cmd")
-    exec.Command("sh", "-c", cmd).Run()  // cfg-unguarded-sink: no guard dominates
+    exec.Command("sh", "-c", cmd).Run()  // cfg-unguarded-sink
 }
 ```

-### Auth gap
+Auth gap:

 ```javascript
 app.get('/admin/delete', (req, res) => {
-    // No is_authenticated() call
-    db.execute("DELETE FROM users WHERE id = " + req.params.id);
-    // cfg-auth-gap: web handler reaches privileged sink without auth
+    // No auth call
+    db.execute("DELETE FROM users WHERE id = " + req.params.id);  // cfg-auth-gap
 });
 ```

-### Resource leak
+Resource leak:

 ```c
 void process() {
-    FILE *f = fopen("data.txt", "r");  // acquire
+    FILE *f = fopen("data.txt", "r");
    if (error) {
-        return;  // cfg-resource-leak: f not closed on this path
+        return;           // cfg-resource-leak: f not closed on this path
    }
    fclose(f);
 }
 ```
-
-## Guard Rules
-
-Nyx recognizes these function name patterns as guards:
-
-| Pattern | Applies to |
-|---------|-----------|
-| `validate*`, `sanitize*` | All sinks |
-| `check_*`, `verify_*`, `assert_*` | All sinks |
-| `shell_escape` | Shell execution sinks |
-| `html_escape` | HTML/XSS sinks |
-| `url_encode` | URL sinks |
-| `which` | Shell execution (binary lookup) |
-
-### Auth rules
-
-| Pattern | Category |
-|---------|----------|
-| `is_authenticated`, `require_auth`, `check_permission` | Common |
-| `authorize`, `authenticate`, `require_login` | Common |
-| `check_auth`, `verify_token`, `validate_token` | Common |
-| `middleware.auth`, `auth.required` | Go |
-| `isAuthenticated`, `checkPermission`, `hasAuthority`, `hasRole` | Java |
--- a/docs/detectors/patterns.md
+++ b/docs/detectors/patterns.md
@ -1,111 +1,84 @@
-# AST Pattern Matching
+# AST patterns

-## Summary
+AST patterns are tree-sitter queries that match dangerous structural shapes in source. No dataflow, no CFG. A match means the construct is present; it's not proof the construct is exploitable.

-AST patterns are tree-sitter queries that match specific structural code constructs. They are the simplest and fastest detector family — no dataflow, no CFG, just structural presence. A match means the dangerous construct exists in the code; it does not prove the code is exploitable.
-
-AST patterns run in all analysis modes, including `--mode ast` (where they are the only active detector).
+Patterns run in every analysis mode. In `--mode ast` they're the only active detector.

 ## Rule IDs

-Pattern rule IDs follow the format `<lang>.<category>.<specific>`:
-
 ```
-rs.memory.transmute
-js.code_exec.eval
-py.deser.pickle_loads
-c.memory.gets
-java.sqli.execute_concat
+<lang>.<category>.<name>
 ```

-See the [Rule Reference](../rules/index.md) for a complete listing per language.
+Examples: `js.code_exec.eval`, `py.deser.pickle_loads`, `c.memory.gets`, `java.sqli.execute_concat`.

-## Pattern Tiers
+Full list: [rules.md](../rules.md).

-| Tier | Meaning | Examples |
-|------|---------|---------|
-| **A** | Structural presence alone is high-signal | `gets()`, `eval()`, `pickle.loads()`, `mem::transmute` |
-| **B** | Query includes a heuristic guard | SQL `execute` with concatenated arg, `printf(var)` with non-literal format |
+## Tiers

-Tier B patterns use additional tree-sitter predicates to reduce false positives. For example, `java.sqli.execute_concat` only fires when `executeQuery()` receives a `binary_expression` (string concatenation) as its argument, not when it receives a literal or parameter placeholder.
+| Tier | Meaning |
+|---|---|
+| **A** | Structural presence alone is high-signal. `gets`, `eval`, `pickle.loads`, `mem::transmute` |
+| **B** | Pattern includes a tree-sitter heuristic guard. Example: `java.sqli.execute_concat` only fires when `executeQuery` receives a `binary_expression` (string concatenation), not a literal or a parameterized statement |

-## What It Detects
+## Categories

-### By category
+| Category | Examples |
+|---|---|
+| CommandExec | `system`, `os.system`, `Runtime.exec`, backticks |
+| CodeExec | `eval`, `Function`, PHP `assert("string")`, `class_eval`, `instance_eval` |
+| Deserialization | `pickle.loads`, `yaml.load`, `Marshal.load`, `readObject`, `unserialize` |
+| SqlInjection | `executeQuery`/`Query`/`execute` with concatenated argument (Tier B) |
+| PathTraversal | PHP `include $var` |
+| Xss | `document.write`, `outerHTML`, `insertAdjacentHTML`, `getWriter().print` |
+| Crypto | `md5`, `sha1`, `Math.random`, `java.util.Random` for security use |
+| Secrets | hardcoded API keys (Go, JS, TS) |
+| InsecureTransport | `InsecureSkipVerify`, `fetch("http://...")` |
+| Reflection | `Class.forName`, `Method.invoke`, `send`, `constantize` |
+| MemorySafety | `transmute`, `unsafe`, `gets`, `strcpy`, `sprintf` |
+| Prototype | `__proto__` assignment, `Object.prototype.*` |
+| Config | CORS dynamic origin, `rejectUnauthorized: false`, insecure session settings |
+| CodeQuality | `unwrap`, `panic!`, `as any` |

-| Category | What it matches | Example languages |
-|----------|----------------|-------------------|
-| **CommandExec** | Shell command execution functions | C (`system`), Python (`os.system`), Ruby (backticks) |
-| **CodeExec** | Dynamic code evaluation | JS (`eval`, `new Function()`), Python (`exec`), PHP (`eval`) |
-| **Deserialization** | Unsafe object deserialization | Java (`readObject`), Python (`pickle.loads`), Ruby (`Marshal.load`) |
-| **SqlInjection** | SQL with string concatenation | Java, Go, Python, PHP (Tier B heuristic) |
-| **PathTraversal** | File inclusion with variable path | PHP (`include $var`) |
-| **Xss** | XSS sink functions | JS (`document.write`, `outerHTML`), Java (`getWriter().print`) |
-| **Crypto** | Weak cryptographic algorithms | All languages (`md5`, `sha1`, `Math.random()`) |
-| **Secrets** | Hardcoded credentials | Go (variable name matching) |
-| **InsecureTransport** | Unencrypted communication | Go (`InsecureSkipVerify`), JS (`fetch("http://")`) |
-| **Reflection** | Dynamic class/method dispatch | Java (`Class.forName`, `Method.invoke`), Ruby (`send`, `constantize`) |
-| **MemorySafety** | Memory safety violations | Rust (`transmute`, `unsafe`), C (`gets`, `strcpy`, `sprintf`) |
-| **Prototype** | Prototype pollution | JS/TS (`__proto__` assignment) |
-| **CodeQuality** | Panic/abort/type-safety issues | Rust (`unwrap`, `panic!`), TS (`as any`) |
+## What patterns can't tell you

-## What It Cannot Detect
+- **Dataflow.** `eval("1+1")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`. The taint detector is the one that distinguishes them.
+- **Reachability.** A pattern in dead code matches identically.
+- **Semantics.** `strcpy(dst, src)` always matches, regardless of buffer sizes.
+- **Indirect calls.** `let e = eval; e(input)` doesn't match `eval`.
+- **Aliased imports.** `from os import system as s; s(cmd)` won't match `system`.
+- **Macro expansions.** Tree-sitter parses the macro call site, not the expansion.

- **Dataflow**: Patterns don't track whether the dangerous function receives tainted input. `eval("hello")` (safe) and `eval(userInput)` (dangerous) both match `js.code_exec.eval`.
- **Context**: Patterns don't understand whether the code is reachable, guarded, or inside a test.
- **Semantics**: `strcpy(dst, src)` always matches — it cannot determine buffer sizes.
- **Indirect calls**: Function pointers, dynamic dispatch, and aliased references are invisible.
+## Common false positives

-## Common False Positives
+| Scenario | Why | Mitigation |
+|---|---|---|
+| `eval("hardcoded literal")` | Pattern matches structure | Run `--mode cfg` to drop AST patterns and rely on taint |
+| `unsafe` block with sound justification | Every `unsafe` matches `rs.quality.unsafe_block` | Filter `>=MEDIUM` (it's Medium) or accept the noise |
+| `.unwrap()` in tests | Acceptable in test code | Default non-prod severity downgrade reduces it |
+| `md5` for non-cryptographic checksums | Pattern can't see intent | Suppress with `--severity ">=MEDIUM"` or per-line `nyx:ignore` |
+| SQL concat with trusted data (Tier B) | Heuristic can't verify the source | Taint is more precise; or convert to a parameterized query |

-| Scenario | Why it fires | Mitigation |
-|----------|-------------|------------|
-| `eval()` with a hardcoded string literal | Pattern matches structural presence | Taint analysis won't flag this — use `--mode cfg` for fewer false positives |
-| `unsafe` block in Rust with sound justification | All unsafe blocks match | Filter with `--severity ">=MEDIUM"` (unsafe_block is Medium) |
-| `.unwrap()` in test code | Acceptable in tests | Default non-prod downgrade reduces severity |
-| `md5()` used for checksums (not security) | Pattern doesn't know usage intent | Filter Low severity or add to exclusions |
-| SQL concatenation with trusted data | Tier B heuristic can't verify data source | Taint analysis is more precise here |
+## Confidence levels

-## Common False Negatives
+Every AST pattern carries an explicit confidence:

-| Scenario | Why it's missed |
-|----------|----------------|
-| `eval` called via alias (`let e = eval; e(input)`) | Pattern matches the identifier `eval`, not the resolved function |
-| Dangerous function in a macro expansion | Tree-sitter parses the macro call, not the expansion |
-| SQL injection via ORM query builder | No pattern for ORM-specific query building |
-| Imported function under different name | `from os import system as s; s(cmd)` — pattern looks for `system` |
+| Confidence | Use |
+|---|---|
+| High | Inherently dangerous construct with no safe usage. `gets`, `pickle.loads`, `eval` with no guard |
+| Medium | Likely issue, context may change the call. SQL concatenation (Tier B), `unsafe` blocks, `exec` |
+| Low | Heuristic. Often appears in safe code. Weak crypto for checksums, `unwrap` outside tests, `Math.random` |

-## Confidence Signals
+`--min-confidence medium` (or `output.min_confidence = "medium"`) drops Low-confidence matches.

-| Signal | Meaning |
-|--------|---------|
-| **Tier A** | High confidence — the function itself is dangerous |
-| **Tier B** | Moderate confidence — heuristic guard reduces false positives |
-| **High severity** | Critical vulnerability class (command exec, deserialization) |
-| **Low severity** | Informational (weak crypto, code quality) |
-| **Non-prod path** | Finding in test/vendor code — downgraded by default |
-
-## Tuning and Noise Controls
-
-### Severity filtering
+## Tuning

 ```bash
-# Skip code-quality and weak-crypto findings
-nyx scan . --severity ">=MEDIUM"
-
-# Only critical findings
-nyx scan . --severity HIGH
+nyx scan . --severity ">=MEDIUM"        # drop Low-tier patterns
+nyx scan . --severity HIGH              # banned APIs and code-exec only
+nyx scan . --mode cfg                   # drop AST patterns; keep taint + state + cfg
 ```

-### Use taint for precision
-
-```bash
-# Taint-only mode: only report findings with confirmed dataflow
-nyx scan . --mode cfg
-```
-
-### Exclude directories
-
 ```toml
 [scanner]
 excluded_directories = ["node_modules", "vendor", "generated"]
@ -113,37 +86,29 @@ excluded_directories = ["node_modules", "vendor", "generated"]

 ## Examples

-### Tier A — structural presence
+Tier A, structural presence:

-**C: Banned function**
 ```c
 char buf[64];
-gets(buf);  // c.memory.gets — always dangerous, no safe usage
+gets(buf);                              // c.memory.gets
 ```

-**Python: Unsafe deserialization**
 ```python
 import pickle
-data = pickle.loads(user_input)  # py.deser.pickle_loads
+data = pickle.loads(user_input)         // py.deser.pickle_loads
 ```

-### Tier B — heuristic-guarded
+Tier B, heuristic guard:

-**Java: SQL concatenation**
 ```java
 // Fires: concatenated argument
-stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);
-// java.sqli.execute_concat
+stmt.executeQuery("SELECT * FROM users WHERE id=" + userId);  // java.sqli.execute_concat

-// Does NOT fire: parameterized query
+// Does not fire: parameterized
 stmt.executeQuery(preparedSql);
 ```

-**C: Format string**
 ```c
-// Fires: variable as first argument
-printf(user_input);  // c.memory.printf_no_fmt
-
-// Does NOT fire: literal format string
-printf("%s", user_input);
+printf(user_input);                     // c.memory.printf_no_fmt: fires (variable as fmt)
+printf("%s", user_input);               // does not fire (literal fmt)
 ```
--- a/docs/detectors/state.md
+++ b/docs/detectors/state.md
@ -1,26 +1,22 @@
-# State Model Analysis
+# State model analysis

-## Summary
+Tracks resource lifecycle and authentication state through a function. Detects use-after-close, double-close, leaks, and unauthenticated access to privileged operations.

-Nyx's state model analysis tracks **resource lifecycle** and **authentication state** through a function using monotone dataflow over bounded lattices. It detects use-after-close bugs, double-close bugs, resource leaks, and unauthenticated access to privileged operations.
-
-State analysis is **opt-in** — enable it with `scanner.enable_state_analysis = true` in config. It requires `mode = "full"` or `mode = "cfg"`.
+State analysis is on by default. Disable with `scanner.enable_state_analysis = false`. It runs in `--mode full` and `--mode taint`; AST-only mode skips it.

 ## Rule IDs

-| Rule ID | Severity | Description |
-|---------|----------|-------------|
-| `state-use-after-close` | High | Variable used after being closed/released |
-| `state-double-close` | Medium | Resource closed twice |
-| `state-resource-leak` | Medium | Resource opened but never closed (definite) |
-| `state-resource-leak-possible` | Low | Resource may not be closed on all paths |
-| `state-unauthed-access` | High | Privileged operation reached without authentication |
+| Rule ID | Severity |
+|---|---|
+| `state-use-after-close` | High |
+| `state-double-close` | Medium |
+| `state-resource-leak` | Medium |
+| `state-resource-leak-possible` | Low |
+| `state-unauthed-access` | High |

-## What It Detects
+## What it detects

-### Use-after-close (`state-use-after-close`)
-
-A resource transitions to the CLOSED state (via `close()`, `fclose()`, `disconnect()`, etc.), then a use operation (`read`, `write`, `send`, `recv`, `query`, etc.) is performed on it.
+**`state-use-after-close`**: Resource transitions to CLOSED (via `close`, `fclose`, `disconnect`, …), then a use operation happens on it.

 ```c
 FILE *f = fopen("data.txt", "r");
@ -28,147 +24,108 @@ fclose(f);
 fread(buf, 1, 100, f);  // state-use-after-close
 ```

-### Double-close (`state-double-close`)
+**`state-double-close`**: Resource closed twice. Crashes or undefined behaviour on most runtimes.

-A resource is closed twice. This can cause crashes or undefined behavior.
+**`state-resource-leak`**: Resource opened but never closed on any path through the function. Definite leak.

-```python
-f = open("data.txt")
-f.close()
-f.close()  # state-double-close
-```
+**`state-resource-leak-possible`**: Resource closed on some paths but not others. Lower confidence; often an early-return error path.

-### Resource leak (`state-resource-leak`)
+**`state-unauthed-access`**: A function recognised as a web handler reaches a privileged sink without an auth call on the path.

-A resource is opened but never closed on any path through the function. This is a definite leak.
+A function counts as a web handler if its name starts with `handle_`, `route_`, or `api_` (sufficient on its own), or starts with `serve_`/`process_` and the file uses web-shaped parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, language-dependent). `main` is excluded.

-```java
-FileInputStream fis = new FileInputStream("data.txt");
-process(fis);
-// function exits without fis.close() — state-resource-leak
-```
+## Managed-resource suppression

-### Possible resource leak (`state-resource-leak-possible`)
+Several language-specific cleanup patterns suppress leak findings:

-A resource is closed on some paths but not others.
+| Pattern | Languages | Effect |
+|---|---|---|
+| RAII / Drop | Rust | All leak findings suppressed except `alloc`/`dealloc` |
+| Smart pointers | C++ | `make_unique`/`make_shared` treated as managed; raw `new`/`malloc` still tracked |
+| `defer` | Go | `defer f.Close()` suppresses leak at exit |
+| `with` context manager | Python | `with open(f) as f:` suppresses leak for the bound name |
+| try-with-resources | Java | TWR-bound resources suppressed |

-```go
-f, err := os.Open("data.txt")
-if err != nil {
-    return  // f not closed here
-}
-f.Close()  // closed here
-// state-resource-leak-possible on the error path
-```
+## What it can't detect

-### Unauthenticated access (`state-unauthed-access`)
+- **Cross-function resource ownership.** Open in one function, close in another, leak gets reported in the opener. The most common FP source for leak detection.
+- **Factory / builder functions** that return a resource for the caller to manage.
+- **Variable shadowing across scopes.** Same name in inner and outer scope shares one symbol; an inner close masks an outer leak.
+- **Resources stored in collections.** Handles in arrays / maps / channels and cleaned up via iteration are not tracked.
+- **Dynamic dispatch.** Close called via trait object or interface may not be recognised.
+- **Type-state authentication.** `AuthenticatedRequest<T>` and similar Rust patterns are not recognised as auth.

-A function identified as a web handler reaches a privileged sink (shell execution, file I/O) without any authentication check on the path.
+## Common false positives

-A function is identified as a web handler if:
-1. Its name starts with `handle_`, `route_`, or `api_` (strong match — sufficient on its own), OR
-2. Its name starts with `serve_` or `process_` AND any function in the file has web-like parameter names (`request`, `req`, `ctx`, `res`, `response`, `w`, `writer`, etc., varying by language).
+| Scenario | Why | Mitigation |
+|---|---|---|
+| Factory returns a resource | Caller owns it | Known limitation |
+| Framework-managed handles | Connection pool, request scope | Exclude framework code or downgrade |
+| Variable name shadowing | Same name reused | Known limitation |

-The function name `main` is explicitly excluded.
+## Per-language detection

-```javascript
-app.post('/admin/exec', (req, res) => {
-    // No auth check
-    exec(req.body.command);  // state-unauthed-access
-});
-```
+| Language | Leak | Double-close | Use-after-close | Notes |
+|---|---|---|---|---|
+| C | yes | yes | yes | `fopen`/`fclose`, `malloc`/`free`, `pthread_mutex_*` |
+| C++ | yes | yes | yes | C pairs plus `new`/`delete`; smart pointers suppressed |
+| Python | yes | yes | yes | `with` suppressed; `open`, `socket`, `connect` |
+| Go | yes | yes | yes | `defer` suppressed; `os.Open` / `.Close` |
+| Rust | unsafe only | n/a | n/a | RAII suppresses everything except `alloc`/`dealloc` |
+| JavaScript | yes | yes | partial | `fs.openSync`/`closeSync` |
+| TypeScript | yes | yes | partial | Same as JS |
+| PHP | yes | yes | partial | `fopen`/`fclose`, `curl_init`/`curl_close`, `mysqli_*` |
+| Ruby | partial | partial | partial | `File.open`/`close`, `TCPSocket` |
+| Java | limited | limited | limited | Constructor-callee matching is incomplete |

-## What It Cannot Detect
-
- **Cross-function resource management**: Resources opened in one function and closed in another are not tracked. This is the most common source of false positives for leak detection.
- **RAII / defer / try-with-resources**: Implicit cleanup via language-level constructs (Rust's `Drop`, Go's `defer`, Java's try-with-resources, Python's `with`) is not recognized. These patterns will produce false-positive leak findings.
- **Dynamic dispatch**: If `close()` is called through a trait object or interface, it may not be recognized.
- **Authentication via type system**: Rust's type-state pattern (e.g. `AuthenticatedRequest<T>`) is not recognized as an auth check.
- **Complex authorization logic**: Only recognized function name patterns are checked.
-
-## Common False Positives
-
-| Scenario | Why it fires | Mitigation |
-|----------|-------------|------------|
-| RAII / Drop / defer cleanup | Implicit cleanup not visible | Known limitation; filter by severity |
-| Resource returned to caller | Ownership transferred, not leaked | Known limitation |
-| Framework-managed resources | Web framework manages connection lifecycle | Exclude framework-generated handlers |
-| Try-with-resources (Java) | Language construct not parsed | Known limitation |
-| Context manager (Python `with`) | Block construct not tracked | Known limitation |
-
-## Common False Negatives
-
-| Scenario | Why it's missed |
-|----------|----------------|
-| Resource closed in helper function | Cross-function tracking not implemented |
-| Auth in middleware | Auth check happens before handler is called |
-| Double-close via aliased reference | Alias analysis not performed |
-
-## Confidence Signals
-
-| Signal | Meaning |
-|--------|---------|
-| **Definite leak (state-resource-leak)** | Resource is never closed on any path — high confidence |
-| **Use-after-close** | Read/write operation after explicit close — high confidence |
-| **Web handler detected** | Entry point matched by parameter naming convention |
-| **Possible leak (state-resource-leak-possible)** | Resource closed on some but not all paths — lower confidence |
-
-## Tuning and Noise Controls
-
-### Enable state analysis
-
-```toml
-[scanner]
-enable_state_analysis = true
-```
-
-### Severity filtering
+## Tuning

 ```bash
-# Skip possible-leak findings (Low severity)
-nyx scan . --severity ">=MEDIUM"
+nyx scan . --severity ">=MEDIUM"   # Skip "possible" leaks (Low)
 ```

-### Exclude test files
-
 ```toml
 [scanner]
-excluded_directories = ["tests", "test", "spec"]
+enable_state_analysis = true        # default
+excluded_directories  = ["tests", "test", "spec"]
 ```

-## Resource Pairs
+## Recognised pairs

-The state engine recognizes these acquire/release pairs per language:
+The state engine ships these acquire/release pairs. Custom pairs are not yet configurable; file an issue if you need one.

-### C/C++
-| Acquire | Release | Resource |
-|---------|---------|----------|
-| `fopen` | `fclose` | File handle |
-| `open` | `close` | File descriptor |
-| `socket` | `close` | Socket |
-| `malloc`, `calloc`, `realloc` | `free` | Heap memory |
-| `pthread_mutex_lock` | `pthread_mutex_unlock` | Mutex |
+**C / C++**

-### Rust
-| Acquire | Release | Resource |
-|---------|---------|----------|
-| `File::open`, `File::create` | `drop`, `close` | File handle |
-| `TcpStream::connect` | `shutdown` | TCP connection |
-| `lock`, `read`, `write` (on Mutex/RwLock) | `drop` | Lock guard |
+| Acquire | Release |
+|---|---|
+| `fopen` | `fclose` |
+| `open` | `close` |
+| `socket` | `close` |
+| `malloc`, `calloc`, `realloc` | `free` |
+| `pthread_mutex_lock` | `pthread_mutex_unlock` |
+| `new`, `new[]` *(C++)* | `delete`, `delete[]` |

-### Java
-| Acquire | Release | Resource |
-|---------|---------|----------|
-| `new FileInputStream` | `close` | File stream |
-| `getConnection` | `close` | DB connection |
-| `new Socket` | `close` | Socket |
+**Rust**

-### Go, Python, JavaScript, Ruby, PHP
-Similar patterns with language-specific function names.
+| Acquire | Release |
+|---|---|
+| `File::open`, `File::create` | `drop`, `close` |
+| `TcpStream::connect` | `shutdown` |
+| `lock`, `read`, `write` (Mutex/RwLock) | `drop` |

-## Use Patterns (Trigger use-after-close)
+**Java**

-The following operations on a closed resource trigger `state-use-after-close`:
+| Acquire | Release |
+|---|---|
+| `new FileInputStream` (and friends) | `close` |
+| `getConnection` | `close` |
+| `new Socket` | `close` |
+
+Go, Python, JavaScript, Ruby, PHP follow language-idiomatic equivalents.
+
+## Use-after-close triggers
+
+These operations on a closed resource fire `state-use-after-close`:

 ```
 read, write, send, recv, fread, fwrite, fgets, fputs, fprintf, fscanf,
@ -177,28 +134,3 @@ ungetc, query, execute, fetch, sendto, recvfrom, ioctl, fcntl,
 strcpy, strncpy, strcat, strncat, memcpy, memmove, memset, memcmp,
 strcmp, strncmp, strlen, sprintf, snprintf
 ```
-
-## Technical Details
-
-### Resource Lifecycle Lattice
-
-```
-UNINIT → OPEN → CLOSED
-              → MOVED
-```
-
-States are tracked as bitflags, allowing the lattice to represent uncertainty (e.g. OPEN|CLOSED means the resource is open on some paths and closed on others).
-
-### Leak Detection Scope
-
-Resource leaks are checked at the file-level exit node and the **synthesized** function exit node (a single Return node that all early returns feed into). Early-return nodes are **not** checked individually — only the merged state at the function's synthesized exit is inspected. This prevents duplicate findings where an early-return path reports a definite leak while the merged exit correctly reports a possible leak.
-
-This per-function exit inspection ensures that a variable leaked inside one function is not masked by a same-named variable that is properly closed in a subsequent function.
-
-### Auth Level Lattice
-
-```
-Unauthed < Authed < Admin
-```
-
-Join semantics: take the minimum (conservative). If any path is unauthenticated, the result is unauthenticated.
--- a/docs/detectors/taint.md
+++ b/docs/detectors/taint.md
@ -1,10 +1,8 @@
-# Taint Analysis
+# Taint analysis

-## Summary
+Nyx tracks untrusted data from **sources** (where it enters the program) through assignments and function calls to **sinks** (where it's used dangerously). If the flow reaches a sink without passing a matching **sanitizer**, a finding fires.

-Nyx's taint analysis tracks the flow of untrusted data from **sources** (where data enters the program) through **assignments and function calls** to **sinks** (where dangerous operations happen). If the data reaches a sink without passing through a **sanitizer** with matching capabilities, a finding is emitted.
-
-The engine uses a monotone forward dataflow analysis over a finite lattice with guaranteed termination. Analysis is **intra-procedural with cross-file function summaries** — it does not follow calls into other functions but uses pre-computed summaries of their behavior.
+The engine is a monotone forward dataflow over a finite lattice with guaranteed termination. It's flow-sensitive inside a function, and interprocedural across files via persisted per-function summaries.

 ## Rule ID

@ -12,191 +10,135 @@ The engine uses a monotone forward dataflow analysis over a finite lattice with
 taint-unsanitised-flow (source <line>:<col>)
 ```

-One rule ID covers all taint findings. The parenthetical identifies the specific source location.
+One rule ID, parameterized by the source location. Suppressions can target either the base ID or the full string.

-## What It Detects
+## What it detects

- Environment variables flowing to shell execution (`env::var` → `Command::new`)
- User input flowing to code evaluation (`req.body` → `eval()`)
- File contents flowing to SQL queries (`fs::read_to_string` → `db.execute()`)
- Request parameters flowing to HTML output (`req.query` → `innerHTML`)
- Any source-to-sink flow where the sink's required capability is not stripped by a sanitizer
+- User input flowing to shell execution: `req.body.cmd` → `child_process.exec`
+- User input flowing to code evaluation: `req.query.code` → `eval`
+- User input flowing to SQL: `request.args.get('id')` → `cursor.execute(f"... {id}")`
+- Environment variables flowing to shell: `env::var("CMD")` → `Command::new("sh").arg("-c")`
+- Request parameters flowing to HTML: `req.query.name` → `innerHTML`
+- File contents flowing to privileged sinks: `fs::read_to_string` → `db.execute`
+- Any other source-to-sink flow where the sink's required capability is not stripped along the way

-## What It Cannot Detect
+## What it can't detect

- **Inter-procedural flows without summaries**: If a function isn't summarized (e.g. from a third-party library without source), the taint engine cannot track data through it. It conservatively treats unknown callees as neither propagating nor sanitizing.
- **Flows through data structures**: Taint is tracked per-variable, not per-field. `obj.field = tainted; sink(obj.other_field)` may produce a false positive because taint attaches to `obj` as a whole.
- **Aliasing**: `let y = &x; sink(*y)` — the engine tracks `y` as a fresh variable, not an alias of `x`. This can cause false negatives.
- **Complex control flow**: The analysis is flow-sensitive (respects control flow within a function) but does not track taint through arbitrary loops with complex exit conditions.
- **Implicit flows**: Taint only follows explicit data flow, not information flow through branching (e.g. `if (secret) { x = 1 } else { x = 0 }` does not taint `x`).
+- **Library calls without summaries.** If a callee has no summary (no source, binary-only dependency), Nyx treats it as neither propagating nor sanitizing. This is conservative for sanitization but lossy for propagation.
+- **Taint through struct fields and containers.** Taint attaches to whole variables. `obj.field = tainted; sink(obj.other_field)` can produce a false positive because `obj` itself is tainted.
+- **Aliasing.** `let y = &x; sink(*y)` tracks `y` separately from `x`. Can cause FNs.
+- **Implicit flows.** Taint follows explicit data, not branching signal. `if (secret) x = 1 else x = 0` does not taint `x`.
+- **Globals and statics across functions.** Not tracked across function boundaries.

-## Common False Positives
+## Common false positives

-| Scenario | Why it happens | Mitigation |
-|----------|---------------|------------|
-| Custom sanitizer not recognized | Nyx only knows built-in and configured sanitizers | Add a custom sanitizer rule in config |
-| Taint through struct fields | Variable-level (not field-level) tracking | No current mitigation; field sensitivity is planned |
-| Dead code paths | The engine is path-insensitive within a function (it considers all paths) | Contradiction pruning catches some cases; path-validated findings score lower |
-| Library wrappers | A wrapper around a dangerous function may re-introduce taint that was sanitized by the wrapper | Summarize the wrapper function or add it as a sanitizer |
+| Scenario | Why | Mitigation |
+|---|---|---|
+| Custom sanitizer not recognised | Only built-in + configured sanitizers match | Add a custom sanitizer rule in config |
+| Taint through struct fields | Variable-level tracking, not field-level | No fix yet; field-sensitivity is planned |
+| Dead branches | Path-insensitive within a function | Constraint solving catches trivially infeasible combos; path-validated findings are scored lower |
+| Library wrapper re-introduces taint | Wrapper opaque, or summary marks it as propagating | Summarize the wrapper explicitly or add it as a sanitizer |

-## Common False Negatives
+## Common false negatives

-| Scenario | Why it's missed |
-|----------|----------------|
-| Third-party library calls | No summary available; callee treated as opaque |
-| Taint through global/static variables | Not tracked across function boundaries |
-| Taint through closures/callbacks in some languages | Closure capture analysis is limited (JS/TS/Ruby/Go anonymous functions ARE analyzed) |
-| Flows spanning more than two files | Summary approximation loses precision at depth |
+| Scenario | Why |
+|---|---|
+| Third-party library on the path | No summary available, callee treated opaquely |
+| Globals / statics across function boundaries | Not tracked |
+| Some closure captures | Closure analysis is limited. JS/TS/Ruby/Go anonymous functions passed as callbacks *are* analyzed as separate scopes |
+| Very deep cross-file chains | Summary approximation loses precision at depth |

-## Confidence Signals
+## Confidence signals

-These signals in the output indicate higher-confidence findings:
+Higher confidence:
+- Source + Sink both present in evidence with specific call locations.
+- `source_kind: user_input` (direct attacker control).
+- `path_validated: false`.
+- No dominating guard on the path.
+- Symex produced a witness string (rendered sink value visible in JSON/SARIF `evidence.symbolic.witness`).

-| Signal | What it means |
-|--------|--------------|
-| **Evidence: Source + Sink** | Both endpoints identified with specific function names and locations |
-| **Source kind = user input** | Source is directly controllable by an attacker (req.body, argv, etc.) |
-| **path_validated = false** | No validation guard on the path — higher exploitability |
-| **No guard_kind** | No dominating predicate check (null check, error check, etc.) |
-| **High rank_score** | Multiple confidence signals combined |
+Lower confidence:
+- Path-validated taint (`path_validated: true`).
+- Source is a database read or internal file (pre-validated at insertion is common).
+- Engine note `ForwardBailed` / `PathWidened`. Use `--require-converged` to drop these in strict gates.

-Lower-confidence:
+## Tuning

-| Signal | What it means |
-|--------|--------------|
-| **path_validated = true** | A validation predicate guards the path — may not be exploitable |
-| **guard_kind = "ValidationCall"** | An explicit validation function was called before the sink |
-| **Source kind = database** | Data from DB — may already be validated at insertion time |
-
-## Tuning and Noise Controls
-
-### Add custom sanitizers
-
-If your codebase has a custom sanitizer that Nyx doesn't recognize:
+### Custom sanitizer

 ```toml
 # nyx.local
 [[analysis.languages.javascript.rules]]
 matchers = ["escapeHtml", "sanitizeInput"]
-kind = "sanitizer"
-cap = "html_escape"
+kind     = "sanitizer"
+cap      = "html_escape"
 ```

-Or via CLI:
-```bash
-nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape
-```
+Or: `nyx config add-rule --lang javascript --matcher escapeHtml --kind sanitizer --cap html_escape`.

-### Filter by severity
+### Filter by severity or confidence

 ```bash
-nyx scan . --severity HIGH          # Only high-severity taint findings
-nyx scan . --severity ">=MEDIUM"    # Skip low-severity
+nyx scan . --severity HIGH
+nyx scan . --min-confidence medium
 ```

-### Skip non-production code
-
-By default, findings in `tests/`, `vendor/`, `build/` paths are downgraded one severity tier. To exclude them entirely, add to config:
-
-```toml
-[scanner]
-excluded_directories = ["tests", "vendor", "build", "examples"]
-```
-
-### Disable taint (AST-only mode)
+### Skip dataflow entirely

 ```bash
 nyx scan . --mode ast
 ```

+AST-only mode gives you structural pattern matches without taint.
+
+In the browser UI, taint findings render as a numbered flow walk so you can see each hop the engine took:
+
+<p align="center"><img src="../../assets/screenshots/docs/serve-finding-detail.png" alt="Nyx finding detail: HIGH taint-unsanitised-flow with numbered source → call → sink steps and How to fix guidance" width="900"/></p>
+
 ## Example

-**Vulnerable code** (Rust):
+Rust:
+
 ```rust
 use std::env;
 use std::process::Command;

 fn main() {
-    let cmd = env::var("USER_CMD").unwrap();          // line 5: source
-    Command::new("sh").arg("-c").arg(&cmd).output();   // line 6: sink
+    let cmd = env::var("USER_CMD").unwrap();           // source
+    Command::new("sh").arg("-c").arg(&cmd).output();   // sink
 }
 ```

-**Finding**:
+Finding:
+
 ```
-[HIGH]   taint-unsanitised-flow (source 5:15)  src/main.rs:6:5
-         Source: env::var("USER_CMD") at 5:15
-         Sink: Command::new("sh").arg("-c")
-         Score: 76
+[HIGH] taint-unsanitised-flow (source 5:15)  src/main.rs:6:5
+       Unsanitised user input flows from env::var → Command::new
+       Source: env::var (5:15)
+       Sink:   Command::new
 ```

-**Safe alternative**:
-```rust
-use std::env;
-use std::process::Command;
+Safe rewrite: drop the shell and pass the value as argv directly (`Command::new(&cmd).output()`), or validate against an allowlist before passing to the shell.

-fn main() {
-    let cmd = env::var("USER_CMD").unwrap();
-    // Use the value as a direct argument, not a shell command
-    Command::new(&cmd).output();
-    // Or validate against an allowlist
-}
-```
+## Capabilities

-## Technical Details
+Sources, sanitizers, and sinks are linked by named capabilities. A sanitizer only clears taint for the cap it declares. A sink only fires when the remaining taint still carries its required cap.

-### Capability System
+| Capability | Typical source | Typical sanitizer | Typical sink |
+|---|---|---|---|
+| `env_var` | `env::var`, `getenv`, `process.env` | | |
+| `html_escape` | | `html.escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
+| `shell_escape` | | `shlex.quote`, `shell_escape::escape` | `system`, `Command::new`, `eval` |
+| `url_encode` | | `encodeURIComponent` | `location.href`, HTTP client URL arg |
+| `json_parse` | | `JSON.parse` | |
+| `file_io` | | `os.path.realpath`, `filepath.Clean` | `open`, `fs::read_to_string`, `send_file` |
+| `fmt_string` | | | `printf(var)` |
+| `sql_query` | | parameterized query binders | `cursor.execute`, `db.query` with concatenation |
+| `deserialize` | | | `pickle.loads`, `yaml.load`, `Marshal.load` |
+| `ssrf` | | URL-prefix locks | `requests.get`, `fetch`, `HttpClient.send` |
+| `code_exec` | | | `eval`, `exec`, `Function` |
+| `crypto` | | | weak-algorithm constructors |
+| `unauthorized_id` | request-bound scoped IDs (Rust auth analysis) | ownership check | row-level write |
+| `all` | Sources typically use `all` so they match any sink | | |

-Taint uses a bitflag capability system to match sources with appropriate sanitizers and sinks:
-
-| Capability | Bit | Sources | Sanitizers | Sinks |
-|-----------|-----|---------|------------|-------|
-| `ENV_VAR` | 0x01 | `env::var`, `getenv` | — | — |
-| `HTML_ESCAPE` | 0x02 | — | `html_escape`, `DOMPurify.sanitize` | `innerHTML`, `document.write` |
-| `SHELL_ESCAPE` | 0x04 | — | `shell_escape` | `Command::new`, `system()`, `eval()` |
-| `URL_ENCODE` | 0x08 | — | `encodeURIComponent` | `location.href` |
-| `JSON_PARSE` | 0x10 | — | `JSON.parse` | — |
-| `FILE_IO` | 0x20 | — | `filepath.Clean`, `basename`, `os.path.realpath` | `fopen`, `open`, `send_file`, `fs::read_to_string` |
-| `FMT_STRING` | 0x40 | — | — | `printf(var)` |
-
-Sources typically use `Cap::all()` to match any sink. A sanitizer strips specific capability bits. A finding fires when a tainted variable reaches a sink and the taint still has the matching capability bit set.
-
-### Nested Function Analysis
-
-The CFG builder recursively discovers function expressions nested inside call arguments:
-
- **JavaScript/TypeScript**: `function_expression`, `arrow_function` inside call arguments (e.g., Express route handlers)
- **Ruby**: `do_block` and `block` nodes (e.g., Sinatra `get '/path' do...end`)
- **Go**: `func_literal` (anonymous function literals)
-
-Each nested function is walked as a separate scope and receives a unique identifier (`<anon@{byte_offset}>`) to prevent collisions when multiple anonymous functions exist in the same file.
-
-### Chained Call Classification
-
-Method chains like `r.URL.Query().Get("host")` are normalized by stripping internal `()` segments between `.` separators. The classifier matches against both the original text and the normalized form, enabling rules like `r.URL` to match within `r.URL.Query.Get`.
-
-### Nested Call Fallback
-
-When the outermost call in an expression doesn't classify as a source/sink, the engine tries all nested inner calls. This handles patterns like `str(eval(expr))` where `str` is not a sink but the inner `eval` is.
-
-### Rust `if let` / `while let` Pattern Bindings
-
-The CFG builder recognizes Rust `let_condition` nodes inside `if` and `while` expressions. The value expression is classified for source/sink labels, and the pattern binding is extracted as a variable definition:
-
-```rust
-if let Ok(cmd) = env::var("CMD") {
-    // cmd is tainted — env::var is a source, cmd is the binding
-    Command::new("sh").arg("-c").arg(&cmd).output();  // taint-unsanitised-flow
-}
-```
-
-This also works for `while let` patterns.
-
-### JS/TS Two-Level Solve
-
-For JavaScript and TypeScript, taint analysis uses a two-level approach:
-
-1. **Level 1**: Solve top-level code (module scope)
-2. **Level 2**: Solve each function seeded with the converged top-level state
-
-This prevents false positives from cross-function taint leakage while preserving global-to-function flows.
+Sources typically use `cap = "all"` so they match every sink. Sinks declare the specific cap they need. Sanitizers only clear the cap they name.