Flakestorm — Automated Robustness Testing for AI Agents. Stop guessing if your agent really works. FlakeStorm generates adversarial mutations and exposes failures your manual tests and evals miss. https://flakestorm.com
Find a file
2026-04-07 23:40:41 +08:00
.github/ISSUE_TEMPLATE Update README.md and CONTRIBUTING.md to enhance project visibility and support for new contributors. Added PyPI version and download badges, build status, and latest release information to README.md. Introduced a section in CONTRIBUTING.md for finding good first issues, providing guidance for beginners on how to contribute effectively. 2026-01-13 21:39:50 +08:00
docs Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new --output option for saving reports, clarified the use of --min-score, and improved descriptions of the seed configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs. 2026-03-12 20:05:51 +08:00
examples Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new --output option for saving reports, clarified the use of --min-score, and improved descriptions of the seed configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs. 2026-03-12 20:05:51 +08:00
rust Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
src/flakestorm Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new --output option for saving reports, clarified the use of --min-score, and improved descriptions of the seed configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs. 2026-03-12 20:05:51 +08:00
tests Enhance documentation and replay functionality in Flakestorm. Updated README to clarify V2 Spec and added references to LangSmith sources in configuration guide. Improved replay regression capabilities by allowing imports from LangSmith projects and runs, with filtering options. Added new classes for LangSmith project and run sources in the configuration. Updated replay loader to support project imports and refined session resolution logic. 2026-03-07 02:04:55 +08:00
.gitignore Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new --output option for saving reports, clarified the use of --min-score, and improved descriptions of the seed configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs. 2026-03-12 20:05:51 +08:00
.pre-commit-config.yaml Refactor Entropix to FlakeStorm 2025-12-29 11:15:18 +08:00
BUILD_FIX.md Add pre-flight validation, flexible response handling, and improved error detection - Add pre-flight check to validate agent with first golden prompt before mutations - Improve response extraction to handle various agent response formats automatically - Add support for non-JSON responses (plain text, HTML) - Enhance error detection for HTTP 200 responses with error fields - Add comprehensive auto-detection for common response field names - Improve JSON parsing error handling with graceful fallbacks - Add example YAML config for GenerateSearchQueries agent - Update documentation with build and installation fixes 2026-01-02 15:21:20 +08:00
Cargo.toml Refactor Entropix to FlakeStorm 2025-12-29 11:15:18 +08:00
CHAOS_ENGINE.md Create CHAOS_ENGINE.md 2026-04-07 23:37:54 +08:00
FIX_INSTALL.md Add pre-flight validation, flexible response handling, and improved error detection - Add pre-flight check to validate agent with first golden prompt before mutations - Improve response extraction to handle various agent response formats automatically - Add support for non-JSON responses (plain text, HTML) - Enhance error detection for HTTP 200 responses with error fields - Add comprehensive auto-detection for common response field names - Improve JSON parsing error handling with graceful fallbacks - Add example YAML config for GenerateSearchQueries agent - Update documentation with build and installation fixes 2026-01-02 15:21:20 +08:00
flakestorm-20260102-233336.html Add files via upload 2026-01-12 19:42:41 +08:00
flakestorm.yaml Revise README.md to enhance clarity and user experience by updating the features section, streamlining the quick start guide, and introducing a new section on future improvements for zero-setup usage. The changes aim to provide a more intuitive overview of Flakestorm's capabilities and installation process. 2026-01-04 23:28:43 +08:00
flakestorm.yaml.example Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types. 2026-01-01 17:28:05 +08:00
flakestorm_demo.gif Update model configuration and enhance documentation for improved user guidance - Change default model to "gemma3:1b" in flakestorm-generate-search-queries.yaml and increase mutation count from 3 to 20 - Revise README.md to include demo visuals and model recommendations based on system RAM - Expand USAGE_GUIDE.md with detailed model selection criteria and installation instructions - Enhance HTML report generation to include actionable recommendations for failed mutations and executive summary insights. 2026-01-02 20:01:12 +08:00
flakestorm_report1.png Add new AI agent for generating search queries using Google Gemini - Introduce keywords_extractor_agent with robust error handling and response parsing - Include multiple fallback strategies for query generation - Update README.md and documentation to reflect new agent capabilities and setup instructions - Remove outdated broken_agent example and associated files. 2026-01-02 21:52:56 +08:00
flakestorm_report2.png Update flakestorm_report2.png to reflect recent changes in report generation and visualization enhancements. 2026-01-02 21:56:56 +08:00
flakestorm_report3.png Add new AI agent for generating search queries using Google Gemini - Introduce keywords_extractor_agent with robust error handling and response parsing - Include multiple fallback strategies for query generation - Update README.md and documentation to reflect new agent capabilities and setup instructions - Remove outdated broken_agent example and associated files. 2026-01-02 21:52:56 +08:00
flakestorm_report4.png Add new AI agent for generating search queries using Google Gemini - Introduce keywords_extractor_agent with robust error handling and response parsing - Include multiple fallback strategies for query generation - Update README.md and documentation to reflect new agent capabilities and setup instructions - Remove outdated broken_agent example and associated files. 2026-01-02 21:52:56 +08:00
flakestorm_report5.png Add new AI agent for generating search queries using Google Gemini - Introduce keywords_extractor_agent with robust error handling and response parsing - Include multiple fallback strategies for query generation - Update README.md and documentation to reflect new agent capabilities and setup instructions - Remove outdated broken_agent example and associated files. 2026-01-02 21:52:56 +08:00
LICENSE Refactor Entropix to FlakeStorm 2025-12-29 11:15:18 +08:00
pyproject.toml Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
README.md Update README.md 2026-04-07 23:40:41 +08:00
RELEASE_NOTES.md Update to 24 mutation types: Add release notes, update development status to Production/Stable, and expand test coverage for all mutation types 2026-01-15 15:04:10 +08:00
ROADMAP.md Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
test_wheel_contents.sh Add pre-flight validation, flexible response handling, and improved error detection - Add pre-flight check to validate agent with first golden prompt before mutations - Improve response extraction to handle various agent response formats automatically - Add support for non-JSON responses (plain text, HTML) - Enhance error detection for HTTP 200 responses with error fields - Add comprehensive auto-detection for common response field names - Improve JSON parsing error handling with graceful fallbacks - Add example YAML config for GenerateSearchQueries agent - Update documentation with build and installation fixes 2026-01-02 15:21:20 +08:00

Flakestorm: The Reliability Layer for Agentic Engineering 🤖

Flakestorm is a suite of infrastructure and observability tools designed to solve the Trust Gap in autonomous software development. As we move from human-written to agent-generated code, we provide the safety rails, cost-controls, and verification protocols required for production-grade AI.


🛠 The Flakestorm Stack

Our ecosystem addresses the four primary failure modes of AI agents:

🧪 Flakestorm Chaos (This Repo)

The Auditor (Resilience) Chaos Engineering for AI Agents. We deliberately inject failures, tool-latency, and adversarial inputs to verify that your agents degrade gracefully and adhere to behavioral contracts under fire.

  • Core Tech: Failure Injection, Agentic Unit Testing, Red Teaming.

🧹 Session-Sift

The Optimizer (Context & Memory) A semantic "Garbage Collector" for LLM sessions. It prunes context rot, resolved errors, and terminal noise to slash token costs by up to 60% while preventing semantic drift in long-running chats.

  • Core Tech: MCP Server, Heuristic Pruning, Token FinOps.

⚖️ VibeDiff

The Notary (Semantic Intent) A high-performance Rust auditor that verifies if agentic code changes actually match the developer's stated intent. It bridges the gap between "The Git Diff" and "The Vibe."

  • Core Tech: Rust, Tree-sitter AST Analysis, Semantic Audit.

🛡️ Veraxiv

The Shield (Verification & Attestation) The final gate for autonomous systems. Veraxiv provides a high-integrity verification layer and tamper-proof attestation for machine-generated actions and outputs.

  • Core Tech: Verification Protocol, Compliance Audit, Attestation.

🔄 The Reliable Agent Loop

We believe the future of engineering isn't just "Better Models," but Better Infrastructure. 1. Sift: Clean the input memory to maximize model intelligence. 2. Stress: Test the agent's logic through deliberate chaos (Flakestorm). 3. Audit: Verify the output code against the human's intent (VibeDiff). 4. Attest: Sign off on the final action with a verifiable audit trail (Veraxiv).