flakestorm/README.md

# Flakestorm

<p align="center">
  <strong>The Agent Reliability Engine</strong><br>
  <em>Chaos Engineering for Production AI Agents</em>
</p>

<p align="center">
  <a href="https://github.com/flakestorm/flakestorm/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="License">
  </a>
  <a href="https://github.com/flakestorm/flakestorm">
    <img src="https://img.shields.io/github/stars/flakestorm/flakestorm?style=social" alt="GitHub Stars">
  </a>
  <a href="https://pypi.org/project/flakestorm/">
    <img src="https://img.shields.io/pypi/v/flakestorm.svg" alt="PyPI version">
  </a>
  <a href="https://pypi.org/project/flakestorm/">
    <img src="https://img.shields.io/pypi/dm/flakestorm.svg" alt="PyPI downloads">
  </a>

  <a href="https://github.com/flakestorm/flakestorm/releases">
    <img src="https://img.shields.io/github/v/release/flakestorm/flakestorm" alt="Latest Release">
  </a>
</p>


---

## The Problem

Production AI agents are **distributed systems**: they depend on LLM APIs, tools, context windows, and multi-step orchestration. Each of these can fail. Today’s tools don’t answer the questions that matter:

- **What happens when the agent’s tools fail?** — A search API returns 503. A database times out. Does the agent degrade gracefully, hallucinate, or fabricate data?
- **Does the agent always follow its rules?** — Must it always cite sources? Never return PII? Are those guarantees maintained when the environment is degraded?
- **Did we fix the production incident?** — After a failure in prod, how do we prove the fix and prevent regression?

Observability tools tell you *after* something broke. Eval libraries focus on output quality, not resilience. **No tool systematically breaks the agent’s environment to test whether it survives.** Flakestorm fills that gap.

## The Solution: Chaos Engineering for AI Agents

**Flakestorm** is a **chaos engineering platform** for production AI agents. Like Chaos Monkey for infrastructure, Flakestorm deliberately injects failures into the tools, APIs, and LLMs your agent depends on — then verifies that the agent still obeys its behavioral contract and recovers gracefully.

> **Other tools test if your agent gives good answers. Flakestorm tests if your agent survives production.**

### Three Pillars

| Pillar | What it does | Question answered |
|--------|----------------|--------------------|
| **Environment Chaos** | Inject faults into tools and LLMs (timeouts, errors, rate limits, malformed responses) | *Does the agent handle bad environments?* |
| **Behavioral Contracts** | Define invariants (rules the agent must always follow) and verify them across a matrix of chaos scenarios | *Does the agent obey its rules when the world breaks?* |
| **Replay Regression** | Import real production failure sessions and replay them as deterministic tests | *Did we fix this incident?* |

On top of that, Flakestorm still runs **adversarial prompt mutations** (22+ mutation types; max 50 per run in OSS) so you can test bad inputs and bad environments together.

**Scores at a glance**

| What you run | Score you get |
|--------------|----------------|
| `flakestorm run` | **Robustness score** (0–1): how well the agent handled adversarial prompts. |
| `flakestorm run --chaos --chaos-only` | **Chaos resilience** (same 0–1 metric): how well the agent handled a broken environment (no mutations, only chaos). |
| `flakestorm contract run` | **Resilience score** (0–100%): contract × chaos matrix, severity-weighted. |
| `flakestorm replay run …` | Per-session pass/fail; aggregate **replay regression** score when run via `flakestorm ci`. |
| `flakestorm ci` | **Overall (weighted)** score combining mutation robustness, chaos resilience, contract compliance, and replay regression — one number for CI gates. |

**Commands by scope**

| Scope | Command | What runs |
|-------|---------|-----------|
| **V1 only / mutation only** | `flakestorm run` | Just adversarial mutations → agent → invariants. No chaos, no contract matrix, no replay. Use a v1.0 config or omit `--chaos` so you get only the classic robustness score. |
| **Mutation + chaos** | `flakestorm run --chaos` | Mutations run against a fault-injected agent (tool/LLM chaos). |
| **Chaos only** | `flakestorm run --chaos --chaos-only` | No mutations; golden prompts only, with chaos. Single chaos resilience score. |
| **Contract only** | `flakestorm contract run` | Contract × chaos matrix; resilience score. |
| **Replay only** | `flakestorm replay run path/to/replay.yaml -c flakestorm.yaml` | One or more replay sessions. |
| **ALL (full CI)** | `flakestorm ci` | Mutation run + contract (if configured) + chaos-only run (if chaos configured) + all replay sessions (if configured); then **overall** weighted score. |

**Context attacks** are part of environment chaos: adversarial content is applied to **tool responses or to the input before invoke**, not to the user prompt itself. The chaos interceptor applies **memory_poisoning** to the user input before each invoke; LLM faults (timeout, truncated, empty, garbage, rate_limit, response_drift) are applied in the interceptor (timeout before the call, others after the response). Types: **indirect_injection** (tool returns valid-looking content with hidden instructions), **memory_poisoning** (payload into input before invoke; strategy `prepend` | `append` | `replace`), **system_prompt_leak_probe** (contract assertion using probe prompts). Config: list of attack configs or dict (e.g. `memory_poisoning: { payload: "...", strategy: "append" }`). Scenarios in the contract chaos matrix can each define `context_attacks`. See [Context Attacks](docs/CONTEXT_ATTACKS.md).

## Production-First by Design

Flakestorm is designed for teams already running AI agents in production. Most production agents use cloud LLM APIs (OpenAI, Gemini, Claude, Perplexity, etc.) and face real traffic, real users, and real abuse patterns.

**Why local LLMs exist in the open source version:**
- Fast experimentation and proofs-of-concept
- CI-friendly testing without external dependencies
- Transparent, extensible chaos engine

**Why production chaos should mirror production reality:**
Production agents run on cloud infrastructure, process real user inputs, and scale dynamically. Chaos testing should reflect this reality—testing against the same infrastructure, scale, and patterns your agents face in production.

The cloud version removes operational friction: no local model setup, no environment configuration, scalable mutation runs, shared dashboards, and team collaboration. Open source proves the value; cloud delivers production-grade chaos engineering.

## Who Flakestorm Is For

- **Teams shipping AI agents to production** — Catch failures before users do
- **Engineers running agents behind APIs** — Test against real-world abuse patterns
- **Teams already paying for LLM APIs** — Reduce regressions and production incidents
- **CI/CD pipelines (Cloud only)** — Automated reliability gates, scheduled runs, and native pipeline integrations; OSS is for local and scripted runs

Flakestorm is built for production-grade agents handling real traffic. While it works great for exploration and hobby projects, it's designed to catch the failures that matter when agents are deployed at scale.


#
## Demo

### flakestorm in Action

![flakestorm Demo](flakestorm_demo.gif)

*Watch Flakestorm run chaos and mutation tests against your agent in real-time*

### Test Report

![flakestorm Test Report 1](flakestorm_report1.png)

![flakestorm Test Report 2](flakestorm_report2.png)

![flakestorm Test Report 3](flakestorm_report3.png)

![flakestorm Test Report 4](flakestorm_report4.png)

![flakestorm Test Report 5](flakestorm_report5.png)

*Interactive HTML reports with detailed failure analysis and recommendations*

## How Flakestorm Works

Flakestorm supports several modes; you can use one or combine them:

- **Chaos only** — Golden prompts → agent with fault-injected tools/LLM → invariants. *Does the agent handle bad environments?*
- **Contract** — Golden prompts → agent under each chaos scenario → verify named invariants across a matrix. *Does the agent obey its rules under every failure mode?*
- **Replay** — Recorded production input + recorded tool responses → agent → contract. *Did we fix this incident?*
- **Mutation (optional)** — Golden prompts → adversarial mutations (24 types, max 50/run) → agent (optionally under chaos) → invariants. *Does the agent handle bad inputs (and optionally bad environments)?*

You define **golden prompts**, **invariants** (or a full **contract** with severity and chaos matrix), and optionally **chaos** (tool/LLM faults) and **replay** sessions. Flakestorm runs the chosen mode(s), checks responses against your rules, and produces a **robustness score** (mutation or chaos-only runs) or **resilience score** (contract run), plus HTML report. Use `flakestorm run`, `flakestorm contract run`, `flakestorm replay run`, or `flakestorm ci` for the combined overall score (OSS: run from CLI or your own scripts; **native CI/CD integrations** — scheduled runs, pipeline plugins — are **Cloud only**).

For the full **V1 vs V2 flow** (mutation-only vs four pillars, contract matrix isolation, resilience score formula), see the [Usage Guide](docs/USAGE_GUIDE.md#how-it-works).

> **Note**: Mutation generation uses a local LLM (Ollama) or cloud APIs (OpenAI, Claude, Gemini). API keys via environment variables only. See [LLM Providers](docs/LLM_PROVIDERS.md).

## Features

### Chaos engineering pillars

- **Environment Chaos** — Inject faults into tools and LLMs (timeouts, errors, rate limits, malformed responses, built-in profiles). **Context attacks**: indirect_injection, memory_poisoning (input before invoke; strategy: prepend/append/replace), system_prompt_leak_probe; config as list or dict. [→ Environment Chaos](docs/ENVIRONMENT_CHAOS.md)
- **Behavioral Contracts** — Named invariants × chaos matrix; severity-weighted resilience score. Optional **reset** per cell: `agent.reset_endpoint` (HTTP) or `agent.reset_function` (e.g. `myagent:reset_state`). **system_prompt_leak_probe**: use `probes` (list of prompts) on an invariant to run probe prompts and verify response (e.g. excludes_pattern). **behavior_unchanged**: baseline `auto` or manual. Stateful agents: warn if no reset and responses differ. [→ Behavioral Contracts](docs/BEHAVIORAL_CONTRACTS.md)
- **Replay Regression** — Import production failures (manual or LangSmith), replay deterministically, verify against contracts. Sessions can reference a `file` or inline id/input; sources support LangSmith project/run with optional auto_import. [→ Replay Regression](docs/REPLAY_REGRESSION.md)

### Supporting capabilities

- **Adversarial mutations** — 24 mutation types (prompt-level and system/network-level); max 50 mutations per run in OSS. [→ Test Scenarios](docs/TEST_SCENARIOS.md) for mutation, chaos, contract, and replay examples.
- **Invariants & assertions** — Deterministic checks, semantic similarity, safety (PII, refusal); configurable per contract.
- **Robustness score** — For mutation runs: a single weighted score (0–1) of how well the agent handled adversarial prompts. Reported in HTML/JSON and CLI (`results.statistics.robustness_score`).
- **Unified resilience score** — For full CI: weighted combination of **mutation robustness**, chaos resilience, contract compliance, and replay regression; weights (mutation, chaos, contract, replay) configurable in YAML and must sum to 1.0.
- **Context attacks** — indirect_injection (into tool/context), memory_poisoning (into input before invoke; strategy: prepend/append/replace), system_prompt_leak_probe (contract assertion with probe prompts). Config: list or dict. [→ Context Attacks](docs/CONTEXT_ATTACKS.md)
- **LLM providers** — Ollama, OpenAI, Anthropic, Google (Gemini); API keys via env only. [→ LLM Providers](docs/LLM_PROVIDERS.md)
- **Reports** — Interactive HTML and JSON; contract matrix and replay reports.

**Try it:** [Working example](examples/v2_research_agent/README.md) with chaos, contracts, and replay from the CLI.

## Open Source vs Cloud

**Open Source (Always Free):**
- Core chaos engine with all 24 mutation types (max 50 per run; no artificial feature gating)
- Local execution for fast experimentation
- Run from CLI or your own scripts (no native CI/CD; that’s Cloud only)
- Full transparency and extensibility
- Perfect for proofs-of-concept and development workflows

**Cloud (In Progress / Waitlist):**
- Zero-setup chaos testing (no Ollama, no local models)
- **CI/CD** — native pipeline integrations, scheduled runs, reliability gates
- Scalable runs (thousands of mutations)
- Shared dashboards & reports
- Team collaboration
- Production-grade reliability workflows

**Our Philosophy:** We do not cripple the OSS version. Cloud exists to remove operational pain, not to lock features. Open source proves the value; cloud delivers production-grade chaos engineering at scale.

# Try Flakestorm in ~60 Seconds

This is the fastest way to try Flakestorm locally. Production teams typically use the cloud version (waitlist). Here's the local quickstart:

1. **Install flakestorm** (if you have Python 3.10+):
   ```bash
   pip install flakestorm
   ```

2. **Initialize a test configuration**:
   ```bash
   flakestorm init
   ```

3. **Point it at your agent** (edit `flakestorm.yaml`):
   ```yaml
   agent:
     endpoint: "http://localhost:8000/invoke"  # Your agent's endpoint
     type: "http"
   ```

4. **Run your first test**:
   ```bash
   flakestorm run
   ```
   With a [v2 config](examples/v2_research_agent/README.md) you can also run `flakestorm run --chaos`, `flakestorm contract run`, `flakestorm replay run`, or `flakestorm ci` to exercise all pillars.

That's it! You get a **robustness score** (for mutation runs) or a **resilience score** (when using chaos/contract/replay), plus a report showing how your agent handles chaos and adversarial inputs.

> **Note**: For full local execution (including mutation generation), you'll need Ollama installed. See the [Usage Guide](docs/USAGE_GUIDE.md) for complete setup instructions.


## Roadmap

See [Roadmap](ROADMAP.md) for the full plan. Highlights:

- **V3 — Multi-agent chaos** — Chaos engineering for systems of multiple agents: fault injection across agent-to-agent and tool boundaries, contract verification for multi-agent workflows, and replay of multi-agent production incidents.
- **Pattern engine** — 110+ prompt-injection and 52+ PII detection patterns; Rust-backed, sub-50ms.
- **Cloud** — Scalable runs, team dashboards, scheduled chaos, CI integrations.
- **Enterprise** — On-premise, audit logging, compliance certifications.

## Documentation

### Getting Started
- [📖 Usage Guide](docs/USAGE_GUIDE.md) - Complete end-to-end guide (includes local setup)
- [⚙️ Configuration Guide](docs/CONFIGURATION_GUIDE.md) - All configuration options
- [🔌 Connection Guide](docs/CONNECTION_GUIDE.md) - How to connect FlakeStorm to your agent
- [🧪 Test Scenarios](docs/TEST_SCENARIOS.md) - Real-world examples for mutation, chaos, contract, and replay (V2)
- [📂 Example: chaos, contracts & replay](examples/v2_research_agent/README.md) - Working agent and config you can run
- [🔗 Integrations Guide](docs/INTEGRATIONS_GUIDE.md) - HuggingFace models & semantic similarity
- [🤖 LLM Providers](docs/LLM_PROVIDERS.md) - OpenAI, Claude, Gemini (env-only API keys)
- [🌪️ Environment Chaos](docs/ENVIRONMENT_CHAOS.md) - Tool/LLM fault injection
- [📜 Behavioral Contracts](docs/BEHAVIORAL_CONTRACTS.md) - Contract × chaos matrix
- [🔄 Replay Regression](docs/REPLAY_REGRESSION.md) - Import and replay production failures
- [🛡️ Context Attacks](docs/CONTEXT_ATTACKS.md) - Indirect injection, memory poisoning
- [📐 V2 Spec](docs/V2_SPEC.md) - Score formula, reset, Python tools

### For Developers
- [🏗️ Architecture & Modules](docs/MODULES.md) - How the code works
- [❓ Developer FAQ](docs/DEVELOPER_FAQ.md) - Q&A about design decisions
- [🤝 Contributing](docs/CONTRIBUTING.md) - How to contribute

### Troubleshooting
- [🔧 Fix Installation Issues](FIX_INSTALL.md) - Resolve `ModuleNotFoundError: No module named 'flakestorm.reports'`
- [🔨 Fix Build Issues](BUILD_FIX.md) - Resolve `pip install .` vs `pip install -e .` problems

### Support
- [🐛 Issue Templates](https://github.com/flakestorm/flakestorm/tree/main/.github/ISSUE_TEMPLATE) - Use our issue templates to report bugs, request features, or ask questions

### Reference
- [📋 API Specification](docs/API_SPECIFICATION.md) - API reference
- [🧪 Testing Guide](docs/TESTING_GUIDE.md) - How to run and write tests

## Cloud Version (Early Access)

For teams running production AI agents, the cloud version removes operational friction: zero-setup chaos testing without local model configuration, scalable mutation runs that mirror production traffic, shared dashboards for team collaboration, and continuous chaos runs integrated into your reliability workflows.

The cloud version is currently in early access. [Join the waitlist](https://flakestorm.com) to get access as we roll it out.

## License

Apache 2.0 - See [LICENSE](LICENSE) for details.

---

<p align="center">
  <strong>Tested with Flakestorm</strong><br>
  <img src="https://img.shields.io/badge/tested%20with-flakestorm-brightgreen" alt="Tested with Flakestorm">
</p>

---

<p align="center">
  ❤️ <a href="https://github.com/sponsors/flakestorm">Sponsor Flakestorm on GitHub</a>
</p>
-												Update README.md to correct spelling of "FlakeStorm" to "Flakestorm" for consistency throughout the documentation.

											
										
										
											2025-12-30 16:36:24 +08:00
+								# Flakestorm
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
 								<p align="center">
 								  <strong>The Agent Reliability Engine</strong><br>
-												Enhance README.md to clarify the purpose and functionality of Flakestorm for production AI agents. Update descriptions to emphasize chaos testing, adversarial input handling, and CI/CD integration. Add sections on target users and production deployment patterns, ensuring comprehensive guidance for teams shipping AI agents.

											
										
										
											2026-01-05 16:53:23 +08:00
+								  <em>Chaos Engineering for Production AI Agents</em>
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								</p>
 								<p align="center">
-												Refactor Entropix to FlakeStorm

- Rename all instances of Entropix to FlakeStorm
- Rename package from entropix to flakestorm
- Update all class names (EntropixConfig -> FlakeStormConfig, EntropixRunner -> FlakeStormRunner)
- Update Rust module from entropix_rust to flakestorm_rust
- Update README: remove cloud comparison, update links to flakestorm.com
- Update .gitignore to allow docs files referenced in README
- Add origin remote for VS Code compatibility
- Fix missing imports and type references
- All imports and references updated throughout codebase

											
										
										
											2025-12-29 11:15:18 +08:00
+								  <a href="https://github.com/flakestorm/flakestorm/blob/main/LICENSE">
-												Remove PUSH_INSTRUCTIONS.md as it is no longer needed. Update README.md to reflect the new license and add GitHub stars badge.

											
										
										
											2025-12-29 13:10:55 +08:00
+								    <img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="License">
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								  </a>
-												Remove PUSH_INSTRUCTIONS.md as it is no longer needed. Update README.md to reflect the new license and add GitHub stars badge.

											
										
										
											2025-12-29 13:10:55 +08:00
+								  <a href="https://github.com/flakestorm/flakestorm">
 								    <img src="https://img.shields.io/github/stars/flakestorm/flakestorm?style=social" alt="GitHub Stars">
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								  </a>
-												Update README.md and CONTRIBUTING.md to enhance project visibility and support for new contributors. Added PyPI version and download badges, build status, and latest release information to README.md. Introduced a section in CONTRIBUTING.md for finding good first issues, providing guidance for beginners on how to contribute effectively.

											
										
										
											2026-01-13 21:39:50 +08:00
+								  <a href="https://pypi.org/project/flakestorm/">
 								    <img src="https://img.shields.io/pypi/v/flakestorm.svg" alt="PyPI version">
 								  </a>
 								  <a href="https://pypi.org/project/flakestorm/">
 								    <img src="https://img.shields.io/pypi/dm/flakestorm.svg" alt="PyPI downloads">
 								  </a>
-												Update README and usage guide to reflect changes in mutation types and clarify V1/V2 flow. Increased mutation types from 22 to 24 and added details on the new V2 features, including environment chaos and behavioral contracts. Enhanced documentation for clarity on scoring mechanisms and command usage.

											
										
										
											2026-03-09 12:45:42 +08:00
-												Update README.md and CONTRIBUTING.md to enhance project visibility and support for new contributors. Added PyPI version and download badges, build status, and latest release information to README.md. Introduced a section in CONTRIBUTING.md for finding good first issues, providing guidance for beginners on how to contribute effectively.

											
										
										
											2026-01-13 21:39:50 +08:00
+								  <a href="https://github.com/flakestorm/flakestorm/releases">
 								    <img src="https://img.shields.io/github/v/release/flakestorm/flakestorm" alt="Latest Release">
 								  </a>
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								</p>
-												Update README.md
											
										
										
											2026-01-16 23:31:39 +08:00
-												Update README.md
											
										
										
											2026-01-16 23:30:38 +08:00
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								---
 								## The Problem
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								Production AI agents are **distributed systems**: they depend on LLM APIs, tools, context windows, and multi-step orchestration. Each of these can fail. Today’s tools don’t answer the questions that matter:
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								- **What happens when the agent’s tools fail?** — A search API returns 503. A database times out. Does the agent degrade gracefully, hallucinate, or fabricate data?
 								- **Does the agent always follow its rules?** — Must it always cite sources? Never return PII? Are those guarantees maintained when the environment is degraded?
 								- **Did we fix the production incident?** — After a failure in prod, how do we prove the fix and prevent regression?
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								Observability tools tell you *after* something broke. Eval libraries focus on output quality, not resilience. **No tool systematically breaks the agent’s environment to test whether it survives.** Flakestorm fills that gap.
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								## The Solution: Chaos Engineering for AI Agents
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								**Flakestorm** is a **chaos engineering platform** for production AI agents. Like Chaos Monkey for infrastructure, Flakestorm deliberately injects failures into the tools, APIs, and LLMs your agent depends on — then verifies that the agent still obeys its behavioral contract and recovers gracefully.
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								> **Other tools test if your agent gives good answers. Flakestorm tests if your agent survives production.**
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								### Three Pillars
 								| Pillar | What it does | Question answered |
 								|--------|----------------|--------------------|
 								| **Environment Chaos** | Inject faults into tools and LLMs (timeouts, errors, rate limits, malformed responses) | *Does the agent handle bad environments?* |
 								| **Behavioral Contracts** | Define invariants (rules the agent must always follow) and verify them across a matrix of chaos scenarios | *Does the agent obey its rules when the world breaks?* |
 								| **Replay Regression** | Import real production failure sessions and replay them as deterministic tests | *Did we fix this incident?* |
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								On top of that, Flakestorm still runs **adversarial prompt mutations** (22+ mutation types; max 50 per run in OSS) so you can test bad inputs and bad environments together.
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
 								**Scores at a glance**
 								| What you run | Score you get |
 								|--------------|----------------|
 								| `flakestorm run` | **Robustness score** (0–1): how well the agent handled adversarial prompts. |
 								| `flakestorm run --chaos --chaos-only` | **Chaos resilience** (same 0–1 metric): how well the agent handled a broken environment (no mutations, only chaos). |
 								| `flakestorm contract run` | **Resilience score** (0–100%): contract × chaos matrix, severity-weighted. |
 								| `flakestorm replay run …` | Per-session pass/fail; aggregate **replay regression** score when run via `flakestorm ci`. |
 								| `flakestorm ci` | **Overall (weighted)** score combining mutation robustness, chaos resilience, contract compliance, and replay regression — one number for CI gates. |
 								**Commands by scope**
 								| Scope | Command | What runs |
 								|-------|---------|-----------|
 								| **V1 only / mutation only** | `flakestorm run` | Just adversarial mutations → agent → invariants. No chaos, no contract matrix, no replay. Use a v1.0 config or omit `--chaos` so you get only the classic robustness score. |
 								| **Mutation + chaos** | `flakestorm run --chaos` | Mutations run against a fault-injected agent (tool/LLM chaos). |
 								| **Chaos only** | `flakestorm run --chaos --chaos-only` | No mutations; golden prompts only, with chaos. Single chaos resilience score. |
 								| **Contract only** | `flakestorm contract run` | Contract × chaos matrix; resilience score. |
 								| **Replay only** | `flakestorm replay run path/to/replay.yaml -c flakestorm.yaml` | One or more replay sessions. |
 								| **ALL (full CI)** | `flakestorm ci` | Mutation run + contract (if configured) + chaos-only run (if chaos configured) + all replay sessions (if configured); then **overall** weighted score. |
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								**Context attacks** are part of environment chaos: adversarial content is applied to **tool responses or to the input before invoke**, not to the user prompt itself. The chaos interceptor applies **memory_poisoning** to the user input before each invoke; LLM faults (timeout, truncated, empty, garbage, rate_limit, response_drift) are applied in the interceptor (timeout before the call, others after the response). Types: **indirect_injection** (tool returns valid-looking content with hidden instructions), **memory_poisoning** (payload into input before invoke; strategy `prepend` | `append` | `replace`), **system_prompt_leak_probe** (contract assertion using probe prompts). Config: list of attack configs or dict (e.g. `memory_poisoning: { payload: "...", strategy: "append" }`). Scenarios in the contract chaos matrix can each define `context_attacks`. See [Context Attacks](docs/CONTEXT_ATTACKS.md).
-												Implement Open Source edition limits and feature restrictions

- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom)
- Cap mutations at 50 per test run
- Force sequential execution only
- Disable GitHub Actions integration (Cloud feature)
- Add upgrade prompts throughout CLI
- Update README with feature comparison
- Add limits.py module for centralized limit management
- Add cloud and limits CLI commands
- Update all documentation with Cloud upgrade messaging

											
										
										
											2025-12-29 00:11:02 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								## Production-First by Design
 								Flakestorm is designed for teams already running AI agents in production. Most production agents use cloud LLM APIs (OpenAI, Gemini, Claude, Perplexity, etc.) and face real traffic, real users, and real abuse patterns.
 								**Why local LLMs exist in the open source version:**
 								- Fast experimentation and proofs-of-concept
 								- CI-friendly testing without external dependencies
 								- Transparent, extensible chaos engine
 								**Why production chaos should mirror production reality:**
 								Production agents run on cloud infrastructure, process real user inputs, and scale dynamically. Chaos testing should reflect this reality—testing against the same infrastructure, scale, and patterns your agents face in production.
 								The cloud version removes operational friction: no local model setup, no environment configuration, scalable mutation runs, shared dashboards, and team collaboration. Open source proves the value; cloud delivers production-grade chaos engineering.
-												Enhance README.md to clarify the purpose and functionality of Flakestorm for production AI agents. Update descriptions to emphasize chaos testing, adversarial input handling, and CI/CD integration. Add sections on target users and production deployment patterns, ensuring comprehensive guidance for teams shipping AI agents.

											
										
										
											2026-01-05 16:53:23 +08:00
+								## Who Flakestorm Is For
 								- **Teams shipping AI agents to production** — Catch failures before users do
 								- **Engineers running agents behind APIs** — Test against real-world abuse patterns
 								- **Teams already paying for LLM APIs** — Reduce regressions and production incidents
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								- **CI/CD pipelines (Cloud only)** — Automated reliability gates, scheduled runs, and native pipeline integrations; OSS is for local and scripted runs
-												Enhance README.md to clarify the purpose and functionality of Flakestorm for production AI agents. Update descriptions to emphasize chaos testing, adversarial input handling, and CI/CD integration. Add sections on target users and production deployment patterns, ensuring comprehensive guidance for teams shipping AI agents.

											
										
										
											2026-01-05 16:53:23 +08:00
 								Flakestorm is built for production-grade agents handling real traffic. While it works great for exploration and hobby projects, it's designed to catch the failures that matter when agents are deployed at scale.
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
 								#
-												Update model configuration and enhance documentation for improved user guidance - Change default model to "gemma3:1b" in flakestorm-generate-search-queries.yaml and increase mutation count from 3 to 20 - Revise README.md to include demo visuals and model recommendations based on system RAM - Expand USAGE_GUIDE.md with detailed model selection criteria and installation instructions - Enhance HTML report generation to include actionable recommendations for failed mutations and executive summary insights.

											
										
										
											2026-01-02 20:01:12 +08:00
+								## Demo
 								### flakestorm in Action
 								![flakestorm Demo](flakestorm_demo.gif)
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								*Watch Flakestorm run chaos and mutation tests against your agent in real-time*
-												Update model configuration and enhance documentation for improved user guidance - Change default model to "gemma3:1b" in flakestorm-generate-search-queries.yaml and increase mutation count from 3 to 20 - Revise README.md to include demo visuals and model recommendations based on system RAM - Expand USAGE_GUIDE.md with detailed model selection criteria and installation instructions - Enhance HTML report generation to include actionable recommendations for failed mutations and executive summary insights.

											
										
										
											2026-01-02 20:01:12 +08:00
 								### Test Report
-												Add new AI agent for generating search queries using Google Gemini - Introduce `keywords_extractor_agent` with robust error handling and response parsing - Include multiple fallback strategies for query generation - Update README.md and documentation to reflect new agent capabilities and setup instructions - Remove outdated `broken_agent` example and associated files.

											
										
										
											2026-01-02 21:52:56 +08:00
+								![flakestorm Test Report 1](flakestorm_report1.png)
 								![flakestorm Test Report 2](flakestorm_report2.png)
 								![flakestorm Test Report 3](flakestorm_report3.png)
 								![flakestorm Test Report 4](flakestorm_report4.png)
 								![flakestorm Test Report 5](flakestorm_report5.png)
-												Update model configuration and enhance documentation for improved user guidance - Change default model to "gemma3:1b" in flakestorm-generate-search-queries.yaml and increase mutation count from 3 to 20 - Revise README.md to include demo visuals and model recommendations based on system RAM - Expand USAGE_GUIDE.md with detailed model selection criteria and installation instructions - Enhance HTML report generation to include actionable recommendations for failed mutations and executive summary insights.

											
										
										
											2026-01-02 20:01:12 +08:00
 								*Interactive HTML reports with detailed failure analysis and recommendations*
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
+								## How Flakestorm Works
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								Flakestorm supports several modes; you can use one or combine them:
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								- **Chaos only** — Golden prompts → agent with fault-injected tools/LLM → invariants. *Does the agent handle bad environments?*
 								- **Contract** — Golden prompts → agent under each chaos scenario → verify named invariants across a matrix. *Does the agent obey its rules under every failure mode?*
 								- **Replay** — Recorded production input + recorded tool responses → agent → contract. *Did we fix this incident?*
-												Update README and usage guide to reflect changes in mutation types and clarify V1/V2 flow. Increased mutation types from 22 to 24 and added details on the new V2 features, including environment chaos and behavioral contracts. Enhanced documentation for clarity on scoring mechanisms and command usage.

											
										
										
											2026-03-09 12:45:42 +08:00
+								- **Mutation (optional)** — Golden prompts → adversarial mutations (24 types, max 50/run) → agent (optionally under chaos) → invariants. *Does the agent handle bad inputs (and optionally bad environments)?*
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								You define **golden prompts**, **invariants** (or a full **contract** with severity and chaos matrix), and optionally **chaos** (tool/LLM faults) and **replay** sessions. Flakestorm runs the chosen mode(s), checks responses against your rules, and produces a **robustness score** (mutation or chaos-only runs) or **resilience score** (contract run), plus HTML report. Use `flakestorm run`, `flakestorm contract run`, `flakestorm replay run`, or `flakestorm ci` for the combined overall score (OSS: run from CLI or your own scripts; **native CI/CD integrations** — scheduled runs, pipeline plugins — are **Cloud only**).
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Update README and usage guide to reflect changes in mutation types and clarify V1/V2 flow. Increased mutation types from 22 to 24 and added details on the new V2 features, including environment chaos and behavioral contracts. Enhanced documentation for clarity on scoring mechanisms and command usage.

											
										
										
											2026-03-09 12:45:42 +08:00
+								For the full **V1 vs V2 flow** (mutation-only vs four pillars, contract matrix isolation, resilience score formula), see the [Usage Guide](docs/USAGE_GUIDE.md#how-it-works).
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								> **Note**: Mutation generation uses a local LLM (Ollama) or cloud APIs (OpenAI, Claude, Gemini). API keys via environment variables only. See [LLM Providers](docs/LLM_PROVIDERS.md).
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
+								## Features
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								### Chaos engineering pillars
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								- **Environment Chaos** — Inject faults into tools and LLMs (timeouts, errors, rate limits, malformed responses, built-in profiles). **Context attacks**: indirect_injection, memory_poisoning (input before invoke; strategy: prepend/append/replace), system_prompt_leak_probe; config as list or dict. [→ Environment Chaos](docs/ENVIRONMENT_CHAOS.md)
 								- **Behavioral Contracts** — Named invariants × chaos matrix; severity-weighted resilience score. Optional **reset** per cell: `agent.reset_endpoint` (HTTP) or `agent.reset_function` (e.g. `myagent:reset_state`). **system_prompt_leak_probe**: use `probes` (list of prompts) on an invariant to run probe prompts and verify response (e.g. excludes_pattern). **behavior_unchanged**: baseline `auto` or manual. Stateful agents: warn if no reset and responses differ. [→ Behavioral Contracts](docs/BEHAVIORAL_CONTRACTS.md)
 								- **Replay Regression** — Import production failures (manual or LangSmith), replay deterministically, verify against contracts. Sessions can reference a `file` or inline id/input; sources support LangSmith project/run with optional auto_import. [→ Replay Regression](docs/REPLAY_REGRESSION.md)
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
 								### Supporting capabilities
-												Remove the implementation checklist document and update README and TEST_SCENARIOS to reflect the latest V2 features, including detailed descriptions of environment chaos, behavioral contracts, and replay regression scenarios. Adjusted links and clarified configuration options for better usability.

											
										
										
											2026-03-09 13:01:08 +08:00
+								- **Adversarial mutations** — 24 mutation types (prompt-level and system/network-level); max 50 mutations per run in OSS. [→ Test Scenarios](docs/TEST_SCENARIOS.md) for mutation, chaos, contract, and replay examples.
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								- **Invariants & assertions** — Deterministic checks, semantic similarity, safety (PII, refusal); configurable per contract.
 								- **Robustness score** — For mutation runs: a single weighted score (0–1) of how well the agent handled adversarial prompts. Reported in HTML/JSON and CLI (`results.statistics.robustness_score`).
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								- **Unified resilience score** — For full CI: weighted combination of **mutation robustness**, chaos resilience, contract compliance, and replay regression; weights (mutation, chaos, contract, replay) configurable in YAML and must sum to 1.0.
 								- **Context attacks** — indirect_injection (into tool/context), memory_poisoning (into input before invoke; strategy: prepend/append/replace), system_prompt_leak_probe (contract assertion with probe prompts). Config: list or dict. [→ Context Attacks](docs/CONTEXT_ATTACKS.md)
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								- **LLM providers** — Ollama, OpenAI, Anthropic, Google (Gemini); API keys via env only. [→ LLM Providers](docs/LLM_PROVIDERS.md)
 								- **Reports** — Interactive HTML and JSON; contract matrix and replay reports.
 								**Try it:** [Working example](examples/v2_research_agent/README.md) with chaos, contracts, and replay from the CLI.
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								## Open Source vs Cloud
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								**Open Source (Always Free):**
-												Update README and usage guide to reflect changes in mutation types and clarify V1/V2 flow. Increased mutation types from 22 to 24 and added details on the new V2 features, including environment chaos and behavioral contracts. Enhanced documentation for clarity on scoring mechanisms and command usage.

											
										
										
											2026-03-09 12:45:42 +08:00
+								- Core chaos engine with all 24 mutation types (max 50 per run; no artificial feature gating)
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								- Local execution for fast experimentation
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								- Run from CLI or your own scripts (no native CI/CD; that’s Cloud only)
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								- Full transparency and extensibility
 								- Perfect for proofs-of-concept and development workflows
-												Update version to 0.8.0 in pyproject.toml, enhance README.md and USAGE_GUIDE.md with optional Rust extension installation instructions for improved performance, and remove outdated keywords_extractor_agent documentation.

											
										
										
											2026-01-02 22:32:18 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								**Cloud (In Progress / Waitlist):**
 								- Zero-setup chaos testing (no Ollama, no local models)
-												Implement memory poisoning context attacks and enhance README documentation. Added functions for applying memory poisoning to user input and normalizing context attack configurations. Updated ChaosInterceptor to apply memory poisoning before invoking the wrapped adapter. Enhanced README to clarify CI/CD pipeline features and the distinction between OSS and Cloud capabilities.

											
										
										
											2026-03-08 14:07:34 +08:00
+								- **CI/CD** — native pipeline integrations, scheduled runs, reliability gates
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								- Scalable runs (thousands of mutations)
 								- Shared dashboards & reports
 								- Team collaboration
 								- Production-grade reliability workflows
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								**Our Philosophy:** We do not cripple the OSS version. Cloud exists to remove operational pain, not to lock features. Open source proves the value; cloud delivers production-grade chaos engineering at scale.
-												Enhance installation instructions across documentation to emphasize the use of virtual environments for Python. Added details for creating and activating virtual environments in README.md, CONTRIBUTING.md, TEST_SCENARIOS.md, TESTING_GUIDE.md, and USAGE_GUIDE.md. Included pipx installation instructions for CLI use in USAGE_GUIDE.md.

											
										
										
											2025-12-30 18:02:36 +08:00
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
+								# Try Flakestorm in ~60 Seconds
-												Enhance installation instructions across documentation to emphasize the use of virtual environments for Python. Added details for creating and activating virtual environments in README.md, CONTRIBUTING.md, TEST_SCENARIOS.md, TESTING_GUIDE.md, and USAGE_GUIDE.md. Included pipx installation instructions for CLI use in USAGE_GUIDE.md.

											
										
										
											2025-12-30 18:02:36 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								This is the fastest way to try Flakestorm locally. Production teams typically use the cloud version (waitlist). Here's the local quickstart:
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
+. **Install flakestorm** (if you have Python 3.10+):
 								   ```bash
 								   pip install flakestorm
 								   ```
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
+. **Initialize a test configuration**:
 								   ```bash
 								   flakestorm init
 								   ```
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Refactor README.md to streamline installation instructions and enhance clarity on local testing setup. Consolidate sections on Ollama installation and usage, while removing outdated content to improve user experience and guidance for teams using Flakestorm.

											
										
										
											2026-01-05 16:58:45 +08:00
+. **Point it at your agent** (edit `flakestorm.yaml`):
 								   ```yaml
 								   agent:
 								     endpoint: "http://localhost:8000/invoke"  # Your agent's endpoint
 								     type: "http"
 								   ```
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Refactor README.md to streamline installation instructions and enhance clarity on local testing setup. Consolidate sections on Ollama installation and usage, while removing outdated content to improve user experience and guidance for teams using Flakestorm.

											
										
										
											2026-01-05 16:58:45 +08:00
+. **Run your first test**:
 								   ```bash
 								   flakestorm run
 								   ```
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								   With a [v2 config](examples/v2_research_agent/README.md) you can also run `flakestorm run --chaos`, `flakestorm contract run`, `flakestorm replay run`, or `flakestorm ci` to exercise all pillars.
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								That's it! You get a **robustness score** (for mutation runs) or a **resilience score** (when using chaos/contract/replay), plus a report showing how your agent handles chaos and adversarial inputs.
-												Revise README.md to improve clarity and organization of Flakestorm's features and usage instructions. Consolidate installation steps, enhance the explanation of mutation types, and introduce a streamlined quick start guide for new users. Emphasize future enhancements aimed at simplifying setup and improving user experience.

											
										
										
											2026-01-05 17:05:54 +08:00
 								> **Note**: For full local execution (including mutation generation), you'll need Ollama installed. See the [Usage Guide](docs/USAGE_GUIDE.md) for complete setup instructions.
-												Enhance README.md to clarify the purpose and functionality of Flakestorm for production AI agents. Update descriptions to emphasize chaos testing, adversarial input handling, and CI/CD integration. Add sections on target users and production deployment patterns, ensuring comprehensive guidance for teams shipping AI agents.

											
										
										
											2026-01-05 16:53:23 +08:00
-												Update README.md and CONTRIBUTING.md to enhance project visibility and support for new contributors. Added PyPI version and download badges, build status, and latest release information to README.md. Introduced a section in CONTRIBUTING.md for finding good first issues, providing guidance for beginners on how to contribute effectively.

											
										
										
											2026-01-13 21:39:50 +08:00
+								## Roadmap
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								See [Roadmap](ROADMAP.md) for the full plan. Highlights:
 								- **V3 — Multi-agent chaos** — Chaos engineering for systems of multiple agents: fault injection across agent-to-agent and tool boundaries, contract verification for multi-agent workflows, and replay of multi-agent production incidents.
 								- **Pattern engine** — 110+ prompt-injection and 52+ PII detection patterns; Rust-backed, sub-50ms.
 								- **Cloud** — Scalable runs, team dashboards, scheduled chaos, CI integrations.
 								- **Enterprise** — On-premise, audit logging, compliance certifications.
-												Update README.md and CONTRIBUTING.md to enhance project visibility and support for new contributors. Added PyPI version and download badges, build status, and latest release information to README.md. Introduced a section in CONTRIBUTING.md for finding good first issues, providing guidance for beginners on how to contribute effectively.

											
										
										
											2026-01-13 21:39:50 +08:00
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								## Documentation
-												Implement Open Source edition limits and feature restrictions

- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom)
- Cap mutations at 50 per test run
- Force sequential execution only
- Disable GitHub Actions integration (Cloud feature)
- Add upgrade prompts throughout CLI
- Update README with feature comparison
- Add limits.py module for centralized limit management
- Add cloud and limits CLI commands
- Update all documentation with Cloud upgrade messaging

											
										
										
											2025-12-29 00:11:02 +08:00
+								### Getting Started
-												Enhance README.md to clarify the purpose and functionality of Flakestorm for production AI agents. Update descriptions to emphasize chaos testing, adversarial input handling, and CI/CD integration. Add sections on target users and production deployment patterns, ensuring comprehensive guidance for teams shipping AI agents.

											
										
										
											2026-01-05 16:53:23 +08:00
+								- [📖 Usage Guide](docs/USAGE_GUIDE.md) - Complete end-to-end guide (includes local setup)
-												Implement Open Source edition limits and feature restrictions

- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom)
- Cap mutations at 50 per test run
- Force sequential execution only
- Disable GitHub Actions integration (Cloud feature)
- Add upgrade prompts throughout CLI
- Update README with feature comparison
- Add limits.py module for centralized limit management
- Add cloud and limits CLI commands
- Update all documentation with Cloud upgrade messaging

											
										
										
											2025-12-29 00:11:02 +08:00
+								- [⚙️ Configuration Guide](docs/CONFIGURATION_GUIDE.md) - All configuration options
-												Implement flexible HTTP agent adapter with request templates and connection guides - Add request_template, response_path, method, query_params, and parse_structured_input to AgentConfig - Implement structured input parser for key-value extraction from golden prompts - Implement template engine with variable substitution for {prompt} and {field_name} - Implement response extractor supporting JSONPath and dot notation - Update HTTPAgentAdapter to support all HTTP methods (GET, POST, PUT, PATCH, DELETE) - Add comprehensive connection guide explaining localhost vs public endpoints - Update documentation with examples for TypeScript/JavaScript developers - Add tests for all new features

											
										
										
											2025-12-31 23:04:47 +08:00
+								- [🔌 Connection Guide](docs/CONNECTION_GUIDE.md) - How to connect FlakeStorm to your agent
-												Remove the implementation checklist document and update README and TEST_SCENARIOS to reflect the latest V2 features, including detailed descriptions of environment chaos, behavioral contracts, and replay regression scenarios. Adjusted links and clarified configuration options for better usability.

											
										
										
											2026-03-09 13:01:08 +08:00
+								- [🧪 Test Scenarios](docs/TEST_SCENARIOS.md) - Real-world examples for mutation, chaos, contract, and replay (V2)
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								- [📂 Example: chaos, contracts & replay](examples/v2_research_agent/README.md) - Working agent and config you can run
-												Add Integrations Guide to README.md and outline Phase 7 roadmap in IMPLEMENTATION_CHECKLIST.md

											
										
										
											2026-01-01 17:29:41 +08:00
+								- [🔗 Integrations Guide](docs/INTEGRATIONS_GUIDE.md) - HuggingFace models & semantic similarity
-												Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.

											
										
										
											2026-03-06 23:33:21 +08:00
+								- [🤖 LLM Providers](docs/LLM_PROVIDERS.md) - OpenAI, Claude, Gemini (env-only API keys)
 								- [🌪️ Environment Chaos](docs/ENVIRONMENT_CHAOS.md) - Tool/LLM fault injection
 								- [📜 Behavioral Contracts](docs/BEHAVIORAL_CONTRACTS.md) - Contract × chaos matrix
 								- [🔄 Replay Regression](docs/REPLAY_REGRESSION.md) - Import and replay production failures
 								- [🛡️ Context Attacks](docs/CONTEXT_ATTACKS.md) - Indirect injection, memory poisoning
-												Enhance documentation and replay functionality in Flakestorm. Updated README to clarify V2 Spec and added references to LangSmith sources in configuration guide. Improved replay regression capabilities by allowing imports from LangSmith projects and runs, with filtering options. Added new classes for LangSmith project and run sources in the configuration. Updated replay loader to support project imports and refined session resolution logic.

											
										
										
											2026-03-07 02:04:55 +08:00
+								- [📐 V2 Spec](docs/V2_SPEC.md) - Score formula, reset, Python tools
-												Implement Open Source edition limits and feature restrictions

- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom)
- Cap mutations at 50 per test run
- Force sequential execution only
- Disable GitHub Actions integration (Cloud feature)
- Add upgrade prompts throughout CLI
- Update README with feature comparison
- Add limits.py module for centralized limit management
- Add cloud and limits CLI commands
- Update all documentation with Cloud upgrade messaging

											
										
										
											2025-12-29 00:11:02 +08:00
 								### For Developers
 								- [🏗️ Architecture & Modules](docs/MODULES.md) - How the code works
 								- [❓ Developer FAQ](docs/DEVELOPER_FAQ.md) - Q&A about design decisions
 								- [🤝 Contributing](docs/CONTRIBUTING.md) - How to contribute
-												Add troubleshooting section to README.md with links for fixing installation and build issues. This update aims to assist users in resolving common problems encountered during setup.

											
										
										
											2026-01-04 23:56:13 +08:00
+								### Troubleshooting
 								- [🔧 Fix Installation Issues](FIX_INSTALL.md) - Resolve `ModuleNotFoundError: No module named 'flakestorm.reports'`
 								- [🔨 Fix Build Issues](BUILD_FIX.md) - Resolve `pip install .` vs `pip install -e .` problems
-												Update README.md to reflect the addition of 24 mutation types, enhancing clarity on core and advanced prompt-level attacks as well as system/network-level attacks. Introduce a new support section with issue templates for better user engagement.

											
										
										
											2026-01-15 13:41:01 +08:00
+								### Support
 								- [🐛 Issue Templates](https://github.com/flakestorm/flakestorm/tree/main/.github/ISSUE_TEMPLATE) - Use our issue templates to report bugs, request features, or ask questions
-												Implement Open Source edition limits and feature restrictions

- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom)
- Cap mutations at 50 per test run
- Force sequential execution only
- Disable GitHub Actions integration (Cloud feature)
- Add upgrade prompts throughout CLI
- Update README with feature comparison
- Add limits.py module for centralized limit management
- Add cloud and limits CLI commands
- Update all documentation with Cloud upgrade messaging

											
										
										
											2025-12-29 00:11:02 +08:00
+								### Reference
 								- [📋 API Specification](docs/API_SPECIFICATION.md) - API reference
 								- [🧪 Testing Guide](docs/TESTING_GUIDE.md) - How to run and write tests
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								## Cloud Version (Early Access)
 								For teams running production AI agents, the cloud version removes operational friction: zero-setup chaos testing without local model configuration, scalable mutation runs that mirror production traffic, shared dashboards for team collaboration, and continuous chaos runs integrated into your reliability workflows.
 								The cloud version is currently in early access. [Join the waitlist](https://flakestorm.com) to get access as we roll it out.
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								## License
-												Enhance installation and troubleshooting instructions in README.md, CONTRIBUTING.md, and USAGE_GUIDE.md. Clarified requirements for Python 3.10+, added detailed steps for creating virtual environments, and provided solutions for common issues related to Ollama and flakestorm installations. Updated license information in README.md.

											
										
										
											2025-12-30 22:33:47 +08:00
+								Apache 2.0 - See [LICENSE](LICENSE) for details.
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
 								---
 								<p align="center">
-												Update README.md to correct spelling of "FlakeStorm" to "Flakestorm" for consistency throughout the documentation.

											
										
										
											2025-12-30 16:36:24 +08:00
+								  <strong>Tested with Flakestorm</strong><br>
 								  <img src="https://img.shields.io/badge/tested%20with-flakestorm-brightgreen" alt="Tested with Flakestorm">
-												Add initial project structure and configuration files

- Created .gitignore to exclude unnecessary files and directories.
- Added Cargo.toml for Rust workspace configuration.
- Introduced example configuration file entropix.yaml.example for user customization.
- Included LICENSE file with Apache 2.0 license details.
- Created pyproject.toml for Python project metadata and dependencies.
- Added README.md with project overview and usage instructions.
- Implemented a broken agent example to demonstrate testing capabilities.
- Established Rust module structure with Cargo.toml and source files.
- Set up initial tests for assertions and configuration validation.

											
										
										
											2025-12-28 21:55:01 +08:00
+								</p>
-												Update README.md

Add sponsor embed
											
										
										
											2026-01-16 10:59:17 +08:00
-												Refactor sponsorship section in README.md for improved presentation and alignment

											
										
										
											2026-01-16 23:32:59 +08:00
+								---
-												Update README.md
											
										
										
											2026-01-16 23:28:52 +08:00
-												Refactor sponsorship section in README.md for improved presentation and alignment

											
										
										
											2026-01-16 23:32:59 +08:00
+								<p align="center">
-												Correct capitalization of "Flakestorm" in sponsorship section of README.md

											
										
										
											2026-01-16 23:35:14 +08:00
+								  ❤️ <a href="https://github.com/sponsors/flakestorm">Sponsor Flakestorm on GitHub</a>
-												Refactor sponsorship section in README.md for improved presentation and alignment

											
										
										
											2026-01-16 23:32:59 +08:00
+								</p>