flakestorm/docs/API_SPECIFICATION.md

# flakestorm API Specification

## Python SDK

### Quick Start

```python
import asyncio
from flakestorm import FlakeStormRunner, load_config

async def main():
    config = load_config("flakestorm.yaml")
    runner = FlakeStormRunner(config)
    results = await runner.run()
    print(f"Robustness Score: {results.statistics.robustness_score:.1%}")

asyncio.run(main())
```

---

## Core Classes

### FlakeStormConfig

Configuration container for all flakestorm settings.

```python
from flakestorm import FlakeStormConfig, load_config

# Load from file
config = load_config("flakestorm.yaml")

# Access properties
config.agent.endpoint  # str
config.model.name      # str
config.golden_prompts  # list[str]
config.invariants      # list[InvariantConfig]

# Serialize
yaml_str = config.to_yaml()

# Parse from string
config = FlakeStormConfig.from_yaml(yaml_content)
```

#### Properties

| Property | Type | Description |
|----------|------|-------------|
| `version` | `str` | Config version (`1.0` \| `2.0`) |
| `agent` | `AgentConfig` | Agent connection settings (includes V2 `reset_endpoint`, `reset_function`) |
| `model` | `ModelConfig` | LLM settings (V2: `api_key` env-only) |
| `mutations` | `MutationConfig` | Mutation generation (max 50/run OSS, 22+ types) |
| `golden_prompts` | `list[str]` | Test prompts |
| `invariants` | `list[InvariantConfig]` | Assertion rules |
| `output` | `OutputConfig` | Report settings |
| `advanced` | `AdvancedConfig` | Advanced options |
| **V2** `chaos` | `ChaosConfig \| None` | Tool/LLM faults and context_attacks (list or dict) |
| **V2** `contract` | `ContractConfig \| None` | Behavioral contract and chaos_matrix (scenarios may include context_attacks) |
| **V2** `chaos_matrix` | `list[ChaosScenarioConfig] \| None` | Top-level chaos scenarios when not using contract.chaos_matrix |
| **V2** `replays` | `ReplayConfig \| None` | Replay sessions (file or inline) and LangSmith sources |
| **V2** `scoring` | `ScoringConfig \| None` | Weights for mutation, chaos, contract, replay (must sum to 1.0) |

---

### FlakeStormRunner

Main test runner class.

```python
from flakestorm import FlakeStormRunner

runner = FlakeStormRunner(
    config="flakestorm.yaml",  # or FlakeStormConfig object
    agent=None,              # optional: pre-configured adapter
    console=None,            # optional: Rich console
    show_progress=True,      # show progress bars
)

# Run tests
results = await runner.run()

# Verify setup only
is_valid = await runner.verify_setup()

# Get config summary
summary = runner.get_config_summary()
```

#### Methods

| Method | Returns | Description |
|--------|---------|-------------|
| `run()` | `TestResults` | Execute full test suite |
| `verify_setup()` | `bool` | Check configuration validity |
| `get_config_summary()` | `str` | Human-readable config summary |

---

### Agent Adapters

#### AgentProtocol

Interface for custom agent implementations.

```python
from typing import Protocol

class AgentProtocol(Protocol):
    async def invoke(self, input: str) -> str:
        """Execute agent and return response."""
        ...
```

#### HTTPAgentAdapter

Adapter for HTTP-based agents.

```python
from flakestorm import HTTPAgentAdapter

adapter = HTTPAgentAdapter(
    endpoint="http://localhost:8000/invoke",
    timeout=30000,  # ms
    headers={"Authorization": "Bearer token"},
    retries=2,
)

response = await adapter.invoke("Hello")
# Returns AgentResponse with .output, .latency_ms, .error
```

#### PythonAgentAdapter

Adapter for Python callable agents.

```python
from flakestorm import PythonAgentAdapter

async def my_agent(input: str) -> str:
    return f"Response to: {input}"

adapter = PythonAgentAdapter(my_agent)
response = await adapter.invoke("Test")
```

#### create_agent_adapter

Factory function for creating adapters from config.

```python
from flakestorm import create_agent_adapter

adapter = create_agent_adapter(config.agent)
```

---

### Mutation Engine

#### MutationType

```python
from flakestorm import MutationType

# Original 8 types
MutationType.PARAPHRASE            # Semantic rewrites
MutationType.NOISE                 # Typos and errors
MutationType.TONE_SHIFT            # Aggressive tone
MutationType.PROMPT_INJECTION      # Basic adversarial attacks
MutationType.ENCODING_ATTACKS      # Encoded inputs (Base64, Unicode, URL)
MutationType.CONTEXT_MANIPULATION  # Context manipulation
MutationType.LENGTH_EXTREMES       # Edge cases (empty/long inputs)
MutationType.CUSTOM                # User-defined templates

# Advanced prompt-level attacks (7 new types)
MutationType.MULTI_TURN_ATTACK     # Context persistence and conversation state
MutationType.ADVANCED_JAILBREAK    # Advanced prompt injection (DAN, role-playing)
MutationType.SEMANTIC_SIMILARITY_ATTACK  # Adversarial examples
MutationType.FORMAT_POISONING      # Structured data injection (JSON, XML)
MutationType.LANGUAGE_MIXING       # Multilingual, code-switching, emoji
MutationType.TOKEN_MANIPULATION    # Tokenizer edge cases, special tokens
MutationType.TEMPORAL_ATTACK       # Time-sensitive context, impossible dates

# System/Network-level attacks (8+ new types)
MutationType.HTTP_HEADER_INJECTION # HTTP header manipulation
MutationType.PAYLOAD_SIZE_ATTACK   # Extremely large payloads, DoS
MutationType.CONTENT_TYPE_CONFUSION # MIME type manipulation
MutationType.QUERY_PARAMETER_POISONING # Malicious query parameters
MutationType.REQUEST_METHOD_ATTACK  # HTTP method confusion
MutationType.PROTOCOL_LEVEL_ATTACK  # Protocol-level exploits
MutationType.RESOURCE_EXHAUSTION    # CPU/memory exhaustion, DoS
MutationType.CONCURRENT_REQUEST_PATTERN # Race conditions, concurrent state
MutationType.TIMEOUT_MANIPULATION   # Timeout handling, slow requests

# Properties
MutationType.PARAPHRASE.display_name    # "Paraphrase"
MutationType.PARAPHRASE.default_weight  # 1.0
MutationType.PARAPHRASE.description     # "Rewrite using..."
```

**Mutation Types Overview (22+ types):**

#### Prompt-Level Attacks

| Type | Description | Default Weight | When to Use |
|------|-------------|----------------|-------------|
| `PARAPHRASE` | Semantically equivalent rewrites | 1.0 | Test semantic understanding |
| `NOISE` | Typos and spelling errors | 0.8 | Test input robustness |
| `TONE_SHIFT` | Aggressive/impatient phrasing | 0.9 | Test emotional resilience |
| `PROMPT_INJECTION` | Basic adversarial attack attempts | 1.5 | Test security |
| `ENCODING_ATTACKS` | Base64, Unicode, URL encoding | 1.3 | Test parser robustness and security |
| `CONTEXT_MANIPULATION` | Adding/removing/reordering context | 1.1 | Test context extraction |
| `LENGTH_EXTREMES` | Empty, minimal, or very long inputs | 1.2 | Test boundary conditions |
| `MULTI_TURN_ATTACK` | Context persistence and conversation state | 1.4 | Test conversational agents |
| `ADVANCED_JAILBREAK` | Advanced prompt injection (DAN, role-playing) | 2.0 | Test advanced security |
| `SEMANTIC_SIMILARITY_ATTACK` | Adversarial examples - similar but different | 1.3 | Test robustness |
| `FORMAT_POISONING` | Structured data injection (JSON, XML, markdown) | 1.6 | Test structured data parsing |
| `LANGUAGE_MIXING` | Multilingual, code-switching, emoji | 1.2 | Test internationalization |
| `TOKEN_MANIPULATION` | Tokenizer edge cases, special tokens | 1.5 | Test LLM tokenization |
| `TEMPORAL_ATTACK` | Time-sensitive context, impossible dates | 1.1 | Test temporal reasoning |
| `CUSTOM` | User-defined mutation templates | 1.0 | Test domain-specific scenarios |

#### System/Network-Level Attacks

| Type | Description | Default Weight | When to Use |
|------|-------------|----------------|-------------|
| `HTTP_HEADER_INJECTION` | HTTP header manipulation attacks | 1.7 | Test HTTP API security |
| `PAYLOAD_SIZE_ATTACK` | Extremely large payloads, DoS | 1.4 | Test resource limits |
| `CONTENT_TYPE_CONFUSION` | MIME type manipulation | 1.5 | Test HTTP parsers |
| `QUERY_PARAMETER_POISONING` | Malicious query parameters | 1.6 | Test GET-based APIs |
| `REQUEST_METHOD_ATTACK` | HTTP method confusion | 1.3 | Test REST APIs |
| `PROTOCOL_LEVEL_ATTACK` | Protocol-level exploits (request smuggling) | 1.8 | Test protocol handling |
| `RESOURCE_EXHAUSTION` | CPU/memory exhaustion, DoS | 1.5 | Test production resilience |
| `CONCURRENT_REQUEST_PATTERN` | Race conditions, concurrent state | 1.4 | Test high-traffic agents |
| `TIMEOUT_MANIPULATION` | Timeout handling, slow requests | 1.3 | Test timeout resilience |

**Mutation Strategy:**

Choose mutation types based on your testing goals:
- **Comprehensive**: Use all 22+ types for complete coverage
- **Security-focused**: Emphasize `PROMPT_INJECTION`, `ADVANCED_JAILBREAK`, `PROTOCOL_LEVEL_ATTACK`, `HTTP_HEADER_INJECTION`
- **UX-focused**: Emphasize `NOISE`, `TONE_SHIFT`, `CONTEXT_MANIPULATION`, `LANGUAGE_MIXING`
- **Infrastructure-focused**: Emphasize all system/network-level types
- **Edge case testing**: Emphasize `LENGTH_EXTREMES`, `ENCODING_ATTACKS`, `TOKEN_MANIPULATION`, `RESOURCE_EXHAUSTION`

#### Mutation

```python
from flakestorm import Mutation, MutationType

mutation = Mutation(
    original="Book a flight",
    mutated="I need to fly",
    type=MutationType.PARAPHRASE,
    weight=1.0,
)

# Properties
mutation.id             # Unique hash
mutation.is_valid()     # Validity check
mutation.to_dict()      # Serialize
mutation.character_diff # Character count difference
```

#### MutationEngine

```python
from flakestorm import MutationEngine

engine = MutationEngine(config.model)

# Verify Ollama connection
is_connected = await engine.verify_connection()

# Generate mutations
mutations = await engine.generate_mutations(
    seed_prompt="Book a flight",
    types=[MutationType.PARAPHRASE, MutationType.NOISE],
    count=10,
)

# Batch generation
results = await engine.generate_batch(
    prompts=["Prompt 1", "Prompt 2"],
    types=[MutationType.PARAPHRASE],
    count_per_prompt=5,
)
```

---

### Invariant Verification

#### InvariantVerifier

```python
from flakestorm import InvariantVerifier

verifier = InvariantVerifier(config.invariants)

# Verify a response
result = verifier.verify(
    response="Agent output text",
    latency_ms=150.0,
)

# Result properties
result.all_passed      # bool
result.passed_count    # int
result.failed_count    # int
result.checks          # list[CheckResult]
result.get_failed_checks()
result.get_passed_checks()
```

#### Built-in Checkers

```python
from flakestorm.assertions import (
    ContainsChecker,
    LatencyChecker,
    ValidJsonChecker,
    RegexChecker,
    SimilarityChecker,
    ExcludesPIIChecker,
    RefusalChecker,
)
```

#### Custom Checker

```python
from flakestorm.assertions.deterministic import BaseChecker, CheckResult

class MyChecker(BaseChecker):
    def check(self, response: str, latency_ms: float) -> CheckResult:
        passed = "expected" in response
        return CheckResult(
            type=self.type,
            passed=passed,
            details="Custom check result",
        )
```

---

### Test Results

#### TestResults

```python
results = await runner.run()

# Statistics
results.statistics.robustness_score   # 0.0-1.0
results.statistics.total_mutations    # int
results.statistics.passed_mutations   # int
results.statistics.failed_mutations   # int
results.statistics.avg_latency_ms     # float
results.statistics.p95_latency_ms     # float
results.statistics.by_type            # list[TypeStatistics]

# Timing
results.started_at    # datetime
results.completed_at  # datetime
results.duration      # seconds

# Mutations
results.mutations            # list[MutationResult]
results.passed_mutations     # list[MutationResult]
results.failed_mutations     # list[MutationResult]
results.get_by_type("noise") # Filter by type
results.get_by_prompt("...")  # Filter by prompt

# Serialization
results.to_dict()  # Full JSON-serializable dict

# V2: Resilience and contract/replay (when config has contract/replays)
results.resilience_scores   # dict: mutation_robustness, chaos_resilience, contract_compliance, replay_regression
results.contract_compliance # ContractRunResult | None (when contract run was executed)
# Replay results are reported via flakestorm replay run --output; see Reports below.
```

#### MutationResult

```python
for result in results.mutations:
    result.original_prompt   # str
    result.mutation          # Mutation object
    result.response          # str
    result.latency_ms        # float
    result.passed            # bool
    result.checks            # list[CheckResult]
    result.error             # str | None
    result.failed_checks     # list[CheckResult]
```

---

### Report Generation

#### HTMLReportGenerator

```python
from flakestorm.reports import HTMLReportGenerator

generator = HTMLReportGenerator(results)

# Generate HTML string
html = generator.generate()

# Save to file
path = generator.save()  # Auto-generated path
path = generator.save("custom/path/report.html")
```

#### JSONReportGenerator

```python
from flakestorm.reports import JSONReportGenerator

generator = JSONReportGenerator(results)

# Full report
json_str = generator.generate(pretty=True)

# Summary only (for CI)
summary = generator.generate_summary()

# Save
path = generator.save()
path = generator.save(summary_only=True)
```

#### TerminalReporter

```python
from flakestorm.reports import TerminalReporter
from rich.console import Console

reporter = TerminalReporter(results, Console())

reporter.print_summary()
reporter.print_type_breakdown()
reporter.print_failures(limit=10)
reporter.print_full_report()
```

**V2 reports:** Contract runs (`flakestorm contract run --output report.html`) and replay runs (`flakestorm replay run --output report.html`) produce HTML reports that include **suggested actions** for failed cells or sessions (e.g. add reset_endpoint, tighten invariants, fix tool behavior). See [Behavioral Contracts](BEHAVIORAL_CONTRACTS.md) and [Replay Regression](REPLAY_REGRESSION.md).

---

## CLI Commands

### `flakestorm init [PATH]`

Initialize a new configuration file.

```bash
flakestorm init                    # Creates flakestorm.yaml
flakestorm init config/test.yaml   # Custom path
flakestorm init --force            # Overwrite existing
```

### `flakestorm run`

Run reliability tests (mutation run; optionally with chaos).

```bash
flakestorm run                              # Default config (mutation only)
flakestorm run --config custom.yaml         # Custom config
flakestorm run --chaos                       # Apply chaos (tool/LLM faults, context_attacks) during mutation run
flakestorm run --chaos-only                  # Chaos-only run (no mutations); requires chaos config
flakestorm run --chaos-profile api_outage    # Use a built-in chaos profile
flakestorm run --output json                 # JSON output
flakestorm run --output terminal             # Terminal only
flakestorm run --min-score 0.9 --ci          # CI mode
flakestorm run --verify-only                 # Just verify setup
flakestorm run --quiet                       # Minimal output
```

### `flakestorm verify`

Verify configuration and connections.

```bash
flakestorm verify
flakestorm verify --config custom.yaml
```

### `flakestorm report PATH`

View or convert existing reports.

```bash
flakestorm report results.json              # View in terminal
flakestorm report results.json --output html # Convert to HTML
```

### `flakestorm score`

Output only the robustness score (for CI scripts).

```bash
SCORE=$(flakestorm score)
if (( $(echo "$SCORE >= 0.9" | bc -l) )); then
    echo "Passed"
else
    echo "Failed"
    exit 1
fi
```

### V2: `flakestorm contract run` / `validate` / `score`

Run behavioral contract tests (invariants × chaos matrix).

```bash
flakestorm contract run                      # Run contract matrix; progress and score in terminal
flakestorm contract run --output report.html  # Save HTML report with suggested actions for failed cells
flakestorm contract validate                 # Validate contract config only
flakestorm contract score                    # Output contract resilience score only
```

### V2: `flakestorm replay run` / `export`

Replay regression: run saved sessions and verify against a contract.

```bash
flakestorm replay run                        # Replay sessions from config (file or inline)
flakestorm replay run path/to/session.yaml   # Replay a single session file
flakestorm replay run path/to/replays/       # Replay all sessions in directory
flakestorm replay run --output report.html   # Save HTML report with suggested actions for failed sessions
flakestorm replay export --from-report FILE  # Export from an existing report
```

### V2: `flakestorm ci`

Run full CI pipeline: mutation run, contract run (if configured), chaos-only (if chaos configured), replay (if configured); then compute overall weighted score from `scoring.weights`.

```bash
flakestorm ci
flakestorm ci --config custom.yaml
```

---

## Environment Variables

| Variable | Description |
|----------|-------------|
| `OLLAMA_HOST` | Override Ollama server URL |
| Custom headers | Expanded in config via `${VAR}` syntax |

**V2 — API keys (env-only):** Model API keys must not be literal in config. Use environment variables and reference them in config (e.g. `api_key: "${OPENAI_API_KEY}"`). Supported: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, etc. See [LLM Providers](LLM_PROVIDERS.md).

---

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Error (config, connection, etc.) |
| 1 | CI mode: Score below threshold |
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								# flakestorm API Specification
 								## Python SDK
 								### Quick Start
 								```python
 								import asyncio
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								from flakestorm import FlakeStormRunner, load_config
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
 								async def main():
 								    config = load_config("flakestorm.yaml")
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								    runner = FlakeStormRunner(config)
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								    results = await runner.run()
 								    print(f"Robustness Score: {results.statistics.robustness_score:.1%}")
 								asyncio.run(main())
 								```
 								---
 								## Core Classes
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								### FlakeStormConfig
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
 								Configuration container for all flakestorm settings.
 								```python
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								from flakestorm import FlakeStormConfig, load_config
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
 								# Load from file
 								config = load_config("flakestorm.yaml")
 								# Access properties
 								config.agent.endpoint  # str
 								config.model.name      # str
 								config.golden_prompts  # list[str]
 								config.invariants      # list[InvariantConfig]
 								# Serialize
 								yaml_str = config.to_yaml()
 								# Parse from string
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								config = FlakeStormConfig.from_yaml(yaml_content)
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								```
 								#### Properties
 								| Property | Type | Description |
 								|----------|------|-------------|
-												Enhance documentation for Flakestorm V2 features, including detailed updates on behavioral contracts, context attacks, and scoring mechanisms. Added new configuration options for state isolation in agents, clarified context attack types, and improved the contract report generation with suggested actions for failures. Updated various guides to reflect the latest changes in chaos engineering capabilities and replay regression functionalities.

											
										
										
											2026-03-08 20:29:48 +08:00
+								| `version` | `str` | Config version (`1.0` \| `2.0`) |
 								| `agent` | `AgentConfig` | Agent connection settings (includes V2 `reset_endpoint`, `reset_function`) |
 								| `model` | `ModelConfig` | LLM settings (V2: `api_key` env-only) |
 								| `mutations` | `MutationConfig` | Mutation generation (max 50/run OSS, 22+ types) |
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								| `golden_prompts` | `list[str]` | Test prompts |
 								| `invariants` | `list[InvariantConfig]` | Assertion rules |
 								| `output` | `OutputConfig` | Report settings |
 								| `advanced` | `AdvancedConfig` | Advanced options |
-												Enhance documentation for Flakestorm V2 features, including detailed updates on behavioral contracts, context attacks, and scoring mechanisms. Added new configuration options for state isolation in agents, clarified context attack types, and improved the contract report generation with suggested actions for failures. Updated various guides to reflect the latest changes in chaos engineering capabilities and replay regression functionalities.

											
										
										
											2026-03-08 20:29:48 +08:00
+								| **V2** `chaos` | `ChaosConfig \| None` | Tool/LLM faults and context_attacks (list or dict) |
 								| **V2** `contract` | `ContractConfig \| None` | Behavioral contract and chaos_matrix (scenarios may include context_attacks) |
 								| **V2** `chaos_matrix` | `list[ChaosScenarioConfig] \| None` | Top-level chaos scenarios when not using contract.chaos_matrix |
 								| **V2** `replays` | `ReplayConfig \| None` | Replay sessions (file or inline) and LangSmith sources |
 								| **V2** `scoring` | `ScoringConfig \| None` | Weights for mutation, chaos, contract, replay (must sum to 1.0) |
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
 								---
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								### FlakeStormRunner
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
 								Main test runner class.
 								```python
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								from flakestorm import FlakeStormRunner
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
-												- Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.

											
										
										
											2025-12-30 16:13:29 +08:00
+								runner = FlakeStormRunner(
 								    config="flakestorm.yaml",  # or FlakeStormConfig object
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								    agent=None,              # optional: pre-configured adapter
 								    console=None,            # optional: Rich console
 								    show_progress=True,      # show progress bars
 								)
 								# Run tests
 								results = await runner.run()
 								# Verify setup only
 								is_valid = await runner.verify_setup()
 								# Get config summary
 								summary = runner.get_config_summary()
 								```
 								#### Methods
 								| Method | Returns | Description |
 								|--------|---------|-------------|
 								| `run()` | `TestResults` | Execute full test suite |
 								| `verify_setup()` | `bool` | Check configuration validity |
 								| `get_config_summary()` | `str` | Human-readable config summary |
 								---
 								### Agent Adapters
 								#### AgentProtocol
 								Interface for custom agent implementations.
 								```python
 								from typing import Protocol
 								class AgentProtocol(Protocol):
 								    async def invoke(self, input: str) -> str:
 								        """Execute agent and return response."""
 								        ...
 								```
 								#### HTTPAgentAdapter
 								Adapter for HTTP-based agents.
 								```python
 								from flakestorm import HTTPAgentAdapter
 								adapter = HTTPAgentAdapter(
 								    endpoint="http://localhost:8000/invoke",
 								    timeout=30000,  # ms
 								    headers={"Authorization": "Bearer token"},
 								    retries=2,
 								)
 								response = await adapter.invoke("Hello")
 								# Returns AgentResponse with .output, .latency_ms, .error
 								```
 								#### PythonAgentAdapter
 								Adapter for Python callable agents.
 								```python
 								from flakestorm import PythonAgentAdapter
 								async def my_agent(input: str) -> str:
 								    return f"Response to: {input}"
 								adapter = PythonAgentAdapter(my_agent)
 								response = await adapter.invoke("Test")
 								```
 								#### create_agent_adapter
 								Factory function for creating adapters from config.
 								```python
 								from flakestorm import create_agent_adapter
 								adapter = create_agent_adapter(config.agent)
 								```
 								---
 								### Mutation Engine
 								#### MutationType
 								```python
 								from flakestorm import MutationType
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								# Original 8 types
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
+								MutationType.PARAPHRASE            # Semantic rewrites
 								MutationType.NOISE                 # Typos and errors
 								MutationType.TONE_SHIFT            # Aggressive tone
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								MutationType.PROMPT_INJECTION      # Basic adversarial attacks
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
+								MutationType.ENCODING_ATTACKS      # Encoded inputs (Base64, Unicode, URL)
 								MutationType.CONTEXT_MANIPULATION  # Context manipulation
 								MutationType.LENGTH_EXTREMES       # Edge cases (empty/long inputs)
 								MutationType.CUSTOM                # User-defined templates
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								# Advanced prompt-level attacks (7 new types)
 								MutationType.MULTI_TURN_ATTACK     # Context persistence and conversation state
 								MutationType.ADVANCED_JAILBREAK    # Advanced prompt injection (DAN, role-playing)
 								MutationType.SEMANTIC_SIMILARITY_ATTACK  # Adversarial examples
 								MutationType.FORMAT_POISONING      # Structured data injection (JSON, XML)
 								MutationType.LANGUAGE_MIXING       # Multilingual, code-switching, emoji
 								MutationType.TOKEN_MANIPULATION    # Tokenizer edge cases, special tokens
 								MutationType.TEMPORAL_ATTACK       # Time-sensitive context, impossible dates
 								# System/Network-level attacks (8+ new types)
 								MutationType.HTTP_HEADER_INJECTION # HTTP header manipulation
 								MutationType.PAYLOAD_SIZE_ATTACK   # Extremely large payloads, DoS
 								MutationType.CONTENT_TYPE_CONFUSION # MIME type manipulation
 								MutationType.QUERY_PARAMETER_POISONING # Malicious query parameters
 								MutationType.REQUEST_METHOD_ATTACK  # HTTP method confusion
 								MutationType.PROTOCOL_LEVEL_ATTACK  # Protocol-level exploits
 								MutationType.RESOURCE_EXHAUSTION    # CPU/memory exhaustion, DoS
 								MutationType.CONCURRENT_REQUEST_PATTERN # Race conditions, concurrent state
 								MutationType.TIMEOUT_MANIPULATION   # Timeout handling, slow requests
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								# Properties
 								MutationType.PARAPHRASE.display_name    # "Paraphrase"
 								MutationType.PARAPHRASE.default_weight  # 1.0
 								MutationType.PARAPHRASE.description     # "Rewrite using..."
 								```
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								**Mutation Types Overview (22+ types):**
 								#### Prompt-Level Attacks
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
 								| Type | Description | Default Weight | When to Use |
 								|------|-------------|----------------|-------------|
 								| `PARAPHRASE` | Semantically equivalent rewrites | 1.0 | Test semantic understanding |
 								| `NOISE` | Typos and spelling errors | 0.8 | Test input robustness |
 								| `TONE_SHIFT` | Aggressive/impatient phrasing | 0.9 | Test emotional resilience |
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								| `PROMPT_INJECTION` | Basic adversarial attack attempts | 1.5 | Test security |
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
+								| `ENCODING_ATTACKS` | Base64, Unicode, URL encoding | 1.3 | Test parser robustness and security |
 								| `CONTEXT_MANIPULATION` | Adding/removing/reordering context | 1.1 | Test context extraction |
 								| `LENGTH_EXTREMES` | Empty, minimal, or very long inputs | 1.2 | Test boundary conditions |
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								| `MULTI_TURN_ATTACK` | Context persistence and conversation state | 1.4 | Test conversational agents |
 								| `ADVANCED_JAILBREAK` | Advanced prompt injection (DAN, role-playing) | 2.0 | Test advanced security |
 								| `SEMANTIC_SIMILARITY_ATTACK` | Adversarial examples - similar but different | 1.3 | Test robustness |
 								| `FORMAT_POISONING` | Structured data injection (JSON, XML, markdown) | 1.6 | Test structured data parsing |
 								| `LANGUAGE_MIXING` | Multilingual, code-switching, emoji | 1.2 | Test internationalization |
 								| `TOKEN_MANIPULATION` | Tokenizer edge cases, special tokens | 1.5 | Test LLM tokenization |
 								| `TEMPORAL_ATTACK` | Time-sensitive context, impossible dates | 1.1 | Test temporal reasoning |
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
+								| `CUSTOM` | User-defined mutation templates | 1.0 | Test domain-specific scenarios |
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								#### System/Network-Level Attacks
 								| Type | Description | Default Weight | When to Use |
 								|------|-------------|----------------|-------------|
 								| `HTTP_HEADER_INJECTION` | HTTP header manipulation attacks | 1.7 | Test HTTP API security |
 								| `PAYLOAD_SIZE_ATTACK` | Extremely large payloads, DoS | 1.4 | Test resource limits |
 								| `CONTENT_TYPE_CONFUSION` | MIME type manipulation | 1.5 | Test HTTP parsers |
 								| `QUERY_PARAMETER_POISONING` | Malicious query parameters | 1.6 | Test GET-based APIs |
 								| `REQUEST_METHOD_ATTACK` | HTTP method confusion | 1.3 | Test REST APIs |
 								| `PROTOCOL_LEVEL_ATTACK` | Protocol-level exploits (request smuggling) | 1.8 | Test protocol handling |
 								| `RESOURCE_EXHAUSTION` | CPU/memory exhaustion, DoS | 1.5 | Test production resilience |
 								| `CONCURRENT_REQUEST_PATTERN` | Race conditions, concurrent state | 1.4 | Test high-traffic agents |
 								| `TIMEOUT_MANIPULATION` | Timeout handling, slow requests | 1.3 | Test timeout resilience |
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
+								**Mutation Strategy:**
 								Choose mutation types based on your testing goals:
-												Enhance documentation to reflect the addition of 22+ mutation types in Flakestorm, including advanced prompt-level and system/network-level attacks. Update README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, USAGE_GUIDE.md, and related files to improve clarity on mutation strategies, testing scenarios, and configuration options. Emphasize the importance of comprehensive testing for production AI agents and provide detailed descriptions for each mutation type.

											
										
										
											2026-01-05 22:21:27 +08:00
+								- **Comprehensive**: Use all 22+ types for complete coverage
 								- **Security-focused**: Emphasize `PROMPT_INJECTION`, `ADVANCED_JAILBREAK`, `PROTOCOL_LEVEL_ATTACK`, `HTTP_HEADER_INJECTION`
 								- **UX-focused**: Emphasize `NOISE`, `TONE_SHIFT`, `CONTEXT_MANIPULATION`, `LANGUAGE_MIXING`
 								- **Infrastructure-focused**: Emphasize all system/network-level types
 								- **Edge case testing**: Emphasize `LENGTH_EXTREMES`, `ENCODING_ATTACKS`, `TOKEN_MANIPULATION`, `RESOURCE_EXHAUSTION`
-												Enhance mutation capabilities by adding three new types: encoding_attacks, context_manipulation, and length_extremes. Update configuration and documentation to reflect the addition of these types, including their weights and descriptions. Revise README.md, API_SPECIFICATION.md, CONFIGURATION_GUIDE.md, and other relevant documents to provide comprehensive coverage of the new mutation strategies and their applications. Ensure all tests are updated to validate the new mutation types.

											
										
										
											2026-01-01 17:28:05 +08:00
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								#### Mutation
 								```python
 								from flakestorm import Mutation, MutationType
 								mutation = Mutation(
 								    original="Book a flight",
 								    mutated="I need to fly",
 								    type=MutationType.PARAPHRASE,
 								    weight=1.0,
 								)
 								# Properties
 								mutation.id             # Unique hash
 								mutation.is_valid()     # Validity check
 								mutation.to_dict()      # Serialize
 								mutation.character_diff # Character count difference
 								```
 								#### MutationEngine
 								```python
 								from flakestorm import MutationEngine
 								engine = MutationEngine(config.model)
 								# Verify Ollama connection
 								is_connected = await engine.verify_connection()
 								# Generate mutations
 								mutations = await engine.generate_mutations(
 								    seed_prompt="Book a flight",
 								    types=[MutationType.PARAPHRASE, MutationType.NOISE],
 								    count=10,
 								)
 								# Batch generation
 								results = await engine.generate_batch(
 								    prompts=["Prompt 1", "Prompt 2"],
 								    types=[MutationType.PARAPHRASE],
 								    count_per_prompt=5,
 								)
 								```
 								---
 								### Invariant Verification
 								#### InvariantVerifier
 								```python
 								from flakestorm import InvariantVerifier
 								verifier = InvariantVerifier(config.invariants)
 								# Verify a response
 								result = verifier.verify(
 								    response="Agent output text",
 								    latency_ms=150.0,
 								)
 								# Result properties
 								result.all_passed      # bool
 								result.passed_count    # int
 								result.failed_count    # int
 								result.checks          # list[CheckResult]
 								result.get_failed_checks()
 								result.get_passed_checks()
 								```
 								#### Built-in Checkers
 								```python
 								from flakestorm.assertions import (
 								    ContainsChecker,
 								    LatencyChecker,
 								    ValidJsonChecker,
 								    RegexChecker,
 								    SimilarityChecker,
 								    ExcludesPIIChecker,
 								    RefusalChecker,
 								)
 								```
 								#### Custom Checker
 								```python
 								from flakestorm.assertions.deterministic import BaseChecker, CheckResult
 								class MyChecker(BaseChecker):
 								    def check(self, response: str, latency_ms: float) -> CheckResult:
 								        passed = "expected" in response
 								        return CheckResult(
 								            type=self.type,
 								            passed=passed,
 								            details="Custom check result",
 								        )
 								```
 								---
 								### Test Results
 								#### TestResults
 								```python
 								results = await runner.run()
 								# Statistics
 								results.statistics.robustness_score   # 0.0-1.0
 								results.statistics.total_mutations    # int
 								results.statistics.passed_mutations   # int
 								results.statistics.failed_mutations   # int
 								results.statistics.avg_latency_ms     # float
 								results.statistics.p95_latency_ms     # float
 								results.statistics.by_type            # list[TypeStatistics]
 								# Timing
 								results.started_at    # datetime
 								results.completed_at  # datetime
 								results.duration      # seconds
 								# Mutations
 								results.mutations            # list[MutationResult]
 								results.passed_mutations     # list[MutationResult]
 								results.failed_mutations     # list[MutationResult]
 								results.get_by_type("noise") # Filter by type
 								results.get_by_prompt("...")  # Filter by prompt
 								# Serialization
 								results.to_dict()  # Full JSON-serializable dict
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
 								# V2: Resilience and contract/replay (when config has contract/replays)
 								results.resilience_scores   # dict: mutation_robustness, chaos_resilience, contract_compliance, replay_regression
 								results.contract_compliance # ContractRunResult | None (when contract run was executed)
 								# Replay results are reported via flakestorm replay run --output; see Reports below.
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								```
 								#### MutationResult
 								```python
 								for result in results.mutations:
 								    result.original_prompt   # str
 								    result.mutation          # Mutation object
 								    result.response          # str
 								    result.latency_ms        # float
 								    result.passed            # bool
 								    result.checks            # list[CheckResult]
 								    result.error             # str | None
 								    result.failed_checks     # list[CheckResult]
 								```
 								---
 								### Report Generation
 								#### HTMLReportGenerator
 								```python
 								from flakestorm.reports import HTMLReportGenerator
 								generator = HTMLReportGenerator(results)
 								# Generate HTML string
 								html = generator.generate()
 								# Save to file
 								path = generator.save()  # Auto-generated path
 								path = generator.save("custom/path/report.html")
 								```
 								#### JSONReportGenerator
 								```python
 								from flakestorm.reports import JSONReportGenerator
 								generator = JSONReportGenerator(results)
 								# Full report
 								json_str = generator.generate(pretty=True)
 								# Summary only (for CI)
 								summary = generator.generate_summary()
 								# Save
 								path = generator.save()
 								path = generator.save(summary_only=True)
 								```
 								#### TerminalReporter
 								```python
 								from flakestorm.reports import TerminalReporter
 								from rich.console import Console
 								reporter = TerminalReporter(results, Console())
 								reporter.print_summary()
 								reporter.print_type_breakdown()
 								reporter.print_failures(limit=10)
 								reporter.print_full_report()
 								```
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
+								**V2 reports:** Contract runs (`flakestorm contract run --output report.html`) and replay runs (`flakestorm replay run --output report.html`) produce HTML reports that include **suggested actions** for failed cells or sessions (e.g. add reset_endpoint, tighten invariants, fix tool behavior). See [Behavioral Contracts](BEHAVIORAL_CONTRACTS.md) and [Replay Regression](REPLAY_REGRESSION.md).
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								---
 								## CLI Commands
 								### `flakestorm init [PATH]`
 								Initialize a new configuration file.
 								```bash
 								flakestorm init                    # Creates flakestorm.yaml
 								flakestorm init config/test.yaml   # Custom path
 								flakestorm init --force            # Overwrite existing
 								```
 								### `flakestorm run`
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
+								Run reliability tests (mutation run; optionally with chaos).
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
 								```bash
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
+								flakestorm run                              # Default config (mutation only)
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								flakestorm run --config custom.yaml         # Custom config
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
+								flakestorm run --chaos                       # Apply chaos (tool/LLM faults, context_attacks) during mutation run
 								flakestorm run --chaos-only                  # Chaos-only run (no mutations); requires chaos config
 								flakestorm run --chaos-profile api_outage    # Use a built-in chaos profile
 								flakestorm run --output json                 # JSON output
 								flakestorm run --output terminal             # Terminal only
 								flakestorm run --min-score 0.9 --ci          # CI mode
 								flakestorm run --verify-only                 # Just verify setup
 								flakestorm run --quiet                       # Minimal output
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								```
 								### `flakestorm verify`
 								Verify configuration and connections.
 								```bash
 								flakestorm verify
 								flakestorm verify --config custom.yaml
 								```
 								### `flakestorm report PATH`
 								View or convert existing reports.
 								```bash
 								flakestorm report results.json              # View in terminal
 								flakestorm report results.json --output html # Convert to HTML
 								```
 								### `flakestorm score`
 								Output only the robustness score (for CI scripts).
 								```bash
 								SCORE=$(flakestorm score)
 								if (( $(echo "$SCORE >= 0.9" | bc -l) )); then
 								    echo "Passed"
 								else
 								    echo "Failed"
 								    exit 1
 								fi
 								```
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
+								### V2: `flakestorm contract run` / `validate` / `score`
 								Run behavioral contract tests (invariants × chaos matrix).
 								```bash
 								flakestorm contract run                      # Run contract matrix; progress and score in terminal
 								flakestorm contract run --output report.html  # Save HTML report with suggested actions for failed cells
 								flakestorm contract validate                 # Validate contract config only
 								flakestorm contract score                    # Output contract resilience score only
 								```
 								### V2: `flakestorm replay run` / `export`
 								Replay regression: run saved sessions and verify against a contract.
 								```bash
 								flakestorm replay run                        # Replay sessions from config (file or inline)
 								flakestorm replay run path/to/session.yaml   # Replay a single session file
 								flakestorm replay run path/to/replays/       # Replay all sessions in directory
 								flakestorm replay run --output report.html   # Save HTML report with suggested actions for failed sessions
 								flakestorm replay export --from-report FILE  # Export from an existing report
 								```
 								### V2: `flakestorm ci`
 								Run full CI pipeline: mutation run, contract run (if configured), chaos-only (if chaos configured), replay (if configured); then compute overall weighted score from `scoring.weights`.
 								```bash
 								flakestorm ci
 								flakestorm ci --config custom.yaml
 								```
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								---
 								## Environment Variables
 								| Variable | Description |
 								|----------|-------------|
 								| `OLLAMA_HOST` | Override Ollama server URL |
 								| Custom headers | Expanded in config via `${VAR}` syntax |
-												Update documentation to reflect enhancements in Flakestorm V2, including detailed descriptions of new features such as resilience scores, chaos engineering capabilities, behavioral contracts, and replay regression. Clarified API key management via environment variables, updated CLI commands, and improved test scenarios. Adjusted mutation types count to 22+ and ensured all V2 gaps are closed as per the latest specifications.

											
										
										
											2026-03-09 19:52:39 +08:00
+								**V2 — API keys (env-only):** Model API keys must not be literal in config. Use environment variables and reference them in config (e.g. `api_key: "${OPENAI_API_KEY}"`). Supported: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, etc. See [LLM Providers](LLM_PROVIDERS.md).
-												Fix .gitignore to allow docs files and add documentation files

- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files
- Add all documentation files referenced in README.md:
  - USAGE_GUIDE.md
  - CONFIGURATION_GUIDE.md
  - TEST_SCENARIOS.md
  - MODULES.md
  - DEVELOPER_FAQ.md
  - PUBLISHING.md
  - CONTRIBUTING.md
  - API_SPECIFICATION.md
  - TESTING_GUIDE.md
  - IMPLEMENTATION_CHECKLIST.md
- Pre-commit hooks fixed trailing whitespace and end-of-file formatting

											
										
										
											2025-12-29 11:32:50 +08:00
+								---
 								## Exit Codes
 								| Code | Meaning |
 								|------|---------|
 								| 0 | Success |
 								| 1 | Error (config, connection, etc.) |
 								| 1 | CI mode: Score below threshold |