diff --git a/README.md b/README.md
index 31bc9b7..f8e299d 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@
   <a href="https://pypi.org/project/flakestorm/">
     <img src="https://img.shields.io/pypi/dm/flakestorm.svg" alt="PyPI downloads">
   </a>
-  
+
   <a href="https://github.com/flakestorm/flakestorm/releases">
     <img src="https://img.shields.io/github/v/release/flakestorm/flakestorm" alt="Latest Release">
   </a>
@@ -134,10 +134,12 @@ Flakestorm supports several modes; you can use one or combine them:
 - **Chaos only** — Golden prompts → agent with fault-injected tools/LLM → invariants. *Does the agent handle bad environments?*
 - **Contract** — Golden prompts → agent under each chaos scenario → verify named invariants across a matrix. *Does the agent obey its rules under every failure mode?*
 - **Replay** — Recorded production input + recorded tool responses → agent → contract. *Did we fix this incident?*
-- **Mutation (optional)** — Golden prompts → adversarial mutations (22+ types, max 50/run) → agent (optionally under chaos) → invariants. *Does the agent handle bad inputs (and optionally bad environments)?*
+- **Mutation (optional)** — Golden prompts → adversarial mutations (24 types, max 50/run) → agent (optionally under chaos) → invariants. *Does the agent handle bad inputs (and optionally bad environments)?*
 
 You define **golden prompts**, **invariants** (or a full **contract** with severity and chaos matrix), and optionally **chaos** (tool/LLM faults) and **replay** sessions. Flakestorm runs the chosen mode(s), checks responses against your rules, and produces a **robustness score** (mutation or chaos-only runs) or **resilience score** (contract run), plus HTML report. Use `flakestorm run`, `flakestorm contract run`, `flakestorm replay run`, or `flakestorm ci` for the combined overall score (OSS: run from CLI or your own scripts; **native CI/CD integrations** — scheduled runs, pipeline plugins — are **Cloud only**).
 
+For the full **V1 vs V2 flow** (mutation-only vs four pillars, contract matrix isolation, resilience score formula), see the [Usage Guide](docs/USAGE_GUIDE.md#how-it-works).
+
 > **Note**: Mutation generation uses a local LLM (Ollama) or cloud APIs (OpenAI, Claude, Gemini). API keys via environment variables only. See [LLM Providers](docs/LLM_PROVIDERS.md).
 
 ## Features
@@ -150,7 +152,7 @@ You define **golden prompts**, **invariants** (or a full **contract** with sever
 
 ### Supporting capabilities
 
-- **Adversarial mutations** — 22+ mutation types (prompt-level and system/network-level); max 50 mutations per run in OSS. [→ Test Scenarios](docs/TEST_SCENARIOS.md)
+- **Adversarial mutations** — 24 mutation types (prompt-level and system/network-level); max 50 mutations per run in OSS. [→ Test Scenarios](docs/TEST_SCENARIOS.md)
 - **Invariants & assertions** — Deterministic checks, semantic similarity, safety (PII, refusal); configurable per contract.
 - **Robustness score** — For mutation runs: a single weighted score (0–1) of how well the agent handled adversarial prompts. Reported in HTML/JSON and CLI (`results.statistics.robustness_score`).
 - **Unified resilience score** — For full CI: weighted combination of **mutation robustness**, chaos resilience, contract compliance, and replay regression; weights (mutation, chaos, contract, replay) configurable in YAML and must sum to 1.0.
@@ -163,7 +165,7 @@ You define **golden prompts**, **invariants** (or a full **contract** with sever
 ## Open Source vs Cloud
 
 **Open Source (Always Free):**
-- Core chaos engine with all 22+ mutation types (max 50 per run; no artificial feature gating)
+- Core chaos engine with all 24 mutation types (max 50 per run; no artificial feature gating)
 - Local execution for fast experimentation
 - Run from CLI or your own scripts (no native CI/CD; that’s Cloud only)
 - Full transparency and extensibility
@@ -276,4 +278,3 @@ Apache 2.0 - See [LICENSE](LICENSE) for details.
 <p align="center">
   ❤️ <a href="https://github.com/sponsors/flakestorm">Sponsor Flakestorm on GitHub</a>
 </p>
- 
\ No newline at end of file
diff --git a/docs/USAGE_GUIDE.md b/docs/USAGE_GUIDE.md
index 9dcba8a..6a73911 100644
--- a/docs/USAGE_GUIDE.md
+++ b/docs/USAGE_GUIDE.md
@@ -25,7 +25,10 @@ This comprehensive guide walks you through using flakestorm to test your AI agen
 
 ### What is flakestorm?
 
-flakestorm is an **adversarial testing framework** and **chaos engineering platform** for AI agents. It applies chaos engineering principles to systematically test how your AI agents behave under unexpected, malformed, or adversarial inputs. With **V2** (`version: "2.0"` in config) you get environment chaos (tool/LLM faults, context attacks), behavioral contracts (invariants × chaos matrix), and replay regression; **22+ mutation types** and **max 50 mutations per run** in OSS. API keys for cloud LLM providers must be set via environment variables only (e.g. `api_key: "${OPENAI_API_KEY}"`). See [Configuration Guide](CONFIGURATION_GUIDE.md) and [V2 Spec](V2_SPEC.md).
+flakestorm is an **adversarial testing framework** and **chaos engineering platform** for AI agents. It applies chaos engineering principles to systematically test how your AI agents behave under unexpected, malformed, or adversarial inputs.
+
+- **V1** (`version: "1.0"` or omitted): Mutation-only mode — golden prompts → mutation engine → agent → invariants → **robustness score**. Ideal for quick adversarial input testing.
+- **V2** (`version: "2.0"` in config): Full chaos platform — **Environment Chaos** (tool/LLM faults, context attacks), **Behavioral Contracts** (invariants × chaos matrix with per-cell isolation), and **Replay Regression** (replay production incidents). You get **24 mutation types** and **max 50 mutations per run** in OSS; plus `flakestorm run --chaos`, `flakestorm contract run`, `flakestorm replay run`, and `flakestorm ci` for a unified **resilience score**. API keys for cloud LLM providers must be set via environment variables only (e.g. `api_key: "${OPENAI_API_KEY}"`). See [Configuration Guide](CONFIGURATION_GUIDE.md), [V2 Spec](V2_SPEC.md), and [V2 Audit](V2_AUDIT.md).
 
 ### Why Use flakestorm?
 
@@ -39,47 +42,44 @@ flakestorm is an **adversarial testing framework** and **chaos engineering platf
 
 ### How It Works
 
+Flakestorm supports **V1 (mutation-only)** and **V2 (full chaos platform)**. The flow depends on your config version and which commands you run.
+
+#### V1 / Mutation-only flow
+
+With a V1 config (or V2 config without `--chaos`), you get the classic adversarial flow:
+
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                         flakestorm FLOW                           │
+│              flakestorm V1 — MUTATION-ONLY FLOW                   │
 ├─────────────────────────────────────────────────────────────────┤
-│                                                                 │
-│   1. GOLDEN PROMPTS          2. MUTATION ENGINE                 │
-│   ┌─────────────────┐        ┌─────────────────┐               │
-│   │ "Book a flight  │  ───►  │ Local LLM       │               │
-│   │  from NYC to LA"│        │ (Qwen/Ollama)   │               │
-│   └─────────────────┘        └────────┬────────┘               │
-│                                       │                         │
-│                                       ▼                         │
-│                              ┌─────────────────┐               │
-│                              │ Mutated Prompts │               │
-│                              │ • Typos         │               │
-│                              │ • Paraphrases   │               │
-│                              │ • Injections    │               │
-│                              └────────┬────────┘               │
-│                                       │                         │
-│   3. YOUR AGENT                       ▼                         │
-│   ┌─────────────────┐        ┌─────────────────┐               │
-│   │ AI Agent        │  ◄───  │ Test Runner     │               │
-│   │ (HTTP/Python)   │        │ (Async)         │               │
-│   └────────┬────────┘        └─────────────────┘               │
-│            │                                                    │
-│            ▼                                                    │
-│   4. VERIFICATION            5. REPORTING                       │
-│   ┌─────────────────┐        ┌─────────────────┐               │
-│   │ Invariant       │  ───►  │ HTML/JSON/CLI   │               │
-│   │ Assertions      │        │ Reports         │               │
-│   └─────────────────┘        └─────────────────┘               │
-│                                       │                         │
-│                                       ▼                         │
-│                              ┌─────────────────┐               │
-│                              │ Robustness      │               │
-│                              │ Score: 0.85     │               │
-│                              └─────────────────┘               │
-│                                                                 │
+│  1. GOLDEN PROMPTS  →  2. MUTATION ENGINE (Local LLM)            │
+│     "Book a flight"       → Mutated prompts (typos, paraphrases,  │
+│                            injections, encoding, etc. — 24 types)│
+│                                        ↓                         │
+│  3. YOUR AGENT  ←  Test Runner sends each mutated prompt         │
+│     (HTTP/Python)       ↓                                         │
+│  4. INVARIANT ASSERTIONS  →  5. REPORTING                        │
+│     (contains, latency, similarity, safety)  →  Robustness Score │
 └─────────────────────────────────────────────────────────────────┘
 ```
 
+**Commands:** `flakestorm run` (no `--chaos`) → **Robustness score** (0–1).
+
+#### V2 flow — Four pillars
+
+With **`version: "2.0"`** in your config, Flakestorm adds environment chaos, behavioral contracts, and replay regression. See [V2 Spec](V2_SPEC.md) and [V2 Audit](V2_AUDIT.md).
+
+| Pillar | What runs | Score / output |
+|--------|-----------|----------------|
+| **Mutation run** | Golden prompts → 24 mutation types → agent → invariants | **Robustness score** (0–1). Use `flakestorm run` or `flakestorm run --chaos` to include chaos. |
+| **Environment chaos** | Fault injection into tools and LLM (timeouts, errors, rate limits, malformed responses, context attacks) | **Chaos resilience** (0–1). Use `flakestorm run --chaos` (with mutations) or `flakestorm run --chaos --chaos-only` (no mutations). |
+| **Behavioral contracts** | Contracts (invariants × severity) × chaos matrix scenarios; each cell is an independent run (optional reset per cell). | **Resilience score** (0–100%). Use `flakestorm contract run`. Per-contract formula: weighted by severity (critical×3, high×2, medium×1); **auto-FAIL** if any critical fails. |
+| **Replay regression** | Replay saved sessions (e.g. production incidents) and verify against a contract. | Per-session pass/fail; **replay regression** score when run via CI. Use `flakestorm replay run [path]`. |
+
+**Unified CI:** `flakestorm ci` runs mutation run, contract run (if configured), chaos-only run (if chaos configured), and all replay sessions; then computes an **overall resilience score** from `scoring.weights` (default: mutation 0.20, chaos 0.35, contract 0.35, replay 0.10). Weights must sum to 1.0.
+
+**Contract matrix isolation (V2):** Each (invariant × scenario) cell is independent. Configure `agent.reset_endpoint` (HTTP) or `agent.reset_function` (Python) to clear agent state between cells; if not set and the agent is stateful, Flakestorm warns. See [V2 Spec — Contract matrix isolation](V2_SPEC.md#contract-matrix-isolation).
+
 ---
 
 ## Installation
@@ -819,7 +819,7 @@ golden_prompts:
 
 ### Mutation Types
 
-flakestorm generates adversarial variations of your golden prompts across 22+ mutation types organized into categories:
+flakestorm generates adversarial variations of your golden prompts across 24 mutation types organized into categories:
 
 #### Prompt-Level Attacks
 
@@ -925,6 +925,21 @@ Score = (Weighted Passed Tests) / (Total Weighted Tests)
 - **0.7-0.8**: Fair - Needs work
 - **<0.7**: Poor - Significant reliability issues
 
+#### V2 Resilience Score (contract + overall)
+
+When using **V2** (`version: "2.0"`) with behavioral contracts and/or `flakestorm ci`, two additional scores apply. See [V2 Spec](V2_SPEC.md#resilience-score-formula).
+
+**Per-contract score** (for `flakestorm contract run`):
+
+```
+score = (Σ(passed_critical×3) + Σ(passed_high×2) + Σ(passed_medium×1))
+      / (Σ(total_critical×3) + Σ(total_high×2) + Σ(total_medium×1)) × 100
+```
+
+- **Automatic FAIL:** If any **critical** severity invariant fails in any scenario, the overall result is FAIL regardless of the numeric score.
+
+**Overall score** (for `flakestorm ci`): Configurable via **`scoring.weights`**. Weights must **sum to 1.0**. Default: mutation 0.20, chaos 0.35, contract 0.35, replay 0.10. The CI run combines mutation robustness, chaos resilience, contract compliance, and replay regression into one weighted overall resilience score.
+
 ---
 
 ## Understanding Mutation Types
@@ -1106,7 +1121,7 @@ flakestorm provides 22+ mutation types organized into **Prompt-Level Attacks** a
 ### Choosing Mutation Types
 
 **Comprehensive Testing (Recommended):**
-Use all 22+ types for complete coverage:
+Use all 24 types for complete coverage:
 ```yaml
 types:
   # Original 8 types
@@ -1206,7 +1221,7 @@ The 22+ mutation types work together to provide comprehensive robustness testing
 - **Infrastructure**: HTTP Header Injection, Payload Size Attack, Content-Type Confusion, Query Parameter Poisoning, Request Method Attack, Protocol-Level Attack, Resource Exhaustion, Concurrent Request Pattern, Timeout Manipulation
 - **Temporal/Context**: Temporal Attack, Multi-Turn Attack
 
-For comprehensive testing, use all 22+ types. For focused testing:
+For comprehensive testing, use all 24 types. For focused testing:
 - **Security-focused**: Emphasize Prompt Injection, Advanced Jailbreak, Protocol-Level Attack, HTTP Header Injection
 - **UX-focused**: Emphasize Noise, Tone Shift, Context Manipulation, Language Mixing
 - **Infrastructure-focused**: Emphasize all system/network-level types