diff --git a/docs/TEST_SCENARIOS.md b/docs/TEST_SCENARIOS.md
index c99ce4e..7e4ae4c 100644
--- a/docs/TEST_SCENARIOS.md
+++ b/docs/TEST_SCENARIOS.md
@@ -1,41 +1,152 @@
 # Real-World Test Scenarios
 
-This document provides concrete, real-world examples of testing AI agents with flakestorm across **all V2 pillars**: **mutation** (adversarial prompts), **environment chaos** (tool/LLM faults), **behavioral contracts** (invariants × chaos matrix), and **replay regression** (replay production incidents). Each scenario includes setup, config, and commands where applicable.
-
-**V2:** Use `version: "2.0"` in config to enable chaos, contracts, and replay. Flakestorm supports **24 mutation types** (prompt-level and system/network-level) and **max 50 mutations per run** in OSS. See [V2 Spec](V2_SPEC.md) and [V2 Audit](V2_AUDIT.md).
+This document provides concrete, real-world examples of testing AI agents with flakestorm: environment chaos (tool/LLM faults), behavioral contracts (invariants × chaos matrix), replay regression, and adversarial mutations. Each scenario includes setup, config, and commands where applicable. Flakestorm supports **24 mutation types** and **max 50 mutations per run** in OSS. See [Configuration Guide](CONFIGURATION_GUIDE.md), [Spec](V2_SPEC.md), and [Audit](V2_AUDIT.md).
 
 ---
 
 ## Table of Contents
 
-### V2 scenarios (all pillars)
+### Scenarios with tool calling, chaos, contract, and replay
 
-- [V2 Scenario: Environment Chaos](#v2-scenario-environment-chaos) — Tool/LLM fault injection
-- [V2 Scenario: Behavioral Contract × Chaos Matrix](#v2-scenario-behavioral-contract--chaos-matrix) — Invariants under each chaos scenario
-- [V2 Scenario: Replay Regression](#v2-scenario-replay-regression) — Replay production failures
-- [Full V2 example (chaos + contract + replay)](../examples/v2_research_agent/README.md) — Working agent and config
+1. [Research Agent with Search Tool](#scenario-1-research-agent-with-search-tool) — Search tool + LLM; chaos + contract
+2. [Support Agent with KB Tool and Replay](#scenario-2-support-agent-with-kb-tool-and-replay) — KB tool; chaos + contract + replay
+3. [Autonomous Planner with Multi-Tool Chain](#scenario-3-autonomous-planner-with-multi-tool-chain) — Multi-step agent (weather + calendar); chaos + contract
+4. [Booking Agent with Calendar and Payment Tools](#scenario-4-booking-agent-with-calendar-and-payment-tools) — Two tools; chaos matrix + replay
+5. [Data Pipeline Agent with Replay](#scenario-5-data-pipeline-agent-with-replay) — Pipeline tool; contract + replay regression
+6. [Quick reference](#quick-reference-commands-and-config)
 
-### Mutation-focused scenarios (agent + config examples)
+### Additional scenarios (agent + config examples)
 
-1. [Scenario 1: Customer Service Chatbot](#scenario-1-customer-service-chatbot)
-2. [Scenario 2: Code Generation Agent](#scenario-2-code-generation-agent)
-3. [Scenario 3: RAG-Based Q&A Agent](#scenario-3-rag-based-qa-agent)
-4. [Scenario 4: Multi-Tool Agent (LangChain)](#scenario-4-multi-tool-agent-langchain)
-5. [Scenario 5: Guardrailed Agent (Safety Testing)](#scenario-5-guardrailed-agent-safety-testing)
-6. [Integration Guide](#integration-guide)
+7. [Customer Service Chatbot](#scenario-6-customer-service-chatbot)
+8. [Code Generation Agent](#scenario-7-code-generation-agent)
+9. [RAG-Based Q&A Agent](#scenario-8-rag-based-qa-agent)
+10. [Multi-Tool Agent (LangChain)](#scenario-9-multi-tool-agent-langchain)
+11. [Guardrailed Agent (Safety Testing)](#scenario-10-guardrailed-agent-safety-testing)
+12. [Integration Guide](#integration-guide)
 
 ---
 
-## V2 Scenario: Environment Chaos
+## Scenario 1: Research Agent with Search Tool
 
-**Goal:** Test that your agent degrades gracefully when tools or the LLM fail (timeouts, errors, rate limits, malformed responses).
+### The Agent
 
-**Commands:** `flakestorm run --chaos` (mutations + chaos) or `flakestorm run --chaos --chaos-only` (golden prompts only, under chaos). Use `--chaos-profile api_outage` (or `degraded_llm`, `hostile_tools`, `high_latency`, `cascading_failure`) for built-in profiles.
+A research assistant that **actually calls a search tool** over HTTP, then sends the query and search results to an LLM. We test it under environment chaos (tool/LLM faults) and a behavioral contract (must cite source, must complete).
 
-**Config (excerpt):**
+### Search Tool (Actual HTTP Service)
+
+The agent calls this service to fetch search results. For a single-endpoint HTTP agent, Flakestorm uses `tool: "*"` to fault the request to the agent, or use `match_url` when the agent makes outbound calls (see [Environment Chaos](ENVIRONMENT_CHAOS.md)).
+
+```python
+# search_service.py — run on port 5001
+from fastapi import FastAPI
+
+app = FastAPI(title="Search Tool")
+
+@app.get("/search")
+def search(q: str):
+    """Simulated search API. In production this might call a real search engine."""
+    results = [
+        {"title": "Wikipedia: " + q, "snippet": "According to Wikipedia, " + q + " is a topic."},
+        {"title": "Source A", "snippet": "Per Source A, " + q + " has been documented."},
+    ]
+    return {"query": q, "results": results}
+```
+
+### Agent Code (Actual Tool Calling)
+
+The agent receives the user query, **calls the search tool** via HTTP, then calls the LLM with the query and results.
+
+```python
+# research_agent.py — run on port 8790
+import os
+import httpx
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Research Agent with Search Tool")
+
+SEARCH_URL = os.environ.get("SEARCH_URL", "http://localhost:5001/search")
+OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434/api/generate")
+MODEL = os.environ.get("OLLAMA_MODEL", "gemma3:1b")
+
+class InvokeRequest(BaseModel):
+    input: str | None = None
+    prompt: str | None = None
+
+class InvokeResponse(BaseModel):
+    result: str
+
+def call_search(query: str) -> str:
+    """Actual tool call: HTTP GET to search service."""
+    r = httpx.get(SEARCH_URL, params={"q": query}, timeout=10.0)
+    r.raise_for_status()
+    data = r.json()
+    snippets = [x.get("snippet", "") for x in data.get("results", [])[:3]]
+    return "\n".join(snippets) if snippets else "No results found."
+
+def call_llm(user_query: str, search_context: str) -> str:
+    """Call LLM with user query and tool output."""
+    prompt = f"""You are a research assistant. Use the following search results to answer. Always cite the source.
+
+Search results:
+{search_context}
+
+User question: {user_query}
+
+Answer (2-4 sentences, must cite source):"""
+    r = httpx.post(
+        OLLAMA_URL,
+        json={"model": MODEL, "prompt": prompt, "stream": False},
+        timeout=60.0,
+    )
+    r.raise_for_status()
+    return (r.json().get("response") or "").strip()
+
+@app.post("/reset")
+def reset():
+    return {"ok": True}
+
+@app.post("/invoke", response_model=InvokeResponse)
+def invoke(req: InvokeRequest):
+    text = (req.input or req.prompt or "").strip()
+    if not text:
+        return InvokeResponse(result="Please ask a question.")
+    try:
+        search_context = call_search(text)   # actual tool call
+        answer = call_llm(text, search_context)
+        return InvokeResponse(result=answer)
+    except Exception as e:
+        return InvokeResponse(
+            result="According to [system], the search or model failed. Please try again."
+        )
+```
+
+### flakestorm Configuration
 
 ```yaml
 version: "2.0"
+agent:
+  endpoint: "http://localhost:8790/invoke"
+  type: http
+  method: POST
+  request_template: '{"input": "{prompt}"}'
+  response_path: "result"
+  timeout: 15000
+  reset_endpoint: "http://localhost:8790/reset"
+model:
+  provider: ollama
+  name: gemma3:1b
+  base_url: "http://localhost:11434"
+golden_prompts:
+  - "What is the capital of France?"
+  - "Summarize the benefits of renewable energy."
+mutations:
+  count: 5
+  types: [paraphrase, noise, prompt_injection]
+invariants:
+  - type: latency
+    max_ms: 30000
+  - type: output_not_empty
 chaos:
   tool_faults:
     - tool: "*"
@@ -46,35 +157,18 @@ chaos:
     - mode: truncated_response
       max_tokens: 5
       probability: 0.2
-```
-
-**Docs:** [Environment Chaos](ENVIRONMENT_CHAOS.md), [V2 Audit §8.1](V2_AUDIT.md#1-prd-81--environment-chaos). **Working example:** [v2_research_agent](../examples/v2_research_agent/README.md).
-
----
-
-## V2 Scenario: Behavioral Contract × Chaos Matrix
-
-**Goal:** Verify that named invariants (with severity) hold under every chaos scenario; each (invariant × scenario) cell is an independent run. Optional `agent.reset_endpoint` or `agent.reset_function` for state isolation.
-
-**Commands:** `flakestorm contract run`, `flakestorm contract validate`, `flakestorm contract score`.
-
-**Config (excerpt):**
-
-```yaml
-version: "2.0"
-agent:
-  reset_endpoint: "http://localhost:8790/reset"
 contract:
-  name: "My Contract"
+  name: "Research Agent Contract"
   invariants:
-    - id: must-cite
+    - id: must-cite-source
       type: regex
-      pattern: "(?i)(source|according to)"
+      pattern: "(?i)(source|according to|per )"
       severity: critical
-    - id: max-latency
-      type: latency
-      max_ms: 60000
-      severity: medium
+      when: always
+    - id: completes
+      type: completes
+      severity: high
+      when: always
   chaos_matrix:
     - name: "no-chaos"
       tool_faults: []
@@ -84,38 +178,616 @@ contract:
         - tool: "*"
           mode: error
           error_code: 503
+output:
+  format: html
+  path: "./reports"
 ```
 
-**Docs:** [Behavioral Contracts](BEHAVIORAL_CONTRACTS.md), [V2 Spec](V2_SPEC.md) (contract matrix isolation, resilience score). **Working example:** [v2_research_agent](../examples/v2_research_agent/README.md).
+### Running the Test
+
+```bash
+# Terminal 1: Search tool
+uvicorn search_service:app --host 0.0.0.0 --port 5001
+# Terminal 2: Agent (requires Ollama with gemma3:1b)
+uvicorn research_agent:app --host 0.0.0.0 --port 8790
+# Terminal 3: Flakestorm
+flakestorm run -c flakestorm.yaml
+flakestorm run -c flakestorm.yaml --chaos
+flakestorm contract run -c flakestorm.yaml
+flakestorm ci -c flakestorm.yaml --min-score 0.5
+```
+
+### What We're Testing
+
+| Pillar | What runs | What we verify |
+|--------|-----------|----------------|
+| **Mutation** | Adversarial prompts to agent (calls search + LLM) | Robustness to typos, paraphrases, injection. |
+| **Chaos** | Tool 503 to agent, LLM truncated | Agent degrades gracefully (fallback, cites source when possible). |
+| **Contract** | Contract x chaos matrix (no-chaos, api-outage) | Must cite source (critical), must complete (high); auto-FAIL if critical fails. |
 
 ---
 
-## V2 Scenario: Replay Regression
+## Scenario 2: Support Agent with KB Tool and Replay
 
-**Goal:** Replay a saved session (e.g. production incident) with fixed inputs and tool responses, then verify the agent’s output against a contract.
+### The Agent
 
-**Commands:** `flakestorm replay run path/to/session.yaml -c flakestorm.yaml`, `flakestorm replay export --from-report report.json -o ./replays/`. Optional: `flakestorm replay run --from-langsmith RUN_ID --run` to import from LangSmith and run.
+A customer support agent that **actually calls a knowledge-base (KB) tool** to fetch articles, then answers the user. We add a **replay session** from a production incident to verify the fix.
 
-**Config (excerpt):**
+### KB Tool (Actual HTTP Service)
+
+```python
+# kb_service.py — run on port 5002
+from fastapi import FastAPI
+from fastapi.responses import JSONResponse
+
+app = FastAPI(title="KB Tool")
+ARTICLES = {
+    "reset-password": "To reset your password: go to Account > Security > Reset password. You will receive an email with a link.",
+    "cancel-subscription": "To cancel: Account > Billing > Cancel subscription. Refunds apply within 14 days.",
+}
+
+@app.get("/kb/article")
+def get_article(article_id: str):
+    """Actual tool: fetch KB article by ID."""
+    if article_id not in ARTICLES:
+        return JSONResponse(status_code=404, content={"error": "Article not found"})
+    return {"article_id": article_id, "content": ARTICLES[article_id]}
+```
+
+### Agent Code (Actual Tool Calling)
+
+The agent parses the user question, **calls the KB tool** to get the article, then formats a response.
+
+```python
+# support_agent.py — run on port 8791
+import httpx
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Support Agent with KB Tool")
+KB_URL = "http://localhost:5002/kb/article"
+
+class InvokeRequest(BaseModel):
+    input: str | None = None
+    prompt: str | None = None
+
+class InvokeResponse(BaseModel):
+    result: str
+
+def extract_article_id(query: str) -> str:
+    q = query.lower()
+    if "password" in q or "reset" in q:
+        return "reset-password"
+    if "cancel" in q or "subscription" in q:
+        return "cancel-subscription"
+    return "reset-password"
+
+def call_kb(article_id: str) -> str:
+    """Actual tool call: HTTP GET to KB service."""
+    r = httpx.get(KB_URL, params={"article_id": article_id}, timeout=5.0)
+    if r.status_code != 200:
+        return f"[KB error: {r.status_code}]"
+    return r.json().get("content", "")
+
+@app.post("/reset")
+def reset():
+    return {"ok": True}
+
+@app.post("/invoke", response_model=InvokeResponse)
+def invoke(req: InvokeRequest):
+    text = (req.input or req.prompt or "").strip()
+    if not text:
+        return InvokeResponse(result="Please describe your issue.")
+    try:
+        article_id = extract_article_id(text)
+        content = call_kb(article_id)   # actual tool call
+        if not content or content.startswith("[KB error"):
+            return InvokeResponse(result="I could not find that article. Please contact support.")
+        return InvokeResponse(result=f"Here is what I found:\n\n{content}")
+    except Exception as e:
+        return InvokeResponse(result=f"Support system is temporarily unavailable. Please try again.")
+```
+
+### flakestorm Configuration
 
 ```yaml
 version: "2.0"
+agent:
+  endpoint: "http://localhost:8791/invoke"
+  type: http
+  method: POST
+  request_template: '{"input": "{prompt}"}'
+  response_path: "result"
+  timeout: 10000
+  reset_endpoint: "http://localhost:8791/reset"
+golden_prompts:
+  - "How do I reset my password?"
+  - "I want to cancel my subscription."
+invariants:
+  - type: output_not_empty
+  - type: latency
+    max_ms: 15000
+chaos:
+  tool_faults:
+    - tool: "*"
+      mode: error
+      error_code: 503
+      probability: 0.25
+contract:
+  name: "Support Agent Contract"
+  invariants:
+    - id: not-empty
+      type: output_not_empty
+      severity: critical
+      when: always
+    - id: no-pii-leak
+      type: excludes_pii
+      severity: high
+      when: always
+  chaos_matrix:
+    - name: "no-chaos"
+      tool_faults: []
+      llm_faults: []
+    - name: "kb-down"
+      tool_faults:
+        - tool: "*"
+          mode: error
+          error_code: 503
 replays:
   sessions:
-    - file: "replays/incident_001.yaml"
-  # Optional: sources for LangSmith import
-  # sources: ...
+    - file: "replays/support_incident_001.yaml"
+scoring:
+  mutation: 0.20
+  chaos: 0.35
+  contract: 0.35
+  replay: 0.10
+output:
+  format: html
+  path: "./reports"
 ```
 
-**Session file (e.g. `replays/incident_001.yaml`):** `id`, `input`, `tool_responses` (optional), `contract` (name or path).
+### Replay Session (Production Incident)
 
-**Docs:** [Replay Regression](REPLAY_REGRESSION.md), [V2 Audit §8.3](V2_AUDIT.md#3-prd-83--replay-based-regression). **Working example:** [v2_research_agent](../examples/v2_research_agent/README.md).
+```yaml
+# replays/support_incident_001.yaml
+id: support-incident-001
+name: "Support agent failed when KB was down"
+source: manual
+input: "How do I reset my password?"
+tool_responses: []
+contract: "Support Agent Contract"
+```
+
+### Running the Test
+
+```bash
+# Terminal 1: KB service
+uvicorn kb_service:app --host 0.0.0.0 --port 5002
+# Terminal 2: Support agent
+uvicorn support_agent:app --host 0.0.0.0 --port 8791
+# Terminal 3: Flakestorm
+flakestorm run -c flakestorm.yaml
+flakestorm contract run -c flakestorm.yaml
+flakestorm replay run replays/support_incident_001.yaml -c flakestorm.yaml
+flakestorm ci -c flakestorm.yaml
+```
+
+### What We're Testing
+
+| Pillar | What runs | What we verify |
+|--------|-----------|----------------|
+| **Mutation** | Adversarial prompts to agent (calls KB tool) | Robustness to noisy/paraphrased support questions. |
+| **Chaos** | Tool 503 to agent | Agent returns graceful message instead of crashing. |
+| **Contract** | Invariants x chaos matrix | Output not empty (critical), no PII (high). |
+| **Replay** | Replay support_incident_001.yaml | Same input passes contract (regression for production incident). |
 
 ---
 
+## Scenario 3: Autonomous Planner with Multi-Tool Chain
+
+### The Agent
+
+An autonomous planner that chains multiple tool calls: it calls a weather tool, then a calendar tool, then formats a response. We test it under chaos (one tool fails) and a behavioral contract (response must complete and include a summary).
+
+### Tools (Weather + Calendar)
+
+```python
+# tools_planner.py — run on port 5010
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Planner Tools")
+
+@app.get("/weather")
+def weather(city: str):
+    return {"city": city, "temp": 72, "condition": "Sunny"}
+
+@app.get("/calendar")
+def calendar(date: str):
+    return {"date": date, "events": ["Meeting 10am", "Lunch 12pm"]}
+
+@app.post("/reset")
+def reset():
+    return {"ok": True}
+```
+
+### Agent Code (Multi-Step Tool Chain)
+
+```python
+# planner_agent.py — port 8792
+import httpx
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Autonomous Planner Agent")
+BASE = "http://localhost:5010"
+
+class InvokeRequest(BaseModel):
+    input: str | None = None
+    prompt: str | None = None
+
+class InvokeResponse(BaseModel):
+    result: str
+
+@app.post("/reset")
+def reset():
+    httpx.post(f"{BASE}/reset")
+    return {"ok": True}
+
+@app.post("/invoke", response_model=InvokeResponse)
+def invoke(req: InvokeRequest):
+    text = (req.input or req.prompt or "").strip()
+    if not text:
+        return InvokeResponse(result="Please provide a request.")
+    try:
+        w = httpx.get(f"{BASE}/weather", params={"city": "Boston"}, timeout=5.0)
+        weather_data = w.json() if w.status_code == 200 else {}
+        c = httpx.get(f"{BASE}/calendar", params={"date": "today"}, timeout=5.0)
+        cal_data = c.json() if c.status_code == 200 else {}
+        summary = f"Weather: {weather_data.get('condition', 'N/A')}. Calendar: {len(cal_data.get('events', []))} events."
+        return InvokeResponse(result=f"Summary: {summary}")
+    except Exception as e:
+        return InvokeResponse(result=f"Summary: Planning unavailable ({type(e).__name__}).")
+```
+
+### flakestorm Configuration
+
+```yaml
+version: "2.0"
+agent:
+  endpoint: "http://localhost:8792/invoke"
+  type: http
+  method: POST
+  request_template: '{"input": "{prompt}"}'
+  response_path: "result"
+  timeout: 10000
+  reset_endpoint: "http://localhost:8792/reset"
+golden_prompts:
+  - "What is the weather and my schedule for today?"
+invariants:
+  - type: output_not_empty
+  - type: latency
+    max_ms: 15000
+chaos:
+  tool_faults:
+    - tool: "*"
+      mode: error
+      error_code: 503
+      probability: 0.3
+contract:
+  name: "Planner Contract"
+  invariants:
+    - id: completes
+      type: completes
+      severity: critical
+      when: always
+  chaos_matrix:
+    - name: "no-chaos"
+      tool_faults: []
+      llm_faults: []
+    - name: "tool-down"
+      tool_faults:
+        - tool: "*"
+          mode: error
+          error_code: 503
+output:
+  format: html
+  path: "./reports"
+```
+
+### Running the Test
+
+```bash
+uvicorn tools_planner:app --host 0.0.0.0 --port 5010
+uvicorn planner_agent:app --host 0.0.0.0 --port 8792
+flakestorm run -c flakestorm.yaml
+flakestorm run -c flakestorm.yaml --chaos
+flakestorm contract run -c flakestorm.yaml
+```
+
+### What We're Testing
+
+| Pillar | What runs | What we verify |
+|--------|-----------|----------------|
+| **Chaos** | Tool 503 to agent | Agent returns summary or graceful fallback. |
+| **Contract** | Invariants × chaos matrix (no-chaos, tool-down) | Must complete (critical). |
+
 ---
 
-## Scenario 1: Customer Service Chatbot
+## Scenario 4: Booking Agent with Calendar and Payment Tools
+
+### The Agent
+
+A booking agent that calls a calendar API and a payment API to reserve a slot and confirm. We test under chaos (payment tool fails in one scenario) and replay a production incident.
+
+### Tools (Calendar + Payment)
+
+```python
+# booking_tools.py — port 5011
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Booking Tools")
+
+@app.post("/calendar/reserve")
+def reserve_slot(slot: str):
+    return {"slot": slot, "confirmed": True, "id": "CAL-001"}
+
+@app.post("/payment/confirm")
+def confirm_payment(amount: float, ref: str):
+    return {"ref": ref, "status": "paid", "amount": amount}
+```
+
+### Agent Code
+
+```python
+# booking_agent.py — port 8793
+import httpx
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Booking Agent")
+BASE = "http://localhost:5011"
+
+class InvokeRequest(BaseModel):
+    input: str | None = None
+    prompt: str | None = None
+
+class InvokeResponse(BaseModel):
+    result: str
+
+@app.post("/reset")
+def reset():
+    return {"ok": True}
+
+@app.post("/invoke", response_model=InvokeResponse)
+def invoke(req: InvokeRequest):
+    text = (req.input or req.prompt or "").strip()
+    if not text:
+        return InvokeResponse(result="Please provide booking details.")
+    try:
+        r = httpx.post(f"{BASE}/calendar/reserve", json={"slot": "10:00"}, timeout=5.0)
+        cal = r.json() if r.status_code == 200 else {}
+        p = httpx.post(f"{BASE}/payment/confirm", json={"amount": 0, "ref": "BK-1"}, timeout=5.0)
+        pay = p.json() if p.status_code == 200 else {}
+        if cal.get("confirmed") and pay.get("status") == "paid":
+            return InvokeResponse(result=f"Booked. Ref: {pay.get('ref', 'N/A')}.")
+        return InvokeResponse(result="Booking could not be completed.")
+    except Exception as e:
+        return InvokeResponse(result=f"Booking unavailable ({type(e).__name__}).")
+```
+
+### flakestorm Configuration
+
+```yaml
+version: "2.0"
+agent:
+  endpoint: "http://localhost:8793/invoke"
+  type: http
+  method: POST
+  request_template: '{"input": "{prompt}"}'
+  response_path: "result"
+  timeout: 10000
+  reset_endpoint: "http://localhost:8793/reset"
+golden_prompts:
+  - "Book a slot at 10am and confirm payment."
+invariants:
+  - type: output_not_empty
+chaos:
+  tool_faults:
+    - tool: "*"
+      mode: error
+      error_code: 503
+      probability: 0.25
+contract:
+  name: "Booking Contract"
+  invariants:
+    - id: not-empty
+      type: output_not_empty
+      severity: critical
+      when: always
+  chaos_matrix:
+    - name: "no-chaos"
+      tool_faults: []
+      llm_faults: []
+    - name: "payment-down"
+      tool_faults:
+        - tool: "*"
+          mode: error
+          error_code: 503
+replays:
+  sessions:
+    - file: "replays/booking_incident_001.yaml"
+output:
+  format: html
+  path: "./reports"
+```
+
+### Replay Session
+
+```yaml
+# replays/booking_incident_001.yaml
+id: booking-incident-001
+name: "Booking failed when payment returned 503"
+source: manual
+input: "Book a slot at 10am and confirm payment."
+contract: "Booking Contract"
+```
+
+### Running the Test
+
+```bash
+uvicorn booking_tools:app --host 0.0.0.0 --port 5011
+uvicorn booking_agent:app --host 0.0.0.0 --port 8793
+flakestorm run -c flakestorm.yaml
+flakestorm contract run -c flakestorm.yaml
+flakestorm replay run replays/booking_incident_001.yaml -c flakestorm.yaml
+flakestorm ci -c flakestorm.yaml
+```
+
+### What We're Testing
+
+| Pillar | What runs | What we verify |
+|--------|-----------|----------------|
+| **Chaos** | Tool 503 | Agent returns clear message when payment/calendar fails. |
+| **Contract** | Invariants × chaos matrix | Output not empty (critical). |
+| **Replay** | booking_incident_001.yaml | Same input passes contract. |
+
+---
+
+## Scenario 5: Data Pipeline Agent with Replay
+
+### The Agent
+
+An agent that triggers a data pipeline via a tool and returns the run status. We verify behavior with a contract and replay a failed pipeline run.
+
+### Pipeline Tool
+
+```python
+# pipeline_tool.py — port 5012
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Pipeline Tool")
+
+@app.post("/pipeline/run")
+def run_pipeline(job_id: str):
+    return {"job_id": job_id, "status": "success", "rows_processed": 1000}
+```
+
+### Agent Code
+
+```python
+# pipeline_agent.py — port 8794
+import httpx
+from fastapi import FastAPI
+from pydantic import BaseModel
+
+app = FastAPI(title="Data Pipeline Agent")
+BASE = "http://localhost:5012"
+
+class InvokeRequest(BaseModel):
+    input: str | None = None
+    prompt: str | None = None
+
+class InvokeResponse(BaseModel):
+    result: str
+
+@app.post("/reset")
+def reset():
+    return {"ok": True}
+
+@app.post("/invoke", response_model=InvokeResponse)
+def invoke(req: InvokeRequest):
+    text = (req.input or req.prompt or "").strip()
+    if not text:
+        return InvokeResponse(result="Please specify a pipeline job.")
+    try:
+        r = httpx.post(f"{BASE}/pipeline/run", json={"job_id": "daily_etl"}, timeout=30.0)
+        data = r.json() if r.status_code == 200 else {}
+        status = data.get("status", "unknown")
+        return InvokeResponse(result=f"Pipeline run: {status}. Rows: {data.get('rows_processed', 0)}.")
+    except Exception as e:
+        return InvokeResponse(result=f"Pipeline run failed ({type(e).__name__}).")
+```
+
+### flakestorm Configuration
+
+```yaml
+version: "2.0"
+agent:
+  endpoint: "http://localhost:8794/invoke"
+  type: http
+  method: POST
+  request_template: '{"input": "{prompt}"}'
+  response_path: "result"
+  timeout: 35000
+  reset_endpoint: "http://localhost:8794/reset"
+golden_prompts:
+  - "Run the daily ETL pipeline."
+invariants:
+  - type: output_not_empty
+  - type: latency
+    max_ms: 60000
+contract:
+  name: "Pipeline Contract"
+  invariants:
+    - id: not-empty
+      type: output_not_empty
+      severity: critical
+      when: always
+  chaos_matrix:
+    - name: "no-chaos"
+      tool_faults: []
+      llm_faults: []
+replays:
+  sessions:
+    - file: "replays/pipeline_fail_001.yaml"
+output:
+  format: html
+  path: "./reports"
+```
+
+### Replay Session
+
+```yaml
+# replays/pipeline_fail_001.yaml
+id: pipeline-fail-001
+name: "Pipeline agent returned empty on timeout"
+source: manual
+input: "Run the daily ETL pipeline."
+contract: "Pipeline Contract"
+```
+
+### Running the Test
+
+```bash
+uvicorn pipeline_tool:app --host 0.0.0.0 --port 5012
+uvicorn pipeline_agent:app --host 0.0.0.0 --port 8794
+flakestorm run -c flakestorm.yaml
+flakestorm contract run -c flakestorm.yaml
+flakestorm replay run replays/pipeline_fail_001.yaml -c flakestorm.yaml
+```
+
+### What We're Testing
+
+| Pillar | What runs | What we verify |
+|--------|-----------|----------------|
+| **Contract** | Invariants × chaos matrix | Output not empty (critical). |
+| **Replay** | pipeline_fail_001.yaml | Regression: same input passes contract after fix. |
+
+---
+
+## Quick reference: commands and config
+
+- **Environment chaos:** [Environment Chaos](ENVIRONMENT_CHAOS.md). Use `match_url` for per-URL fault injection when your agent makes outbound HTTP calls.
+- **Behavioral contracts:** [Behavioral Contracts](BEHAVIORAL_CONTRACTS.md). Reset: `agent.reset_endpoint` or `agent.reset_function`.
+- **Replay regression:** [Replay Regression](REPLAY_REGRESSION.md).
+- **Full example:** [Research Agent example](../examples/v2_research_agent/README.md).
+
+---
+
+## Scenario 6: Customer Service Chatbot
 
 ### The Agent
 
@@ -267,7 +939,7 @@ flakestorm run --output html
 
 ---
 
-## Scenario 2: Code Generation Agent
+## Scenario 7: Code Generation Agent
 
 ### The Agent
 
@@ -373,7 +1045,7 @@ invariants:
 
 ---
 
-## Scenario 3: RAG-Based Q&A Agent
+## Scenario 8: RAG-Based Q&A Agent
 
 ### The Agent
 
@@ -453,7 +1125,7 @@ invariants:
 
 ---
 
-## Scenario 4: Multi-Tool Agent (LangChain)
+## Scenario 9: Multi-Tool Agent (LangChain)
 
 ### The Agent
 
@@ -557,7 +1229,7 @@ invariants:
 
 ---
 
-## Scenario 5: Guardrailed Agent (Safety Testing)
+## Scenario 10: Guardrailed Agent (Safety Testing)
 
 ### The Agent