mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 00:36:54 +02:00
Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.
This commit is contained in:
parent
59cca61f3c
commit
9c3450a75d
63 changed files with 4147 additions and 134 deletions
76
examples/v2_research_agent/README.md
Normal file
76
examples/v2_research_agent/README.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# V2 Research Assistant — Flakestorm v2 Example
|
||||
|
||||
A **working** HTTP agent and v2.0 config that demonstrates all three V2 pillars: **Environment Chaos**, **Behavioral Contracts**, and **Replay-Based Regression**.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- Ollama running (for mutation generation): `ollama run gemma3:1b` or any model
|
||||
- Optional: `pip install fastapi uvicorn` (agent server)
|
||||
|
||||
## 1. Start the agent
|
||||
|
||||
From the project root or this directory:
|
||||
|
||||
```bash
|
||||
cd examples/v2_research_agent
|
||||
uvicorn agent:app --host 0.0.0.0 --port 8790
|
||||
```
|
||||
|
||||
Or: `python agent.py` (uses port 8790 by default).
|
||||
|
||||
Verify: `curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"`
|
||||
|
||||
## 2. Run Flakestorm v2 commands
|
||||
|
||||
From the **project root** (so `flakestorm` and config paths resolve):
|
||||
|
||||
```bash
|
||||
# Mutation testing only (v1 style)
|
||||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml
|
||||
|
||||
# With chaos (tool/LLM faults)
|
||||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos
|
||||
|
||||
# Chaos only (no mutations, golden prompts under chaos)
|
||||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only
|
||||
|
||||
# Built-in chaos profile
|
||||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage
|
||||
|
||||
# Behavioral contract × chaos matrix
|
||||
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml
|
||||
|
||||
# Contract score only (CI gate)
|
||||
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml
|
||||
|
||||
# Replay regression (one session)
|
||||
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml
|
||||
|
||||
# Export failures from a report as replay files
|
||||
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/
|
||||
|
||||
# Full CI run (mutation + contract + chaos + replay, overall weighted score)
|
||||
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5
|
||||
```
|
||||
|
||||
## 3. What this example demonstrates
|
||||
|
||||
| Feature | Config / usage |
|
||||
|--------|-----------------|
|
||||
| **Chaos** | `chaos.tool_faults` (503 with probability), `chaos.llm_faults` (truncated); `--chaos`, `--chaos-profile` |
|
||||
| **Contract** | `contract` with invariants (always-cite-source, completes, max-latency) and `chaos_matrix` (no-chaos, api-outage) |
|
||||
| **Replay** | `replays.sessions` with `file: replays/incident_001.yaml`; contract resolved by name "Research Agent Contract" |
|
||||
| **Scoring** | `scoring` weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in `flakestorm ci` |
|
||||
| **Reset** | `agent.reset_endpoint: http://localhost:8790/reset` for contract matrix isolation |
|
||||
|
||||
## 4. Config layout (v2.0)
|
||||
|
||||
- `version: "2.0"`
|
||||
- `agent` + `reset_endpoint`
|
||||
- `chaos` (tool_faults, llm_faults)
|
||||
- `contract` (invariants, chaos_matrix)
|
||||
- `replays.sessions` (file reference)
|
||||
- `scoring` (weights)
|
||||
|
||||
The agent is stateless except for a call counter; `/reset` clears it so contract cells stay isolated.
|
||||
72
examples/v2_research_agent/agent.py
Normal file
72
examples/v2_research_agent/agent.py
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
"""
|
||||
V2 Research Assistant Agent — Working example for Flakestorm v2.
|
||||
|
||||
A minimal HTTP agent that simulates a research assistant: it responds to queries
|
||||
and always cites a source (so behavioral contracts can be verified). Supports
|
||||
/reset for contract matrix isolation. Used to demonstrate:
|
||||
- flakestorm run (mutation testing)
|
||||
- flakestorm run --chaos / --chaos-profile (environment chaos)
|
||||
- flakestorm contract run (behavioral contract × chaos matrix)
|
||||
- flakestorm replay run (replay regression)
|
||||
- flakestorm ci (unified run with overall score)
|
||||
"""
|
||||
|
||||
import os
|
||||
from fastapi import FastAPI
|
||||
from pydantic import BaseModel
|
||||
|
||||
app = FastAPI(title="V2 Research Assistant Agent")
|
||||
|
||||
# In-memory state (cleared by /reset for contract isolation)
|
||||
_state = {"calls": 0}
|
||||
|
||||
|
||||
class InvokeRequest(BaseModel):
|
||||
"""Request body: prompt or input."""
|
||||
input: str | None = None
|
||||
prompt: str | None = None
|
||||
query: str | None = None
|
||||
|
||||
|
||||
class InvokeResponse(BaseModel):
|
||||
"""Response with result and optional metadata."""
|
||||
result: str
|
||||
source: str = "demo_knowledge_base"
|
||||
latency_ms: float | None = None
|
||||
|
||||
|
||||
@app.post("/reset")
|
||||
def reset():
|
||||
"""Reset agent state. Called by Flakestorm before each contract matrix cell."""
|
||||
_state["calls"] = 0
|
||||
return {"ok": True}
|
||||
|
||||
|
||||
@app.post("/invoke", response_model=InvokeResponse)
|
||||
def invoke(req: InvokeRequest):
|
||||
"""Handle a single user query. Always cites a source (contract invariant)."""
|
||||
_state["calls"] += 1
|
||||
text = req.input or req.prompt or req.query or ""
|
||||
if not text.strip():
|
||||
return InvokeResponse(
|
||||
result="I didn't receive a question. Please ask something.",
|
||||
source="none",
|
||||
)
|
||||
# Simulate a research response that cites a source (contract: always-cite-source)
|
||||
response = (
|
||||
f"According to [source: {_state['source']}], "
|
||||
f"here is what I found for your query: \"{text[:100]}\". "
|
||||
"Data may be incomplete when tools are degraded."
|
||||
)
|
||||
return InvokeResponse(result=response, source=_state["source"])
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
def health():
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
port = int(os.environ.get("PORT", "8790"))
|
||||
uvicorn.run(app, host="0.0.0.0", port=port)
|
||||
129
examples/v2_research_agent/flakestorm.yaml
Normal file
129
examples/v2_research_agent/flakestorm.yaml
Normal file
|
|
@ -0,0 +1,129 @@
|
|||
# Flakestorm v2.0 — Research Assistant Example
|
||||
# Demonstrates: mutation testing, chaos, behavioral contract, replay, ci
|
||||
|
||||
version: "2.0"
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Agent (HTTP). Start with: python agent.py (or uvicorn agent:app --port 8790)
|
||||
# -----------------------------------------------------------------------------
|
||||
agent:
|
||||
endpoint: "http://localhost:8790/invoke"
|
||||
type: "http"
|
||||
method: "POST"
|
||||
request_template: '{"input": "{prompt}"}'
|
||||
response_path: "result"
|
||||
timeout: 15000
|
||||
reset_endpoint: "http://localhost:8790/reset"
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Model (for mutation generation only)
|
||||
# -----------------------------------------------------------------------------
|
||||
model:
|
||||
provider: "ollama"
|
||||
name: "gemma3:1b"
|
||||
base_url: "http://localhost:11434"
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Mutations
|
||||
# -----------------------------------------------------------------------------
|
||||
mutations:
|
||||
count: 5
|
||||
types:
|
||||
- paraphrase
|
||||
- noise
|
||||
- tone_shift
|
||||
- prompt_injection
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Golden prompts
|
||||
# -----------------------------------------------------------------------------
|
||||
golden_prompts:
|
||||
- "What is the capital of France?"
|
||||
- "Summarize the benefits of renewable energy."
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Invariants (run invariants)
|
||||
# -----------------------------------------------------------------------------
|
||||
invariants:
|
||||
- type: latency
|
||||
max_ms: 30000
|
||||
- type: contains
|
||||
value: "source"
|
||||
- type: output_not_empty
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# V2: Environment Chaos (tool/LLM faults)
|
||||
# For HTTP agent, tool_faults with tool "*" apply to the single request to endpoint.
|
||||
# -----------------------------------------------------------------------------
|
||||
chaos:
|
||||
tool_faults:
|
||||
- tool: "*"
|
||||
mode: error
|
||||
error_code: 503
|
||||
message: "Service Unavailable"
|
||||
probability: 0.3
|
||||
llm_faults:
|
||||
- mode: truncated_response
|
||||
max_tokens: 5
|
||||
probability: 0.2
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# V2: Behavioral Contract + Chaos Matrix
|
||||
# -----------------------------------------------------------------------------
|
||||
contract:
|
||||
name: "Research Agent Contract"
|
||||
description: "Must cite source and complete under chaos"
|
||||
invariants:
|
||||
- id: always-cite-source
|
||||
type: regex
|
||||
pattern: "(?i)(source|according to)"
|
||||
severity: critical
|
||||
when: always
|
||||
description: "Must cite a source"
|
||||
- id: completes
|
||||
type: completes
|
||||
severity: high
|
||||
when: always
|
||||
description: "Must return a response"
|
||||
- id: max-latency
|
||||
type: latency
|
||||
max_ms: 60000
|
||||
severity: medium
|
||||
when: always
|
||||
chaos_matrix:
|
||||
- name: "no-chaos"
|
||||
tool_faults: []
|
||||
llm_faults: []
|
||||
- name: "api-outage"
|
||||
tool_faults:
|
||||
- tool: "*"
|
||||
mode: error
|
||||
error_code: 503
|
||||
message: "Service Unavailable"
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# V2: Replay regression (sessions can reference file or be inline)
|
||||
# -----------------------------------------------------------------------------
|
||||
replays:
|
||||
sessions:
|
||||
- file: "replays/incident_001.yaml"
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# V2: Scoring weights (overall = mutation*0.2 + chaos*0.35 + contract*0.35 + replay*0.1)
|
||||
# -----------------------------------------------------------------------------
|
||||
scoring:
|
||||
mutation: 0.20
|
||||
chaos: 0.35
|
||||
contract: 0.35
|
||||
replay: 0.10
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Output
|
||||
# -----------------------------------------------------------------------------
|
||||
output:
|
||||
format: "html"
|
||||
path: "./reports"
|
||||
|
||||
advanced:
|
||||
concurrency: 5
|
||||
retries: 2
|
||||
9
examples/v2_research_agent/replays/incident_001.yaml
Normal file
9
examples/v2_research_agent/replays/incident_001.yaml
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Replay session: production incident to regress
|
||||
# Run with: flakestorm replay run replays/incident_001.yaml -c flakestorm.yaml
|
||||
id: incident-001
|
||||
name: "Research agent incident - missing source"
|
||||
source: manual
|
||||
input: "What is the capital of France?"
|
||||
tool_responses: []
|
||||
expected_failure: "Agent returned response without citing source"
|
||||
contract: "Research Agent Contract"
|
||||
4
examples/v2_research_agent/requirements.txt
Normal file
4
examples/v2_research_agent/requirements.txt
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
# V2 Research Agent — run the example HTTP agent
|
||||
fastapi>=0.100.0
|
||||
uvicorn>=0.22.0
|
||||
pydantic>=2.0
|
||||
Loading…
Add table
Add a link
Reference in a new issue