mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-05-22 14:45:12 +02:00
77 lines
2.9 KiB
Markdown
77 lines
2.9 KiB
Markdown
|
|
# V2 Research Assistant — Flakestorm v2 Example
|
|||
|
|
|
|||
|
|
A **working** HTTP agent and v2.0 config that demonstrates all three V2 pillars: **Environment Chaos**, **Behavioral Contracts**, and **Replay-Based Regression**.
|
|||
|
|
|
|||
|
|
## Prerequisites
|
|||
|
|
|
|||
|
|
- Python 3.10+
|
|||
|
|
- Ollama running (for mutation generation): `ollama run gemma3:1b` or any model
|
|||
|
|
- Optional: `pip install fastapi uvicorn` (agent server)
|
|||
|
|
|
|||
|
|
## 1. Start the agent
|
|||
|
|
|
|||
|
|
From the project root or this directory:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cd examples/v2_research_agent
|
|||
|
|
uvicorn agent:app --host 0.0.0.0 --port 8790
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or: `python agent.py` (uses port 8790 by default).
|
|||
|
|
|
|||
|
|
Verify: `curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"`
|
|||
|
|
|
|||
|
|
## 2. Run Flakestorm v2 commands
|
|||
|
|
|
|||
|
|
From the **project root** (so `flakestorm` and config paths resolve):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Mutation testing only (v1 style)
|
|||
|
|
flakestorm run -c examples/v2_research_agent/flakestorm.yaml
|
|||
|
|
|
|||
|
|
# With chaos (tool/LLM faults)
|
|||
|
|
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos
|
|||
|
|
|
|||
|
|
# Chaos only (no mutations, golden prompts under chaos)
|
|||
|
|
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only
|
|||
|
|
|
|||
|
|
# Built-in chaos profile
|
|||
|
|
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage
|
|||
|
|
|
|||
|
|
# Behavioral contract × chaos matrix
|
|||
|
|
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml
|
|||
|
|
|
|||
|
|
# Contract score only (CI gate)
|
|||
|
|
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml
|
|||
|
|
|
|||
|
|
# Replay regression (one session)
|
|||
|
|
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml
|
|||
|
|
|
|||
|
|
# Export failures from a report as replay files
|
|||
|
|
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/
|
|||
|
|
|
|||
|
|
# Full CI run (mutation + contract + chaos + replay, overall weighted score)
|
|||
|
|
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 3. What this example demonstrates
|
|||
|
|
|
|||
|
|
| Feature | Config / usage |
|
|||
|
|
|--------|-----------------|
|
|||
|
|
| **Chaos** | `chaos.tool_faults` (503 with probability), `chaos.llm_faults` (truncated); `--chaos`, `--chaos-profile` |
|
|||
|
|
| **Contract** | `contract` with invariants (always-cite-source, completes, max-latency) and `chaos_matrix` (no-chaos, api-outage) |
|
|||
|
|
| **Replay** | `replays.sessions` with `file: replays/incident_001.yaml`; contract resolved by name "Research Agent Contract" |
|
|||
|
|
| **Scoring** | `scoring` weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in `flakestorm ci` |
|
|||
|
|
| **Reset** | `agent.reset_endpoint: http://localhost:8790/reset` for contract matrix isolation |
|
|||
|
|
|
|||
|
|
## 4. Config layout (v2.0)
|
|||
|
|
|
|||
|
|
- `version: "2.0"`
|
|||
|
|
- `agent` + `reset_endpoint`
|
|||
|
|
- `chaos` (tool_faults, llm_faults)
|
|||
|
|
- `contract` (invariants, chaos_matrix)
|
|||
|
|
- `replays.sessions` (file reference)
|
|||
|
|
- `scoring` (weights)
|
|||
|
|
|
|||
|
|
The agent is stateless except for a call counter; `/reset` clears it so contract cells stay isolated.
|