mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 00:36:54 +02:00
76 lines
3.2 KiB
Markdown
76 lines
3.2 KiB
Markdown
# V2 Research Assistant — Flakestorm v2 Example
|
||
|
||
A **working** HTTP agent and v2.0 config that demonstrates all three V2 pillars: **Environment Chaos**, **Behavioral Contracts**, and **Replay-Based Regression**.
|
||
|
||
## Prerequisites
|
||
|
||
- Python 3.10+
|
||
- **Ollama** running with a model (e.g. `ollama pull gemma3:1b` then `ollama run gemma3:1b`). The agent calls Ollama to generate real LLM responses; Flakestorm uses the same Ollama for mutation generation.
|
||
- Dependencies: `pip install -r requirements.txt` (fastapi, uvicorn, pydantic, httpx)
|
||
|
||
## 1. Start the agent
|
||
|
||
From the project root or this directory:
|
||
|
||
```bash
|
||
cd examples/v2_research_agent
|
||
uvicorn agent:app --host 0.0.0.0 --port 8790
|
||
```
|
||
|
||
Or: `python agent.py` (uses port 8790 by default).
|
||
|
||
Verify: `curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"`
|
||
|
||
## 2. Run Flakestorm v2 commands
|
||
|
||
From the **project root** (so `flakestorm` and config paths resolve):
|
||
|
||
```bash
|
||
# Mutation testing only (v1 style)
|
||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml
|
||
|
||
# With chaos (tool/LLM faults)
|
||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos
|
||
|
||
# Chaos only (no mutations, golden prompts under chaos)
|
||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only
|
||
|
||
# Built-in chaos profile
|
||
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage
|
||
|
||
# Behavioral contract × chaos matrix
|
||
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml
|
||
|
||
# Contract score only (CI gate)
|
||
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml
|
||
|
||
# Replay regression (one session)
|
||
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml
|
||
|
||
# Export failures from a report as replay files
|
||
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/
|
||
|
||
# Full CI run (mutation + contract + chaos + replay, overall weighted score)
|
||
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5
|
||
```
|
||
|
||
## 3. What this example demonstrates
|
||
|
||
| Feature | Config / usage |
|
||
|--------|-----------------|
|
||
| **Chaos** | `chaos.tool_faults` (503 with probability), `chaos.llm_faults` (truncated); `--chaos`, `--chaos-profile` |
|
||
| **Contract** | `contract` with invariants (always-cite-source, completes, max-latency) and `chaos_matrix` (no-chaos, api-outage) |
|
||
| **Replay** | `replays.sessions` with `file: replays/incident_001.yaml`; contract resolved by name "Research Agent Contract" |
|
||
| **Scoring** | `scoring` weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in `flakestorm ci` |
|
||
| **Reset** | `agent.reset_endpoint: http://localhost:8790/reset` for contract matrix isolation |
|
||
|
||
## 4. Config layout (v2.0)
|
||
|
||
- `version: "2.0"`
|
||
- `agent` + `reset_endpoint`
|
||
- `chaos` (tool_faults, llm_faults)
|
||
- `contract` (invariants, chaos_matrix)
|
||
- `replays.sessions` (file reference)
|
||
- `scoring` (weights)
|
||
|
||
The agent calls **Ollama** (same model as in `flakestorm.yaml`: `gemma3:1b` by default). Set `OLLAMA_BASE_URL` or `OLLAMA_MODEL` if your Ollama runs elsewhere or uses a different model. The agent is stateless except for a call counter; `/reset` clears it so contract cells stay isolated.
|