mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 00:36:54 +02:00
3.2 KiB
3.2 KiB
V2 Research Assistant — Flakestorm v2 Example
A working HTTP agent and v2.0 config that demonstrates all three V2 pillars: Environment Chaos, Behavioral Contracts, and Replay-Based Regression.
Prerequisites
- Python 3.10+
- Ollama running with a model (e.g.
ollama pull gemma3:1bthenollama run gemma3:1b). The agent calls Ollama to generate real LLM responses; Flakestorm uses the same Ollama for mutation generation. - Dependencies:
pip install -r requirements.txt(fastapi, uvicorn, pydantic, httpx)
1. Start the agent
From the project root or this directory:
cd examples/v2_research_agent
uvicorn agent:app --host 0.0.0.0 --port 8790
Or: python agent.py (uses port 8790 by default).
Verify: curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"
2. Run Flakestorm v2 commands
From the project root (so flakestorm and config paths resolve):
# Mutation testing only (v1 style)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml
# With chaos (tool/LLM faults)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos
# Chaos only (no mutations, golden prompts under chaos)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only
# Built-in chaos profile
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage
# Behavioral contract × chaos matrix
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml
# Contract score only (CI gate)
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml
# Replay regression (one session)
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml
# Export failures from a report as replay files
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/
# Full CI run (mutation + contract + chaos + replay, overall weighted score)
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5
3. What this example demonstrates
| Feature | Config / usage |
|---|---|
| Chaos | chaos.tool_faults (503 with probability), chaos.llm_faults (truncated); --chaos, --chaos-profile |
| Contract | contract with invariants (always-cite-source, completes, max-latency) and chaos_matrix (no-chaos, api-outage) |
| Replay | replays.sessions with file: replays/incident_001.yaml; contract resolved by name "Research Agent Contract" |
| Scoring | scoring weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in flakestorm ci |
| Reset | agent.reset_endpoint: http://localhost:8790/reset for contract matrix isolation |
4. Config layout (v2.0)
version: "2.0"agent+reset_endpointchaos(tool_faults, llm_faults)contract(invariants, chaos_matrix)replays.sessions(file reference)scoring(weights)
The agent calls Ollama (same model as in flakestorm.yaml: gemma3:1b by default). Set OLLAMA_BASE_URL or OLLAMA_MODEL if your Ollama runs elsewhere or uses a different model. The agent is stateless except for a call counter; /reset clears it so contract cells stay isolated.