V2 Research Assistant — Flakestorm v2 Example

A working HTTP agent and v2.0 config that demonstrates all three V2 pillars: Environment Chaos, Behavioral Contracts, and Replay-Based Regression.

Prerequisites

Python 3.10+
Ollama running with a model (e.g. ollama pull gemma3:1b then ollama run gemma3:1b). The agent calls Ollama to generate real LLM responses; Flakestorm uses the same Ollama for mutation generation.
Dependencies: pip install -r requirements.txt (fastapi, uvicorn, pydantic, httpx)

1. Start the agent

From the project root or this directory:

cd examples/v2_research_agent
uvicorn agent:app --host 0.0.0.0 --port 8790

Or: python agent.py (uses port 8790 by default).

Verify: curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"

2. Run Flakestorm v2 commands

From the project root (so flakestorm and config paths resolve):

# Mutation testing only (v1 style)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml

# With chaos (tool/LLM faults)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos

# Chaos only (no mutations, golden prompts under chaos)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only

# Built-in chaos profile
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage

# Behavioral contract × chaos matrix
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml

# Contract score only (CI gate)
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml

# Replay regression (one session)
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml

# Export failures from a report as replay files
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/

# Full CI run (mutation + contract + chaos + replay, overall weighted score)
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5

3. What this example demonstrates

Feature	Config / usage
Chaos	`chaos.tool_faults` (503 with probability), `chaos.llm_faults` (truncated); `--chaos`, `--chaos-profile`
Contract	`contract` with invariants (always-cite-source, completes, max-latency) and `chaos_matrix` (no-chaos, api-outage)
Replay	`replays.sessions` with `file: replays/incident_001.yaml`; contract resolved by name "Research Agent Contract"
Scoring	`scoring` weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in `flakestorm ci`
Reset	`agent.reset_endpoint: http://localhost:8790/reset` for contract matrix isolation

4. Config layout (v2.0)

version: "2.0"
agent + reset_endpoint
chaos (tool_faults, llm_faults)
contract (invariants, chaos_matrix)
replays.sessions (file reference)
scoring (weights)

The agent calls Ollama (same model as in flakestorm.yaml: gemma3:1b by default). Set OLLAMA_BASE_URL or OLLAMA_MODEL if your Ollama runs elsewhere or uses a different model. The agent is stateless except for a call counter; /reset clears it so contract cells stay isolated.

3.2 KiB Raw Blame History Unescape Escape