flakestorm/examples/v2_research_agent
2026-03-06 23:33:21 +08:00
..
replays Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
agent.py Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
flakestorm.yaml Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
README.md Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00
requirements.txt Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files. 2026-03-06 23:33:21 +08:00

V2 Research Assistant — Flakestorm v2 Example

A working HTTP agent and v2.0 config that demonstrates all three V2 pillars: Environment Chaos, Behavioral Contracts, and Replay-Based Regression.

Prerequisites

  • Python 3.10+
  • Ollama running (for mutation generation): ollama run gemma3:1b or any model
  • Optional: pip install fastapi uvicorn (agent server)

1. Start the agent

From the project root or this directory:

cd examples/v2_research_agent
uvicorn agent:app --host 0.0.0.0 --port 8790

Or: python agent.py (uses port 8790 by default).

Verify: curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"

2. Run Flakestorm v2 commands

From the project root (so flakestorm and config paths resolve):

# Mutation testing only (v1 style)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml

# With chaos (tool/LLM faults)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos

# Chaos only (no mutations, golden prompts under chaos)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only

# Built-in chaos profile
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage

# Behavioral contract × chaos matrix
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml

# Contract score only (CI gate)
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml

# Replay regression (one session)
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml

# Export failures from a report as replay files
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/

# Full CI run (mutation + contract + chaos + replay, overall weighted score)
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5

3. What this example demonstrates

Feature Config / usage
Chaos chaos.tool_faults (503 with probability), chaos.llm_faults (truncated); --chaos, --chaos-profile
Contract contract with invariants (always-cite-source, completes, max-latency) and chaos_matrix (no-chaos, api-outage)
Replay replays.sessions with file: replays/incident_001.yaml; contract resolved by name "Research Agent Contract"
Scoring scoring weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in flakestorm ci
Reset agent.reset_endpoint: http://localhost:8790/reset for contract matrix isolation

4. Config layout (v2.0)

  • version: "2.0"
  • agent + reset_endpoint
  • chaos (tool_faults, llm_faults)
  • contract (invariants, chaos_matrix)
  • replays.sessions (file reference)
  • scoring (weights)

The agent is stateless except for a call counter; /reset clears it so contract cells stay isolated.