mirror of https://github.com/flakestorm/flakestorm.git synced 2026-04-25 00:36:54 +02:00

Francisco M Humarang Jr. f4d45d4053 Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new `--output` option for saving reports, clarified the use of `--min-score`, and improved descriptions of the `seed` configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs.		2026-03-12 20:05:51 +08:00
..
replays	Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.	2026-03-06 23:33:21 +08:00
agent.py	Add Flakestorm V2 PRD Plan and enhance v2 research agent to utilize Ollama for LLM responses. Updated README and requirements for dependencies. Implemented LLM API key support and chaos features in the agent.	2026-03-07 00:31:26 +08:00
flakestorm.yaml	Update version to 2.0.0 and enhance chaos engineering features in Flakestorm. Added support for environment chaos, behavioral contracts, and replay regression. Expanded documentation and improved scoring mechanisms. Updated .gitignore to include new documentation files.	2026-03-06 23:33:21 +08:00
README.md	Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new `--output` option for saving reports, clarified the use of `--min-score`, and improved descriptions of the `seed` configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs.	2026-03-12 20:05:51 +08:00
requirements.txt	Add Flakestorm V2 PRD Plan and enhance v2 research agent to utilize Ollama for LLM responses. Updated README and requirements for dependencies. Implemented LLM API key support and chaos features in the agent.	2026-03-07 00:31:26 +08:00

README.md

V2 Research Assistant — Flakestorm v2 Example

A working HTTP agent and v2.0 config that demonstrates all three V2 pillars: Environment Chaos, Behavioral Contracts, and Replay-Based Regression.

Prerequisites

Python 3.10+
Ollama running with a model (e.g. ollama pull gemma3:1b then ollama run gemma3:1b). The agent calls Ollama to generate real LLM responses; Flakestorm uses the same Ollama for mutation generation.
Dependencies: pip install -r requirements.txt (fastapi, uvicorn, pydantic, httpx)

1. Start the agent

From the project root or this directory:

cd examples/v2_research_agent
uvicorn agent:app --host 0.0.0.0 --port 8790

Or: python agent.py (uses port 8790 by default).

Verify: curl -X POST http://localhost:8790/invoke -H "Content-Type: application/json" -d "{\"input\": \"Hello\"}"

2. Run Flakestorm v2 commands

From the project root (so flakestorm and config paths resolve):

# Mutation testing only (v1 style)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml

# With chaos (tool/LLM faults)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos

# Chaos only (no mutations, golden prompts under chaos)
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-only

# Built-in chaos profile
flakestorm run -c examples/v2_research_agent/flakestorm.yaml --chaos-profile api_outage

# Behavioral contract × chaos matrix
flakestorm contract run -c examples/v2_research_agent/flakestorm.yaml

# Contract score only (CI gate)
flakestorm contract score -c examples/v2_research_agent/flakestorm.yaml

# Replay regression (one session)
flakestorm replay run examples/v2_research_agent/replays/incident_001.yaml -c examples/v2_research_agent/flakestorm.yaml

# Export failures from a report as replay files
flakestorm replay export --from-report reports/report.json -o examples/v2_research_agent/replays/

# Full CI run (mutation + contract + chaos + replay, overall weighted score)
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml --min-score 0.5

# CI with reports: summary + detailed phase reports (mutation, contract, chaos, replay)
flakestorm ci -c examples/v2_research_agent/flakestorm.yaml -o ./reports --min-score 0.5

3. What this example demonstrates

Feature	Config / usage
Chaos	`chaos.tool_faults` (503 with probability), `chaos.llm_faults` (truncated); `--chaos`, `--chaos-profile`
Contract	`contract` with invariants (always-cite-source, completes, max-latency) and `chaos_matrix` (no-chaos, api-outage)
Replay	`replays.sessions` with `file: replays/incident_001.yaml`; contract resolved by name "Research Agent Contract"
Scoring	`scoring` weights (mutation 20%, chaos 35%, contract 35%, replay 10%); used in `flakestorm ci`
Reset	`agent.reset_endpoint: http://localhost:8790/reset` for contract matrix isolation
Reproducibility	Set `advanced.seed` (e.g. `42`) for deterministic chaos and mutation generation; same config → same scores.

4. Config layout (v2.0)

version: "2.0"
agent + reset_endpoint
chaos (tool_faults, llm_faults)
contract (invariants, chaos_matrix)
replays.sessions (file reference)
scoring (weights)

The agent calls Ollama (same model as in flakestorm.yaml: gemma3:1b by default). Set OLLAMA_BASE_URL or OLLAMA_MODEL if your Ollama runs elsewhere or uses a different model. The agent is stateless except for a call counter; /reset clears it so contract cells stay isolated.

README.md Unescape Escape