Update documentation and configuration for Flakestorm V2, enhancing clarity on CI processes, report generation, and reproducibility features. Added details on the new --output option for saving reports, clarified the use of --min-score, and improved descriptions of the seed configuration for deterministic runs. Updated README and usage guides to reflect these changes and ensure comprehensive understanding of the CI pipeline and report outputs.

2026-04-25 00:36:54 +02:00 · 2026-03-12 20:05:51 +08:00 · 2026-03-12 20:05:51 +08:00 · f4d45d4053
commit f4d45d4053
parent 4a13425f8a
14 changed files with 356 additions and 49 deletions
--- a/docs/API_SPECIFICATION.md
+++ b/docs/API_SPECIFICATION.md
@ -538,13 +538,24 @@ flakestorm replay export --from-report FILE  # Export from an existing report

 ### V2: `flakestorm ci`

-Run full CI pipeline: mutation run, contract run (if configured), chaos-only (if chaos configured), replay (if configured); then compute overall weighted score from `scoring.weights`.
+Run full CI pipeline: mutation run, contract run (if configured), chaos-only (if chaos configured), replay (if configured); then compute overall weighted score from `scoring.weights`. Writes a **CI summary report** (e.g. `flakestorm-ci-report.html`) with per-phase scores and **"View detailed report"** links to phase-specific reports (mutation, contract, chaos, replay). Contract phase PASS/FAIL in the summary matches the contract detailed report (FAIL if any critical invariant fails).

 ```bash
 flakestorm ci
 flakestorm ci --config custom.yaml
+flakestorm ci --min-score 0.5              # Fail if overall score below 0.5
+flakestorm ci --output ./reports            # Save summary + detailed reports to directory
+flakestorm ci --output report.html          # Save summary report to file
+flakestorm ci --quiet                       # Minimal output, no progress bars
 ```

+| Option | Description |
+|--------|-------------|
+| `--config`, `-c` | Config file path (default: `flakestorm.yaml`) |
+| `--min-score` | Minimum overall (weighted) score to pass (default: 0.0) |
+| `--output`, `-o` | Path to save reports: directory (creates `flakestorm-ci-report.html` + phase reports) or HTML file path |
+| `--quiet`, `-q` | Minimal output, no progress bars |
+
 ---

 ## Environment Variables
--- a/docs/CONFIGURATION_GUIDE.md
+++ b/docs/CONFIGURATION_GUIDE.md
@ -960,7 +960,7 @@ advanced:
 |--------|------|---------|-------------|
 | `concurrency` | integer | `10` | Max concurrent agent requests (1-100) |
 | `retries` | integer | `2` | Retry failed requests (0-5) |
-| `seed` | integer | null | Random seed for reproducibility |
+| `seed` | integer | null | **Reproducible runs:** when set, Python's random is seeded (chaos behavior fixed) and the mutation-generation LLM uses temperature=0 so the same config yields the same results run-to-run. Omit for exploratory, varying runs. |

 ---

--- a/docs/DEVELOPER_FAQ.md
+++ b/docs/DEVELOPER_FAQ.md
@ -107,7 +107,7 @@ This separation allows:

 ### Q: What does `flakestorm ci` run?

-**A:** It runs, in order: (1) mutation run (with chaos if configured), (2) contract run if `contract` + `chaos_matrix` are configured, (3) chaos-only run if chaos is configured, (4) replay run if `replays` is configured. Then it computes an **overall weighted score** from `scoring.weights` (mutation, chaos, contract, replay); weights must sum to 1.0. Default weights: mutation 0.20, chaos 0.35, contract 0.35, replay 0.10.
+**A:** It runs, in order: (1) mutation run (with chaos if configured), (2) contract run if `contract` + `chaos_matrix` are configured, (3) chaos-only run if chaos is configured, (4) replay run if `replays` is configured. Then it computes an **overall weighted score** from `scoring.weights` (mutation, chaos, contract, replay); weights must sum to 1.0. Default weights: mutation 0.20, chaos 0.35, contract 0.35, replay 0.10. It also writes a **CI summary report** (e.g. `flakestorm-ci-report.html`) with per-phase scores and links to **detailed reports** (mutation, contract, chaos, replay). Contract phase PASS/FAIL in the summary matches the contract detailed report (FAIL if any critical invariant fails). Use `--output` to control where reports are saved and `--min-score` for the overall pass threshold.

 ---

--- a/docs/USAGE_GUIDE.md
+++ b/docs/USAGE_GUIDE.md
@ -76,7 +76,7 @@ With **`version: "2.0"`** in your config, Flakestorm adds environment chaos, beh
 | **Behavioral contracts** | Contracts (invariants × severity) × chaos matrix scenarios; each cell is an independent run (optional reset per cell). | **Resilience score** (0–100%). Use `flakestorm contract run`. Per-contract formula: weighted by severity (critical×3, high×2, medium×1); **auto-FAIL** if any critical fails. |
 | **Replay regression** | Replay saved sessions (e.g. production incidents) and verify against a contract. | Per-session pass/fail; **replay regression** score when run via CI. Use `flakestorm replay run [path]`. |

-**Unified CI:** `flakestorm ci` runs mutation run, contract run (if configured), chaos-only run (if chaos configured), and all replay sessions; then computes an **overall resilience score** from `scoring.weights` (default: mutation 0.20, chaos 0.35, contract 0.35, replay 0.10). Weights must sum to 1.0.
+**Unified CI:** `flakestorm ci` runs mutation run, contract run (if configured), chaos-only run (if chaos configured), and all replay sessions; then computes an **overall resilience score** from `scoring.weights` (default: mutation 0.20, chaos 0.35, contract 0.35, replay 0.10). Weights must sum to 1.0. It writes a **CI summary report** (e.g. `flakestorm-ci-report.html`) with per-phase scores and links to **detailed reports** (mutation, contract, chaos, replay). Contract PASS/FAIL in the summary matches the contract detailed report (FAIL if any critical invariant fails). Use `--output DIR` or `--output report.html` and `--min-score N`.

 **Reports:** Use `flakestorm contract run --output report.html` and `flakestorm replay run --output report.html` to save HTML reports; both include **suggested actions** for failed cells or sessions (e.g. add reset_endpoint, tighten invariants). Replay accepts a single session file or a directory: `flakestorm replay run path/to/session.yaml` or `flakestorm replay run path/to/replays/`.

@ -1858,6 +1858,22 @@ advanced:
  retries: 3      # Retry failed requests 3 times
 ```

+### Reproducible Runs
+
+By default, mutation generation (LLM) and chaos (e.g. fault triggers, payload choice) can vary between runs, so scores may differ. For **deterministic, reproducible runs** (e.g. CI or regression checks), set a **random seed** in config:
+
+```yaml
+advanced:
+  seed: 42   # Same config → same mutations and chaos → same scores
+```
+
+When `advanced.seed` is set:
+
+- **Python random** is seeded at run start, so chaos behavior (which faults trigger, which payloads) is fixed.
+- The **mutation-generation LLM** uses temperature=0, so the same golden prompts produce the same mutations each run.
+
+Use a fixed seed when you need comparable run-to-run results; omit it for exploratory testing where variation is acceptable.
+
 ### Golden Prompt Guide

 A comprehensive guide to creating effective golden prompts for your agent.