Refactor documentation and remove CI/CD integration references

- Updated README.md to clarify local testing instructions and added error handling for low scores. - Removed CI/CD configuration details from CONFIGURATION_GUIDE.md and other documentation files. - Cleaned up MODULES.md by deleting references to the now-removed github_actions.py. - Streamlined TEST_SCENARIOS.md and USAGE_GUIDE.md by eliminating CI/CD related sections. - Adjusted CLI command help text in main.py for clarity on minimum score checks.
2026-04-25 00:36:54 +02:00 · 2025-12-30 16:03:42 +08:00 · 2025-12-30 16:03:42 +08:00 · 0c986e268a
commit 0c986e268a
parent 8d752e9746
7 changed files with 13 additions and 282 deletions
--- a/README.md
+++ b/README.md
@ -192,12 +192,15 @@ agent:
  module: "my_agent:chain"
 ```

-## CI/CD Integration
+## Local Testing

-For local testing:
+For local testing and validation:
 ```bash
-# Run before committing (manual)
+# Run with minimum score check
 flakestorm run --min-score 0.9
+
+# Exit with error code if score is too low
+flakestorm run --min-score 0.9 --ci
 ```

 ## Robustness Score
--- a/docs/CONFIGURATION_GUIDE.md
+++ b/docs/CONFIGURATION_GUIDE.md
@ -436,38 +436,6 @@ advanced:

 ---

-## CI/CD Configuration
-
-For GitHub Actions:
-
-```yaml
-# .github/workflows/reliability.yml
-name: Agent Reliability
-
-on: [push, pull_request]
-
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Setup Ollama
-        run: |
-          curl -fsSL https://ollama.ai/install.sh | sh
-          ollama serve &
-          sleep 5
-          ollama pull qwen3:8b
-
-      - name: Install flakestorm
-        run: pip install flakestorm
-
-      - name: Run Tests
-        run: flakestorm run --min-score 0.9 --ci
-```
-
---
-
 ## Troubleshooting

 ### Common Issues
--- a/docs/IMPLEMENTATION_CHECKLIST.md
+++ b/docs/IMPLEMENTATION_CHECKLIST.md
@ -14,7 +14,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
 - [x] Create flakestorm.yaml.example template
 - [x] Set up project structure (src/flakestorm/*)
 - [x] Configure pre-commit hooks (black, ruff, mypy)
- [ ] Set up GitHub Actions for CI/CD

 #### Configuration System
 - [x] Define Pydantic models for configuration
@ -122,12 +121,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
 - [x] Implement similarity calculation
 - [x] Add lazy model loading

-#### GitHub Actions Integration
- [x] Create action.yml template
- [x] Create workflow example
- [x] Document CI/CD integration
- [ ] Publish to GitHub Marketplace
-
 ---

 ### Testing & Quality
@ -155,114 +148,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia

 ---

-## Cloud Version (Commercial)
-
-### Cloud Phase 1: Infrastructure (Week 9-10)
-
-#### Cloud Setup
- [ ] Set up AWS/GCP project
- [ ] Configure VPC and networking
- [ ] Set up PostgreSQL database
- [ ] Configure Redis for queue/cache
- [ ] Set up S3/GCS for storage
- [ ] Configure Docker/Kubernetes
-
-#### Database Schema
- [ ] Create users table
- [ ] Create test_configs table
- [ ] Create test_runs table
- [ ] Create subscriptions table
- [ ] Set up migrations (Alembic)
-
-#### Authentication
- [ ] Integrate Auth0/Clerk
- [ ] Implement JWT validation
- [ ] Create user management endpoints
- [ ] Add RBAC for team tier
-
---
-
-### Cloud Phase 2: Backend (Week 10-12)
-
-#### FastAPI Application
- [ ] Set up FastAPI project structure
- [ ] Implement auth middleware
- [ ] Create test management endpoints
- [ ] Create config management endpoints
- [ ] Create report endpoints
- [ ] Implement async job queue (Celery)
-
-#### Gemini Integration
- [ ] Create GeminiMutationService
- [ ] Implement mutation generation
- [ ] Add fallback to GPU models
- [ ] Rate limiting and retry logic
-
-#### Tier Limits
- [ ] Implement free tier limits (5 lifetime runs)
- [ ] Implement Pro tier limits (200/month)
- [ ] Implement Team tier limits (1000/month)
- [ ] Create usage tracking
-
---
-
-### Cloud Phase 3: Frontend (Week 12-14)
-
-#### Next.js Setup
- [ ] Initialize Next.js project
- [ ] Configure Tailwind CSS
- [ ] Set up authentication flow
- [ ] Create layout components
-
-#### Dashboard Pages
- [ ] Dashboard home (overview)
- [ ] Tests list and creation
- [ ] Reports viewer
- [ ] Billing management
- [ ] Team management (Team tier)
- [ ] Settings page
-
-#### Marketing Pages
- [ ] Landing page
- [ ] Pricing page
- [ ] Documentation
- [ ] Blog (optional)
-
---
-
-### Cloud Phase 4: Billing (Week 14-15)
-
-#### Stripe Integration
- [ ] Set up Stripe products/prices
- [ ] Implement subscription creation
- [ ] Handle subscription updates
- [ ] Implement webhook handlers
- [ ] Create invoice history
-
-#### Email Notifications
- [ ] Set up SendGrid/Mailgun
- [ ] Test failure alerts
- [ ] Subscription notifications
- [ ] Welcome emails
-
---
-
-### Cloud Phase 5: Testing & Launch (Week 15-16)
-
-#### Testing
- [ ] E2E tests with Cypress/Playwright
- [ ] Load testing
- [ ] Security audit
- [ ] Performance optimization
-
-#### Deployment
- [ ] Set up CI/CD pipeline
- [ ] Configure production environment
- [ ] Set up monitoring (Sentry, etc.)
- [ ] Launch to production
-
---
-
 ## Progress Summary

 | Phase | Status | Completion |
@ -273,10 +158,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
 | CLI Phase 4: CLI & Reporting | ✅ Complete | 100% |
 | CLI Phase 5: V2 Features | ✅ Complete | 90% |
 | Documentation | ✅ Complete | 100% |
-| Cloud Phase 1: Infrastructure | ⏳ Pending | 0% |
-| Cloud Phase 2: Backend | ⏳ Pending | 0% |
-| Cloud Phase 3: Frontend | ⏳ Pending | 0% |
-| Cloud Phase 4: Billing | ⏳ Pending | 0% |

 ---

@ -285,5 +166,4 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
 1. **Rust Build**: Compile and integrate Rust performance module
 2. **Integration Tests**: Add full integration test suite
 3. **PyPI Release**: Prepare and publish to PyPI
-4. **Cloud Infrastructure**: Begin AWS/GCP setup
-5. **Community Launch**: Publish to Hacker News and Reddit
+4. **Community Launch**: Publish to Hacker News and Reddit
--- a/docs/MODULES.md
+++ b/docs/MODULES.md
@ -61,8 +61,7 @@ flakestorm/
 │   └── main.py             # Typer CLI commands
 └── integrations/            # External integrations
    ├── huggingface.py      # HuggingFace model support
-    ├── embeddings.py       # Local embeddings
-    └── github_actions.py   # CI/CD integration
+    └── embeddings.py       # Local embeddings
 ```

 ---
--- a/docs/TEST_SCENARIOS.md
+++ b/docs/TEST_SCENARIOS.md
@ -680,15 +680,7 @@ open reports/entropix_report_*.html
 # ...

 # Re-run tests
-flakestorm run --ci --min-score 0.9
-```
-
-### Step 6: Add to CI/CD
-
-```yaml
-# .github/workflows/test.yml
- name: Run flakestorm
-  run: flakestorm run --ci --min-score 0.85
+flakestorm run --min-score 0.9
 ```

 ---
--- a/docs/USAGE_GUIDE.md
+++ b/docs/USAGE_GUIDE.md
@ -16,9 +16,8 @@ This comprehensive guide walks you through using flakestorm to test your AI agen
 6. [Running Tests](#running-tests)
 7. [Understanding Results](#understanding-results)
 8. [Integration Patterns](#integration-patterns)
-9. [CI/CD Integration](#cicd-integration)
-10. [Advanced Usage](#advanced-usage)
-11. [Troubleshooting](#troubleshooting)
+9. [Advanced Usage](#advanced-usage)
+10. [Troubleshooting](#troubleshooting)

 ---

@ -450,19 +449,6 @@ flakestorm run --quiet
 flakestorm run --verbose
 ```

-### CI/CD Mode
-
-```bash
-# Fail if score < 0.8
-flakestorm run --ci --min-score 0.8
-
-# Exit codes:
-# 0 = Score meets threshold
-# 1 = Score below threshold
-# 2 = Configuration error
-# 3 = Runtime error
-```
-
 ### Individual Commands

 ```bash
@ -630,103 +616,6 @@ agent = AgentExecutor(...)

 ---

-## CI/CD Integration
-
-### GitHub Actions
-
-Create `.github/workflows/flakestorm.yml`:
-
-```yaml
-name: Agent Reliability Tests
-
-on:
-  push:
-    branches: [main]
-  pull_request:
-    branches: [main]
-
-jobs:
-  reliability-test:
-    runs-on: ubuntu-latest
-
-    services:
-      ollama:
-        image: ollama/ollama
-        ports:
-          - 11434:11434
-
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - name: Install dependencies
-        run: |
-          pip install flakestorm
-          pip install -r requirements.txt
-
-      - name: Pull Ollama model
-        run: |
-          curl -X POST http://localhost:11434/api/pull \
-            -d '{"name": "qwen2.5-coder:7b"}'
-
-      - name: Start agent
-        run: |
-          python -m my_agent &
-          sleep 5  # Wait for startup
-
-      - name: Run flakestorm tests
-        run: |
-          flakestorm run --ci --min-score 0.8 --output json
-
-      - name: Upload report
-        uses: actions/upload-artifact@v4
-        if: always()
-        with:
-          name: flakestorm-report
-          path: reports/
-```
-
-### GitLab CI
-
-```yaml
-flakestorm-test:
-  image: python:3.11
-  services:
-    - name: ollama/ollama
-      alias: ollama
-  variables:
-    OLLAMA_HOST: "http://ollama:11434"
-  script:
-    - pip install flakestorm
-    - flakestorm run --ci --min-score 0.8
-  artifacts:
-    paths:
-      - reports/
-    when: always
-```
-
-### Pre-commit Hook
-
-Add to `.pre-commit-config.yaml`:
-
-```yaml
-repos:
-  - repo: local
-    hooks:
-      - id: flakestorm
-        name: flakestorm Agent Tests
-        entry: flakestorm run --ci --min-score 0.8
-        language: system
-        pass_filenames: false
-        always_run: true
-```
-
---
-
 ## Advanced Usage

 ### Custom Mutation Templates
--- a/src/flakestorm/cli/main.py
+++ b/src/flakestorm/cli/main.py
@ -118,12 +118,12 @@ def run(
    min_score: float | None = typer.Option(
        None,
        "--min-score",
-        help="Minimum score to pass (for CI/CD)",
+        help="Minimum score to pass",
    ),
    ci: bool = typer.Option(
        False,
        "--ci",
-        help="CI mode: exit with error if below min-score",
+        help="Exit with error code if score is below min-score",
    ),
    verify_only: bool = typer.Option(
        False,