diff --git a/README.md b/README.md index 223471e..92aa790 100644 --- a/README.md +++ b/README.md @@ -192,12 +192,15 @@ agent: module: "my_agent:chain" ``` -## CI/CD Integration +## Local Testing -For local testing: +For local testing and validation: ```bash -# Run before committing (manual) +# Run with minimum score check flakestorm run --min-score 0.9 + +# Exit with error code if score is too low +flakestorm run --min-score 0.9 --ci ``` ## Robustness Score diff --git a/docs/CONFIGURATION_GUIDE.md b/docs/CONFIGURATION_GUIDE.md index 59c7872..ae73b7e 100644 --- a/docs/CONFIGURATION_GUIDE.md +++ b/docs/CONFIGURATION_GUIDE.md @@ -436,38 +436,6 @@ advanced: --- -## CI/CD Configuration - -For GitHub Actions: - -```yaml -# .github/workflows/reliability.yml -name: Agent Reliability - -on: [push, pull_request] - -jobs: - test: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - - name: Setup Ollama - run: | - curl -fsSL https://ollama.ai/install.sh | sh - ollama serve & - sleep 5 - ollama pull qwen3:8b - - - name: Install flakestorm - run: pip install flakestorm - - - name: Run Tests - run: flakestorm run --min-score 0.9 --ci -``` - ---- - ## Troubleshooting ### Common Issues diff --git a/docs/IMPLEMENTATION_CHECKLIST.md b/docs/IMPLEMENTATION_CHECKLIST.md index b5bbba4..cc0dcc4 100644 --- a/docs/IMPLEMENTATION_CHECKLIST.md +++ b/docs/IMPLEMENTATION_CHECKLIST.md @@ -14,7 +14,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia - [x] Create flakestorm.yaml.example template - [x] Set up project structure (src/flakestorm/*) - [x] Configure pre-commit hooks (black, ruff, mypy) -- [ ] Set up GitHub Actions for CI/CD #### Configuration System - [x] Define Pydantic models for configuration @@ -122,12 +121,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia - [x] Implement similarity calculation - [x] Add lazy model loading -#### GitHub Actions Integration -- [x] Create action.yml template -- [x] Create workflow example -- [x] Document CI/CD integration -- [ ] Publish to GitHub Marketplace - --- ### Testing & Quality @@ -155,114 +148,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia --- -## Cloud Version (Commercial) - -### Cloud Phase 1: Infrastructure (Week 9-10) - -#### Cloud Setup -- [ ] Set up AWS/GCP project -- [ ] Configure VPC and networking -- [ ] Set up PostgreSQL database -- [ ] Configure Redis for queue/cache -- [ ] Set up S3/GCS for storage -- [ ] Configure Docker/Kubernetes - -#### Database Schema -- [ ] Create users table -- [ ] Create test_configs table -- [ ] Create test_runs table -- [ ] Create subscriptions table -- [ ] Set up migrations (Alembic) - -#### Authentication -- [ ] Integrate Auth0/Clerk -- [ ] Implement JWT validation -- [ ] Create user management endpoints -- [ ] Add RBAC for team tier - ---- - -### Cloud Phase 2: Backend (Week 10-12) - -#### FastAPI Application -- [ ] Set up FastAPI project structure -- [ ] Implement auth middleware -- [ ] Create test management endpoints -- [ ] Create config management endpoints -- [ ] Create report endpoints -- [ ] Implement async job queue (Celery) - -#### Gemini Integration -- [ ] Create GeminiMutationService -- [ ] Implement mutation generation -- [ ] Add fallback to GPU models -- [ ] Rate limiting and retry logic - -#### Tier Limits -- [ ] Implement free tier limits (5 lifetime runs) -- [ ] Implement Pro tier limits (200/month) -- [ ] Implement Team tier limits (1000/month) -- [ ] Create usage tracking - ---- - -### Cloud Phase 3: Frontend (Week 12-14) - -#### Next.js Setup -- [ ] Initialize Next.js project -- [ ] Configure Tailwind CSS -- [ ] Set up authentication flow -- [ ] Create layout components - -#### Dashboard Pages -- [ ] Dashboard home (overview) -- [ ] Tests list and creation -- [ ] Reports viewer -- [ ] Billing management -- [ ] Team management (Team tier) -- [ ] Settings page - -#### Marketing Pages -- [ ] Landing page -- [ ] Pricing page -- [ ] Documentation -- [ ] Blog (optional) - ---- - -### Cloud Phase 4: Billing (Week 14-15) - -#### Stripe Integration -- [ ] Set up Stripe products/prices -- [ ] Implement subscription creation -- [ ] Handle subscription updates -- [ ] Implement webhook handlers -- [ ] Create invoice history - -#### Email Notifications -- [ ] Set up SendGrid/Mailgun -- [ ] Test failure alerts -- [ ] Subscription notifications -- [ ] Welcome emails - ---- - -### Cloud Phase 5: Testing & Launch (Week 15-16) - -#### Testing -- [ ] E2E tests with Cypress/Playwright -- [ ] Load testing -- [ ] Security audit -- [ ] Performance optimization - -#### Deployment -- [ ] Set up CI/CD pipeline -- [ ] Configure production environment -- [ ] Set up monitoring (Sentry, etc.) -- [ ] Launch to production - ---- - ## Progress Summary | Phase | Status | Completion | @@ -273,10 +158,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia | CLI Phase 4: CLI & Reporting | ✅ Complete | 100% | | CLI Phase 5: V2 Features | ✅ Complete | 90% | | Documentation | ✅ Complete | 100% | -| Cloud Phase 1: Infrastructure | ⏳ Pending | 0% | -| Cloud Phase 2: Backend | ⏳ Pending | 0% | -| Cloud Phase 3: Frontend | ⏳ Pending | 0% | -| Cloud Phase 4: Billing | ⏳ Pending | 0% | --- @@ -285,5 +166,4 @@ This document tracks the implementation progress of flakestorm - The Agent Relia 1. **Rust Build**: Compile and integrate Rust performance module 2. **Integration Tests**: Add full integration test suite 3. **PyPI Release**: Prepare and publish to PyPI -4. **Cloud Infrastructure**: Begin AWS/GCP setup -5. **Community Launch**: Publish to Hacker News and Reddit +4. **Community Launch**: Publish to Hacker News and Reddit diff --git a/docs/MODULES.md b/docs/MODULES.md index 5027806..e0223cd 100644 --- a/docs/MODULES.md +++ b/docs/MODULES.md @@ -61,8 +61,7 @@ flakestorm/ │ └── main.py # Typer CLI commands └── integrations/ # External integrations ├── huggingface.py # HuggingFace model support - ├── embeddings.py # Local embeddings - └── github_actions.py # CI/CD integration + └── embeddings.py # Local embeddings ``` --- diff --git a/docs/TEST_SCENARIOS.md b/docs/TEST_SCENARIOS.md index 512ca6d..0bf678a 100644 --- a/docs/TEST_SCENARIOS.md +++ b/docs/TEST_SCENARIOS.md @@ -680,15 +680,7 @@ open reports/entropix_report_*.html # ... # Re-run tests -flakestorm run --ci --min-score 0.9 -``` - -### Step 6: Add to CI/CD - -```yaml -# .github/workflows/test.yml -- name: Run flakestorm - run: flakestorm run --ci --min-score 0.85 +flakestorm run --min-score 0.9 ``` --- diff --git a/docs/USAGE_GUIDE.md b/docs/USAGE_GUIDE.md index 38e17ad..d0c464f 100644 --- a/docs/USAGE_GUIDE.md +++ b/docs/USAGE_GUIDE.md @@ -16,9 +16,8 @@ This comprehensive guide walks you through using flakestorm to test your AI agen 6. [Running Tests](#running-tests) 7. [Understanding Results](#understanding-results) 8. [Integration Patterns](#integration-patterns) -9. [CI/CD Integration](#cicd-integration) -10. [Advanced Usage](#advanced-usage) -11. [Troubleshooting](#troubleshooting) +9. [Advanced Usage](#advanced-usage) +10. [Troubleshooting](#troubleshooting) --- @@ -450,19 +449,6 @@ flakestorm run --quiet flakestorm run --verbose ``` -### CI/CD Mode - -```bash -# Fail if score < 0.8 -flakestorm run --ci --min-score 0.8 - -# Exit codes: -# 0 = Score meets threshold -# 1 = Score below threshold -# 2 = Configuration error -# 3 = Runtime error -``` - ### Individual Commands ```bash @@ -630,103 +616,6 @@ agent = AgentExecutor(...) --- -## CI/CD Integration - -### GitHub Actions - -Create `.github/workflows/flakestorm.yml`: - -```yaml -name: Agent Reliability Tests - -on: - push: - branches: [main] - pull_request: - branches: [main] - -jobs: - reliability-test: - runs-on: ubuntu-latest - - services: - ollama: - image: ollama/ollama - ports: - - 11434:11434 - - steps: - - uses: actions/checkout@v4 - - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: '3.11' - - - name: Install dependencies - run: | - pip install flakestorm - pip install -r requirements.txt - - - name: Pull Ollama model - run: | - curl -X POST http://localhost:11434/api/pull \ - -d '{"name": "qwen2.5-coder:7b"}' - - - name: Start agent - run: | - python -m my_agent & - sleep 5 # Wait for startup - - - name: Run flakestorm tests - run: | - flakestorm run --ci --min-score 0.8 --output json - - - name: Upload report - uses: actions/upload-artifact@v4 - if: always() - with: - name: flakestorm-report - path: reports/ -``` - -### GitLab CI - -```yaml -flakestorm-test: - image: python:3.11 - services: - - name: ollama/ollama - alias: ollama - variables: - OLLAMA_HOST: "http://ollama:11434" - script: - - pip install flakestorm - - flakestorm run --ci --min-score 0.8 - artifacts: - paths: - - reports/ - when: always -``` - -### Pre-commit Hook - -Add to `.pre-commit-config.yaml`: - -```yaml -repos: - - repo: local - hooks: - - id: flakestorm - name: flakestorm Agent Tests - entry: flakestorm run --ci --min-score 0.8 - language: system - pass_filenames: false - always_run: true -``` - ---- - ## Advanced Usage ### Custom Mutation Templates diff --git a/src/flakestorm/cli/main.py b/src/flakestorm/cli/main.py index 3a9689f..3a8d92e 100644 --- a/src/flakestorm/cli/main.py +++ b/src/flakestorm/cli/main.py @@ -118,12 +118,12 @@ def run( min_score: float | None = typer.Option( None, "--min-score", - help="Minimum score to pass (for CI/CD)", + help="Minimum score to pass", ), ci: bool = typer.Option( False, "--ci", - help="CI mode: exit with error if below min-score", + help="Exit with error code if score is below min-score", ), verify_only: bool = typer.Option( False,