mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 00:36:54 +02:00
Refactor documentation and remove CI/CD integration references
- Updated README.md to clarify local testing instructions and added error handling for low scores. - Removed CI/CD configuration details from CONFIGURATION_GUIDE.md and other documentation files. - Cleaned up MODULES.md by deleting references to the now-removed github_actions.py. - Streamlined TEST_SCENARIOS.md and USAGE_GUIDE.md by eliminating CI/CD related sections. - Adjusted CLI command help text in main.py for clarity on minimum score checks.
This commit is contained in:
parent
8d752e9746
commit
0c986e268a
7 changed files with 13 additions and 282 deletions
|
|
@ -192,12 +192,15 @@ agent:
|
|||
module: "my_agent:chain"
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
## Local Testing
|
||||
|
||||
For local testing:
|
||||
For local testing and validation:
|
||||
```bash
|
||||
# Run before committing (manual)
|
||||
# Run with minimum score check
|
||||
flakestorm run --min-score 0.9
|
||||
|
||||
# Exit with error code if score is too low
|
||||
flakestorm run --min-score 0.9 --ci
|
||||
```
|
||||
|
||||
## Robustness Score
|
||||
|
|
|
|||
|
|
@ -436,38 +436,6 @@ advanced:
|
|||
|
||||
---
|
||||
|
||||
## CI/CD Configuration
|
||||
|
||||
For GitHub Actions:
|
||||
|
||||
```yaml
|
||||
# .github/workflows/reliability.yml
|
||||
name: Agent Reliability
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Ollama
|
||||
run: |
|
||||
curl -fsSL https://ollama.ai/install.sh | sh
|
||||
ollama serve &
|
||||
sleep 5
|
||||
ollama pull qwen3:8b
|
||||
|
||||
- name: Install flakestorm
|
||||
run: pip install flakestorm
|
||||
|
||||
- name: Run Tests
|
||||
run: flakestorm run --min-score 0.9 --ci
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
|
|
|||
|
|
@ -14,7 +14,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
|
|||
- [x] Create flakestorm.yaml.example template
|
||||
- [x] Set up project structure (src/flakestorm/*)
|
||||
- [x] Configure pre-commit hooks (black, ruff, mypy)
|
||||
- [ ] Set up GitHub Actions for CI/CD
|
||||
|
||||
#### Configuration System
|
||||
- [x] Define Pydantic models for configuration
|
||||
|
|
@ -122,12 +121,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
|
|||
- [x] Implement similarity calculation
|
||||
- [x] Add lazy model loading
|
||||
|
||||
#### GitHub Actions Integration
|
||||
- [x] Create action.yml template
|
||||
- [x] Create workflow example
|
||||
- [x] Document CI/CD integration
|
||||
- [ ] Publish to GitHub Marketplace
|
||||
|
||||
---
|
||||
|
||||
### Testing & Quality
|
||||
|
|
@ -155,114 +148,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
|
|||
|
||||
---
|
||||
|
||||
## Cloud Version (Commercial)
|
||||
|
||||
### Cloud Phase 1: Infrastructure (Week 9-10)
|
||||
|
||||
#### Cloud Setup
|
||||
- [ ] Set up AWS/GCP project
|
||||
- [ ] Configure VPC and networking
|
||||
- [ ] Set up PostgreSQL database
|
||||
- [ ] Configure Redis for queue/cache
|
||||
- [ ] Set up S3/GCS for storage
|
||||
- [ ] Configure Docker/Kubernetes
|
||||
|
||||
#### Database Schema
|
||||
- [ ] Create users table
|
||||
- [ ] Create test_configs table
|
||||
- [ ] Create test_runs table
|
||||
- [ ] Create subscriptions table
|
||||
- [ ] Set up migrations (Alembic)
|
||||
|
||||
#### Authentication
|
||||
- [ ] Integrate Auth0/Clerk
|
||||
- [ ] Implement JWT validation
|
||||
- [ ] Create user management endpoints
|
||||
- [ ] Add RBAC for team tier
|
||||
|
||||
---
|
||||
|
||||
### Cloud Phase 2: Backend (Week 10-12)
|
||||
|
||||
#### FastAPI Application
|
||||
- [ ] Set up FastAPI project structure
|
||||
- [ ] Implement auth middleware
|
||||
- [ ] Create test management endpoints
|
||||
- [ ] Create config management endpoints
|
||||
- [ ] Create report endpoints
|
||||
- [ ] Implement async job queue (Celery)
|
||||
|
||||
#### Gemini Integration
|
||||
- [ ] Create GeminiMutationService
|
||||
- [ ] Implement mutation generation
|
||||
- [ ] Add fallback to GPU models
|
||||
- [ ] Rate limiting and retry logic
|
||||
|
||||
#### Tier Limits
|
||||
- [ ] Implement free tier limits (5 lifetime runs)
|
||||
- [ ] Implement Pro tier limits (200/month)
|
||||
- [ ] Implement Team tier limits (1000/month)
|
||||
- [ ] Create usage tracking
|
||||
|
||||
---
|
||||
|
||||
### Cloud Phase 3: Frontend (Week 12-14)
|
||||
|
||||
#### Next.js Setup
|
||||
- [ ] Initialize Next.js project
|
||||
- [ ] Configure Tailwind CSS
|
||||
- [ ] Set up authentication flow
|
||||
- [ ] Create layout components
|
||||
|
||||
#### Dashboard Pages
|
||||
- [ ] Dashboard home (overview)
|
||||
- [ ] Tests list and creation
|
||||
- [ ] Reports viewer
|
||||
- [ ] Billing management
|
||||
- [ ] Team management (Team tier)
|
||||
- [ ] Settings page
|
||||
|
||||
#### Marketing Pages
|
||||
- [ ] Landing page
|
||||
- [ ] Pricing page
|
||||
- [ ] Documentation
|
||||
- [ ] Blog (optional)
|
||||
|
||||
---
|
||||
|
||||
### Cloud Phase 4: Billing (Week 14-15)
|
||||
|
||||
#### Stripe Integration
|
||||
- [ ] Set up Stripe products/prices
|
||||
- [ ] Implement subscription creation
|
||||
- [ ] Handle subscription updates
|
||||
- [ ] Implement webhook handlers
|
||||
- [ ] Create invoice history
|
||||
|
||||
#### Email Notifications
|
||||
- [ ] Set up SendGrid/Mailgun
|
||||
- [ ] Test failure alerts
|
||||
- [ ] Subscription notifications
|
||||
- [ ] Welcome emails
|
||||
|
||||
---
|
||||
|
||||
### Cloud Phase 5: Testing & Launch (Week 15-16)
|
||||
|
||||
#### Testing
|
||||
- [ ] E2E tests with Cypress/Playwright
|
||||
- [ ] Load testing
|
||||
- [ ] Security audit
|
||||
- [ ] Performance optimization
|
||||
|
||||
#### Deployment
|
||||
- [ ] Set up CI/CD pipeline
|
||||
- [ ] Configure production environment
|
||||
- [ ] Set up monitoring (Sentry, etc.)
|
||||
- [ ] Launch to production
|
||||
|
||||
---
|
||||
|
||||
## Progress Summary
|
||||
|
||||
| Phase | Status | Completion |
|
||||
|
|
@ -273,10 +158,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
|
|||
| CLI Phase 4: CLI & Reporting | ✅ Complete | 100% |
|
||||
| CLI Phase 5: V2 Features | ✅ Complete | 90% |
|
||||
| Documentation | ✅ Complete | 100% |
|
||||
| Cloud Phase 1: Infrastructure | ⏳ Pending | 0% |
|
||||
| Cloud Phase 2: Backend | ⏳ Pending | 0% |
|
||||
| Cloud Phase 3: Frontend | ⏳ Pending | 0% |
|
||||
| Cloud Phase 4: Billing | ⏳ Pending | 0% |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -285,5 +166,4 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
|
|||
1. **Rust Build**: Compile and integrate Rust performance module
|
||||
2. **Integration Tests**: Add full integration test suite
|
||||
3. **PyPI Release**: Prepare and publish to PyPI
|
||||
4. **Cloud Infrastructure**: Begin AWS/GCP setup
|
||||
5. **Community Launch**: Publish to Hacker News and Reddit
|
||||
4. **Community Launch**: Publish to Hacker News and Reddit
|
||||
|
|
|
|||
|
|
@ -61,8 +61,7 @@ flakestorm/
|
|||
│ └── main.py # Typer CLI commands
|
||||
└── integrations/ # External integrations
|
||||
├── huggingface.py # HuggingFace model support
|
||||
├── embeddings.py # Local embeddings
|
||||
└── github_actions.py # CI/CD integration
|
||||
└── embeddings.py # Local embeddings
|
||||
```
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -680,15 +680,7 @@ open reports/entropix_report_*.html
|
|||
# ...
|
||||
|
||||
# Re-run tests
|
||||
flakestorm run --ci --min-score 0.9
|
||||
```
|
||||
|
||||
### Step 6: Add to CI/CD
|
||||
|
||||
```yaml
|
||||
# .github/workflows/test.yml
|
||||
- name: Run flakestorm
|
||||
run: flakestorm run --ci --min-score 0.85
|
||||
flakestorm run --min-score 0.9
|
||||
```
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -16,9 +16,8 @@ This comprehensive guide walks you through using flakestorm to test your AI agen
|
|||
6. [Running Tests](#running-tests)
|
||||
7. [Understanding Results](#understanding-results)
|
||||
8. [Integration Patterns](#integration-patterns)
|
||||
9. [CI/CD Integration](#cicd-integration)
|
||||
10. [Advanced Usage](#advanced-usage)
|
||||
11. [Troubleshooting](#troubleshooting)
|
||||
9. [Advanced Usage](#advanced-usage)
|
||||
10. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -450,19 +449,6 @@ flakestorm run --quiet
|
|||
flakestorm run --verbose
|
||||
```
|
||||
|
||||
### CI/CD Mode
|
||||
|
||||
```bash
|
||||
# Fail if score < 0.8
|
||||
flakestorm run --ci --min-score 0.8
|
||||
|
||||
# Exit codes:
|
||||
# 0 = Score meets threshold
|
||||
# 1 = Score below threshold
|
||||
# 2 = Configuration error
|
||||
# 3 = Runtime error
|
||||
```
|
||||
|
||||
### Individual Commands
|
||||
|
||||
```bash
|
||||
|
|
@ -630,103 +616,6 @@ agent = AgentExecutor(...)
|
|||
|
||||
---
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
Create `.github/workflows/flakestorm.yml`:
|
||||
|
||||
```yaml
|
||||
name: Agent Reliability Tests
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
reliability-test:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
services:
|
||||
ollama:
|
||||
image: ollama/ollama
|
||||
ports:
|
||||
- 11434:11434
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install flakestorm
|
||||
pip install -r requirements.txt
|
||||
|
||||
- name: Pull Ollama model
|
||||
run: |
|
||||
curl -X POST http://localhost:11434/api/pull \
|
||||
-d '{"name": "qwen2.5-coder:7b"}'
|
||||
|
||||
- name: Start agent
|
||||
run: |
|
||||
python -m my_agent &
|
||||
sleep 5 # Wait for startup
|
||||
|
||||
- name: Run flakestorm tests
|
||||
run: |
|
||||
flakestorm run --ci --min-score 0.8 --output json
|
||||
|
||||
- name: Upload report
|
||||
uses: actions/upload-artifact@v4
|
||||
if: always()
|
||||
with:
|
||||
name: flakestorm-report
|
||||
path: reports/
|
||||
```
|
||||
|
||||
### GitLab CI
|
||||
|
||||
```yaml
|
||||
flakestorm-test:
|
||||
image: python:3.11
|
||||
services:
|
||||
- name: ollama/ollama
|
||||
alias: ollama
|
||||
variables:
|
||||
OLLAMA_HOST: "http://ollama:11434"
|
||||
script:
|
||||
- pip install flakestorm
|
||||
- flakestorm run --ci --min-score 0.8
|
||||
artifacts:
|
||||
paths:
|
||||
- reports/
|
||||
when: always
|
||||
```
|
||||
|
||||
### Pre-commit Hook
|
||||
|
||||
Add to `.pre-commit-config.yaml`:
|
||||
|
||||
```yaml
|
||||
repos:
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: flakestorm
|
||||
name: flakestorm Agent Tests
|
||||
entry: flakestorm run --ci --min-score 0.8
|
||||
language: system
|
||||
pass_filenames: false
|
||||
always_run: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Mutation Templates
|
||||
|
|
|
|||
|
|
@ -118,12 +118,12 @@ def run(
|
|||
min_score: float | None = typer.Option(
|
||||
None,
|
||||
"--min-score",
|
||||
help="Minimum score to pass (for CI/CD)",
|
||||
help="Minimum score to pass",
|
||||
),
|
||||
ci: bool = typer.Option(
|
||||
False,
|
||||
"--ci",
|
||||
help="CI mode: exit with error if below min-score",
|
||||
help="Exit with error code if score is below min-score",
|
||||
),
|
||||
verify_only: bool = typer.Option(
|
||||
False,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue