Refactor documentation and remove CI/CD integration references

- Updated README.md to clarify local testing instructions and added error handling for low scores.
- Removed CI/CD configuration details from CONFIGURATION_GUIDE.md and other documentation files.
- Cleaned up MODULES.md by deleting references to the now-removed github_actions.py.
- Streamlined TEST_SCENARIOS.md and USAGE_GUIDE.md by eliminating CI/CD related sections.
- Adjusted CLI command help text in main.py for clarity on minimum score checks.
This commit is contained in:
Entropix 2025-12-30 16:03:42 +08:00
parent 8d752e9746
commit 0c986e268a
7 changed files with 13 additions and 282 deletions

View file

@ -192,12 +192,15 @@ agent:
module: "my_agent:chain"
```
## CI/CD Integration
## Local Testing
For local testing:
For local testing and validation:
```bash
# Run before committing (manual)
# Run with minimum score check
flakestorm run --min-score 0.9
# Exit with error code if score is too low
flakestorm run --min-score 0.9 --ci
```
## Robustness Score

View file

@ -436,38 +436,6 @@ advanced:
---
## CI/CD Configuration
For GitHub Actions:
```yaml
# .github/workflows/reliability.yml
name: Agent Reliability
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Ollama
run: |
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
sleep 5
ollama pull qwen3:8b
- name: Install flakestorm
run: pip install flakestorm
- name: Run Tests
run: flakestorm run --min-score 0.9 --ci
```
---
## Troubleshooting
### Common Issues

View file

@ -14,7 +14,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
- [x] Create flakestorm.yaml.example template
- [x] Set up project structure (src/flakestorm/*)
- [x] Configure pre-commit hooks (black, ruff, mypy)
- [ ] Set up GitHub Actions for CI/CD
#### Configuration System
- [x] Define Pydantic models for configuration
@ -122,12 +121,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
- [x] Implement similarity calculation
- [x] Add lazy model loading
#### GitHub Actions Integration
- [x] Create action.yml template
- [x] Create workflow example
- [x] Document CI/CD integration
- [ ] Publish to GitHub Marketplace
---
### Testing & Quality
@ -155,114 +148,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
---
## Cloud Version (Commercial)
### Cloud Phase 1: Infrastructure (Week 9-10)
#### Cloud Setup
- [ ] Set up AWS/GCP project
- [ ] Configure VPC and networking
- [ ] Set up PostgreSQL database
- [ ] Configure Redis for queue/cache
- [ ] Set up S3/GCS for storage
- [ ] Configure Docker/Kubernetes
#### Database Schema
- [ ] Create users table
- [ ] Create test_configs table
- [ ] Create test_runs table
- [ ] Create subscriptions table
- [ ] Set up migrations (Alembic)
#### Authentication
- [ ] Integrate Auth0/Clerk
- [ ] Implement JWT validation
- [ ] Create user management endpoints
- [ ] Add RBAC for team tier
---
### Cloud Phase 2: Backend (Week 10-12)
#### FastAPI Application
- [ ] Set up FastAPI project structure
- [ ] Implement auth middleware
- [ ] Create test management endpoints
- [ ] Create config management endpoints
- [ ] Create report endpoints
- [ ] Implement async job queue (Celery)
#### Gemini Integration
- [ ] Create GeminiMutationService
- [ ] Implement mutation generation
- [ ] Add fallback to GPU models
- [ ] Rate limiting and retry logic
#### Tier Limits
- [ ] Implement free tier limits (5 lifetime runs)
- [ ] Implement Pro tier limits (200/month)
- [ ] Implement Team tier limits (1000/month)
- [ ] Create usage tracking
---
### Cloud Phase 3: Frontend (Week 12-14)
#### Next.js Setup
- [ ] Initialize Next.js project
- [ ] Configure Tailwind CSS
- [ ] Set up authentication flow
- [ ] Create layout components
#### Dashboard Pages
- [ ] Dashboard home (overview)
- [ ] Tests list and creation
- [ ] Reports viewer
- [ ] Billing management
- [ ] Team management (Team tier)
- [ ] Settings page
#### Marketing Pages
- [ ] Landing page
- [ ] Pricing page
- [ ] Documentation
- [ ] Blog (optional)
---
### Cloud Phase 4: Billing (Week 14-15)
#### Stripe Integration
- [ ] Set up Stripe products/prices
- [ ] Implement subscription creation
- [ ] Handle subscription updates
- [ ] Implement webhook handlers
- [ ] Create invoice history
#### Email Notifications
- [ ] Set up SendGrid/Mailgun
- [ ] Test failure alerts
- [ ] Subscription notifications
- [ ] Welcome emails
---
### Cloud Phase 5: Testing & Launch (Week 15-16)
#### Testing
- [ ] E2E tests with Cypress/Playwright
- [ ] Load testing
- [ ] Security audit
- [ ] Performance optimization
#### Deployment
- [ ] Set up CI/CD pipeline
- [ ] Configure production environment
- [ ] Set up monitoring (Sentry, etc.)
- [ ] Launch to production
---
## Progress Summary
| Phase | Status | Completion |
@ -273,10 +158,6 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
| CLI Phase 4: CLI & Reporting | ✅ Complete | 100% |
| CLI Phase 5: V2 Features | ✅ Complete | 90% |
| Documentation | ✅ Complete | 100% |
| Cloud Phase 1: Infrastructure | ⏳ Pending | 0% |
| Cloud Phase 2: Backend | ⏳ Pending | 0% |
| Cloud Phase 3: Frontend | ⏳ Pending | 0% |
| Cloud Phase 4: Billing | ⏳ Pending | 0% |
---
@ -285,5 +166,4 @@ This document tracks the implementation progress of flakestorm - The Agent Relia
1. **Rust Build**: Compile and integrate Rust performance module
2. **Integration Tests**: Add full integration test suite
3. **PyPI Release**: Prepare and publish to PyPI
4. **Cloud Infrastructure**: Begin AWS/GCP setup
5. **Community Launch**: Publish to Hacker News and Reddit
4. **Community Launch**: Publish to Hacker News and Reddit

View file

@ -61,8 +61,7 @@ flakestorm/
│ └── main.py # Typer CLI commands
└── integrations/ # External integrations
├── huggingface.py # HuggingFace model support
├── embeddings.py # Local embeddings
└── github_actions.py # CI/CD integration
└── embeddings.py # Local embeddings
```
---

View file

@ -680,15 +680,7 @@ open reports/entropix_report_*.html
# ...
# Re-run tests
flakestorm run --ci --min-score 0.9
```
### Step 6: Add to CI/CD
```yaml
# .github/workflows/test.yml
- name: Run flakestorm
run: flakestorm run --ci --min-score 0.85
flakestorm run --min-score 0.9
```
---

View file

@ -16,9 +16,8 @@ This comprehensive guide walks you through using flakestorm to test your AI agen
6. [Running Tests](#running-tests)
7. [Understanding Results](#understanding-results)
8. [Integration Patterns](#integration-patterns)
9. [CI/CD Integration](#cicd-integration)
10. [Advanced Usage](#advanced-usage)
11. [Troubleshooting](#troubleshooting)
9. [Advanced Usage](#advanced-usage)
10. [Troubleshooting](#troubleshooting)
---
@ -450,19 +449,6 @@ flakestorm run --quiet
flakestorm run --verbose
```
### CI/CD Mode
```bash
# Fail if score < 0.8
flakestorm run --ci --min-score 0.8
# Exit codes:
# 0 = Score meets threshold
# 1 = Score below threshold
# 2 = Configuration error
# 3 = Runtime error
```
### Individual Commands
```bash
@ -630,103 +616,6 @@ agent = AgentExecutor(...)
---
## CI/CD Integration
### GitHub Actions
Create `.github/workflows/flakestorm.yml`:
```yaml
name: Agent Reliability Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
reliability-test:
runs-on: ubuntu-latest
services:
ollama:
image: ollama/ollama
ports:
- 11434:11434
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install flakestorm
pip install -r requirements.txt
- name: Pull Ollama model
run: |
curl -X POST http://localhost:11434/api/pull \
-d '{"name": "qwen2.5-coder:7b"}'
- name: Start agent
run: |
python -m my_agent &
sleep 5 # Wait for startup
- name: Run flakestorm tests
run: |
flakestorm run --ci --min-score 0.8 --output json
- name: Upload report
uses: actions/upload-artifact@v4
if: always()
with:
name: flakestorm-report
path: reports/
```
### GitLab CI
```yaml
flakestorm-test:
image: python:3.11
services:
- name: ollama/ollama
alias: ollama
variables:
OLLAMA_HOST: "http://ollama:11434"
script:
- pip install flakestorm
- flakestorm run --ci --min-score 0.8
artifacts:
paths:
- reports/
when: always
```
### Pre-commit Hook
Add to `.pre-commit-config.yaml`:
```yaml
repos:
- repo: local
hooks:
- id: flakestorm
name: flakestorm Agent Tests
entry: flakestorm run --ci --min-score 0.8
language: system
pass_filenames: false
always_run: true
```
---
## Advanced Usage
### Custom Mutation Templates

View file

@ -118,12 +118,12 @@ def run(
min_score: float | None = typer.Option(
None,
"--min-score",
help="Minimum score to pass (for CI/CD)",
help="Minimum score to pass",
),
ci: bool = typer.Option(
False,
"--ci",
help="CI mode: exit with error if below min-score",
help="Exit with error code if score is below min-score",
),
verify_only: bool = typer.Option(
False,