mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-27 01:36:34 +02:00
- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files - Add all documentation files referenced in README.md: - USAGE_GUIDE.md - CONFIGURATION_GUIDE.md - TEST_SCENARIOS.md - MODULES.md - DEVELOPER_FAQ.md - PUBLISHING.md - CONTRIBUTING.md - API_SPECIFICATION.md - TESTING_GUIDE.md - IMPLEMENTATION_CHECKLIST.md - Pre-commit hooks fixed trailing whitespace and end-of-file formatting
7.4 KiB
7.4 KiB
flakestorm Implementation Checklist
This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.
CLI Version (Open Source - Apache 2.0)
Phase 1: Foundation (Week 1-2)
Project Scaffolding
- Initialize Python project with pyproject.toml
- Set up Rust workspace with Cargo.toml
- Create Apache 2.0 LICENSE file
- Write comprehensive README.md
- Create flakestorm.yaml.example template
- Set up project structure (src/flakestorm/*)
- Configure pre-commit hooks (black, ruff, mypy)
- Set up GitHub Actions for CI/CD
Configuration System
- Define Pydantic models for configuration
- Implement YAML loading/validation
- Support environment variable expansion
- Create configuration factory functions
- Add configuration validation tests
Agent Protocol/Adapter
- Define AgentProtocol interface
- Implement HTTPAgentAdapter
- Implement PythonAgentAdapter
- Implement LangChainAgentAdapter
- Create adapter factory function
- Add retry logic for HTTP adapter
Phase 2: Mutation Engine (Week 2-3)
Ollama Integration
- Create MutationEngine class
- Implement Ollama client wrapper
- Add connection verification
- Support async mutation generation
- Implement batch generation
Mutation Types & Templates
- Define MutationType enum
- Create Mutation dataclass
- Write templates for PARAPHRASE
- Write templates for NOISE
- Write templates for TONE_SHIFT
- Write templates for PROMPT_INJECTION
- Add mutation validation logic
- Support custom templates
Rust Performance Bindings
- Set up PyO3 bindings
- Implement robustness score calculation
- Implement weighted score calculation
- Implement Levenshtein distance
- Implement parallel processing utilities
- Build and test Rust module
- Integrate with Python package
Phase 3: Runner & Assertions (Week 3-4)
Async Runner
- Create EntropixRunner class
- Implement orchestrator logic
- Add concurrency control with semaphores
- Implement progress tracking
- Add setup verification
Invariant System
- Create InvariantVerifier class
- Implement ContainsChecker
- Implement LatencyChecker
- Implement ValidJsonChecker
- Implement RegexChecker
- Implement SimilarityChecker
- Implement ExcludesPIIChecker
- Implement RefusalChecker
- Add checker registry
Phase 4: CLI & Reporting (Week 4-5)
CLI Commands
- Set up Typer application
- Implement
flakestorm initcommand - Implement
flakestorm runcommand - Implement
flakestorm verifycommand - Implement
flakestorm reportcommand - Implement
flakestorm scorecommand - Add CI mode (--ci --min-score)
- Add rich progress bars
Report Generation
- Create report data models
- Implement HTMLReportGenerator
- Create interactive HTML template
- Implement JSONReportGenerator
- Implement TerminalReporter
- Add score visualization
- Add mutation matrix view
Phase 5: V2 Features (Week 5-7)
HuggingFace Integration
- Create HuggingFaceModelProvider
- Support GGUF model downloading
- Add recommended models list
- Integrate with Ollama model importing
Vector Similarity
- Create LocalEmbedder class
- Integrate sentence-transformers
- Implement similarity calculation
- Add lazy model loading
GitHub Actions Integration
- Create action.yml template
- Create workflow example
- Document CI/CD integration
- Publish to GitHub Marketplace
Testing & Quality
Unit Tests
- Test configuration loading
- Test mutation types
- Test assertion checkers
- Test agent adapters
- Test orchestrator
- Test report generation
Integration Tests
- Test full run with mock agent
- Test CLI commands
- Test report generation
Documentation
- Write README.md
- Create IMPLEMENTATION_CHECKLIST.md
- Create ARCHITECTURE_SUMMARY.md
- Create API_SPECIFICATION.md
- Create CONTRIBUTING.md
- Create CONFIGURATION_GUIDE.md
Cloud Version (Commercial)
Cloud Phase 1: Infrastructure (Week 9-10)
Cloud Setup
- Set up AWS/GCP project
- Configure VPC and networking
- Set up PostgreSQL database
- Configure Redis for queue/cache
- Set up S3/GCS for storage
- Configure Docker/Kubernetes
Database Schema
- Create users table
- Create test_configs table
- Create test_runs table
- Create subscriptions table
- Set up migrations (Alembic)
Authentication
- Integrate Auth0/Clerk
- Implement JWT validation
- Create user management endpoints
- Add RBAC for team tier
Cloud Phase 2: Backend (Week 10-12)
FastAPI Application
- Set up FastAPI project structure
- Implement auth middleware
- Create test management endpoints
- Create config management endpoints
- Create report endpoints
- Implement async job queue (Celery)
Gemini Integration
- Create GeminiMutationService
- Implement mutation generation
- Add fallback to GPU models
- Rate limiting and retry logic
Tier Limits
- Implement free tier limits (5 lifetime runs)
- Implement Pro tier limits (200/month)
- Implement Team tier limits (1000/month)
- Create usage tracking
Cloud Phase 3: Frontend (Week 12-14)
Next.js Setup
- Initialize Next.js project
- Configure Tailwind CSS
- Set up authentication flow
- Create layout components
Dashboard Pages
- Dashboard home (overview)
- Tests list and creation
- Reports viewer
- Billing management
- Team management (Team tier)
- Settings page
Marketing Pages
- Landing page
- Pricing page
- Documentation
- Blog (optional)
Cloud Phase 4: Billing (Week 14-15)
Stripe Integration
- Set up Stripe products/prices
- Implement subscription creation
- Handle subscription updates
- Implement webhook handlers
- Create invoice history
Email Notifications
- Set up SendGrid/Mailgun
- Test failure alerts
- Subscription notifications
- Welcome emails
Cloud Phase 5: Testing & Launch (Week 15-16)
Testing
- E2E tests with Cypress/Playwright
- Load testing
- Security audit
- Performance optimization
Deployment
- Set up CI/CD pipeline
- Configure production environment
- Set up monitoring (Sentry, etc.)
- Launch to production
Progress Summary
| Phase | Status | Completion |
|---|---|---|
| CLI Phase 1: Foundation | ✅ Complete | 100% |
| CLI Phase 2: Mutation Engine | ✅ Complete | 100% |
| CLI Phase 3: Runner & Assertions | ✅ Complete | 100% |
| CLI Phase 4: CLI & Reporting | ✅ Complete | 100% |
| CLI Phase 5: V2 Features | ✅ Complete | 90% |
| Documentation | ✅ Complete | 100% |
| Cloud Phase 1: Infrastructure | ⏳ Pending | 0% |
| Cloud Phase 2: Backend | ⏳ Pending | 0% |
| Cloud Phase 3: Frontend | ⏳ Pending | 0% |
| Cloud Phase 4: Billing | ⏳ Pending | 0% |
Next Steps
- Rust Build: Compile and integrate Rust performance module
- Integration Tests: Add full integration test suite
- PyPI Release: Prepare and publish to PyPI
- Cloud Infrastructure: Begin AWS/GCP setup
- Community Launch: Publish to Hacker News and Reddit