mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 16:56:25 +02:00
5.8 KiB
5.8 KiB
flakestorm Implementation Checklist
This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.
CLI Version (Open Source - Apache 2.0)
Phase 1: Foundation (Week 1-2)
Project Scaffolding
- Initialize Python project with pyproject.toml
- Set up Rust workspace with Cargo.toml
- Create Apache 2.0 LICENSE file
- Write comprehensive README.md
- Create flakestorm.yaml.example template
- Set up project structure (src/flakestorm/*)
- Configure pre-commit hooks (black, ruff, mypy)
Configuration System
- Define Pydantic models for configuration
- Implement YAML loading/validation
- Support environment variable expansion
- Create configuration factory functions
- Add configuration validation tests
Agent Protocol/Adapter
- Define AgentProtocol interface
- Implement HTTPAgentAdapter
- Implement PythonAgentAdapter
- Implement LangChainAgentAdapter
- Create adapter factory function
- Add retry logic for HTTP adapter
Phase 2: Mutation Engine (Week 2-3)
Ollama Integration
- Create MutationEngine class
- Implement Ollama client wrapper
- Add connection verification
- Support async mutation generation
- Implement batch generation
Mutation Types & Templates
- Define MutationType enum
- Create Mutation dataclass
- Write templates for PARAPHRASE
- Write templates for NOISE
- Write templates for TONE_SHIFT
- Write templates for PROMPT_INJECTION
- Add mutation validation logic
- Support custom templates
Rust Performance Bindings
- Set up PyO3 bindings
- Implement robustness score calculation
- Implement weighted score calculation
- Implement Levenshtein distance
- Implement parallel processing utilities
- Build and test Rust module
- Integrate with Python package
Phase 3: Runner & Assertions (Week 3-4)
Async Runner
- Create FlakeStormRunner class
- Implement orchestrator logic
- Add concurrency control with semaphores
- Implement progress tracking
- Add setup verification
Invariant System
- Create InvariantVerifier class
- Implement ContainsChecker
- Implement LatencyChecker
- Implement ValidJsonChecker
- Implement RegexChecker
- Implement SimilarityChecker
- Implement ExcludesPIIChecker
- Implement RefusalChecker
- Add checker registry
Phase 4: CLI & Reporting (Week 4-5)
CLI Commands
- Set up Typer application
- Implement
flakestorm initcommand - Implement
flakestorm runcommand - Implement
flakestorm verifycommand - Implement
flakestorm reportcommand - Implement
flakestorm scorecommand - Add CI mode (--ci --min-score)
- Add rich progress bars
Report Generation
- Create report data models
- Implement HTMLReportGenerator
- Create interactive HTML template
- Implement JSONReportGenerator
- Implement TerminalReporter
- Add score visualization
- Add mutation matrix view
Phase 5: V2 Features (Week 5-7)
HuggingFace Integration
- Create HuggingFaceModelProvider
- Support GGUF model downloading
- Add recommended models list
- Integrate with Ollama model importing
Vector Similarity
- Create LocalEmbedder class
- Integrate sentence-transformers
- Implement similarity calculation
- Add lazy model loading
Testing & Quality
Unit Tests
- Test configuration loading
- Test mutation types
- Test assertion checkers
- Test agent adapters
- Test orchestrator
- Test report generation
Integration Tests
- Test full run with mock agent
- Test CLI commands
- Test report generation
Documentation
- Write README.md
- Create IMPLEMENTATION_CHECKLIST.md
- Create ARCHITECTURE_SUMMARY.md
- Create API_SPECIFICATION.md
- Create CONTRIBUTING.md
- Create CONFIGURATION_GUIDE.md
Phase 6: Essential Mutations (Week 7-8)
Core Mutation Types
- Add ENCODING_ATTACKS mutation type
- Add CONTEXT_MANIPULATION mutation type
- Add LENGTH_EXTREMES mutation type
- Update MutationType enum with all 8 types
- Create templates for new mutation types
- Update mutation validation for edge cases
Configuration Updates
- Update MutationConfig defaults
- Update example configuration files
- Update orchestrator comments
Documentation Updates
- Update README.md with comprehensive mutation types table
- Add Mutation Strategy section to README
- Update API_SPECIFICATION.md with all 8 types
- Update MODULES.md with detailed mutation documentation
- Add Mutation Types Guide to CONFIGURATION_GUIDE.md
- Add Understanding Mutation Types to USAGE_GUIDE.md
- Add Mutation Type Deep Dive to TEST_SCENARIOS.md
Progress Summary
| Phase | Status | Completion |
|---|---|---|
| CLI Phase 1: Foundation | ✅ Complete | 100% |
| CLI Phase 2: Mutation Engine | ✅ Complete | 100% |
| CLI Phase 3: Runner & Assertions | ✅ Complete | 100% |
| CLI Phase 4: CLI & Reporting | ✅ Complete | 100% |
| CLI Phase 5: V2 Features | ✅ Complete | 90% |
| CLI Phase 6: Essential Mutations | ✅ Complete | 100% |
| Documentation | ✅ Complete | 100% |
Next Steps
Immediate (Current Sprint)
- Rust Build: Compile and integrate Rust performance module
- Integration Tests: Add full integration test suite
- PyPI Release: Prepare and publish to PyPI
- Community Launch: Publish to Hacker News and Reddit
Future Roadmap
See ROADMAP.md for comprehensive roadmap of advanced chaos engineering and adversarial testing features. These are open for community contribution - see CONTRIBUTING.md for how to get involved.