mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-25 00:36:54 +02:00
- Updated README.md to clarify local testing instructions and added error handling for low scores. - Removed CI/CD configuration details from CONFIGURATION_GUIDE.md and other documentation files. - Cleaned up MODULES.md by deleting references to the now-removed github_actions.py. - Streamlined TEST_SCENARIOS.md and USAGE_GUIDE.md by eliminating CI/CD related sections. - Adjusted CLI command help text in main.py for clarity on minimum score checks.
169 lines
4.6 KiB
Markdown
169 lines
4.6 KiB
Markdown
# flakestorm Implementation Checklist
|
|
|
|
This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.
|
|
|
|
## CLI Version (Open Source - Apache 2.0)
|
|
|
|
### Phase 1: Foundation (Week 1-2)
|
|
|
|
#### Project Scaffolding
|
|
- [x] Initialize Python project with pyproject.toml
|
|
- [x] Set up Rust workspace with Cargo.toml
|
|
- [x] Create Apache 2.0 LICENSE file
|
|
- [x] Write comprehensive README.md
|
|
- [x] Create flakestorm.yaml.example template
|
|
- [x] Set up project structure (src/flakestorm/*)
|
|
- [x] Configure pre-commit hooks (black, ruff, mypy)
|
|
|
|
#### Configuration System
|
|
- [x] Define Pydantic models for configuration
|
|
- [x] Implement YAML loading/validation
|
|
- [x] Support environment variable expansion
|
|
- [x] Create configuration factory functions
|
|
- [x] Add configuration validation tests
|
|
|
|
#### Agent Protocol/Adapter
|
|
- [x] Define AgentProtocol interface
|
|
- [x] Implement HTTPAgentAdapter
|
|
- [x] Implement PythonAgentAdapter
|
|
- [x] Implement LangChainAgentAdapter
|
|
- [x] Create adapter factory function
|
|
- [x] Add retry logic for HTTP adapter
|
|
|
|
---
|
|
|
|
### Phase 2: Mutation Engine (Week 2-3)
|
|
|
|
#### Ollama Integration
|
|
- [x] Create MutationEngine class
|
|
- [x] Implement Ollama client wrapper
|
|
- [x] Add connection verification
|
|
- [x] Support async mutation generation
|
|
- [x] Implement batch generation
|
|
|
|
#### Mutation Types & Templates
|
|
- [x] Define MutationType enum
|
|
- [x] Create Mutation dataclass
|
|
- [x] Write templates for PARAPHRASE
|
|
- [x] Write templates for NOISE
|
|
- [x] Write templates for TONE_SHIFT
|
|
- [x] Write templates for PROMPT_INJECTION
|
|
- [x] Add mutation validation logic
|
|
- [x] Support custom templates
|
|
|
|
#### Rust Performance Bindings
|
|
- [x] Set up PyO3 bindings
|
|
- [x] Implement robustness score calculation
|
|
- [x] Implement weighted score calculation
|
|
- [x] Implement Levenshtein distance
|
|
- [x] Implement parallel processing utilities
|
|
- [x] Build and test Rust module
|
|
- [x] Integrate with Python package
|
|
|
|
---
|
|
|
|
### Phase 3: Runner & Assertions (Week 3-4)
|
|
|
|
#### Async Runner
|
|
- [x] Create EntropixRunner class
|
|
- [x] Implement orchestrator logic
|
|
- [x] Add concurrency control with semaphores
|
|
- [x] Implement progress tracking
|
|
- [x] Add setup verification
|
|
|
|
#### Invariant System
|
|
- [x] Create InvariantVerifier class
|
|
- [x] Implement ContainsChecker
|
|
- [x] Implement LatencyChecker
|
|
- [x] Implement ValidJsonChecker
|
|
- [x] Implement RegexChecker
|
|
- [x] Implement SimilarityChecker
|
|
- [x] Implement ExcludesPIIChecker
|
|
- [x] Implement RefusalChecker
|
|
- [x] Add checker registry
|
|
|
|
---
|
|
|
|
### Phase 4: CLI & Reporting (Week 4-5)
|
|
|
|
#### CLI Commands
|
|
- [x] Set up Typer application
|
|
- [x] Implement `flakestorm init` command
|
|
- [x] Implement `flakestorm run` command
|
|
- [x] Implement `flakestorm verify` command
|
|
- [x] Implement `flakestorm report` command
|
|
- [x] Implement `flakestorm score` command
|
|
- [x] Add CI mode (--ci --min-score)
|
|
- [x] Add rich progress bars
|
|
|
|
#### Report Generation
|
|
- [x] Create report data models
|
|
- [x] Implement HTMLReportGenerator
|
|
- [x] Create interactive HTML template
|
|
- [x] Implement JSONReportGenerator
|
|
- [x] Implement TerminalReporter
|
|
- [x] Add score visualization
|
|
- [x] Add mutation matrix view
|
|
|
|
---
|
|
|
|
### Phase 5: V2 Features (Week 5-7)
|
|
|
|
#### HuggingFace Integration
|
|
- [x] Create HuggingFaceModelProvider
|
|
- [x] Support GGUF model downloading
|
|
- [x] Add recommended models list
|
|
- [x] Integrate with Ollama model importing
|
|
|
|
#### Vector Similarity
|
|
- [x] Create LocalEmbedder class
|
|
- [x] Integrate sentence-transformers
|
|
- [x] Implement similarity calculation
|
|
- [x] Add lazy model loading
|
|
|
|
---
|
|
|
|
### Testing & Quality
|
|
|
|
#### Unit Tests
|
|
- [x] Test configuration loading
|
|
- [x] Test mutation types
|
|
- [x] Test assertion checkers
|
|
- [ ] Test agent adapters
|
|
- [ ] Test orchestrator
|
|
- [ ] Test report generation
|
|
|
|
#### Integration Tests
|
|
- [ ] Test full run with mock agent
|
|
- [ ] Test CLI commands
|
|
- [ ] Test report generation
|
|
|
|
#### Documentation
|
|
- [x] Write README.md
|
|
- [x] Create IMPLEMENTATION_CHECKLIST.md
|
|
- [x] Create ARCHITECTURE_SUMMARY.md
|
|
- [x] Create API_SPECIFICATION.md
|
|
- [x] Create CONTRIBUTING.md
|
|
- [x] Create CONFIGURATION_GUIDE.md
|
|
|
|
---
|
|
|
|
## Progress Summary
|
|
|
|
| Phase | Status | Completion |
|
|
|-------|--------|------------|
|
|
| CLI Phase 1: Foundation | ✅ Complete | 100% |
|
|
| CLI Phase 2: Mutation Engine | ✅ Complete | 100% |
|
|
| CLI Phase 3: Runner & Assertions | ✅ Complete | 100% |
|
|
| CLI Phase 4: CLI & Reporting | ✅ Complete | 100% |
|
|
| CLI Phase 5: V2 Features | ✅ Complete | 90% |
|
|
| Documentation | ✅ Complete | 100% |
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Rust Build**: Compile and integrate Rust performance module
|
|
2. **Integration Tests**: Add full integration test suite
|
|
3. **PyPI Release**: Prepare and publish to PyPI
|
|
4. **Community Launch**: Publish to Hacker News and Reddit
|