flakestorm/docs/IMPLEMENTATION_CHECKLIST.md

5.8 KiB

flakestorm Implementation Checklist

This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.

CLI Version (Open Source - Apache 2.0)

Phase 1: Foundation (Week 1-2)

Project Scaffolding

  • Initialize Python project with pyproject.toml
  • Set up Rust workspace with Cargo.toml
  • Create Apache 2.0 LICENSE file
  • Write comprehensive README.md
  • Create flakestorm.yaml.example template
  • Set up project structure (src/flakestorm/*)
  • Configure pre-commit hooks (black, ruff, mypy)

Configuration System

  • Define Pydantic models for configuration
  • Implement YAML loading/validation
  • Support environment variable expansion
  • Create configuration factory functions
  • Add configuration validation tests

Agent Protocol/Adapter

  • Define AgentProtocol interface
  • Implement HTTPAgentAdapter
  • Implement PythonAgentAdapter
  • Implement LangChainAgentAdapter
  • Create adapter factory function
  • Add retry logic for HTTP adapter

Phase 2: Mutation Engine (Week 2-3)

Ollama Integration

  • Create MutationEngine class
  • Implement Ollama client wrapper
  • Add connection verification
  • Support async mutation generation
  • Implement batch generation

Mutation Types & Templates

  • Define MutationType enum
  • Create Mutation dataclass
  • Write templates for PARAPHRASE
  • Write templates for NOISE
  • Write templates for TONE_SHIFT
  • Write templates for PROMPT_INJECTION
  • Add mutation validation logic
  • Support custom templates

Rust Performance Bindings

  • Set up PyO3 bindings
  • Implement robustness score calculation
  • Implement weighted score calculation
  • Implement Levenshtein distance
  • Implement parallel processing utilities
  • Build and test Rust module
  • Integrate with Python package

Phase 3: Runner & Assertions (Week 3-4)

Async Runner

  • Create FlakeStormRunner class
  • Implement orchestrator logic
  • Add concurrency control with semaphores
  • Implement progress tracking
  • Add setup verification

Invariant System

  • Create InvariantVerifier class
  • Implement ContainsChecker
  • Implement LatencyChecker
  • Implement ValidJsonChecker
  • Implement RegexChecker
  • Implement SimilarityChecker
  • Implement ExcludesPIIChecker
  • Implement RefusalChecker
  • Add checker registry

Phase 4: CLI & Reporting (Week 4-5)

CLI Commands

  • Set up Typer application
  • Implement flakestorm init command
  • Implement flakestorm run command
  • Implement flakestorm verify command
  • Implement flakestorm report command
  • Implement flakestorm score command
  • Add CI mode (--ci --min-score)
  • Add rich progress bars

Report Generation

  • Create report data models
  • Implement HTMLReportGenerator
  • Create interactive HTML template
  • Implement JSONReportGenerator
  • Implement TerminalReporter
  • Add score visualization
  • Add mutation matrix view

Phase 5: V2 Features (Week 5-7)

HuggingFace Integration

  • Create HuggingFaceModelProvider
  • Support GGUF model downloading
  • Add recommended models list
  • Integrate with Ollama model importing

Vector Similarity

  • Create LocalEmbedder class
  • Integrate sentence-transformers
  • Implement similarity calculation
  • Add lazy model loading

Testing & Quality

Unit Tests

  • Test configuration loading
  • Test mutation types
  • Test assertion checkers
  • Test agent adapters
  • Test orchestrator
  • Test report generation

Integration Tests

  • Test full run with mock agent
  • Test CLI commands
  • Test report generation

Documentation

  • Write README.md
  • Create IMPLEMENTATION_CHECKLIST.md
  • Create ARCHITECTURE_SUMMARY.md
  • Create API_SPECIFICATION.md
  • Create CONTRIBUTING.md
  • Create CONFIGURATION_GUIDE.md

Phase 6: Essential Mutations (Week 7-8)

Core Mutation Types

  • Add ENCODING_ATTACKS mutation type
  • Add CONTEXT_MANIPULATION mutation type
  • Add LENGTH_EXTREMES mutation type
  • Update MutationType enum with all 8 types
  • Create templates for new mutation types
  • Update mutation validation for edge cases

Configuration Updates

  • Update MutationConfig defaults
  • Update example configuration files
  • Update orchestrator comments

Documentation Updates

  • Update README.md with comprehensive mutation types table
  • Add Mutation Strategy section to README
  • Update API_SPECIFICATION.md with all 8 types
  • Update MODULES.md with detailed mutation documentation
  • Add Mutation Types Guide to CONFIGURATION_GUIDE.md
  • Add Understanding Mutation Types to USAGE_GUIDE.md
  • Add Mutation Type Deep Dive to TEST_SCENARIOS.md

Progress Summary

Phase Status Completion
CLI Phase 1: Foundation Complete 100%
CLI Phase 2: Mutation Engine Complete 100%
CLI Phase 3: Runner & Assertions Complete 100%
CLI Phase 4: CLI & Reporting Complete 100%
CLI Phase 5: V2 Features Complete 90%
CLI Phase 6: Essential Mutations Complete 100%
Documentation Complete 100%

Next Steps

Immediate (Current Sprint)

  1. Rust Build: Compile and integrate Rust performance module
  2. Integration Tests: Add full integration test suite
  3. PyPI Release: Prepare and publish to PyPI
  4. Community Launch: Publish to Hacker News and Reddit

Future Roadmap

See ROADMAP.md for comprehensive roadmap of advanced chaos engineering and adversarial testing features. These are open for community contribution - see CONTRIBUTING.md for how to get involved.