flakestorm/docs/IMPLEMENTATION_CHECKLIST.md
Entropix 1d45fb2981 - Updated class names and import statements to align with the new naming convention.
- Adjusted test commands and report references to use FlakeStorm terminology.
- Ensured consistency in configuration and runner references throughout the documentation.
2025-12-30 16:13:29 +08:00

4.6 KiB

flakestorm Implementation Checklist

This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.

CLI Version (Open Source - Apache 2.0)

Phase 1: Foundation (Week 1-2)

Project Scaffolding

  • Initialize Python project with pyproject.toml
  • Set up Rust workspace with Cargo.toml
  • Create Apache 2.0 LICENSE file
  • Write comprehensive README.md
  • Create flakestorm.yaml.example template
  • Set up project structure (src/flakestorm/*)
  • Configure pre-commit hooks (black, ruff, mypy)

Configuration System

  • Define Pydantic models for configuration
  • Implement YAML loading/validation
  • Support environment variable expansion
  • Create configuration factory functions
  • Add configuration validation tests

Agent Protocol/Adapter

  • Define AgentProtocol interface
  • Implement HTTPAgentAdapter
  • Implement PythonAgentAdapter
  • Implement LangChainAgentAdapter
  • Create adapter factory function
  • Add retry logic for HTTP adapter

Phase 2: Mutation Engine (Week 2-3)

Ollama Integration

  • Create MutationEngine class
  • Implement Ollama client wrapper
  • Add connection verification
  • Support async mutation generation
  • Implement batch generation

Mutation Types & Templates

  • Define MutationType enum
  • Create Mutation dataclass
  • Write templates for PARAPHRASE
  • Write templates for NOISE
  • Write templates for TONE_SHIFT
  • Write templates for PROMPT_INJECTION
  • Add mutation validation logic
  • Support custom templates

Rust Performance Bindings

  • Set up PyO3 bindings
  • Implement robustness score calculation
  • Implement weighted score calculation
  • Implement Levenshtein distance
  • Implement parallel processing utilities
  • Build and test Rust module
  • Integrate with Python package

Phase 3: Runner & Assertions (Week 3-4)

Async Runner

  • Create FlakeStormRunner class
  • Implement orchestrator logic
  • Add concurrency control with semaphores
  • Implement progress tracking
  • Add setup verification

Invariant System

  • Create InvariantVerifier class
  • Implement ContainsChecker
  • Implement LatencyChecker
  • Implement ValidJsonChecker
  • Implement RegexChecker
  • Implement SimilarityChecker
  • Implement ExcludesPIIChecker
  • Implement RefusalChecker
  • Add checker registry

Phase 4: CLI & Reporting (Week 4-5)

CLI Commands

  • Set up Typer application
  • Implement flakestorm init command
  • Implement flakestorm run command
  • Implement flakestorm verify command
  • Implement flakestorm report command
  • Implement flakestorm score command
  • Add CI mode (--ci --min-score)
  • Add rich progress bars

Report Generation

  • Create report data models
  • Implement HTMLReportGenerator
  • Create interactive HTML template
  • Implement JSONReportGenerator
  • Implement TerminalReporter
  • Add score visualization
  • Add mutation matrix view

Phase 5: V2 Features (Week 5-7)

HuggingFace Integration

  • Create HuggingFaceModelProvider
  • Support GGUF model downloading
  • Add recommended models list
  • Integrate with Ollama model importing

Vector Similarity

  • Create LocalEmbedder class
  • Integrate sentence-transformers
  • Implement similarity calculation
  • Add lazy model loading

Testing & Quality

Unit Tests

  • Test configuration loading
  • Test mutation types
  • Test assertion checkers
  • Test agent adapters
  • Test orchestrator
  • Test report generation

Integration Tests

  • Test full run with mock agent
  • Test CLI commands
  • Test report generation

Documentation

  • Write README.md
  • Create IMPLEMENTATION_CHECKLIST.md
  • Create ARCHITECTURE_SUMMARY.md
  • Create API_SPECIFICATION.md
  • Create CONTRIBUTING.md
  • Create CONFIGURATION_GUIDE.md

Progress Summary

Phase Status Completion
CLI Phase 1: Foundation Complete 100%
CLI Phase 2: Mutation Engine Complete 100%
CLI Phase 3: Runner & Assertions Complete 100%
CLI Phase 4: CLI & Reporting Complete 100%
CLI Phase 5: V2 Features Complete 90%
Documentation Complete 100%

Next Steps

  1. Rust Build: Compile and integrate Rust performance module
  2. Integration Tests: Add full integration test suite
  3. PyPI Release: Prepare and publish to PyPI
  4. Community Launch: Publish to Hacker News and Reddit