flakestorm/docs/IMPLEMENTATION_CHECKLIST.md
Entropix 0c986e268a Refactor documentation and remove CI/CD integration references
- Updated README.md to clarify local testing instructions and added error handling for low scores.
- Removed CI/CD configuration details from CONFIGURATION_GUIDE.md and other documentation files.
- Cleaned up MODULES.md by deleting references to the now-removed github_actions.py.
- Streamlined TEST_SCENARIOS.md and USAGE_GUIDE.md by eliminating CI/CD related sections.
- Adjusted CLI command help text in main.py for clarity on minimum score checks.
2025-12-30 16:03:42 +08:00

4.6 KiB

flakestorm Implementation Checklist

This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.

CLI Version (Open Source - Apache 2.0)

Phase 1: Foundation (Week 1-2)

Project Scaffolding

  • Initialize Python project with pyproject.toml
  • Set up Rust workspace with Cargo.toml
  • Create Apache 2.0 LICENSE file
  • Write comprehensive README.md
  • Create flakestorm.yaml.example template
  • Set up project structure (src/flakestorm/*)
  • Configure pre-commit hooks (black, ruff, mypy)

Configuration System

  • Define Pydantic models for configuration
  • Implement YAML loading/validation
  • Support environment variable expansion
  • Create configuration factory functions
  • Add configuration validation tests

Agent Protocol/Adapter

  • Define AgentProtocol interface
  • Implement HTTPAgentAdapter
  • Implement PythonAgentAdapter
  • Implement LangChainAgentAdapter
  • Create adapter factory function
  • Add retry logic for HTTP adapter

Phase 2: Mutation Engine (Week 2-3)

Ollama Integration

  • Create MutationEngine class
  • Implement Ollama client wrapper
  • Add connection verification
  • Support async mutation generation
  • Implement batch generation

Mutation Types & Templates

  • Define MutationType enum
  • Create Mutation dataclass
  • Write templates for PARAPHRASE
  • Write templates for NOISE
  • Write templates for TONE_SHIFT
  • Write templates for PROMPT_INJECTION
  • Add mutation validation logic
  • Support custom templates

Rust Performance Bindings

  • Set up PyO3 bindings
  • Implement robustness score calculation
  • Implement weighted score calculation
  • Implement Levenshtein distance
  • Implement parallel processing utilities
  • Build and test Rust module
  • Integrate with Python package

Phase 3: Runner & Assertions (Week 3-4)

Async Runner

  • Create EntropixRunner class
  • Implement orchestrator logic
  • Add concurrency control with semaphores
  • Implement progress tracking
  • Add setup verification

Invariant System

  • Create InvariantVerifier class
  • Implement ContainsChecker
  • Implement LatencyChecker
  • Implement ValidJsonChecker
  • Implement RegexChecker
  • Implement SimilarityChecker
  • Implement ExcludesPIIChecker
  • Implement RefusalChecker
  • Add checker registry

Phase 4: CLI & Reporting (Week 4-5)

CLI Commands

  • Set up Typer application
  • Implement flakestorm init command
  • Implement flakestorm run command
  • Implement flakestorm verify command
  • Implement flakestorm report command
  • Implement flakestorm score command
  • Add CI mode (--ci --min-score)
  • Add rich progress bars

Report Generation

  • Create report data models
  • Implement HTMLReportGenerator
  • Create interactive HTML template
  • Implement JSONReportGenerator
  • Implement TerminalReporter
  • Add score visualization
  • Add mutation matrix view

Phase 5: V2 Features (Week 5-7)

HuggingFace Integration

  • Create HuggingFaceModelProvider
  • Support GGUF model downloading
  • Add recommended models list
  • Integrate with Ollama model importing

Vector Similarity

  • Create LocalEmbedder class
  • Integrate sentence-transformers
  • Implement similarity calculation
  • Add lazy model loading

Testing & Quality

Unit Tests

  • Test configuration loading
  • Test mutation types
  • Test assertion checkers
  • Test agent adapters
  • Test orchestrator
  • Test report generation

Integration Tests

  • Test full run with mock agent
  • Test CLI commands
  • Test report generation

Documentation

  • Write README.md
  • Create IMPLEMENTATION_CHECKLIST.md
  • Create ARCHITECTURE_SUMMARY.md
  • Create API_SPECIFICATION.md
  • Create CONTRIBUTING.md
  • Create CONFIGURATION_GUIDE.md

Progress Summary

Phase Status Completion
CLI Phase 1: Foundation Complete 100%
CLI Phase 2: Mutation Engine Complete 100%
CLI Phase 3: Runner & Assertions Complete 100%
CLI Phase 4: CLI & Reporting Complete 100%
CLI Phase 5: V2 Features Complete 90%
Documentation Complete 100%

Next Steps

  1. Rust Build: Compile and integrate Rust performance module
  2. Integration Tests: Add full integration test suite
  3. PyPI Release: Prepare and publish to PyPI
  4. Community Launch: Publish to Hacker News and Reddit