flakestorm/docs/IMPLEMENTATION_CHECKLIST.md
Entropix ee10da0b97 Add comprehensive documentation for flakestorm
- Introduced multiple new documents including API Specification, Configuration Guide, Contributing Guide, Developer FAQ, Implementation Checklist, Module Documentation, Publishing Guide, Test Scenarios, Testing Guide, and Usage Guide.
- Each document provides detailed instructions, examples, and best practices for using and contributing to flakestorm.
- Enhanced overall project documentation to support users and developers in understanding and utilizing the framework effectively.
2025-12-29 11:33:01 +08:00

7.4 KiB

flakestorm Implementation Checklist

This document tracks the implementation progress of flakestorm - The Agent Reliability Engine.

CLI Version (Open Source - Apache 2.0)

Phase 1: Foundation (Week 1-2)

Project Scaffolding

  • Initialize Python project with pyproject.toml
  • Set up Rust workspace with Cargo.toml
  • Create Apache 2.0 LICENSE file
  • Write comprehensive README.md
  • Create flakestorm.yaml.example template
  • Set up project structure (src/flakestorm/*)
  • Configure pre-commit hooks (black, ruff, mypy)
  • Set up GitHub Actions for CI/CD

Configuration System

  • Define Pydantic models for configuration
  • Implement YAML loading/validation
  • Support environment variable expansion
  • Create configuration factory functions
  • Add configuration validation tests

Agent Protocol/Adapter

  • Define AgentProtocol interface
  • Implement HTTPAgentAdapter
  • Implement PythonAgentAdapter
  • Implement LangChainAgentAdapter
  • Create adapter factory function
  • Add retry logic for HTTP adapter

Phase 2: Mutation Engine (Week 2-3)

Ollama Integration

  • Create MutationEngine class
  • Implement Ollama client wrapper
  • Add connection verification
  • Support async mutation generation
  • Implement batch generation

Mutation Types & Templates

  • Define MutationType enum
  • Create Mutation dataclass
  • Write templates for PARAPHRASE
  • Write templates for NOISE
  • Write templates for TONE_SHIFT
  • Write templates for PROMPT_INJECTION
  • Add mutation validation logic
  • Support custom templates

Rust Performance Bindings

  • Set up PyO3 bindings
  • Implement robustness score calculation
  • Implement weighted score calculation
  • Implement Levenshtein distance
  • Implement parallel processing utilities
  • Build and test Rust module
  • Integrate with Python package

Phase 3: Runner & Assertions (Week 3-4)

Async Runner

  • Create EntropixRunner class
  • Implement orchestrator logic
  • Add concurrency control with semaphores
  • Implement progress tracking
  • Add setup verification

Invariant System

  • Create InvariantVerifier class
  • Implement ContainsChecker
  • Implement LatencyChecker
  • Implement ValidJsonChecker
  • Implement RegexChecker
  • Implement SimilarityChecker
  • Implement ExcludesPIIChecker
  • Implement RefusalChecker
  • Add checker registry

Phase 4: CLI & Reporting (Week 4-5)

CLI Commands

  • Set up Typer application
  • Implement flakestorm init command
  • Implement flakestorm run command
  • Implement flakestorm verify command
  • Implement flakestorm report command
  • Implement flakestorm score command
  • Add CI mode (--ci --min-score)
  • Add rich progress bars

Report Generation

  • Create report data models
  • Implement HTMLReportGenerator
  • Create interactive HTML template
  • Implement JSONReportGenerator
  • Implement TerminalReporter
  • Add score visualization
  • Add mutation matrix view

Phase 5: V2 Features (Week 5-7)

HuggingFace Integration

  • Create HuggingFaceModelProvider
  • Support GGUF model downloading
  • Add recommended models list
  • Integrate with Ollama model importing

Vector Similarity

  • Create LocalEmbedder class
  • Integrate sentence-transformers
  • Implement similarity calculation
  • Add lazy model loading

GitHub Actions Integration

  • Create action.yml template
  • Create workflow example
  • Document CI/CD integration
  • Publish to GitHub Marketplace

Testing & Quality

Unit Tests

  • Test configuration loading
  • Test mutation types
  • Test assertion checkers
  • Test agent adapters
  • Test orchestrator
  • Test report generation

Integration Tests

  • Test full run with mock agent
  • Test CLI commands
  • Test report generation

Documentation

  • Write README.md
  • Create IMPLEMENTATION_CHECKLIST.md
  • Create ARCHITECTURE_SUMMARY.md
  • Create API_SPECIFICATION.md
  • Create CONTRIBUTING.md
  • Create CONFIGURATION_GUIDE.md

Cloud Version (Commercial)

Cloud Phase 1: Infrastructure (Week 9-10)

Cloud Setup

  • Set up AWS/GCP project
  • Configure VPC and networking
  • Set up PostgreSQL database
  • Configure Redis for queue/cache
  • Set up S3/GCS for storage
  • Configure Docker/Kubernetes

Database Schema

  • Create users table
  • Create test_configs table
  • Create test_runs table
  • Create subscriptions table
  • Set up migrations (Alembic)

Authentication

  • Integrate Auth0/Clerk
  • Implement JWT validation
  • Create user management endpoints
  • Add RBAC for team tier

Cloud Phase 2: Backend (Week 10-12)

FastAPI Application

  • Set up FastAPI project structure
  • Implement auth middleware
  • Create test management endpoints
  • Create config management endpoints
  • Create report endpoints
  • Implement async job queue (Celery)

Gemini Integration

  • Create GeminiMutationService
  • Implement mutation generation
  • Add fallback to GPU models
  • Rate limiting and retry logic

Tier Limits

  • Implement free tier limits (5 lifetime runs)
  • Implement Pro tier limits (200/month)
  • Implement Team tier limits (1000/month)
  • Create usage tracking

Cloud Phase 3: Frontend (Week 12-14)

Next.js Setup

  • Initialize Next.js project
  • Configure Tailwind CSS
  • Set up authentication flow
  • Create layout components

Dashboard Pages

  • Dashboard home (overview)
  • Tests list and creation
  • Reports viewer
  • Billing management
  • Team management (Team tier)
  • Settings page

Marketing Pages

  • Landing page
  • Pricing page
  • Documentation
  • Blog (optional)

Cloud Phase 4: Billing (Week 14-15)

Stripe Integration

  • Set up Stripe products/prices
  • Implement subscription creation
  • Handle subscription updates
  • Implement webhook handlers
  • Create invoice history

Email Notifications

  • Set up SendGrid/Mailgun
  • Test failure alerts
  • Subscription notifications
  • Welcome emails

Cloud Phase 5: Testing & Launch (Week 15-16)

Testing

  • E2E tests with Cypress/Playwright
  • Load testing
  • Security audit
  • Performance optimization

Deployment

  • Set up CI/CD pipeline
  • Configure production environment
  • Set up monitoring (Sentry, etc.)
  • Launch to production

Progress Summary

Phase Status Completion
CLI Phase 1: Foundation Complete 100%
CLI Phase 2: Mutation Engine Complete 100%
CLI Phase 3: Runner & Assertions Complete 100%
CLI Phase 4: CLI & Reporting Complete 100%
CLI Phase 5: V2 Features Complete 90%
Documentation Complete 100%
Cloud Phase 1: Infrastructure Pending 0%
Cloud Phase 2: Backend Pending 0%
Cloud Phase 3: Frontend Pending 0%
Cloud Phase 4: Billing Pending 0%

Next Steps

  1. Rust Build: Compile and integrate Rust performance module
  2. Integration Tests: Add full integration test suite
  3. PyPI Release: Prepare and publish to PyPI
  4. Cloud Infrastructure: Begin AWS/GCP setup
  5. Community Launch: Publish to Hacker News and Reddit