# Testing Guide This guide explains how to run, write, and expand tests for flakestorm. It covers the remaining testing items from the implementation checklist. --- ## Table of Contents 1. [Running Tests](#running-tests) 2. [Test Structure](#test-structure) 3. [V2 Integration Tests](#v2-integration-tests) 4. [Writing Tests: Agent Adapters](#writing-tests-agent-adapters) 5. [Writing Tests: Orchestrator](#writing-tests-orchestrator) 6. [Writing Tests: Report Generation](#writing-tests-report-generation) 7. [Integration Tests](#integration-tests) 8. [CLI Tests](#cli-tests) 9. [Test Fixtures](#test-fixtures) --- ## Running Tests ### Prerequisites ```bash # Install dev dependencies # Create virtual environment first python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Then install pip install -e ".[dev]" # Or manually pip install pytest pytest-asyncio pytest-cov ``` ### Running All Tests ```bash # Full test suite pytest # With coverage report pytest --cov=src/flakestorm --cov-report=html # Verbose output pytest -v # Run specific test file pytest tests/test_config.py # Run specific test class pytest tests/test_assertions.py::TestContainsChecker # Run specific test pytest tests/test_assertions.py::TestContainsChecker::test_contains_match ``` ### Test Categories ```bash # Unit tests only (fast) pytest tests/test_config.py tests/test_mutations.py tests/test_assertions.py # Performance tests (requires Rust module) pytest tests/test_performance.py # Integration tests (requires Ollama) pytest tests/test_integration.py # V2 integration tests (chaos, contract, replay) pytest tests/test_chaos_integration.py tests/test_contract_integration.py tests/test_replay_integration.py ``` --- ## Test Structure ``` tests/ ├── __init__.py ├── conftest.py # Shared fixtures ├── test_config.py # Configuration loading tests ├── test_mutations.py # Mutation engine tests ├── test_assertions.py # Assertion checkers tests ├── test_performance.py # Rust/Python bridge tests ├── test_adapters.py # Agent adapter tests ├── test_orchestrator.py # Orchestrator tests ├── test_reports.py # Report generation tests ├── test_cli.py # CLI command tests ├── test_integration.py # Full integration tests ├── test_chaos_integration.py # V2: chaos (tool/LLM faults, interceptor) ├── test_contract_integration.py # V2: contract (N×M matrix, score, critical fail) └── test_replay_integration.py # V2: replay (session → replay → pass/fail) ``` --- ## V2 Integration Tests V2 adds three integration test modules; all gaps are closed (see [GAP_VERIFICATION](GAP_VERIFICATION.md)). | Module | What it tests | |--------|----------------| | `test_chaos_integration.py` | Chaos interceptor, tool faults (match_url/tool *), LLM faults (truncated, empty, garbage, rate_limit, response_drift). | | `test_contract_integration.py` | Contract engine: invariants × chaos matrix, reset between cells, resilience score (severity-weighted), critical failure → FAIL. | | `test_replay_integration.py` | Replay loader (file/format), ReplayRunner verification against contract, contract resolution by name/path. | For CI pipelines that use V2, run the full suite including these; `flakestorm ci` runs mutation, contract (if configured), chaos-only (if configured), and replay (if configured), then computes the overall weighted score from `scoring.weights`. --- ## Writing Tests: Agent Adapters ### Location: `tests/test_adapters.py` ### What to Test 1. **HTTPAgentAdapter** - Sends correct HTTP request format - Handles successful responses - Handles error responses (4xx, 5xx) - Respects timeout settings - Retries on transient failures - Extracts response using JSONPath 2. **PythonAgentAdapter** - Imports module correctly - Calls sync and async functions - Handles exceptions gracefully - Measures latency correctly 3. **LangChainAgentAdapter** - Invokes LangChain agents correctly - Handles different chain types ### Example Test File ```python # tests/test_adapters.py """Tests for agent adapters.""" import pytest from unittest.mock import AsyncMock, MagicMock, patch import asyncio # Import the modules to test from flakestorm.core.protocol import ( HTTPAgentAdapter, PythonAgentAdapter, AgentResponse, ) from flakestorm.core.config import AgentConfig, AgentType class TestHTTPAgentAdapter: """Tests for HTTP agent adapter.""" @pytest.fixture def http_config(self): """Create a test HTTP agent config.""" return AgentConfig( endpoint="http://localhost:8000/chat", type=AgentType.HTTP, timeout=30, request_template='{"message": "{prompt}"}', response_path="$.reply", ) @pytest.fixture def adapter(self, http_config): """Create adapter instance.""" return HTTPAgentAdapter(http_config) @pytest.mark.asyncio async def test_invoke_success(self, adapter): """Test successful invocation.""" with patch("httpx.AsyncClient.post") as mock_post: mock_response = MagicMock() mock_response.status_code = 200 mock_response.json.return_value = {"reply": "Hello there!"} mock_post.return_value = mock_response result = await adapter.invoke("Hello") assert isinstance(result, AgentResponse) assert result.text == "Hello there!" assert result.latency_ms > 0 @pytest.mark.asyncio async def test_invoke_formats_request(self, adapter): """Test that request template is formatted correctly.""" with patch("httpx.AsyncClient.post") as mock_post: mock_response = MagicMock() mock_response.status_code = 200 mock_response.json.return_value = {"reply": "OK"} mock_post.return_value = mock_response await adapter.invoke("Test prompt") # Verify the request body call_args = mock_post.call_args assert '"message": "Test prompt"' in str(call_args) @pytest.mark.asyncio async def test_invoke_timeout(self, adapter): """Test timeout handling.""" with patch("httpx.AsyncClient.post") as mock_post: mock_post.side_effect = asyncio.TimeoutError() with pytest.raises(TimeoutError): await adapter.invoke("Hello") @pytest.mark.asyncio async def test_invoke_http_error(self, adapter): """Test HTTP error handling.""" with patch("httpx.AsyncClient.post") as mock_post: mock_response = MagicMock() mock_response.status_code = 500 mock_response.text = "Internal Server Error" mock_post.return_value = mock_response with pytest.raises(Exception): await adapter.invoke("Hello") class TestPythonAgentAdapter: """Tests for Python function adapter.""" @pytest.fixture def python_config(self): """Create a test Python agent config.""" return AgentConfig( endpoint="tests.fixtures.mock_agent:handle_message", type=AgentType.PYTHON, timeout=30, ) @pytest.mark.asyncio async def test_invoke_sync_function(self): """Test invoking a sync function.""" # Create a mock module with a sync function def mock_handler(prompt: str) -> str: return f"Echo: {prompt}" with patch.dict("sys.modules", {"mock_module": MagicMock(handler=mock_handler)}): config = AgentConfig( endpoint="mock_module:handler", type=AgentType.PYTHON, ) adapter = PythonAgentAdapter(config) # This would need the actual implementation to work # For now, test the structure @pytest.mark.asyncio async def test_invoke_async_function(self): """Test invoking an async function.""" async def mock_handler(prompt: str) -> str: await asyncio.sleep(0.01) return f"Async Echo: {prompt}" # Similar test structure class TestAgentAdapterFactory: """Tests for adapter factory function.""" def test_creates_http_adapter(self): """Factory creates HTTP adapter for HTTP type.""" from flakestorm.core.protocol import create_agent_adapter config = AgentConfig( endpoint="http://localhost:8000/chat", type=AgentType.HTTP, ) adapter = create_agent_adapter(config) assert isinstance(adapter, HTTPAgentAdapter) def test_creates_python_adapter(self): """Factory creates Python adapter for Python type.""" from flakestorm.core.protocol import create_agent_adapter config = AgentConfig( endpoint="my_module:my_function", type=AgentType.PYTHON, ) adapter = create_agent_adapter(config) assert isinstance(adapter, PythonAgentAdapter) ``` ### How to Run ```bash # Run adapter tests pytest tests/test_adapters.py -v # Run with coverage pytest tests/test_adapters.py --cov=src/flakestorm/core/protocol ``` --- ## Writing Tests: Orchestrator ### Location: `tests/test_orchestrator.py` ### What to Test 1. **Mutation Generation Phase** - Generates correct number of mutations - Handles all mutation types - Handles LLM failures gracefully 2. **Test Execution Phase** - Runs mutations in parallel - Respects concurrency limits - Handles agent failures - Measures latency correctly 3. **Result Aggregation** - Calculates statistics correctly - Scores results with correct weights - Groups results by mutation type ### Example Test File ```python # tests/test_orchestrator.py """Tests for the flakestorm orchestrator.""" import pytest from unittest.mock import AsyncMock, MagicMock, patch from datetime import datetime from flakestorm.core.orchestrator import Orchestrator, OrchestratorState from flakestorm.core.config import FlakeStormConfig, AgentConfig, MutationConfig from flakestorm.mutations.types import Mutation, MutationType from flakestorm.assertions.verifier import CheckResult class TestOrchestratorState: """Tests for orchestrator state tracking.""" def test_initial_state(self): """State initializes correctly.""" state = OrchestratorState() assert state.total_mutations == 0 assert state.completed_mutations == 0 assert state.completed_at is None def test_state_updates(self): """State updates as tests run.""" state = OrchestratorState() state.total_mutations = 10 state.completed_mutations = 5 assert state.completed_mutations == 5 class TestOrchestrator: """Tests for main orchestrator.""" @pytest.fixture def mock_config(self): """Create a minimal test config.""" return FlakeStormConfig( agent=AgentConfig( endpoint="http://localhost:8000/chat", type="http", ), golden_prompts=["Test prompt 1", "Test prompt 2"], mutations=MutationConfig( count=5, types=[MutationType.PARAPHRASE], ), ) @pytest.fixture def mock_agent(self): """Create a mock agent adapter.""" agent = AsyncMock() agent.invoke.return_value = MagicMock( text="Agent response", latency_ms=100.0, ) return agent @pytest.fixture def mock_mutation_engine(self): """Create a mock mutation engine.""" engine = AsyncMock() engine.generate_mutations.return_value = [ Mutation( original="Test", mutated="Test variation", type=MutationType.PARAPHRASE, difficulty=1.0, ) ] return engine @pytest.fixture def mock_verifier(self): """Create a mock verifier.""" verifier = MagicMock() verifier.verify.return_value = [ CheckResult(passed=True, check_type="contains", details="OK") ] return verifier @pytest.mark.asyncio async def test_run_generates_mutations( self, mock_config, mock_agent, mock_mutation_engine, mock_verifier ): """Orchestrator generates mutations for all golden prompts.""" orchestrator = Orchestrator( config=mock_config, agent=mock_agent, mutation_engine=mock_mutation_engine, verifier=mock_verifier, ) await orchestrator.run() # Should have called generate_mutations for each golden prompt assert mock_mutation_engine.generate_mutations.call_count == 2 @pytest.mark.asyncio async def test_run_invokes_agent( self, mock_config, mock_agent, mock_mutation_engine, mock_verifier ): """Orchestrator invokes agent for each mutation.""" orchestrator = Orchestrator( config=mock_config, agent=mock_agent, mutation_engine=mock_mutation_engine, verifier=mock_verifier, ) await orchestrator.run() # Should have invoked agent for each mutation # 2 golden prompts × 1 mutation each = 2 invocations assert mock_agent.invoke.call_count >= 2 @pytest.mark.asyncio async def test_run_returns_results( self, mock_config, mock_agent, mock_mutation_engine, mock_verifier ): """Orchestrator returns complete test results.""" orchestrator = Orchestrator( config=mock_config, agent=mock_agent, mutation_engine=mock_mutation_engine, verifier=mock_verifier, ) results = await orchestrator.run() assert results is not None assert hasattr(results, "statistics") assert hasattr(results, "mutations") @pytest.mark.asyncio async def test_handles_agent_failure( self, mock_config, mock_mutation_engine, mock_verifier ): """Orchestrator handles agent failures gracefully.""" failing_agent = AsyncMock() failing_agent.invoke.side_effect = Exception("Agent error") orchestrator = Orchestrator( config=mock_config, agent=failing_agent, mutation_engine=mock_mutation_engine, verifier=mock_verifier, ) # Should not raise, should mark test as failed results = await orchestrator.run() assert results is not None ``` --- ## Writing Tests: Report Generation ### Location: `tests/test_reports.py` ### What to Test 1. **HTMLReportGenerator** - Generates valid HTML - Contains all required sections - Includes statistics - Includes mutation details 2. **JSONReportGenerator** - Generates valid JSON - Contains all required fields - Serializes datetime correctly 3. **TerminalReporter** - Formats output correctly - Handles different result types ### Example Test File ```python # tests/test_reports.py """Tests for report generation.""" import pytest import json from datetime import datetime from pathlib import Path import tempfile from flakestorm.reports.models import TestResults, TestStatistics, MutationResult from flakestorm.reports.html import HTMLReportGenerator from flakestorm.reports.json_export import JSONReportGenerator class TestHTMLReportGenerator: """Tests for HTML report generation.""" @pytest.fixture def sample_results(self): """Create sample test results.""" return TestResults( config=None, # Simplified for testing mutations=[ MutationResult( mutation=None, response="Test response", latency_ms=100.0, passed=True, checks=[], ) ], statistics=TestStatistics( total_mutations=10, passed_mutations=8, failed_mutations=2, robustness_score=0.8, avg_latency_ms=150.0, p50_latency_ms=120.0, p95_latency_ms=300.0, p99_latency_ms=450.0, by_type=[], ), timestamp=datetime.now(), ) def test_generate_returns_string(self, sample_results): """Generator returns HTML string.""" generator = HTMLReportGenerator(sample_results) html = generator.generate() assert isinstance(html, str) assert len(html) > 0 def test_generate_valid_html(self, sample_results): """Generated HTML is valid.""" generator = HTMLReportGenerator(sample_results) html = generator.generate() assert "" in html assert "
" in html assert "" in html def test_contains_robustness_score(self, sample_results): """Report contains robustness score.""" generator = HTMLReportGenerator(sample_results) html = generator.generate() assert "0.8" in html or "80%" in html def test_save_creates_file(self, sample_results): """save() creates file on disk.""" with tempfile.TemporaryDirectory() as tmpdir: generator = HTMLReportGenerator(sample_results) path = generator.save(Path(tmpdir) / "report.html") assert path.exists() assert path.read_text().startswith("") class TestJSONReportGenerator: """Tests for JSON report generation.""" @pytest.fixture def sample_results(self): """Create sample test results.""" return TestResults( config=None, mutations=[], statistics=TestStatistics( total_mutations=10, passed_mutations=8, failed_mutations=2, robustness_score=0.8, avg_latency_ms=150.0, p50_latency_ms=120.0, p95_latency_ms=300.0, p99_latency_ms=450.0, by_type=[], ), timestamp=datetime(2024, 1, 15, 12, 0, 0), ) def test_generate_valid_json(self, sample_results): """Generator produces valid JSON.""" generator = JSONReportGenerator(sample_results) json_str = generator.generate() # Should not raise data = json.loads(json_str) assert isinstance(data, dict) def test_contains_statistics(self, sample_results): """JSON contains statistics.""" generator = JSONReportGenerator(sample_results) data = json.loads(generator.generate()) assert "statistics" in data assert data["statistics"]["robustness_score"] == 0.8 ``` --- ## Integration Tests ### Location: `tests/test_integration.py` ### Prerequisites Integration tests require: 1. Ollama running locally 2. A model pulled (e.g., `ollama pull qwen2.5-coder:7b`) 3. A mock agent running ### Example Test File ```python # tests/test_integration.py """Integration tests for full flakestorm workflow.""" import pytest import asyncio from pathlib import Path import tempfile # Skip all tests if Ollama is not running pytest_plugins = ["pytest_asyncio"] def ollama_available(): """Check if Ollama is running.""" from flakestorm.integrations.huggingface import HuggingFaceModelProvider return HuggingFaceModelProvider.verify_ollama_connection() @pytest.mark.skipif(not ollama_available(), reason="Ollama not running") class TestFullWorkflow: """Integration tests for complete test runs.""" @pytest.mark.asyncio async def test_full_run_with_mock_agent(self): """Test complete workflow with mock agent.""" # This test would: # 1. Start a mock agent # 2. Create config # 3. Run flakestorm # 4. Verify results pass @pytest.mark.asyncio async def test_mutation_generation(self): """Test that mutation engine generates valid mutations.""" from flakestorm.mutations.engine import MutationEngine from flakestorm.core.config import LLMConfig config = LLMConfig( model="qwen2.5-coder:7b", host="http://localhost:11434", ) engine = MutationEngine(config) mutations = await engine.generate_mutations( prompt="Hello, world!", types=[MutationType.PARAPHRASE], count=3, ) assert len(mutations) > 0 assert all(m.mutated != "Hello, world!" for m in mutations) ``` --- ## CLI Tests ### Location: `tests/test_cli.py` ### How to Test CLI Commands Use the `CliRunner` from Typer for testing: ```python # tests/test_cli.py """Tests for CLI commands.""" import pytest from typer.testing import CliRunner import tempfile from pathlib import Path from flakestorm.cli.main import app runner = CliRunner() class TestInitCommand: """Tests for `flakestorm init`.""" def test_init_creates_config(self): """init creates flakestorm.yaml.""" with tempfile.TemporaryDirectory() as tmpdir: result = runner.invoke( app, ["init", "--dir", tmpdir] ) assert result.exit_code == 0 assert (Path(tmpdir) / "flakestorm.yaml").exists() def test_init_no_overwrite(self): """init doesn't overwrite existing config.""" with tempfile.TemporaryDirectory() as tmpdir: config_path = Path(tmpdir) / "flakestorm.yaml" config_path.write_text("existing: content") result = runner.invoke( app, ["init", "--dir", tmpdir] ) # Should warn about existing file assert "exists" in result.output.lower() or result.exit_code != 0 class TestVerifyCommand: """Tests for `flakestorm verify`.""" def test_verify_valid_config(self): """verify accepts valid config.""" with tempfile.TemporaryDirectory() as tmpdir: config_path = Path(tmpdir) / "flakestorm.yaml" config_path.write_text(""" agent: endpoint: "http://localhost:8000/chat" type: http golden_prompts: - "Test prompt" """) result = runner.invoke( app, ["verify", "--config", str(config_path)] ) assert result.exit_code == 0 def test_verify_invalid_config(self): """verify rejects invalid config.""" with tempfile.TemporaryDirectory() as tmpdir: config_path = Path(tmpdir) / "flakestorm.yaml" config_path.write_text("invalid: yaml: content:") result = runner.invoke( app, ["verify", "--config", str(config_path)] ) assert result.exit_code != 0 class TestHelpCommand: """Tests for help output.""" def test_main_help(self): """Main help displays commands.""" result = runner.invoke(app, ["--help"]) assert result.exit_code == 0 assert "run" in result.output assert "init" in result.output def test_run_help(self): """Run command help displays options.""" result = runner.invoke(app, ["run", "--help"]) assert result.exit_code == 0 assert "--config" in result.output assert "--output" in result.output ``` --- ## Test Fixtures ### Shared Fixtures in `conftest.py` ```python # tests/conftest.py """Shared test fixtures.""" import pytest from pathlib import Path import tempfile @pytest.fixture def temp_dir(): """Create a temporary directory.""" with tempfile.TemporaryDirectory() as tmpdir: yield Path(tmpdir) @pytest.fixture def sample_config_yaml(): """Sample valid config YAML.""" return """ agent: endpoint: "http://localhost:8000/chat" type: http timeout: 30 golden_prompts: - "Test prompt 1" - "Test prompt 2" mutations: count: 5 types: - paraphrase - noise invariants: - type: latency max_ms: 5000 """ @pytest.fixture def config_file(temp_dir, sample_config_yaml): """Create a config file in temp directory.""" config_path = temp_dir / "flakestorm.yaml" config_path.write_text(sample_config_yaml) return config_path ``` --- ## Summary: Remaining Test Items | Checklist Item | Test File | Status | |----------------|-----------|--------| | Test agent adapters | `tests/test_adapters.py` | Template provided above | | Test orchestrator | `tests/test_orchestrator.py` | Template provided above | | Test report generation | `tests/test_reports.py` | Template provided above | | Test CLI commands | `tests/test_cli.py` | Template provided above | | Full integration test | `tests/test_integration.py` | Template provided above | ### Quick Start 1. Copy the templates above to create test files 2. Run: `pytest tests/test_