mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-26 09:16:25 +02:00
Fix .gitignore to allow docs files and add documentation files
- Fix .gitignore pattern: un-ignore docs/ directory first, then ignore docs/*, then un-ignore specific files - Add all documentation files referenced in README.md: - USAGE_GUIDE.md - CONFIGURATION_GUIDE.md - TEST_SCENARIOS.md - MODULES.md - DEVELOPER_FAQ.md - PUBLISHING.md - CONTRIBUTING.md - API_SPECIFICATION.md - TESTING_GUIDE.md - IMPLEMENTATION_CHECKLIST.md - Pre-commit hooks fixed trailing whitespace and end-of-file formatting
This commit is contained in:
parent
4dd882a2d2
commit
69e0f8deeb
11 changed files with 5936 additions and 2 deletions
679
docs/DEVELOPER_FAQ.md
Normal file
679
docs/DEVELOPER_FAQ.md
Normal file
|
|
@ -0,0 +1,679 @@
|
|||
# flakestorm Developer FAQ
|
||||
|
||||
This document answers common questions developers might have about the flakestorm codebase. It's designed to help project maintainers explain design decisions and help contributors understand the codebase.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Architecture Questions](#architecture-questions)
|
||||
2. [Configuration System](#configuration-system)
|
||||
3. [Mutation Engine](#mutation-engine)
|
||||
4. [Assertion System](#assertion-system)
|
||||
5. [Performance & Rust](#performance--rust)
|
||||
6. [Agent Adapters](#agent-adapters)
|
||||
7. [Testing & Quality](#testing--quality)
|
||||
8. [Extending flakestorm](#extending-flakestorm)
|
||||
9. [Common Issues](#common-issues)
|
||||
|
||||
---
|
||||
|
||||
## Architecture Questions
|
||||
|
||||
### Q: Why is the codebase split into core, mutations, assertions, and reports?
|
||||
|
||||
**A:** This follows the **Single Responsibility Principle (SRP)** and makes the codebase maintainable:
|
||||
|
||||
| Module | Responsibility |
|
||||
|--------|---------------|
|
||||
| `core/` | Orchestration, configuration, agent communication |
|
||||
| `mutations/` | Adversarial input generation |
|
||||
| `assertions/` | Response validation |
|
||||
| `reports/` | Output formatting |
|
||||
|
||||
This separation means:
|
||||
- Changes to mutation logic don't affect assertions
|
||||
- New report formats can be added without touching core logic
|
||||
- Each module can be tested independently
|
||||
|
||||
---
|
||||
|
||||
### Q: Why use async/await throughout the codebase?
|
||||
|
||||
**A:** Agent testing is **I/O-bound**, not CPU-bound. The bottleneck is waiting for:
|
||||
1. LLM responses (mutation generation)
|
||||
2. Agent responses (test execution)
|
||||
|
||||
Async allows running many operations concurrently:
|
||||
|
||||
```python
|
||||
# Without async: 100 tests × 500ms = 50 seconds
|
||||
# With async (10 concurrent): 100 tests / 10 × 500ms = 5 seconds
|
||||
```
|
||||
|
||||
The semaphore in `orchestrator.py` controls concurrency:
|
||||
|
||||
```python
|
||||
semaphore = asyncio.Semaphore(self.config.advanced.concurrency)
|
||||
|
||||
async def _run_single_mutation(self, mutation):
|
||||
async with semaphore: # Limits concurrent executions
|
||||
return await self.agent.invoke(mutation.mutated)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Q: Why is there both an `orchestrator.py` and a `runner.py`?
|
||||
|
||||
**A:** They serve different purposes:
|
||||
|
||||
- **`runner.py`**: High-level API for users - simple `EntropixRunner.run()` interface
|
||||
- **`orchestrator.py`**: Internal coordination logic - handles the complex flow
|
||||
|
||||
This separation allows:
|
||||
- `runner.py` to provide a clean facade
|
||||
- `orchestrator.py` to be refactored without breaking the public API
|
||||
- Different entry points (CLI, programmatic) to use the same core logic
|
||||
|
||||
---
|
||||
|
||||
## Configuration System
|
||||
|
||||
### Q: Why Pydantic instead of dataclasses or attrs?
|
||||
|
||||
**A:** Pydantic was chosen for several reasons:
|
||||
|
||||
1. **Automatic Validation**: Built-in validators with clear error messages
|
||||
```python
|
||||
class MutationConfig(BaseModel):
|
||||
count: int = Field(ge=1, le=100) # Validates range automatically
|
||||
```
|
||||
|
||||
2. **Environment Variable Support**: Native expansion
|
||||
```python
|
||||
endpoint: str = Field(default="${AGENT_URL}")
|
||||
```
|
||||
|
||||
3. **YAML/JSON Serialization**: Works out of the box
|
||||
4. **IDE Support**: Type hints provide autocomplete
|
||||
|
||||
---
|
||||
|
||||
### Q: Why use environment variable expansion in config?
|
||||
|
||||
**A:** Security best practice - secrets should never be in config files:
|
||||
|
||||
```yaml
|
||||
# BAD: Secret in file (gets committed to git)
|
||||
headers:
|
||||
Authorization: "Bearer sk-1234567890"
|
||||
|
||||
# GOOD: Reference environment variable
|
||||
headers:
|
||||
Authorization: "Bearer ${API_KEY}"
|
||||
```
|
||||
|
||||
Implementation in `config.py`:
|
||||
|
||||
```python
|
||||
def expand_env_vars(value: str) -> str:
|
||||
"""Replace ${VAR} with environment variable value."""
|
||||
pattern = r'\$\{([^}]+)\}'
|
||||
def replacer(match):
|
||||
var_name = match.group(1)
|
||||
return os.environ.get(var_name, match.group(0))
|
||||
return re.sub(pattern, replacer, value)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Q: Why is MutationType defined as `str, Enum`?
|
||||
|
||||
**A:** String enums serialize directly to YAML/JSON:
|
||||
|
||||
```python
|
||||
class MutationType(str, Enum):
|
||||
PARAPHRASE = "paraphrase"
|
||||
```
|
||||
|
||||
This allows:
|
||||
```yaml
|
||||
# In config file - uses string value directly
|
||||
mutations:
|
||||
types:
|
||||
- paraphrase # Works!
|
||||
- noise
|
||||
```
|
||||
|
||||
If we used a regular Enum, we'd need custom serialization logic.
|
||||
|
||||
---
|
||||
|
||||
## Mutation Engine
|
||||
|
||||
### Q: Why use a local LLM (Ollama) instead of cloud APIs?
|
||||
|
||||
**A:** Several important reasons:
|
||||
|
||||
| Factor | Local LLM | Cloud API |
|
||||
|--------|-----------|-----------|
|
||||
| **Cost** | Free | $0.01-0.10 per mutation |
|
||||
| **Privacy** | Data stays local | Prompts sent to third party |
|
||||
| **Rate Limits** | None | Often restrictive |
|
||||
| **Latency** | Low | Network dependent |
|
||||
| **Offline** | Works | Requires internet |
|
||||
|
||||
For a test run with 100 prompts × 20 mutations = 2000 API calls, cloud costs would add up quickly.
|
||||
|
||||
---
|
||||
|
||||
### Q: Why Qwen Coder 3 8B as the default model?
|
||||
|
||||
**A:** We evaluated several models:
|
||||
|
||||
| Model | Mutation Quality | Speed | Memory |
|
||||
|-------|-----------------|-------|--------|
|
||||
| Qwen Coder 3 8B | ⭐⭐⭐⭐ | ⭐⭐⭐ | 8GB |
|
||||
| Llama 3 8B | ⭐⭐⭐ | ⭐⭐⭐ | 8GB |
|
||||
| Mistral 7B | ⭐⭐⭐ | ⭐⭐⭐⭐ | 6GB |
|
||||
| Phi-3 Mini | ⭐⭐ | ⭐⭐⭐⭐⭐ | 4GB |
|
||||
|
||||
Qwen Coder 3 was chosen because:
|
||||
1. Excellent at understanding and modifying prompts
|
||||
2. Good balance of quality vs. speed
|
||||
3. Runs on consumer hardware (8GB VRAM)
|
||||
|
||||
---
|
||||
|
||||
### Q: How does the mutation template system work?
|
||||
|
||||
**A:** Templates are stored in `templates.py` and formatted with the original prompt:
|
||||
|
||||
```python
|
||||
TEMPLATES = {
|
||||
MutationType.PARAPHRASE: """
|
||||
Rewrite this prompt with different words but same meaning.
|
||||
|
||||
Original: {prompt}
|
||||
|
||||
Rewritten:
|
||||
""",
|
||||
MutationType.NOISE: """
|
||||
Add 2-3 realistic typos to this prompt:
|
||||
|
||||
Original: {prompt}
|
||||
|
||||
With typos:
|
||||
"""
|
||||
}
|
||||
```
|
||||
|
||||
The engine fills in `{prompt}` and sends to the LLM:
|
||||
|
||||
```python
|
||||
template = TEMPLATES[mutation_type]
|
||||
filled = template.format(prompt=original_prompt)
|
||||
response = await self.client.generate(model=self.model, prompt=filled)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Q: What if the LLM returns malformed mutations?
|
||||
|
||||
**A:** We have several safeguards:
|
||||
|
||||
1. **Parsing Logic**: Extracts text between known markers
|
||||
2. **Validation**: Checks mutation isn't identical to original
|
||||
3. **Retry Logic**: Regenerates if parsing fails
|
||||
4. **Fallback**: Uses simple string manipulation if LLM fails
|
||||
|
||||
```python
|
||||
def _parse_mutation(self, response: str) -> str:
|
||||
# Try to extract the mutated text
|
||||
lines = response.strip().split('\n')
|
||||
for line in lines:
|
||||
if line and not line.startswith('#'):
|
||||
return line.strip()
|
||||
raise MutationParseError("Could not extract mutation")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Assertion System
|
||||
|
||||
### Q: Why separate deterministic and semantic assertions?
|
||||
|
||||
**A:** They have fundamentally different characteristics:
|
||||
|
||||
| Aspect | Deterministic | Semantic |
|
||||
|--------|---------------|----------|
|
||||
| **Speed** | Nanoseconds | Milliseconds |
|
||||
| **Dependencies** | None | sentence-transformers |
|
||||
| **Reproducibility** | 100% | May vary slightly |
|
||||
| **Use Case** | Exact matching | Meaning matching |
|
||||
|
||||
Separating them allows:
|
||||
- Running deterministic checks first (fast-fail)
|
||||
- Making semantic checks optional (lighter installation)
|
||||
|
||||
---
|
||||
|
||||
### Q: How does the SimilarityChecker work internally?
|
||||
|
||||
**A:** It uses sentence embeddings and cosine similarity:
|
||||
|
||||
```python
|
||||
class SimilarityChecker:
|
||||
def check(self, response: str, latency_ms: float) -> CheckResult:
|
||||
# 1. Embed both texts to vectors
|
||||
response_vec = self.embedder.embed(response) # [0.1, 0.2, ...]
|
||||
expected_vec = self.embedder.embed(self.expected) # [0.15, 0.18, ...]
|
||||
|
||||
# 2. Calculate cosine similarity
|
||||
similarity = cosine_similarity(response_vec, expected_vec)
|
||||
# Returns value between -1 and 1 (typically 0-1 for text)
|
||||
|
||||
# 3. Compare to threshold
|
||||
return CheckResult(passed=similarity >= self.threshold)
|
||||
```
|
||||
|
||||
The embedding model (`all-MiniLM-L6-v2`) converts text to 384-dimensional vectors that capture semantic meaning.
|
||||
|
||||
---
|
||||
|
||||
### Q: Why is the embedder a class variable with lazy loading?
|
||||
|
||||
**A:** The embedding model is large (23MB) and takes 1-2 seconds to load:
|
||||
|
||||
```python
|
||||
class SimilarityChecker:
|
||||
_embedder: LocalEmbedder | None = None # Class variable, shared
|
||||
|
||||
@property
|
||||
def embedder(self) -> LocalEmbedder:
|
||||
if SimilarityChecker._embedder is None:
|
||||
SimilarityChecker._embedder = LocalEmbedder() # Load once
|
||||
return SimilarityChecker._embedder
|
||||
```
|
||||
|
||||
Benefits:
|
||||
1. **Lazy Loading**: Only loads if semantic checks are used
|
||||
2. **Shared Instance**: All SimilarityCheckers share one model
|
||||
3. **Memory Efficient**: One copy in memory, not one per checker
|
||||
|
||||
---
|
||||
|
||||
### Q: How does PII detection work?
|
||||
|
||||
**A:** Uses regex patterns for common PII formats:
|
||||
|
||||
```python
|
||||
PII_PATTERNS = [
|
||||
(r'\b\d{3}-\d{2}-\d{4}\b', 'SSN'), # 123-45-6789
|
||||
(r'\b\d{16}\b', 'Credit Card'), # 1234567890123456
|
||||
(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b', 'Email'),
|
||||
(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', 'Phone'), # 123-456-7890
|
||||
]
|
||||
|
||||
def check(self, response: str, latency_ms: float) -> CheckResult:
|
||||
for pattern, pii_type in self.PII_PATTERNS:
|
||||
if re.search(pattern, response, re.IGNORECASE):
|
||||
return CheckResult(
|
||||
passed=False,
|
||||
details=f"Found potential {pii_type}"
|
||||
)
|
||||
return CheckResult(passed=True)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance & Rust
|
||||
|
||||
### Q: Why Rust for performance-critical code?
|
||||
|
||||
**A:** Python is slow for CPU-bound operations. Benchmarks show:
|
||||
|
||||
```
|
||||
Levenshtein Distance (5000 iterations):
|
||||
Python: 5864ms
|
||||
Rust: 67ms
|
||||
Speedup: 88x
|
||||
```
|
||||
|
||||
Rust was chosen over alternatives because:
|
||||
- **vs C/C++**: Memory safety, easier to write correct code
|
||||
- **vs Cython**: Better tooling (cargo), cleaner code
|
||||
- **vs NumPy**: Works on strings, not just numbers
|
||||
|
||||
---
|
||||
|
||||
### Q: How does the Rust/Python bridge work?
|
||||
|
||||
**A:** Uses PyO3 for bindings:
|
||||
|
||||
```rust
|
||||
// Rust side (lib.rs)
|
||||
#[pyfunction]
|
||||
fn levenshtein_distance(s1: &str, s2: &str) -> usize {
|
||||
// Rust implementation
|
||||
}
|
||||
|
||||
#[pymodule]
|
||||
fn entropix_rust(m: &PyModule) -> PyResult<()> {
|
||||
m.add_function(wrap_pyfunction!(levenshtein_distance, m)?)?;
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
```python
|
||||
# Python side (performance.py)
|
||||
try:
|
||||
import flakestorm_rust
|
||||
_RUST_AVAILABLE = True
|
||||
except ImportError:
|
||||
_RUST_AVAILABLE = False
|
||||
|
||||
def levenshtein_distance(s1: str, s2: str) -> int:
|
||||
if _RUST_AVAILABLE:
|
||||
return entropix_rust.levenshtein_distance(s1, s2)
|
||||
# Pure Python fallback
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Q: Why provide pure Python fallbacks?
|
||||
|
||||
**A:** Accessibility and reliability:
|
||||
|
||||
1. **Easy Installation**: `pip install flakestorm` works without Rust toolchain
|
||||
2. **Platform Support**: Works on any Python platform
|
||||
3. **Development**: Faster iteration without recompiling Rust
|
||||
4. **Testing**: Can test both implementations for parity
|
||||
|
||||
The tradeoff is speed, but most time is spent waiting for LLM/agent responses anyway.
|
||||
|
||||
---
|
||||
|
||||
## Agent Adapters
|
||||
|
||||
### Q: Why use the Protocol pattern for agents?
|
||||
|
||||
**A:** Enables type-safe duck typing:
|
||||
|
||||
```python
|
||||
class AgentProtocol(Protocol):
|
||||
async def invoke(self, prompt: str) -> AgentResponse: ...
|
||||
```
|
||||
|
||||
Any class with a matching `invoke` method works, even if it doesn't inherit from a base class. This is more Pythonic than Java-style interfaces.
|
||||
|
||||
---
|
||||
|
||||
### Q: How does the HTTP adapter handle different API formats?
|
||||
|
||||
**A:** Through configurable templates:
|
||||
|
||||
```yaml
|
||||
agent:
|
||||
endpoint: "https://api.example.com/v1/chat"
|
||||
request_template: |
|
||||
{"messages": [{"role": "user", "content": "{prompt}"}]}
|
||||
response_path: "$.choices[0].message.content"
|
||||
```
|
||||
|
||||
The adapter:
|
||||
1. Replaces `{prompt}` in the template
|
||||
2. Sends the formatted JSON
|
||||
3. Uses JSONPath to extract the response
|
||||
|
||||
This supports OpenAI, Anthropic, custom APIs, etc.
|
||||
|
||||
---
|
||||
|
||||
### Q: Why is there a Python adapter?
|
||||
|
||||
**A:** Bypasses HTTP overhead for local testing:
|
||||
|
||||
```python
|
||||
# Instead of: HTTP request → your server → your code → HTTP response
|
||||
# Just: your_function(prompt) → response
|
||||
|
||||
class PythonAgentAdapter:
|
||||
async def invoke(self, prompt: str) -> AgentResponse:
|
||||
# Import the module dynamically
|
||||
module_path, func_name = self.endpoint.rsplit(":", 1)
|
||||
module = importlib.import_module(module_path)
|
||||
func = getattr(module, func_name)
|
||||
|
||||
# Call directly
|
||||
start = time.perf_counter()
|
||||
response = await func(prompt) if asyncio.iscoroutinefunction(func) else func(prompt)
|
||||
latency = (time.perf_counter() - start) * 1000
|
||||
|
||||
return AgentResponse(text=response, latency_ms=latency)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing & Quality
|
||||
|
||||
### Q: Why are tests split by module?
|
||||
|
||||
**A:** Mirrors the source structure for maintainability:
|
||||
|
||||
```
|
||||
tests/
|
||||
├── test_config.py # Tests for core/config.py
|
||||
├── test_mutations.py # Tests for mutations/
|
||||
├── test_assertions.py # Tests for assertions/
|
||||
├── test_performance.py # Tests for performance module
|
||||
```
|
||||
|
||||
When fixing a bug in `config.py`, you immediately know to check `test_config.py`.
|
||||
|
||||
---
|
||||
|
||||
### Q: Why use pytest over unittest?
|
||||
|
||||
**A:** Pytest is more Pythonic and powerful:
|
||||
|
||||
```python
|
||||
# unittest style (verbose)
|
||||
class TestConfig(unittest.TestCase):
|
||||
def test_load_config(self):
|
||||
self.assertEqual(config.agent.type, AgentType.HTTP)
|
||||
|
||||
# pytest style (concise)
|
||||
def test_load_config():
|
||||
assert config.agent.type == AgentType.HTTP
|
||||
```
|
||||
|
||||
Pytest also offers:
|
||||
- Fixtures for setup/teardown
|
||||
- Parametrized tests
|
||||
- Better assertion introspection
|
||||
|
||||
---
|
||||
|
||||
### Q: How should I add tests for a new feature?
|
||||
|
||||
**A:** Follow this pattern:
|
||||
|
||||
1. **Create test file** if needed: `tests/test_<module>.py`
|
||||
2. **Write failing test first** (TDD)
|
||||
3. **Group related tests** in a class
|
||||
4. **Use fixtures** for common setup
|
||||
|
||||
```python
|
||||
# tests/test_new_feature.py
|
||||
import pytest
|
||||
from flakestorm.new_module import NewFeature
|
||||
|
||||
class TestNewFeature:
|
||||
@pytest.fixture
|
||||
def feature(self):
|
||||
return NewFeature(config={...})
|
||||
|
||||
def test_basic_functionality(self, feature):
|
||||
result = feature.do_something()
|
||||
assert result == expected
|
||||
|
||||
def test_edge_case(self, feature):
|
||||
with pytest.raises(ValueError):
|
||||
feature.do_something(invalid_input)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Extending flakestorm
|
||||
|
||||
### Q: How do I add a new mutation type?
|
||||
|
||||
**A:** Three steps:
|
||||
|
||||
1. **Add to enum** (`mutations/types.py`):
|
||||
```python
|
||||
class MutationType(str, Enum):
|
||||
# ... existing types
|
||||
MY_NEW_TYPE = "my_new_type"
|
||||
```
|
||||
|
||||
2. **Add template** (`mutations/templates.py`):
|
||||
```python
|
||||
TEMPLATES[MutationType.MY_NEW_TYPE] = """
|
||||
Your prompt template here.
|
||||
|
||||
Original: {prompt}
|
||||
|
||||
Modified:
|
||||
"""
|
||||
```
|
||||
|
||||
3. **Add default weight** (`core/config.py`):
|
||||
```python
|
||||
class MutationConfig(BaseModel):
|
||||
weights: dict = {
|
||||
# ... existing weights
|
||||
MutationType.MY_NEW_TYPE: 1.0,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Q: How do I add a new assertion type?
|
||||
|
||||
**A:** Four steps:
|
||||
|
||||
1. **Create checker class** (`assertions/deterministic.py` or `semantic.py`):
|
||||
```python
|
||||
class MyNewChecker(BaseChecker):
|
||||
def check(self, response: str, latency_ms: float) -> CheckResult:
|
||||
# Your logic here
|
||||
passed = some_condition(response)
|
||||
return CheckResult(
|
||||
passed=passed,
|
||||
check_type=InvariantType.MY_NEW_TYPE,
|
||||
details="Explanation"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Add to enum** (`core/config.py`):
|
||||
```python
|
||||
class InvariantType(str, Enum):
|
||||
# ... existing types
|
||||
MY_NEW_TYPE = "my_new_type"
|
||||
```
|
||||
|
||||
3. **Register in verifier** (`assertions/verifier.py`):
|
||||
```python
|
||||
CHECKER_REGISTRY = {
|
||||
# ... existing checkers
|
||||
InvariantType.MY_NEW_TYPE: MyNewChecker,
|
||||
}
|
||||
```
|
||||
|
||||
4. **Add tests** (`tests/test_assertions.py`)
|
||||
|
||||
---
|
||||
|
||||
### Q: How do I add a new report format?
|
||||
|
||||
**A:** Create a new generator:
|
||||
|
||||
```python
|
||||
# reports/markdown.py
|
||||
class MarkdownReportGenerator:
|
||||
def __init__(self, results: TestResults):
|
||||
self.results = results
|
||||
|
||||
def generate(self) -> str:
|
||||
"""Generate markdown content."""
|
||||
md = f"# flakestorm Report\n\n"
|
||||
md += f"**Score:** {self.results.statistics.robustness_score:.2f}\n"
|
||||
# ... more content
|
||||
return md
|
||||
|
||||
def save(self, path: Path = None) -> Path:
|
||||
path = path or Path(f"reports/report_{timestamp}.md")
|
||||
path.write_text(self.generate())
|
||||
return path
|
||||
```
|
||||
|
||||
Then add CLI option in `cli/main.py`.
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Q: Why am I getting "Cannot connect to Ollama"?
|
||||
|
||||
**A:** Ollama service isn't running. Fix:
|
||||
|
||||
```bash
|
||||
# Start Ollama
|
||||
ollama serve
|
||||
|
||||
# Verify it's running
|
||||
curl http://localhost:11434/api/version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Q: Why is mutation generation slow?
|
||||
|
||||
**A:** LLM inference is inherently slow. Options:
|
||||
1. Use a faster model: `ollama pull phi3:mini`
|
||||
2. Reduce mutation count: `mutations.count: 10`
|
||||
3. Use GPU: Ensure Ollama uses GPU acceleration
|
||||
|
||||
---
|
||||
|
||||
### Q: Why do tests pass locally but fail in CI?
|
||||
|
||||
**A:** Common causes:
|
||||
1. **Missing Ollama**: CI needs Ollama service
|
||||
2. **Different model**: Ensure same model is pulled
|
||||
3. **Timing**: CI may be slower, increase timeouts
|
||||
4. **Environment variables**: Ensure secrets are set in CI
|
||||
|
||||
---
|
||||
|
||||
### Q: How do I debug a failing assertion?
|
||||
|
||||
**A:** Enable verbose mode and check the report:
|
||||
|
||||
```bash
|
||||
flakestorm run --verbose --output html
|
||||
```
|
||||
|
||||
The HTML report shows:
|
||||
- Original prompt
|
||||
- Mutated prompt
|
||||
- Agent response
|
||||
- Which assertion failed and why
|
||||
|
||||
---
|
||||
|
||||
*Have more questions? Open an issue on GitHub!*
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue