mirror of
https://github.com/flakestorm/flakestorm.git
synced 2026-04-30 03:16:29 +02:00
Refactor README.md and USAGE_GUIDE.md to streamline installation instructions and enhance clarity on robustness scoring and mutation strategies. Removed outdated sections and added detailed explanations for mutation types and their applications in testing. This update aims to improve user understanding and facilitate easier setup and usage of Flakestorm.
This commit is contained in:
parent
732a7bd990
commit
d339d5e436
2 changed files with 28 additions and 186 deletions
|
|
@ -870,13 +870,23 @@ invariants:
|
|||
|
||||
### Robustness Score
|
||||
|
||||
A number from 0.0 to 1.0 indicating how reliable your agent is:
|
||||
A number from 0.0 to 1.0 indicating how reliable your agent is.
|
||||
|
||||
The Robustness Score is calculated as:
|
||||
|
||||
$$R = \frac{W_s \cdot S_{passed} + W_d \cdot D_{passed}}{N_{total}}$$
|
||||
|
||||
Where:
|
||||
- $S_{passed}$ = Semantic variations passed
|
||||
- $D_{passed}$ = Deterministic tests passed
|
||||
- $W$ = Weights assigned by mutation difficulty
|
||||
|
||||
**Simplified formula:**
|
||||
```
|
||||
Score = (Weighted Passed Tests) / (Total Weighted Tests)
|
||||
```
|
||||
|
||||
Weights by mutation type:
|
||||
**Weights by mutation type:**
|
||||
- `prompt_injection`: 1.5 (harder to defend against)
|
||||
- `encoding_attacks`: 1.3 (security and parsing critical)
|
||||
- `length_extremes`: 1.2 (edge cases important)
|
||||
|
|
@ -1001,6 +1011,20 @@ types:
|
|||
- noise
|
||||
```
|
||||
|
||||
### Mutation Strategy
|
||||
|
||||
The 8 mutation types work together to provide comprehensive robustness testing:
|
||||
|
||||
- **Semantic Robustness**: Paraphrase, Context Manipulation
|
||||
- **Input Robustness**: Noise, Encoding Attacks, Length Extremes
|
||||
- **Security**: Prompt Injection, Encoding Attacks
|
||||
- **User Experience**: Tone Shift, Noise, Context Manipulation
|
||||
|
||||
For comprehensive testing, use all 8 types. For focused testing:
|
||||
- **Security-focused**: Emphasize Prompt Injection, Encoding Attacks
|
||||
- **UX-focused**: Emphasize Noise, Tone Shift, Context Manipulation
|
||||
- **Edge case testing**: Emphasize Length Extremes, Encoding Attacks
|
||||
|
||||
### Interpreting Results by Mutation Type
|
||||
|
||||
When analyzing test results, pay attention to which mutation types are failing:
|
||||
|
|
@ -1045,7 +1069,7 @@ mutations:
|
|||
mutations:
|
||||
types:
|
||||
- custom # Enable custom mutations
|
||||
|
||||
|
||||
custom_templates:
|
||||
extreme_encoding: |
|
||||
Multi-layer encoding (Base64 + URL + Unicode): {prompt}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue