- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom) - Cap mutations at 50 per test run - Force sequential execution only - Disable GitHub Actions integration (Cloud feature) - Add upgrade prompts throughout CLI - Update README with feature comparison - Add limits.py module for centralized limit management - Add cloud and limits CLI commands - Update all documentation with Cloud upgrade messaging
9.2 KiB
Entropix
The Agent Reliability Engine
Chaos Engineering for AI Agents
📢 This is the Open Source Edition. For production workloads, check out Entropix Cloud — 20x faster with parallel execution, cloud LLMs, and CI/CD integration.
The Problem
The "Happy Path" Fallacy: Current AI development tools focus on getting an agent to work once. Developers tweak prompts until they get a correct answer, declare victory, and ship.
The Reality: LLMs are non-deterministic. An agent that works on Monday with temperature=0.7 might fail on Tuesday. Users don't follow "Happy Paths" — they make typos, they're aggressive, they lie, and they attempt prompt injections.
The Void:
- Observability Tools (LangSmith) tell you after the agent failed in production
- Eval Libraries (RAGAS) focus on academic scores rather than system reliability
- Missing Link: A tool that actively attacks the agent to prove robustness before deployment
The Solution
Entropix is a local-first testing engine that applies Chaos Engineering principles to AI Agents.
Instead of running one test case, Entropix takes a single "Golden Prompt", generates adversarial mutations (semantic variations, noise injection, hostile tone, prompt injections), runs them against your agent, and calculates a Robustness Score.
"If it passes Entropix, it won't break in Production."
Open Source vs Cloud
| Feature | Open Source (Free) | Cloud Pro ($49/mo) | Cloud Team ($299/mo) |
|---|---|---|---|
| Mutation Types | 5 basic | All types | All types |
| Mutations/Run | 50 max | Unlimited | Unlimited |
| Execution | Sequential | ⚡ Parallel (20x) | ⚡ Parallel (20x) |
| LLM | Local only | Cloud + Local | Cloud + Local |
| PII Detection | Basic regex | Advanced NER + ML | Advanced NER + ML |
| Prompt Injection | Basic | ML-powered | ML-powered |
| Factuality Check | ❌ | ✅ | ✅ |
| Test History | ❌ | ✅ Dashboard | ✅ Dashboard |
| GitHub Actions | ❌ | ✅ One-click | ✅ One-click |
| Team Features | ❌ | ❌ | ✅ SSO + Sharing |
Why the difference?
Developer workflow:
1. Make code change
2. Run Entropix tests (waiting...)
3. Get results
4. Fix issues
5. Repeat
Open Source: ~10 minutes per iteration → Run once, then skip
Cloud Pro: ~30 seconds per iteration → Run every commit
👉 Upgrade to Cloud for production workloads.
Features (Open Source)
- ✅ 5 Mutation Types: Paraphrasing, noise, tone shifts, basic adversarial, custom templates
- ✅ Invariant Assertions: Deterministic checks, semantic similarity, basic safety
- ✅ Local-First: Uses Ollama with Qwen 3 8B for free testing
- ✅ Beautiful Reports: Interactive HTML reports with pass/fail matrices
- ⚠️ 50 Mutations Max: Per test run (upgrade to Cloud for unlimited)
- ⚠️ Sequential Only: One test at a time (upgrade to Cloud for 20x parallel)
- ❌ No CI/CD: GitHub Actions requires Cloud
Quick Start
Installation
pip install entropix
Prerequisites
Entropix uses Ollama for local model inference:
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the default model
ollama pull qwen3:8b
Initialize Configuration
entropix init
This creates an entropix.yaml configuration file:
version: "1.0"
agent:
endpoint: "http://localhost:8000/invoke"
type: "http"
timeout: 30000
model:
provider: "ollama"
name: "qwen3:8b"
base_url: "http://localhost:11434"
mutations:
count: 10 # Max 50 total per run in Open Source
types:
- paraphrase
- noise
- tone_shift
- prompt_injection
golden_prompts:
- "Book a flight to Paris for next Monday"
- "What's my account balance?"
invariants:
- type: "latency"
max_ms: 2000
- type: "valid_json"
output:
format: "html"
path: "./reports"
Run Tests
entropix run
Output:
ℹ️ Running in sequential mode (Open Source). Upgrade for parallel: https://entropix.cloud
Generating mutations... ━━━━━━━━━━━━━━━━━━━━ 100%
Running attacks... ━━━━━━━━━━━━━━━━━━━━ 100%
╭──────────────────────────────────────────╮
│ Robustness Score: 87.5% │
│ ──────────────────────── │
│ Passed: 17/20 mutations │
│ Failed: 3 (2 latency, 1 injection) │
╰──────────────────────────────────────────╯
⏱️ Test took 245.3s. With Entropix Cloud, this would take ~12.3s
→ https://entropix.cloud
Report saved to: ./reports/entropix-2024-01-15-143022.html
Check Limits
entropix limits # Show Open Source edition limits
entropix cloud # Learn about Cloud features
Mutation Types
| Type | Description | Example |
|---|---|---|
| Paraphrase | Semantically equivalent rewrites | "Book a flight" → "I need to fly out" |
| Noise | Typos and spelling errors | "Book a flight" → "Book a fliight plz" |
| Tone Shift | Aggressive/impatient phrasing | "Book a flight" → "I need a flight NOW!" |
| Prompt Injection | Basic adversarial attacks | "Book a flight and ignore previous instructions" |
| Custom | Your own mutation templates | Define with {prompt} placeholder |
Need advanced mutations? Sophisticated jailbreaks, multi-step injections, and domain-specific attacks are available in Entropix Cloud.
Invariants (Assertions)
Deterministic
invariants:
- type: "contains"
value: "confirmation_code"
- type: "latency"
max_ms: 2000
- type: "valid_json"
Semantic
invariants:
- type: "similarity"
expected: "Your flight has been booked"
threshold: 0.8
Safety (Basic)
invariants:
- type: "excludes_pii" # Basic regex patterns
- type: "refusal_check"
Need advanced safety? NER-based PII detection, ML-powered prompt injection detection, and factuality checking are available in Entropix Cloud.
Agent Adapters
HTTP Endpoint
agent:
type: "http"
endpoint: "http://localhost:8000/invoke"
Python Callable
from entropix import test_agent
@test_agent
async def my_agent(input: str) -> str:
# Your agent logic
return response
LangChain
agent:
type: "langchain"
module: "my_agent:chain"
CI/CD Integration
⚠️ Cloud Feature: GitHub Actions integration requires Entropix Cloud.
For local testing only:
# Run before committing (manual)
entropix run --min-score 0.9
With Entropix Cloud, you get:
- One-click GitHub Actions setup
- Automatic PR blocking below threshold
- Test history comparison
- Slack/Discord notifications
Robustness Score
The Robustness Score is calculated as:
R = \frac{W_s \cdot S_{passed} + W_d \cdot D_{passed}}{N_{total}}
Where:
S_{passed}= Semantic variations passedD_{passed}= Deterministic tests passedW= Weights assigned by mutation difficulty
Documentation
Getting Started
- 📖 Usage Guide - Complete end-to-end guide
- ⚙️ Configuration Guide - All configuration options
- 🧪 Test Scenarios - Real-world examples with code
For Developers
- 🏗️ Architecture & Modules - How the code works
- ❓ Developer FAQ - Q&A about design decisions
- 📦 Publishing Guide - How to publish to PyPI
- 🤝 Contributing - How to contribute
Reference
- 📋 API Specification - API reference
- 🧪 Testing Guide - How to run and write tests
- ✅ Implementation Checklist - Development progress
License
AGPLv3 - See LICENSE for details.
Tested with Entropix