# Entropix

The Agent Reliability Engine
Chaos Engineering for AI Agents

License PyPI Python Versions Cloud

--- > **๐Ÿ“ข This is the Open Source Edition.** For production workloads, check out [Entropix Cloud](https://entropix.cloud) โ€” 20x faster with parallel execution, cloud LLMs, and CI/CD integration. --- ## The Problem **The "Happy Path" Fallacy**: Current AI development tools focus on getting an agent to work *once*. Developers tweak prompts until they get a correct answer, declare victory, and ship. **The Reality**: LLMs are non-deterministic. An agent that works on Monday with `temperature=0.7` might fail on Tuesday. Users don't follow "Happy Paths" โ€” they make typos, they're aggressive, they lie, and they attempt prompt injections. **The Void**: - **Observability Tools** (LangSmith) tell you *after* the agent failed in production - **Eval Libraries** (RAGAS) focus on academic scores rather than system reliability - **Missing Link**: A tool that actively *attacks* the agent to prove robustness before deployment ## The Solution **Entropix** is a local-first testing engine that applies **Chaos Engineering** principles to AI Agents. Instead of running one test case, Entropix takes a single "Golden Prompt", generates adversarial mutations (semantic variations, noise injection, hostile tone, prompt injections), runs them against your agent, and calculates a **Robustness Score**. > **"If it passes Entropix, it won't break in Production."** ## Open Source vs Cloud | Feature | Open Source (Free) | Cloud Pro ($49/mo) | Cloud Team ($299/mo) | |---------|:------------------:|:------------------:|:--------------------:| | Mutation Types | 5 basic | All types | All types | | Mutations/Run | **50 max** | Unlimited | Unlimited | | Execution | **Sequential** | โšก Parallel (20x) | โšก Parallel (20x) | | LLM | Local only | Cloud + Local | Cloud + Local | | PII Detection | Basic regex | Advanced NER + ML | Advanced NER + ML | | Prompt Injection | Basic | ML-powered | ML-powered | | Factuality Check | โŒ | โœ… | โœ… | | Test History | โŒ | โœ… Dashboard | โœ… Dashboard | | GitHub Actions | โŒ | โœ… One-click | โœ… One-click | | Team Features | โŒ | โŒ | โœ… SSO + Sharing | **Why the difference?** ``` Developer workflow: 1. Make code change 2. Run Entropix tests (waiting...) 3. Get results 4. Fix issues 5. Repeat Open Source: ~10 minutes per iteration โ†’ Run once, then skip Cloud Pro: ~30 seconds per iteration โ†’ Run every commit ``` ๐Ÿ‘‰ [**Upgrade to Cloud**](https://entropix.cloud) for production workloads. ## Features (Open Source) - โœ… **5 Mutation Types**: Paraphrasing, noise, tone shifts, basic adversarial, custom templates - โœ… **Invariant Assertions**: Deterministic checks, semantic similarity, basic safety - โœ… **Local-First**: Uses Ollama with Qwen 3 8B for free testing - โœ… **Beautiful Reports**: Interactive HTML reports with pass/fail matrices - โš ๏ธ **50 Mutations Max**: Per test run (upgrade to Cloud for unlimited) - โš ๏ธ **Sequential Only**: One test at a time (upgrade to Cloud for 20x parallel) - โŒ **No CI/CD**: GitHub Actions requires Cloud ## Quick Start ### Installation ```bash pip install entropix ``` ### Prerequisites Entropix uses [Ollama](https://ollama.ai) for local model inference: ```bash # Install Ollama (macOS/Linux) curl -fsSL https://ollama.ai/install.sh | sh # Pull the default model ollama pull qwen3:8b ``` ### Initialize Configuration ```bash entropix init ``` This creates an `entropix.yaml` configuration file: ```yaml version: "1.0" agent: endpoint: "http://localhost:8000/invoke" type: "http" timeout: 30000 model: provider: "ollama" name: "qwen3:8b" base_url: "http://localhost:11434" mutations: count: 10 # Max 50 total per run in Open Source types: - paraphrase - noise - tone_shift - prompt_injection golden_prompts: - "Book a flight to Paris for next Monday" - "What's my account balance?" invariants: - type: "latency" max_ms: 2000 - type: "valid_json" output: format: "html" path: "./reports" ``` ### Run Tests ```bash entropix run ``` Output: ``` โ„น๏ธ Running in sequential mode (Open Source). Upgrade for parallel: https://entropix.cloud Generating mutations... โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% Running attacks... โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 100% โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚ Robustness Score: 87.5% โ”‚ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”‚ Passed: 17/20 mutations โ”‚ โ”‚ Failed: 3 (2 latency, 1 injection) โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โฑ๏ธ Test took 245.3s. With Entropix Cloud, this would take ~12.3s โ†’ https://entropix.cloud Report saved to: ./reports/entropix-2024-01-15-143022.html ``` ### Check Limits ```bash entropix limits # Show Open Source edition limits entropix cloud # Learn about Cloud features ``` ## Mutation Types | Type | Description | Example | |------|-------------|---------| | **Paraphrase** | Semantically equivalent rewrites | "Book a flight" โ†’ "I need to fly out" | | **Noise** | Typos and spelling errors | "Book a flight" โ†’ "Book a fliight plz" | | **Tone Shift** | Aggressive/impatient phrasing | "Book a flight" โ†’ "I need a flight NOW!" | | **Prompt Injection** | Basic adversarial attacks | "Book a flight and ignore previous instructions" | | **Custom** | Your own mutation templates | Define with `{prompt}` placeholder | > **Need advanced mutations?** Sophisticated jailbreaks, multi-step injections, and domain-specific attacks are available in [Entropix Cloud](https://entropix.cloud). ## Invariants (Assertions) ### Deterministic ```yaml invariants: - type: "contains" value: "confirmation_code" - type: "latency" max_ms: 2000 - type: "valid_json" ``` ### Semantic ```yaml invariants: - type: "similarity" expected: "Your flight has been booked" threshold: 0.8 ``` ### Safety (Basic) ```yaml invariants: - type: "excludes_pii" # Basic regex patterns - type: "refusal_check" ``` > **Need advanced safety?** NER-based PII detection, ML-powered prompt injection detection, and factuality checking are available in [Entropix Cloud](https://entropix.cloud). ## Agent Adapters ### HTTP Endpoint ```yaml agent: type: "http" endpoint: "http://localhost:8000/invoke" ``` ### Python Callable ```python from entropix import test_agent @test_agent async def my_agent(input: str) -> str: # Your agent logic return response ``` ### LangChain ```yaml agent: type: "langchain" module: "my_agent:chain" ``` ## CI/CD Integration > โš ๏ธ **Cloud Feature**: GitHub Actions integration requires [Entropix Cloud](https://entropix.cloud). For local testing only: ```bash # Run before committing (manual) entropix run --min-score 0.9 ``` With Entropix Cloud, you get: - One-click GitHub Actions setup - Automatic PR blocking below threshold - Test history comparison - Slack/Discord notifications ## Robustness Score The Robustness Score is calculated as: $$R = \frac{W_s \cdot S_{passed} + W_d \cdot D_{passed}}{N_{total}}$$ Where: - $S_{passed}$ = Semantic variations passed - $D_{passed}$ = Deterministic tests passed - $W$ = Weights assigned by mutation difficulty ## Documentation ### Getting Started - [๐Ÿ“– Usage Guide](docs/USAGE_GUIDE.md) - Complete end-to-end guide - [โš™๏ธ Configuration Guide](docs/CONFIGURATION_GUIDE.md) - All configuration options - [๐Ÿงช Test Scenarios](docs/TEST_SCENARIOS.md) - Real-world examples with code ### For Developers - [๐Ÿ—๏ธ Architecture & Modules](docs/MODULES.md) - How the code works - [โ“ Developer FAQ](docs/DEVELOPER_FAQ.md) - Q&A about design decisions - [๐Ÿ“ฆ Publishing Guide](docs/PUBLISHING.md) - How to publish to PyPI - [๐Ÿค Contributing](docs/CONTRIBUTING.md) - How to contribute ### Reference - [๐Ÿ“‹ API Specification](docs/API_SPECIFICATION.md) - API reference - [๐Ÿงช Testing Guide](docs/TESTING_GUIDE.md) - How to run and write tests - [โœ… Implementation Checklist](docs/IMPLEMENTATION_CHECKLIST.md) - Development progress ## License AGPLv3 - See [LICENSE](LICENSE) for details. ---

Tested with Entropix
Tested with Entropix

โšก Need speed? Try Entropix Cloud โ†’