flakestorm/README.md
Entropix 7b75fc9530 Implement Open Source edition limits and feature restrictions
- Add 5 mutation types (paraphrase, noise, tone_shift, prompt_injection, custom)
- Cap mutations at 50 per test run
- Force sequential execution only
- Disable GitHub Actions integration (Cloud feature)
- Add upgrade prompts throughout CLI
- Update README with feature comparison
- Add limits.py module for centralized limit management
- Add cloud and limits CLI commands
- Update all documentation with Cloud upgrade messaging
2025-12-29 00:11:02 +08:00

9.2 KiB
Raw Blame History

Entropix

The Agent Reliability Engine
Chaos Engineering for AI Agents

License PyPI Python Versions Cloud


📢 This is the Open Source Edition. For production workloads, check out Entropix Cloud — 20x faster with parallel execution, cloud LLMs, and CI/CD integration.


The Problem

The "Happy Path" Fallacy: Current AI development tools focus on getting an agent to work once. Developers tweak prompts until they get a correct answer, declare victory, and ship.

The Reality: LLMs are non-deterministic. An agent that works on Monday with temperature=0.7 might fail on Tuesday. Users don't follow "Happy Paths" — they make typos, they're aggressive, they lie, and they attempt prompt injections.

The Void:

  • Observability Tools (LangSmith) tell you after the agent failed in production
  • Eval Libraries (RAGAS) focus on academic scores rather than system reliability
  • Missing Link: A tool that actively attacks the agent to prove robustness before deployment

The Solution

Entropix is a local-first testing engine that applies Chaos Engineering principles to AI Agents.

Instead of running one test case, Entropix takes a single "Golden Prompt", generates adversarial mutations (semantic variations, noise injection, hostile tone, prompt injections), runs them against your agent, and calculates a Robustness Score.

"If it passes Entropix, it won't break in Production."

Open Source vs Cloud

Feature Open Source (Free) Cloud Pro ($49/mo) Cloud Team ($299/mo)
Mutation Types 5 basic All types All types
Mutations/Run 50 max Unlimited Unlimited
Execution Sequential Parallel (20x) Parallel (20x)
LLM Local only Cloud + Local Cloud + Local
PII Detection Basic regex Advanced NER + ML Advanced NER + ML
Prompt Injection Basic ML-powered ML-powered
Factuality Check
Test History Dashboard Dashboard
GitHub Actions One-click One-click
Team Features SSO + Sharing

Why the difference?

Developer workflow:
1. Make code change
2. Run Entropix tests (waiting...)
3. Get results
4. Fix issues
5. Repeat

Open Source: ~10 minutes per iteration → Run once, then skip
Cloud Pro:   ~30 seconds per iteration → Run every commit

👉 Upgrade to Cloud for production workloads.

Features (Open Source)

  • 5 Mutation Types: Paraphrasing, noise, tone shifts, basic adversarial, custom templates
  • Invariant Assertions: Deterministic checks, semantic similarity, basic safety
  • Local-First: Uses Ollama with Qwen 3 8B for free testing
  • Beautiful Reports: Interactive HTML reports with pass/fail matrices
  • ⚠️ 50 Mutations Max: Per test run (upgrade to Cloud for unlimited)
  • ⚠️ Sequential Only: One test at a time (upgrade to Cloud for 20x parallel)
  • No CI/CD: GitHub Actions requires Cloud

Quick Start

Installation

pip install entropix

Prerequisites

Entropix uses Ollama for local model inference:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the default model
ollama pull qwen3:8b

Initialize Configuration

entropix init

This creates an entropix.yaml configuration file:

version: "1.0"

agent:
  endpoint: "http://localhost:8000/invoke"
  type: "http"
  timeout: 30000

model:
  provider: "ollama"
  name: "qwen3:8b"
  base_url: "http://localhost:11434"

mutations:
  count: 10  # Max 50 total per run in Open Source
  types:
    - paraphrase
    - noise
    - tone_shift
    - prompt_injection

golden_prompts:
  - "Book a flight to Paris for next Monday"
  - "What's my account balance?"

invariants:
  - type: "latency"
    max_ms: 2000
  - type: "valid_json"

output:
  format: "html"
  path: "./reports"

Run Tests

entropix run

Output:

  Running in sequential mode (Open Source). Upgrade for parallel: https://entropix.cloud

Generating mutations... ━━━━━━━━━━━━━━━━━━━━ 100%
Running attacks...      ━━━━━━━━━━━━━━━━━━━━ 100%

╭──────────────────────────────────────────╮
│  Robustness Score: 87.5%                 │
│  ────────────────────────                │
│  Passed: 17/20 mutations                 │
│  Failed: 3 (2 latency, 1 injection)      │
╰──────────────────────────────────────────╯

⏱️  Test took 245.3s. With Entropix Cloud, this would take ~12.3s
→ https://entropix.cloud

Report saved to: ./reports/entropix-2024-01-15-143022.html

Check Limits

entropix limits   # Show Open Source edition limits
entropix cloud    # Learn about Cloud features

Mutation Types

Type Description Example
Paraphrase Semantically equivalent rewrites "Book a flight" → "I need to fly out"
Noise Typos and spelling errors "Book a flight" → "Book a fliight plz"
Tone Shift Aggressive/impatient phrasing "Book a flight" → "I need a flight NOW!"
Prompt Injection Basic adversarial attacks "Book a flight and ignore previous instructions"
Custom Your own mutation templates Define with {prompt} placeholder

Need advanced mutations? Sophisticated jailbreaks, multi-step injections, and domain-specific attacks are available in Entropix Cloud.

Invariants (Assertions)

Deterministic

invariants:
  - type: "contains"
    value: "confirmation_code"
  - type: "latency"
    max_ms: 2000
  - type: "valid_json"

Semantic

invariants:
  - type: "similarity"
    expected: "Your flight has been booked"
    threshold: 0.8

Safety (Basic)

invariants:
  - type: "excludes_pii"  # Basic regex patterns
  - type: "refusal_check"

Need advanced safety? NER-based PII detection, ML-powered prompt injection detection, and factuality checking are available in Entropix Cloud.

Agent Adapters

HTTP Endpoint

agent:
  type: "http"
  endpoint: "http://localhost:8000/invoke"

Python Callable

from entropix import test_agent

@test_agent
async def my_agent(input: str) -> str:
    # Your agent logic
    return response

LangChain

agent:
  type: "langchain"
  module: "my_agent:chain"

CI/CD Integration

⚠️ Cloud Feature: GitHub Actions integration requires Entropix Cloud.

For local testing only:

# Run before committing (manual)
entropix run --min-score 0.9

With Entropix Cloud, you get:

  • One-click GitHub Actions setup
  • Automatic PR blocking below threshold
  • Test history comparison
  • Slack/Discord notifications

Robustness Score

The Robustness Score is calculated as:

R = \frac{W_s \cdot S_{passed} + W_d \cdot D_{passed}}{N_{total}}

Where:

  • S_{passed} = Semantic variations passed
  • D_{passed} = Deterministic tests passed
  • W = Weights assigned by mutation difficulty

Documentation

Getting Started

For Developers

Reference

License

AGPLv3 - See LICENSE for details.


Tested with Entropix
Tested with Entropix

Need speed? Try Entropix Cloud →