flakestorm/examples/langchain_agent/README.md

364 lines
11 KiB
Markdown

# LangChain Agent Example
This example demonstrates how to test a LangChain agent with flakestorm. The agent uses LangChain's `LLMChain` to process user queries.
## Overview
The example includes:
- A LangChain agent that uses **Google Gemini AI** (if API key is set) or falls back to a mock LLM
- A `flakestorm.yaml` configuration file for testing the agent
- Instructions for running flakestorm against the agent
- Automatic fallback to mock LLM if API key is not set (no API keys required for basic testing)
## Features
- **Real LLM Support**: Uses Google Gemini AI (if API key is set) for realistic testing
- **Automatic Fallback**: Falls back to a mock LLM if API key is not set (no API keys required for basic testing)
- **Input-Aware Processing**: Actually processes input and can fail on certain inputs, making it realistic for testing
- **Realistic Failure Modes**: The agent can fail on empty inputs, very long inputs, and prompt injection attempts
- **flakestorm Integration**: Ready-to-use configuration for testing robustness with meaningful results
## Setup
### 1. Create Virtual Environment (Recommended)
```bash
cd examples/langchain_agent
# Create virtual environment
python -m venv lc_test_venv
# Activate virtual environment
# On macOS/Linux:
source lc_test_venv/bin/activate
# On Windows (PowerShell):
# lc_test_venv\Scripts\Activate.ps1
# On Windows (Command Prompt):
# lc_test_venv\Scripts\activate.bat
```
**Note:** You should see `(venv)` in your terminal prompt after activation.
### 2. Install Dependencies
```bash
# Make sure virtual environment is activated
pip install -r requirements.txt
# This will install:
# - langchain-core, langchain-community (LangChain packages)
# - langchain-google-genai (for Google Gemini support)
# - flakestorm (for testing)
# Or install manually:
# For modern LangChain (0.3.x+) with Gemini:
# pip install langchain-core langchain-community langchain-google-genai flakestorm
# For older LangChain (0.1.x, 0.2.x):
# pip install langchain flakestorm
```
**Note:** The agent code automatically handles different LangChain versions. If you encounter import errors, try:
```bash
# Install all LangChain packages for maximum compatibility
pip install langchain langchain-core langchain-community
```
### 3. Verify the Agent Works
```bash
# Test the agent directly
python -c "from agent import chain; result = chain.invoke({'input': 'Hello!'}); print(result)"
```
Expected output:
```
{'input': 'Hello!', 'text': 'I can help you with that!'}
```
## Running flakestorm Tests
### From the Project Root (Recommended)
```bash
# Make sure you're in the project root (not in examples/langchain_agent)
cd /path/to/flakestorm
# Run flakestorm against the LangChain agent
flakestorm run --config examples/langchain_agent/flakestorm.yaml
```
**This is the easiest way** - no PYTHONPATH setup needed!
### From the Example Directory
If you want to run from `examples/langchain_agent`, you need to set the Python path:
```bash
# If you're in examples/langchain_agent
cd examples/langchain_agent
# Option 1: Set PYTHONPATH (recommended)
# On macOS/Linux:
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
flakestorm run
# On Windows (PowerShell):
$env:PYTHONPATH = "$env:PYTHONPATH;$PWD"
flakestorm run
# Option 2: Update flakestorm.yaml to use full path
# Change: endpoint: "examples.langchain_agent.agent:chain"
# To: endpoint: "agent:chain"
# Then run: flakestorm run
```
**Note:** The `flakestorm.yaml` is configured to run from the project root by default. For easiest setup, run from the project root. If running from the example directory, either set `PYTHONPATH` or update the `endpoint` in `flakestorm.yaml`.
## Understanding the Configuration
### Agent Configuration
The `flakestorm.yaml` file configures flakestorm to test the LangChain agent:
```yaml
agent:
endpoint: "examples.langchain_agent.agent:chain" # Module path: imports chain from agent.py
type: "langchain" # Tells flakestorm to use LangChain adapter
timeout: 30000 # 30 second timeout
```
**How it works:**
- flakestorm imports `chain` from the `agent` module
- It calls `chain.invoke({"input": prompt})` or `chain.ainvoke({"input": prompt})`
- The adapter handles different LangChain interfaces automatically
### Choosing the Right Invariants
**Important:** Only use invariants that match your agent's expected output format!
**For Text-Only Agents (like this example):**
```yaml
invariants:
- type: "latency"
max_ms: 10000
- type: "not_contains"
value: "" # Response shouldn't be empty
- type: "excludes_pii"
- type: "refusal_check"
```
**For JSON-Only Agents:**
```yaml
invariants:
- type: "valid_json" # ✅ Use this if agent returns JSON
- type: "latency"
max_ms: 5000
```
**For Agents with Mixed Output:**
```yaml
invariants:
- type: "latency"
max_ms: 5000
# Use prompt_filter to apply JSON check only to specific prompts
- type: "valid_json"
prompt_filter: "api|json|data" # Only check JSON for prompts containing these words
```
### Golden Prompts
The configuration includes 8 example prompts that should work correctly:
- Weather queries
- Educational questions
- Help requests
- Technical explanations
flakestorm will generate mutations of these prompts to test robustness.
### Invariants
The tests verify:
- **Latency**: Response under 10 seconds
- **Contains "help"**: Response should contain helpful content (stricter than just checking for space)
- **Minimum Length**: Response must be at least 20 characters (ensures meaningful response)
- **PII Safety**: No personally identifiable information
- **Refusal**: Agent should refuse dangerous prompt injections
**Important:**
- flakestorm requires **at least 3 invariants** to ensure comprehensive testing
- This agent returns plain text responses, so we don't use `valid_json` invariant
- Only use `valid_json` if your agent is supposed to return JSON responses
- The invariants are **stricter** than before to catch more issues and produce meaningful test results
## Using Google Gemini (Real LLM)
This example **already uses Google Gemini** if you set the API key! Just set the environment variable:
```bash
# macOS/Linux:
export GOOGLE_AI_API_KEY=your-api-key-here
# Windows (PowerShell):
$env:GOOGLE_AI_API_KEY="your-api-key-here"
# Windows (Command Prompt):
set GOOGLE_AI_API_KEY=your-api-key-here
```
**Get your API key:**
1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy and set it as the environment variable above
**Without API Key:**
If you don't set the API key, the agent automatically falls back to a mock LLM that still processes input meaningfully. This is useful for testing without API costs.
**Other LLM Options:**
You can modify `agent.py` to use other LLMs:
- `ChatOpenAI` - OpenAI GPT models (requires `langchain-openai`)
- `ChatAnthropic` - Anthropic Claude (requires `langchain-anthropic`)
- `ChatOllama` - Local Ollama models (requires `langchain-ollama`)
## Expected Test Results
When you run flakestorm, you'll see:
1. **Mutation Generation**: flakestorm generates 20 mutations per golden prompt (200 total tests with 10 golden prompts)
2. **Test Execution**: Each mutation is tested against the agent
3. **Results Report**: HTML report showing:
- Robustness score (0.0 - 1.0)
- Pass/fail breakdown by mutation type
- Detailed failure analysis
- Recommendations for improvement
### Why This Agent is Better for Testing
**Previous Issue:** The original agent used `FakeListLLM`, which ignored input and just cycled through 8 predefined responses. This meant:
- Mutations had no effect (agent didn't read them)
- Invariants were too lax (always passed)
- 100% reliability score was meaningless
**Current Solution:** The agent uses **Google Gemini AI** (if API key is set) or a mock LLM:
-**With Gemini**: Real LLM that processes input naturally, can fail on edge cases
-**Without API Key**: Mock LLM that still processes input meaningfully
- ✅ Reads and analyzes the input
- ✅ Can fail on empty/whitespace inputs
- ✅ Can fail on very long inputs (>5000 chars)
- ✅ Detects and refuses prompt injection attempts
- ✅ Returns context-aware responses based on input content
- ✅ Stricter invariants (checks for meaningful content, not just non-empty)
**Expected Results:**
- **With Gemini**: More realistic failures, reliability score typically 70-90% (real LLM behavior)
- **With Mock LLM**: Some failures on edge cases, reliability score typically 80-95%
- You should see **some failures** on edge cases (empty inputs, prompt injections, etc.)
- This makes the test results **meaningful** and helps identify real robustness issues
## Common Issues
### "ModuleNotFoundError: No module named 'agent'" or "No module named 'examples'"
**Solution 1 (Recommended):** Run from the project root:
```bash
cd /path/to/flakestorm # Go to project root
flakestorm run --config examples/langchain_agent/flakestorm.yaml
```
**Solution 2:** If running from `examples/langchain_agent`, set PYTHONPATH:
```bash
# macOS/Linux:
export PYTHONPATH="${PYTHONPATH}:$(pwd)"
flakestorm run
# Windows (PowerShell):
$env:PYTHONPATH = "$env:PYTHONPATH;$PWD"
flakestorm run
```
**Solution 3:** Update `flakestorm.yaml` to use relative path:
```yaml
agent:
endpoint: "agent:chain" # Instead of "examples.langchain_agent.agent:chain"
```
### "ModuleNotFoundError: No module named 'langchain.chains'" or "cannot import name 'LLMChain'"
**Solution:** This happens with newer LangChain versions (0.3.x+). Install the required packages:
```bash
# Install all LangChain packages for compatibility
pip install langchain langchain-core langchain-community
# Or if using requirements.txt:
pip install -r requirements.txt
```
The agent code automatically tries multiple import strategies, so installing all packages ensures compatibility.
### "AttributeError: 'LLMChain' object has no attribute 'invoke'"
**Solution:** Update your LangChain version:
```bash
pip install --upgrade langchain langchain-core
```
### "Timeout errors"
**Solution:** Increase timeout in `flakestorm.yaml`:
```yaml
agent:
timeout: 60000 # 60 seconds
```
## Customizing the Agent
### Add Tools/Agents
You can extend the agent to use LangChain tools or agents:
```python
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
tools = [
Tool(
name="Calculator",
func=lambda x: str(eval(x)),
description="Useful for mathematical calculations"
)
]
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
# Export for flakestorm
chain = agent
```
### Add Memory
Add conversation memory to your agent:
```python
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory)
```
## Next Steps
1. **Run the tests**: `flakestorm run --config examples/langchain_agent/flakestorm.yaml`
2. **Review the report**: Check `reports/flakestorm-*.html`
3. **Improve robustness**: Fix issues found in the report
4. **Re-test**: Run flakestorm again to verify improvements
## Learn More
- [LangChain Documentation](https://python.langchain.com/)
- [flakestorm Usage Guide](../docs/USAGE_GUIDE.md)
- [flakestorm Configuration Guide](../docs/CONFIGURATION_GUIDE.md)