# LangChain Agent Example This example demonstrates how to test a LangChain agent with flakestorm. The agent uses LangChain's `LLMChain` to process user queries. ## Overview The example includes: - A LangChain agent that uses **Google Gemini AI** (if API key is set) or falls back to a mock LLM - A `flakestorm.yaml` configuration file for testing the agent - Instructions for running flakestorm against the agent - Automatic fallback to mock LLM if API key is not set (no API keys required for basic testing) ## Features - **Real LLM Support**: Uses Google Gemini AI (if API key is set) for realistic testing - **Automatic Fallback**: Falls back to a mock LLM if API key is not set (no API keys required for basic testing) - **Input-Aware Processing**: Actually processes input and can fail on certain inputs, making it realistic for testing - **Realistic Failure Modes**: The agent can fail on empty inputs, very long inputs, and prompt injection attempts - **flakestorm Integration**: Ready-to-use configuration for testing robustness with meaningful results ## Setup ### 1. Create Virtual Environment (Recommended) ```bash cd examples/langchain_agent # Create virtual environment python -m venv lc_test_venv # Activate virtual environment # On macOS/Linux: source lc_test_venv/bin/activate # On Windows (PowerShell): # lc_test_venv\Scripts\Activate.ps1 # On Windows (Command Prompt): # lc_test_venv\Scripts\activate.bat ``` **Note:** You should see `(venv)` in your terminal prompt after activation. ### 2. Install Dependencies ```bash # Make sure virtual environment is activated pip install -r requirements.txt # This will install: # - langchain-core, langchain-community (LangChain packages) # - langchain-google-genai (for Google Gemini support) # - flakestorm (for testing) # Or install manually: # For modern LangChain (0.3.x+) with Gemini: # pip install langchain-core langchain-community langchain-google-genai flakestorm # For older LangChain (0.1.x, 0.2.x): # pip install langchain flakestorm ``` **Note:** The agent code automatically handles different LangChain versions. If you encounter import errors, try: ```bash # Install all LangChain packages for maximum compatibility pip install langchain langchain-core langchain-community ``` ### 3. Verify the Agent Works ```bash # Test the agent directly python -c "from agent import chain; result = chain.invoke({'input': 'Hello!'}); print(result)" ``` Expected output: ``` {'input': 'Hello!', 'text': 'I can help you with that!'} ``` ## Running flakestorm Tests ### From the Project Root (Recommended) ```bash # Make sure you're in the project root (not in examples/langchain_agent) cd /path/to/flakestorm # Run flakestorm against the LangChain agent flakestorm run --config examples/langchain_agent/flakestorm.yaml ``` **This is the easiest way** - no PYTHONPATH setup needed! ### From the Example Directory If you want to run from `examples/langchain_agent`, you need to set the Python path: ```bash # If you're in examples/langchain_agent cd examples/langchain_agent # Option 1: Set PYTHONPATH (recommended) # On macOS/Linux: export PYTHONPATH="${PYTHONPATH}:$(pwd)" flakestorm run # On Windows (PowerShell): $env:PYTHONPATH = "$env:PYTHONPATH;$PWD" flakestorm run # Option 2: Update flakestorm.yaml to use full path # Change: endpoint: "examples.langchain_agent.agent:chain" # To: endpoint: "agent:chain" # Then run: flakestorm run ``` **Note:** The `flakestorm.yaml` is configured to run from the project root by default. For easiest setup, run from the project root. If running from the example directory, either set `PYTHONPATH` or update the `endpoint` in `flakestorm.yaml`. ## Understanding the Configuration ### Agent Configuration The `flakestorm.yaml` file configures flakestorm to test the LangChain agent: ```yaml agent: endpoint: "examples.langchain_agent.agent:chain" # Module path: imports chain from agent.py type: "langchain" # Tells flakestorm to use LangChain adapter timeout: 30000 # 30 second timeout ``` **How it works:** - flakestorm imports `chain` from the `agent` module - It calls `chain.invoke({"input": prompt})` or `chain.ainvoke({"input": prompt})` - The adapter handles different LangChain interfaces automatically ### Choosing the Right Invariants **Important:** Only use invariants that match your agent's expected output format! **For Text-Only Agents (like this example):** ```yaml invariants: - type: "latency" max_ms: 10000 - type: "not_contains" value: "" # Response shouldn't be empty - type: "excludes_pii" - type: "refusal_check" ``` **For JSON-Only Agents:** ```yaml invariants: - type: "valid_json" # ✅ Use this if agent returns JSON - type: "latency" max_ms: 5000 ``` **For Agents with Mixed Output:** ```yaml invariants: - type: "latency" max_ms: 5000 # Use prompt_filter to apply JSON check only to specific prompts - type: "valid_json" prompt_filter: "api|json|data" # Only check JSON for prompts containing these words ``` ### Golden Prompts The configuration includes 8 example prompts that should work correctly: - Weather queries - Educational questions - Help requests - Technical explanations flakestorm will generate mutations of these prompts to test robustness. ### Invariants The tests verify: - **Latency**: Response under 10 seconds - **Contains "help"**: Response should contain helpful content (stricter than just checking for space) - **Minimum Length**: Response must be at least 20 characters (ensures meaningful response) - **PII Safety**: No personally identifiable information - **Refusal**: Agent should refuse dangerous prompt injections **Important:** - flakestorm requires **at least 3 invariants** to ensure comprehensive testing - This agent returns plain text responses, so we don't use `valid_json` invariant - Only use `valid_json` if your agent is supposed to return JSON responses - The invariants are **stricter** than before to catch more issues and produce meaningful test results ## Using Google Gemini (Real LLM) This example **already uses Google Gemini** if you set the API key! Just set the environment variable: ```bash # macOS/Linux: export GOOGLE_AI_API_KEY=your-api-key-here # Windows (PowerShell): $env:GOOGLE_AI_API_KEY="your-api-key-here" # Windows (Command Prompt): set GOOGLE_AI_API_KEY=your-api-key-here ``` **Get your API key:** 1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey) 2. Create a new API key 3. Copy and set it as the environment variable above **Without API Key:** If you don't set the API key, the agent automatically falls back to a mock LLM that still processes input meaningfully. This is useful for testing without API costs. **Other LLM Options:** You can modify `agent.py` to use other LLMs: - `ChatOpenAI` - OpenAI GPT models (requires `langchain-openai`) - `ChatAnthropic` - Anthropic Claude (requires `langchain-anthropic`) - `ChatOllama` - Local Ollama models (requires `langchain-ollama`) ## Expected Test Results When you run flakestorm, you'll see: 1. **Mutation Generation**: flakestorm generates 20 mutations per golden prompt (200 total tests with 10 golden prompts) 2. **Test Execution**: Each mutation is tested against the agent 3. **Results Report**: HTML report showing: - Robustness score (0.0 - 1.0) - Pass/fail breakdown by mutation type - Detailed failure analysis - Recommendations for improvement ### Why This Agent is Better for Testing **Previous Issue:** The original agent used `FakeListLLM`, which ignored input and just cycled through 8 predefined responses. This meant: - Mutations had no effect (agent didn't read them) - Invariants were too lax (always passed) - 100% reliability score was meaningless **Current Solution:** The agent uses **Google Gemini AI** (if API key is set) or a mock LLM: - ✅ **With Gemini**: Real LLM that processes input naturally, can fail on edge cases - ✅ **Without API Key**: Mock LLM that still processes input meaningfully - ✅ Reads and analyzes the input - ✅ Can fail on empty/whitespace inputs - ✅ Can fail on very long inputs (>5000 chars) - ✅ Detects and refuses prompt injection attempts - ✅ Returns context-aware responses based on input content - ✅ Stricter invariants (checks for meaningful content, not just non-empty) **Expected Results:** - **With Gemini**: More realistic failures, reliability score typically 70-90% (real LLM behavior) - **With Mock LLM**: Some failures on edge cases, reliability score typically 80-95% - You should see **some failures** on edge cases (empty inputs, prompt injections, etc.) - This makes the test results **meaningful** and helps identify real robustness issues ## Common Issues ### "ModuleNotFoundError: No module named 'agent'" or "No module named 'examples'" **Solution 1 (Recommended):** Run from the project root: ```bash cd /path/to/flakestorm # Go to project root flakestorm run --config examples/langchain_agent/flakestorm.yaml ``` **Solution 2:** If running from `examples/langchain_agent`, set PYTHONPATH: ```bash # macOS/Linux: export PYTHONPATH="${PYTHONPATH}:$(pwd)" flakestorm run # Windows (PowerShell): $env:PYTHONPATH = "$env:PYTHONPATH;$PWD" flakestorm run ``` **Solution 3:** Update `flakestorm.yaml` to use relative path: ```yaml agent: endpoint: "agent:chain" # Instead of "examples.langchain_agent.agent:chain" ``` ### "ModuleNotFoundError: No module named 'langchain.chains'" or "cannot import name 'LLMChain'" **Solution:** This happens with newer LangChain versions (0.3.x+). Install the required packages: ```bash # Install all LangChain packages for compatibility pip install langchain langchain-core langchain-community # Or if using requirements.txt: pip install -r requirements.txt ``` The agent code automatically tries multiple import strategies, so installing all packages ensures compatibility. ### "AttributeError: 'LLMChain' object has no attribute 'invoke'" **Solution:** Update your LangChain version: ```bash pip install --upgrade langchain langchain-core ``` ### "Timeout errors" **Solution:** Increase timeout in `flakestorm.yaml`: ```yaml agent: timeout: 60000 # 60 seconds ``` ## Customizing the Agent ### Add Tools/Agents You can extend the agent to use LangChain tools or agents: ```python from langchain.agents import initialize_agent, Tool from langchain.llms import OpenAI llm = OpenAI(temperature=0) tools = [ Tool( name="Calculator", func=lambda x: str(eval(x)), description="Useful for mathematical calculations" ) ] agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True) # Export for flakestorm chain = agent ``` ### Add Memory Add conversation memory to your agent: ```python from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory() chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory) ``` ## Next Steps 1. **Run the tests**: `flakestorm run --config examples/langchain_agent/flakestorm.yaml` 2. **Review the report**: Check `reports/flakestorm-*.html` 3. **Improve robustness**: Fix issues found in the report 4. **Re-test**: Run flakestorm again to verify improvements ## Learn More - [LangChain Documentation](https://python.langchain.com/) - [flakestorm Usage Guide](../docs/USAGE_GUIDE.md) - [flakestorm Configuration Guide](../docs/CONFIGURATION_GUIDE.md)