Implement flexible HTTP agent adapter with request templates and connection guides - Add request_template, response_path, method, query_params, and parse_structured_input to AgentConfig - Implement structured input parser for key-value extraction from golden prompts - Implement template engine with variable substitution for {prompt} and {field_name} - Implement response extractor supporting JSONPath and dot notation - Update HTTPAgentAdapter to support all HTTP methods (GET, POST, PUT, PATCH, DELETE) - Add comprehensive connection guide explaining localhost vs public endpoints - Update documentation with examples for TypeScript/JavaScript developers - Add tests for all new features

This commit is contained in:
Entropix 2025-12-31 23:04:47 +08:00
parent 050204ef42
commit 859566ee59
10 changed files with 1839 additions and 31 deletions

View file

@ -47,6 +47,10 @@ Define how flakestorm connects to your AI agent.
### HTTP Agent
FlakeStorm's HTTP adapter is highly flexible and supports any endpoint format through request templates and response path configuration.
#### Basic Configuration
```yaml
agent:
endpoint: "http://localhost:8000/invoke"
@ -57,7 +61,7 @@ agent:
Content-Type: "application/json"
```
**Expected API Format:**
**Default Format (if no template specified):**
Request:
```json
@ -70,6 +74,126 @@ Response:
{"output": "agent response text"}
```
#### Custom Request Template
Map your endpoint's exact format using `request_template`:
```yaml
agent:
endpoint: "http://localhost:8000/api/chat"
type: "http"
method: "POST"
request_template: |
{"message": "{prompt}", "stream": false}
response_path: "$.reply"
```
**Template Variables:**
- `{prompt}` - Full golden prompt text
- `{field_name}` - Parsed structured input fields (see Structured Input below)
#### Structured Input Parsing
For agents that accept structured input (like your Reddit query generator):
```yaml
agent:
endpoint: "http://localhost:8000/generate-query"
type: "http"
method: "POST"
request_template: |
{
"industry": "{industry}",
"productName": "{productName}",
"businessModel": "{businessModel}",
"targetMarket": "{targetMarket}",
"description": "{description}"
}
response_path: "$.query"
parse_structured_input: true # Default: true
```
**Golden Prompt Format:**
```yaml
golden_prompts:
- |
Industry: Fitness tech
Product/Service: AI personal trainer app
Business Model: B2C
Target Market: fitness enthusiasts
Description: An app that provides personalized workout plans
```
FlakeStorm will automatically parse this and map fields to your template.
#### HTTP Methods
Support for all HTTP methods:
**GET Request:**
```yaml
agent:
endpoint: "http://api.example.com/search"
type: "http"
method: "GET"
request_template: "q={prompt}"
query_params:
api_key: "${API_KEY}"
format: "json"
```
**PUT Request:**
```yaml
agent:
endpoint: "http://api.example.com/update"
type: "http"
method: "PUT"
request_template: |
{"id": "123", "content": "{prompt}"}
```
#### Response Path Extraction
Extract responses from complex JSON structures:
```yaml
agent:
endpoint: "http://api.example.com/chat"
type: "http"
response_path: "$.choices[0].message.content" # JSONPath
# OR
response_path: "data.result" # Dot notation
```
**Supported Formats:**
- JSONPath: `"$.data.result"`, `"$.choices[0].message.content"`
- Dot notation: `"data.result"`, `"response.text"`
- Simple key: `"output"`, `"response"`
#### Complete Example
```yaml
agent:
endpoint: "http://localhost:8000/api/v1/agent"
type: "http"
method: "POST"
timeout: 30000
headers:
Authorization: "Bearer ${API_KEY}"
Content-Type: "application/json"
request_template: |
{
"messages": [
{"role": "user", "content": "{prompt}"}
],
"temperature": 0.7
}
response_path: "$.choices[0].message.content"
query_params:
version: "v1"
parse_structured_input: true
```
### Python Agent
```yaml
@ -109,6 +233,11 @@ chain: Runnable = ... # Your LangChain chain
|--------|------|---------|-------------|
| `endpoint` | string | required | URL or module path |
| `type` | string | `"http"` | `http`, `python`, or `langchain` |
| `method` | string | `"POST"` | HTTP method: `GET`, `POST`, `PUT`, `PATCH`, `DELETE` |
| `request_template` | string | `null` | Template for request body/query with `{prompt}` or `{field_name}` variables |
| `response_path` | string | `null` | JSONPath or dot notation to extract response (e.g., `"$.data.result"`) |
| `query_params` | object | `{}` | Static query parameters (supports env vars) |
| `parse_structured_input` | boolean | `true` | Whether to parse structured golden prompts into key-value pairs |
| `timeout` | integer | `30000` | Request timeout in ms (1000-300000) |
| `headers` | object | `{}` | HTTP headers (supports env vars) |

317
docs/CONNECTION_GUIDE.md Normal file
View file

@ -0,0 +1,317 @@
# FlakeStorm Connection Guide
This guide explains how to connect FlakeStorm to your agent, covering different scenarios from localhost to public endpoints, and options for internal code.
---
## Table of Contents
1. [Connection Requirements](#connection-requirements)
2. [Localhost vs Public Endpoints](#localhost-vs-public-endpoints)
3. [Internal Code Options](#internal-code-options)
4. [Exposing Local Endpoints](#exposing-local-endpoints)
5. [Troubleshooting](#troubleshooting)
---
## Connection Requirements
### When Do You Need an HTTP Endpoint?
| Your Agent Code | Adapter Type | Endpoint Needed? | Notes |
|----------------|--------------|------------------|-------|
| Python (internal) | Python adapter | ❌ No | Use `type: "python"`, call function directly |
| TypeScript/JavaScript | HTTP adapter | ✅ Yes | Must create HTTP endpoint (can be localhost) |
| Java/Go/Rust | HTTP adapter | ✅ Yes | Must create HTTP endpoint (can be localhost) |
| Already has HTTP API | HTTP adapter | ✅ Yes | Use existing endpoint |
**Key Point:** FlakeStorm is a Python CLI tool. It can only directly call Python functions. For non-Python code, you **must** create an HTTP endpoint wrapper.
---
## Localhost vs Public Endpoints
### When Localhost Works
| FlakeStorm Location | Agent Location | Endpoint Type | Works? |
|---------------------|----------------|---------------|--------|
| Same machine | Same machine | `localhost:8000` | ✅ Yes |
| Different machine | Your machine | `localhost:8000` | ❌ No |
| CI/CD server | Your machine | `localhost:8000` | ❌ No |
| CI/CD server | Cloud (AWS/GCP) | `https://api.example.com` | ✅ Yes |
**Rule of Thumb:** If FlakeStorm and your agent run on the **same machine**, use `localhost`. Otherwise, you need a **public endpoint**.
---
## Internal Code Options
### Option 1: Python Adapter (Recommended for Python Code)
If your agent code is in Python, use the Python adapter - **no HTTP endpoint needed**:
```python
# my_agent.py
async def flakestorm_agent(input: str) -> str:
"""
FlakeStorm will call this function directly.
Args:
input: The golden prompt text (may be structured)
Returns:
The agent's response as a string
"""
# Parse input, call your internal functions
params = parse_structured_input(input)
result = await your_internal_function(params)
return result
```
```yaml
# flakestorm.yaml
agent:
endpoint: "my_agent:flakestorm_agent"
type: "python" # ← No HTTP endpoint needed!
```
**Benefits:**
- No server setup required
- Faster (no HTTP overhead)
- Works offline
- No network configuration
### Option 2: HTTP Wrapper Endpoint (Required for Non-Python Code)
For TypeScript/JavaScript/Java/Go/Rust, create a simple HTTP wrapper:
**TypeScript/Node.js Example:**
```typescript
// test-endpoint.ts
import express from 'express';
import { generateRedditSearchQuery } from './your-internal-code';
const app = express();
app.use(express.json());
app.post('/flakestorm-test', async (req, res) => {
// FlakeStorm sends: {"input": "Industry: X\nProduct: Y..."}
const structuredText = req.body.input;
// Parse structured input
const params = parseStructuredInput(structuredText);
// Call your internal function
const query = await generateRedditSearchQuery(params);
// Return in FlakeStorm's expected format
res.json({ output: query });
});
app.listen(8000, () => {
console.log('FlakeStorm test endpoint: http://localhost:8000/flakestorm-test');
});
```
**Python FastAPI Example:**
```python
# test_endpoint.py
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class Request(BaseModel):
input: str
@app.post("/flakestorm-test")
async def flakestorm_test(request: Request):
# Parse structured input
params = parse_structured_input(request.input)
# Call your internal function
result = await your_internal_function(params)
return {"output": result}
```
Then in `flakestorm.yaml`:
```yaml
agent:
endpoint: "http://localhost:8000/flakestorm-test"
type: "http"
request_template: |
{
"industry": "{industry}",
"productName": "{productName}",
"businessModel": "{businessModel}",
"targetMarket": "{targetMarket}",
"description": "{description}"
}
response_path: "$.output"
```
---
## Exposing Local Endpoints
If FlakeStorm runs on a different machine (e.g., CI/CD), you need to expose your local endpoint publicly.
### Option 1: ngrok (Recommended)
```bash
# Install ngrok
brew install ngrok # macOS
# Or download from https://ngrok.com/download
# Expose local port 8000
ngrok http 8000
# Output:
# Forwarding https://abc123.ngrok.io -> http://localhost:8000
```
Then use the ngrok URL in your config:
```yaml
agent:
endpoint: "https://abc123.ngrok.io/flakestorm-test"
type: "http"
```
### Option 2: localtunnel
```bash
# Install
npm install -g localtunnel
# Expose port
lt --port 8000
# Output:
# your url is: https://xyz.localtunnel.me
```
### Option 3: Deploy to Cloud
Deploy your test endpoint to a cloud service:
- **Vercel** (for Node.js/TypeScript)
- **Railway** (any language)
- **Fly.io** (any language)
- **AWS Lambda** (serverless)
### Option 4: VPN/SSH Tunnel
If both machines are on the same network:
```bash
# SSH tunnel
ssh -L 8000:localhost:8000 user@agent-machine
# Then use localhost:8000 in config
```
---
## Troubleshooting
### "Connection Refused" Error
**Problem:** FlakeStorm can't reach your endpoint.
**Solutions:**
1. **Check if agent is running:**
```bash
curl http://localhost:8000/health
```
2. **Verify endpoint URL in config:**
```yaml
agent:
endpoint: "http://localhost:8000/invoke" # Check this matches your server
```
3. **Check firewall:**
```bash
# macOS: System Preferences > Security & Privacy > Firewall
# Linux: sudo ufw allow 8000
```
4. **For Docker/containers:**
- Use `host.docker.internal:8000` instead of `localhost:8000`
- Or use container networking
### "Timeout" Error
**Problem:** Agent takes too long to respond.
**Solutions:**
1. **Increase timeout:**
```yaml
agent:
timeout: 60000 # 60 seconds
```
2. **Check agent performance:**
- Is the agent actually processing requests?
- Are there network issues?
### "Invalid Response Format" Error
**Problem:** Response doesn't match expected format.
**Solutions:**
1. **Use response_path:**
```yaml
agent:
response_path: "$.data.result" # Extract from nested JSON
```
2. **Check actual response:**
```bash
curl -X POST http://localhost:8000/invoke \
-H "Content-Type: application/json" \
-d '{"input": "test"}'
```
3. **Update request_template if needed:**
```yaml
agent:
request_template: |
{"your_field": "{prompt}"}
```
### Network Connectivity Issues
**Problem:** Can't connect from CI/CD or remote machine.
**Solutions:**
1. **Use public endpoint** (ngrok, cloud deployment)
2. **Check network policies** (corporate firewall, VPN)
3. **Verify DNS resolution** (if using domain name)
4. **Test with curl** from the same machine FlakeStorm runs on
---
## Best Practices
1. **For Development:** Use Python adapter if possible (fastest, simplest)
2. **For Testing:** Use localhost HTTP endpoint (easy to debug)
3. **For CI/CD:** Use public endpoint or cloud deployment
4. **For Production Testing:** Use production endpoint with proper authentication
5. **Security:** Never commit API keys - use environment variables
---
## Quick Reference
| Scenario | Solution |
|----------|----------|
| Python code, same machine | Python adapter (`type: "python"`) |
| TypeScript/JS, same machine | HTTP endpoint (`localhost:8000`) |
| Any language, CI/CD | Public endpoint (ngrok/cloud) |
| Already has HTTP API | Use existing endpoint |
| Need custom request format | Use `request_template` |
| Complex response structure | Use `response_path` |
---
*For more examples, see [Configuration Guide](CONFIGURATION_GUIDE.md) and [Usage Guide](USAGE_GUIDE.md).*

View file

@ -456,6 +456,109 @@ class PythonAgentAdapter:
---
### Q: When do I need to create an HTTP endpoint vs use Python adapter?
**A:** It depends on your agent's language and setup:
| Your Agent Code | Adapter Type | Endpoint Needed? | Notes |
|----------------|--------------|------------------|-------|
| Python (internal) | Python adapter | ❌ No | Use `type: "python"`, call function directly |
| TypeScript/JavaScript | HTTP adapter | ✅ Yes | Must create HTTP endpoint (can be localhost) |
| Java/Go/Rust | HTTP adapter | ✅ Yes | Must create HTTP endpoint (can be localhost) |
| Already has HTTP API | HTTP adapter | ✅ Yes | Use existing endpoint |
**For non-Python code (TypeScript example):**
Since FlakeStorm is a Python CLI tool, it can only directly call Python functions. For TypeScript/JavaScript/other languages, you **must** create an HTTP endpoint:
```typescript
// test-endpoint.ts - Wrapper endpoint for FlakeStorm
import express from 'express';
import { generateRedditSearchQuery } from './your-internal-code';
const app = express();
app.use(express.json());
app.post('/flakestorm-test', async (req, res) => {
// FlakeStorm sends: {"input": "Industry: X\nProduct: Y..."}
const structuredText = req.body.input;
// Parse structured input
const params = parseStructuredInput(structuredText);
// Call your internal function
const query = await generateRedditSearchQuery(params);
// Return in FlakeStorm's expected format
res.json({ output: query });
});
app.listen(8000, () => {
console.log('FlakeStorm test endpoint: http://localhost:8000/flakestorm-test');
});
```
Then in `flakestorm.yaml`:
```yaml
agent:
endpoint: "http://localhost:8000/flakestorm-test"
type: "http"
request_template: |
{
"industry": "{industry}",
"productName": "{productName}",
"businessModel": "{businessModel}",
"targetMarket": "{targetMarket}",
"description": "{description}"
}
response_path: "$.output"
```
---
### Q: Do I need a public endpoint or can I use localhost?
**A:** It depends on where FlakeStorm runs:
| FlakeStorm Location | Agent Location | Endpoint Type | Works? |
|---------------------|----------------|---------------|--------|
| Same machine | Same machine | `localhost:8000` | ✅ Yes |
| Different machine | Your machine | `localhost:8000` | ❌ No - use public endpoint or ngrok |
| CI/CD server | Your machine | `localhost:8000` | ❌ No - use public endpoint |
| CI/CD server | Cloud (AWS/GCP) | `https://api.example.com` | ✅ Yes |
**Options for exposing local endpoint:**
1. **ngrok**: `ngrok http 8000` → get public URL
2. **localtunnel**: `lt --port 8000` → get public URL
3. **Deploy to cloud**: Deploy your test endpoint to a cloud service
4. **VPN/SSH tunnel**: If both machines are on same network
---
### Q: Can I test internal code without creating an endpoint?
**A:** Only if your code is in Python:
```python
# my_agent.py
async def flakestorm_agent(input: str) -> str:
# Parse input, call your internal functions
return result
```
```yaml
# flakestorm.yaml
agent:
endpoint: "my_agent:flakestorm_agent"
type: "python" # ← No HTTP endpoint needed!
```
For non-Python code, you **must** create an HTTP endpoint wrapper.
See [Connection Guide](CONNECTION_GUIDE.md) for detailed examples and troubleshooting.
---
## Testing & Quality
### Q: Why are tests split by module?

View file

@ -455,23 +455,280 @@ open reports/flakestorm-*.html
**What they are:** Carefully crafted prompts that represent your agent's core use cases. These are prompts that *should always work correctly*.
**How to choose them:**
- Cover all major user intents
- Include edge cases you've seen in production
- Represent different complexity levels
#### Understanding Golden Prompts vs System Prompts
**Key Distinction:**
- **System Prompt**: Instructions that define your agent's role and behavior (stays in your code)
- **Golden Prompt**: Example user inputs that should work correctly (what FlakeStorm mutates and tests)
**Example:**
```javascript
// System Prompt (in your agent code - NOT in flakestorm.yaml)
const systemPrompt = `You are a helpful assistant that books flights...`;
// Golden Prompts (in flakestorm.yaml - what FlakeStorm tests)
golden_prompts:
- "Book a flight from NYC to LA"
- "I need to fly to Paris next Monday"
```
FlakeStorm takes your golden prompts, mutates them (adds typos, paraphrases, etc.), and sends them to your agent. Your agent processes them using its system prompt.
#### How to Choose Golden Prompts
**1. Cover All Major User Intents**
```yaml
golden_prompts:
# Primary use case
- "Book a flight from New York to Los Angeles"
# Secondary use case
- "What's my account balance?"
# Another feature
- "Cancel my reservation #12345"
```
**2. Include Different Complexity Levels**
```yaml
golden_prompts:
# Simple intent
- "Hello, how are you?"
# Complex intent with parameters
- "Book a flight from New York to Los Angeles departing March 15th"
# Medium complexity
- "Book a flight to Paris"
# Edge case
- "What if I need to cancel my booking?"
# Complex with multiple parameters
- "Book a flight from New York to Los Angeles departing March 15th, returning March 22nd, economy class, window seat"
```
**3. Include Edge Cases**
```yaml
golden_prompts:
# Normal case
- "Book a flight to Paris"
# Edge case: unusual request
- "What if I need to cancel my booking?"
# Edge case: minimal input
- "Paris"
# Edge case: ambiguous request
- "I need to travel somewhere warm"
```
#### Examples by Agent Type
**1. Simple Chat Agent**
```yaml
golden_prompts:
- "What is the weather in New York?"
- "Tell me a joke"
- "How do I make a paper airplane?"
- "What's 2 + 2?"
```
**2. E-commerce Assistant**
```yaml
golden_prompts:
- "I'm looking for a red dress size medium"
- "Show me running shoes under $100"
- "What's the return policy?"
- "Add this to my cart"
- "Track my order #ABC123"
```
**3. Structured Input Agent (Reddit Search Query Generator)**
For agents that accept structured input (like a Reddit community discovery assistant):
```yaml
golden_prompts:
# B2C SaaS example
- |
Industry: Fitness tech
Product/Service: AI personal trainer app
Business Model: B2C
Target Market: fitness enthusiasts, people who want to lose weight
Description: An app that provides personalized workout plans using AI
# B2B SaaS example
- |
Industry: Marketing tech
Product/Service: Email automation platform
Business Model: B2B SaaS
Target Market: small business owners, marketing teams
Description: Automated email campaigns for small businesses
# Marketplace example
- |
Industry: E-commerce
Product/Service: Handmade crafts marketplace
Business Model: Marketplace
Target Market: crafters, DIY enthusiasts, gift buyers
Description: Platform connecting artisans with buyers
# Edge case - minimal description
- |
Industry: Healthcare tech
Product/Service: Telemedicine platform
Business Model: B2C
Target Market: busy professionals
Description: Video consultations
```
**4. API/Function-Calling Agent**
```yaml
golden_prompts:
- "Get the weather for San Francisco"
- "Send an email to john@example.com with subject 'Meeting'"
- "Create a calendar event for tomorrow at 3pm"
- "What's my schedule for next week?"
```
**5. Code Generation Agent**
```yaml
golden_prompts:
- "Write a Python function to sort a list"
- "Create a React component for a login form"
- "How do I connect to a PostgreSQL database in Node.js?"
- "Fix this bug: [code snippet]"
```
#### Best Practices
**1. Start Small, Then Expand**
```yaml
# Phase 1: Start with 2-3 core prompts
golden_prompts:
- "Primary use case 1"
- "Primary use case 2"
# Phase 2: Add more as you validate
golden_prompts:
- "Primary use case 1"
- "Primary use case 2"
- "Secondary use case"
- "Edge case 1"
- "Edge case 2"
```
**2. Cover Different User Personas**
```yaml
golden_prompts:
# Professional user
- "I need to schedule a meeting with the team for Q4 planning"
# Casual user
- "hey can u help me book something"
# Technical user
- "Query the database for all users created after 2024-01-01"
# Non-technical user
- "Show me my account"
```
**3. Include Real Production Examples**
```yaml
golden_prompts:
# From your production logs
- "Actual user query from logs"
- "Another real example"
- "Edge case that caused issues before"
```
**4. Test Different Input Formats**
```yaml
golden_prompts:
# Well-formatted
- "Book a flight from New York to Los Angeles on March 15th"
# Informal
- "need a flight nyc to la march 15"
# With extra context
- "Hi! I'm planning a trip and I need to book a flight from New York City to Los Angeles on March 15th, 2024. Can you help?"
```
**5. For Structured Input: Cover All Variations**
```yaml
golden_prompts:
# Complete input
- |
Industry: Tech
Product: SaaS platform
Model: B2B
Market: Enterprises
Description: Full description here
# Minimal input (edge case)
- |
Industry: Tech
Product: Platform
# Different business models
- |
Industry: Retail
Product: E-commerce site
Model: B2C
Market: Consumers
```
#### Common Patterns
**Pattern 1: Question-Answer Agent**
```yaml
golden_prompts:
- "What is X?"
- "How do I Y?"
- "Why does Z happen?"
- "When should I do A?"
```
**Pattern 2: Task-Oriented Agent**
```yaml
golden_prompts:
- "Do X" (imperative)
- "I need to do X" (declarative)
- "Can you help me with X?" (question form)
- "X please" (polite request)
```
**Pattern 3: Multi-Turn Context Agent**
```yaml
golden_prompts:
# First turn
- "I'm looking for a hotel"
# Second turn (test separately)
- "In Paris"
# Third turn (test separately)
- "Under $200 per night"
```
**Pattern 4: Data Processing Agent**
```yaml
golden_prompts:
- "Analyze this data: [data]"
- "Summarize the following: [text]"
- "Extract key information from: [content]"
```
#### What NOT to Include
❌ **Don't include:**
- Prompts that are known to fail (those are edge cases to test, not golden prompts)
- System prompts or instructions (those stay in your code)
- Malformed inputs (FlakeStorm will generate those as mutations)
- Test-only prompts that users would never send
✅ **Do include:**
- Real user queries from production
- Expected use cases
- Prompts that should always work
- Representative examples of your user base
### Mutation Types
flakestorm generates adversarial variations of your golden prompts:
@ -862,6 +1119,143 @@ agent = AgentExecutor(...)
---
## Request Templates and Connection Setup
### Understanding Request Templates
Request templates allow you to map FlakeStorm's format to your agent's exact API format.
#### Basic Template
```yaml
agent:
endpoint: "http://localhost:8000/api/chat"
type: "http"
request_template: |
{"message": "{prompt}", "stream": false}
response_path: "$.reply"
```
**What happens:**
1. FlakeStorm takes golden prompt: `"Book a flight to Paris"`
2. Replaces `{prompt}` in template: `{"message": "Book a flight to Paris", "stream": false}`
3. Sends to your endpoint
4. Extracts response from `$.reply` path
#### Structured Input Mapping
For agents that accept structured input:
```yaml
agent:
endpoint: "http://localhost:8000/generate-query"
type: "http"
method: "POST"
request_template: |
{
"industry": "{industry}",
"productName": "{productName}",
"businessModel": "{businessModel}",
"targetMarket": "{targetMarket}",
"description": "{description}"
}
response_path: "$.query"
parse_structured_input: true
```
**Golden Prompt:**
```yaml
golden_prompts:
- |
Industry: Fitness tech
Product/Service: AI personal trainer app
Business Model: B2C
Target Market: fitness enthusiasts
Description: An app that provides personalized workout plans
```
**What happens:**
1. FlakeStorm parses structured input into key-value pairs
2. Maps fields to template: `{"industry": "Fitness tech", "productName": "AI personal trainer app", ...}`
3. Sends to your endpoint
4. Extracts response from `$.query`
#### Different HTTP Methods
**GET Request:**
```yaml
agent:
endpoint: "http://api.example.com/search"
type: "http"
method: "GET"
request_template: "q={prompt}"
query_params:
api_key: "${API_KEY}"
format: "json"
```
**PUT Request:**
```yaml
agent:
endpoint: "http://api.example.com/update"
type: "http"
method: "PUT"
request_template: |
{"id": "123", "content": "{prompt}"}
```
### Connection Setup
#### For Python Code (No Endpoint Needed)
```python
# my_agent.py
async def flakestorm_agent(input: str) -> str:
# Your agent logic
return result
```
```yaml
agent:
endpoint: "my_agent:flakestorm_agent"
type: "python"
```
#### For TypeScript/JavaScript (Need HTTP Endpoint)
Create a wrapper endpoint:
```typescript
// test-endpoint.ts
import express from 'express';
import { yourAgentFunction } from './your-code';
const app = express();
app.use(express.json());
app.post('/flakestorm-test', async (req, res) => {
const result = await yourAgentFunction(req.body.input);
res.json({ output: result });
});
app.listen(8000);
```
```yaml
agent:
endpoint: "http://localhost:8000/flakestorm-test"
type: "http"
```
#### Localhost vs Public Endpoint
- **Same machine:** Use `localhost:8000`
- **Different machine/CI/CD:** Use public endpoint (ngrok, cloud deployment)
See [Connection Guide](CONNECTION_GUIDE.md) for detailed setup instructions.
---
## Advanced Usage
### Custom Mutation Templates
@ -921,6 +1315,306 @@ advanced:
retries: 3 # Retry failed requests 3 times
```
### Golden Prompt Guide
A comprehensive guide to creating effective golden prompts for your agent.
#### Step-by-Step: Creating Golden Prompts
**Step 1: Identify Core Use Cases**
```yaml
# List your agent's primary functions
# Example: Flight booking agent
golden_prompts:
- "Book a flight" # Core function
- "Check flight status" # Core function
- "Cancel booking" # Core function
```
**Step 2: Add Variations for Each Use Case**
```yaml
golden_prompts:
# Booking variations
- "Book a flight from NYC to LA"
- "I need to fly to Paris"
- "Reserve a ticket to Tokyo"
- "Can you book me a flight?"
# Status check variations
- "What's my flight status?"
- "Check my booking"
- "Is my flight on time?"
```
**Step 3: Include Edge Cases**
```yaml
golden_prompts:
# Normal cases (from Step 2)
- "Book a flight from NYC to LA"
# Edge cases
- "Book a flight" # Minimal input
- "I need to travel somewhere" # Vague request
- "What if I need to change my flight?" # Conditional
- "Book a flight for next year" # Far future
```
**Step 4: Cover Different User Styles**
```yaml
golden_prompts:
# Formal
- "I would like to book a flight from New York to Los Angeles"
# Casual
- "hey can u book me a flight nyc to la"
# Technical/precise
- "Flight booking: JFK -> LAX, 2024-03-15, economy"
# Verbose
- "Hi! I'm planning a trip and I need to book a flight from New York City to Los Angeles on March 15th, 2024. Can you help me with that?"
```
#### Golden Prompts for Structured Input Agents
For agents that accept structured data (JSON, YAML, key-value pairs):
**Example: Reddit Community Discovery Agent**
```yaml
golden_prompts:
# Complete structured input
- |
Industry: Fitness tech
Product/Service: AI personal trainer app
Business Model: B2C
Target Market: fitness enthusiasts, people who want to lose weight
Description: An app that provides personalized workout plans using AI
# Different business model
- |
Industry: Marketing tech
Product/Service: Email automation platform
Business Model: B2B SaaS
Target Market: small business owners, marketing teams
Description: Automated email campaigns for small businesses
# Minimal input (edge case)
- |
Industry: Healthcare tech
Product/Service: Telemedicine platform
Business Model: B2C
# Different industry
- |
Industry: E-commerce
Product/Service: Handmade crafts marketplace
Business Model: Marketplace
Target Market: crafters, DIY enthusiasts
Description: Platform connecting artisans with buyers
```
**Example: API Request Builder Agent**
```yaml
golden_prompts:
- |
Method: GET
Endpoint: /users
Headers: {"Authorization": "Bearer token"}
- |
Method: POST
Endpoint: /orders
Body: {"product_id": 123, "quantity": 2}
- |
Method: PUT
Endpoint: /users/123
Body: {"name": "John Doe"}
```
#### Domain-Specific Examples
**E-commerce Agent:**
```yaml
golden_prompts:
# Product search
- "I'm looking for a red dress size medium"
- "Show me running shoes under $100"
- "Find blue jeans for men"
# Cart operations
- "Add this to my cart"
- "What's in my cart?"
- "Remove item from cart"
# Orders
- "Track my order #ABC123"
- "What's my order status?"
- "Cancel my order"
# Support
- "What's the return policy?"
- "How do I exchange an item?"
- "Contact customer service"
```
**Code Generation Agent:**
```yaml
golden_prompts:
# Simple functions
- "Write a Python function to sort a list"
- "Create a function to calculate factorial"
# Components
- "Create a React component for a login form"
- "Build a Vue component for a todo list"
# Integration
- "How do I connect to PostgreSQL in Node.js?"
- "Show me how to use Redis with Python"
# Debugging
- "Fix this bug: [code snippet]"
- "Why is this code not working?"
```
**Customer Support Agent:**
```yaml
golden_prompts:
# Account questions
- "What's my account balance?"
- "How do I change my password?"
- "Update my email address"
# Product questions
- "How do I use feature X?"
- "What are the system requirements?"
- "Is there a mobile app?"
# Billing
- "What's my subscription status?"
- "How do I cancel my subscription?"
- "Update my payment method"
```
#### Quality Checklist
Before finalizing your golden prompts, verify:
- [ ] **Coverage**: All major features/use cases included
- [ ] **Diversity**: Different complexity levels (simple, medium, complex)
- [ ] **Realism**: Based on actual user queries from production
- [ ] **Edge Cases**: Unusual but valid inputs included
- [ ] **User Styles**: Formal, casual, technical, verbose variations
- [ ] **Quantity**: 5-15 prompts recommended (start with 5, expand)
- [ ] **Clarity**: Each prompt represents a distinct use case
- [ ] **Relevance**: All prompts are things users would actually send
#### Iterative Improvement
**Phase 1: Initial Set (5 prompts)**
```yaml
golden_prompts:
- "Primary use case 1"
- "Primary use case 2"
- "Primary use case 3"
- "Secondary use case 1"
- "Edge case 1"
```
**Phase 2: Expand (10 prompts)**
```yaml
# Add variations and more edge cases
golden_prompts:
# ... previous 5 ...
- "Primary use case 1 variation"
- "Primary use case 2 variation"
- "Secondary use case 2"
- "Edge case 2"
- "Edge case 3"
```
**Phase 3: Refine (15+ prompts)**
```yaml
# Add based on test results and production data
golden_prompts:
# ... previous 10 ...
- "Real user query from logs"
- "Another production example"
- "Failure case that should work"
```
#### Common Mistakes to Avoid
❌ **Too Generic**
```yaml
# Bad: Too vague
golden_prompts:
- "Help me"
- "Do something"
- "Question"
```
✅ **Specific and Actionable**
```yaml
# Good: Clear intent
golden_prompts:
- "Book a flight from NYC to LA"
- "What's my account balance?"
- "Cancel my subscription"
```
❌ **Including System Prompts**
```yaml
# Bad: This is a system prompt, not a golden prompt
golden_prompts:
- "You are a helpful assistant that..."
```
✅ **User Inputs Only**
```yaml
# Good: Actual user queries
golden_prompts:
- "Book a flight"
- "What's the weather?"
```
❌ **Only Happy Path**
```yaml
# Bad: Only perfect inputs
golden_prompts:
- "Book a flight from New York to Los Angeles on March 15th, 2024, economy class, window seat, no meals"
```
✅ **Include Variations**
```yaml
# Good: Various input styles
golden_prompts:
- "Book a flight from NYC to LA"
- "I need to fly to Los Angeles"
- "flight booking please"
- "Can you help me book a flight?"
```
#### Testing Your Golden Prompts
Before running FlakeStorm, manually test your golden prompts:
```bash
# Test each golden prompt manually
curl -X POST http://localhost:8000/invoke \
-H "Content-Type: application/json" \
-d '{"input": "Your golden prompt here"}'
```
Verify:
- ✅ Agent responds correctly
- ✅ Response time is reasonable
- ✅ No errors occur
- ✅ Response format matches expectations
If a golden prompt fails manually, fix your agent first, then use it in FlakeStorm.
---
## Troubleshooting