> Best practices for building agents and agentic applications with Plano — the AI-native proxy and dataplane. Covers configuration, routing, agent orchestration, filter chains, observability, CLI operations, and deployment patterns.
- [7.2 Use PostgreSQL State Storage for Multi-Turn Conversations in Production](#use-postgresql-state-storage-for-multi-turn-conversations-in-production)
- [7.3 Verify Listener Health Before Sending Requests](#verify-listener-health-before-sending-requests)
- [Section 8: Advanced Patterns](#section-8)
- [8.1 Combine Multiple Listener Types for Layered Agent Architectures](#combine-multiple-listener-types-for-layered-agent-architectures)
- [8.2 Design Prompt Targets with Precise Parameter Schemas](#design-prompt-targets-with-precise-parameter-schemas)
---
## Section 1: Configuration Fundamentals
*Core config.yaml structure, versioning, listener types, and provider setup — the entry point for every Plano deployment.*
### 1.1 Always Specify a Supported Config Version
**Impact:** `CRITICAL` — Plano rejects configs with missing or unsupported version fields — the version field gates all other validation
**Tags:** `config`, `versioning`, `validation`
## Always Specify a Supported Config Version
Every Plano `config.yaml` must include a `version` field at the top level. Plano validates configs against a versioned JSON schema — an unrecognized or missing version will cause `planoai up` to fail immediately with a schema validation error before the container starts.
**Incorrect (missing or invalid version):**
```yaml
# No version field — fails schema validation
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
```
**Correct (explicit supported version):**
```yaml
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
```
Use the latest supported version unless you are targeting a specific deployed Plano image. Current supported versions: `v0.1`, `v0.1.0`, `0.1-beta`, `v0.2.0`, `v0.3.0`. Prefer `v0.3.0` for all new projects.
### 1.2 Choose the Right Listener Type for Your Use Case
**Impact:** `CRITICAL` — The listener type determines the entire request processing pipeline — choosing the wrong type means features like prompt functions or agent routing are unavailable
## Choose the Right Listener Type for Your Use Case
Plano supports three listener types, each serving a distinct purpose. `listeners` is the only required top-level array in a Plano config. Every listener needs at minimum a `type`, `name`, and `port`.
| Type | Use When | Key Feature |
|------|----------|-------------|
| `model` | You want an OpenAI-compatible LLM gateway | Routes to multiple LLM providers, supports model aliases and routing preferences |
| `prompt` | You want LLM-callable custom functions | Define `prompt_targets` that the LLM dispatches as function calls |
| `agent` | You want multi-agent orchestration | Routes user requests to specialized sub-agents by matching agent descriptions |
**Incorrect (using `model` when agents need orchestration):**
```yaml
version: v0.3.0
# Wrong: a model listener cannot route to backend agent services
listeners:
- type: model
name: main
port: 12000
agents:
- id: weather_agent
url: http://host.docker.internal:8001
```
**Correct (use `agent` listener for multi-agent systems):**
```yaml
version: v0.3.0
agents:
- id: weather_agent
url: http://host.docker.internal:8001
- id: travel_agent
url: http://host.docker.internal:8002
listeners:
- type: agent
name: orchestrator
port: 8000
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: Provides real-time weather, forecasts, and conditions for any city.
- id: travel_agent
description: Books flights, hotels, and travel itineraries.
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
```
A single Plano instance can expose multiple listeners on different ports, each with a different type, to serve different clients simultaneously.
Reference: https://github.com/katanemo/archgw
---
### 1.3 Register Model Providers with Correct Format Identifiers
**Impact:** `CRITICAL` — Incorrect provider format causes request translation failures — Plano must know the wire format each provider expects
## Register Model Providers with Correct Format Identifiers
Plano translates requests between its internal format and each provider's API. The `model` field uses `provider/model-name` syntax which determines both the upstream endpoint and the request/response translation layer. Some providers require an explicit `provider_interface` override.
## Use Environment Variable Substitution for All Secrets
Plano supports `$VAR_NAME` substitution in config values. This applies to `access_key` fields, `connection_string` for state storage, and `http_headers` in prompt targets and endpoints. Never hardcode credentials — Plano reads them from environment variables or a `.env` file at startup via `planoai up`.
**Incorrect (hardcoded secrets):**
```yaml
version: v0.3.0
model_providers:
- model: openai/gpt-4o
access_key: abcdefghijklmnopqrstuvwxyz... # Hardcoded — never do this
**`.env` file pattern (loaded automatically by `planoai up`):**
```bash
# .env — add to .gitignore
OPENAI_API_KEY=abcdefghijklmnopqrstuvwxyz...
ANTHROPIC_API_KEY=abcdefghijklmnopqrstuvwxyz...
DB_USER=plano
DB_PASS=secure-password
DB_HOST=localhost
MY_API_TOKEN=abcdefghijklmnopqrstuvwxyz...
```
Plano also accepts keys set directly in the shell environment. Variables referenced in config but not found at startup cause `planoai up` to fail with a clear error listing the missing keys.
Reference: https://github.com/katanemo/archgw
---
## Section 2: Routing & Model Selection
*Intelligent LLM routing using preferences, aliases, and defaults to match tasks to the best model.*
### 2.1 Always Set Exactly One Default Model Provider
**Impact:** `HIGH` — Without a default provider, Plano has no fallback when routing preferences do not match — requests with unclassified intent will fail
When a request does not match any routing preference, Plano forwards it to the `default: true` provider. Without a default, unmatched requests fail. If multiple providers are marked `default: true`, Plano uses the first one — which can produce unexpected behavior.
### 2.2 Use Model Aliases for Semantic, Stable Model References
**Impact:** `MEDIUM` — Hardcoded model names in client code require code changes when you swap providers; aliases let you update routing in config.yaml alone
## Use Model Aliases for Semantic, Stable Model References
`model_aliases` map human-readable names to specific model identifiers. Client applications reference the alias, not the underlying model. When you want to upgrade from `gpt-4o` to a new model, you change one line in `config.yaml` — not every client calling the API.
**Incorrect (clients hardcode specific model names):**
```yaml
# config.yaml — no aliases defined
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
```
```python
# Client code — brittle, must be updated when model changes
- Bumping `.v2` → `.v3` lets you run old and new aliases simultaneously during rollouts
-`plano.v1` as a canonical default gives clients a single stable entry point
Reference: https://github.com/katanemo/archgw
---
### 2.3 Use Passthrough Auth for Proxy and Multi-Tenant Setups
**Impact:** `MEDIUM` — Without passthrough auth, self-hosted proxy services (LiteLLM, vLLM, etc.) reject Plano's requests because the wrong Authorization header is sent
## Use Passthrough Auth for Proxy and Multi-Tenant Setups
When routing to a self-hosted LLM proxy (LiteLLM, vLLM, OpenRouter, Azure APIM) or in multi-tenant setups where clients supply their own keys, set `passthrough_auth: true`. This forwards the client's `Authorization` header rather than Plano's configured `access_key`. Combine with a `base_url` pointing to the proxy.
**Incorrect (Plano sends its own key to a proxy that expects the client's key):**
```yaml
model_providers:
- model: custom/proxy
base_url: http://host.docker.internal:8000
access_key: $SOME_KEY # Plano overwrites the client's auth — proxy rejects it
```
**Correct (forward client Authorization header to the proxy):**
```yaml
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: custom/litellm-proxy
base_url: http://host.docker.internal:4000 # LiteLLM server
provider_interface: openai # LiteLLM uses OpenAI format
Plano's `plano_orchestrator_v1` router uses a 1.5B preference-aligned LLM to classify incoming requests against your `routing_preferences` descriptions. It returns an ordered `models` list for the matched route; the client uses `models[0]` as primary and falls back to `models[1]`, `models[2]`... on `429`/`5xx` errors. Description quality directly determines routing accuracy.
Starting in `v0.4.0`, `routing_preferences` lives at the **top level** of the config and each entry carries its own `models: [...]` candidate pool. Listing multiple models under a single route gives you automatic provider fallback without extra client logic. Configs still using the legacy v0.3.0 inline shape (under each `model_provider`) are auto-migrated with a deprecation warning — prefer the top-level form below.
- Test with representative queries using `planoai trace` and `--where` filters to verify routing decisions
Reference: https://github.com/katanemo/archgw
---
## Section 3: Agent Orchestration
*Multi-agent patterns, agent descriptions, and orchestration strategies for building agentic applications.*
### 3.1 Register All Sub-Agents in Both `agents` and `listeners.agents`
**Impact:** `CRITICAL` — An agent registered only in `agents` but not referenced in a listener's agent list is unreachable; an agent listed in a listener but missing from `agents` causes a startup error
## Register All Sub-Agents in Both `agents` and `listeners.agents`
Plano's agent system has two separate concepts: the global `agents` array (defines the agent's ID and backend URL) and the `listeners[].agents` array (controls which agents are available to an orchestrator and provides their routing descriptions). Both must reference the same agent ID.
**Incorrect (agent defined globally but not referenced in listener):**
```yaml
version: v0.3.0
agents:
- id: weather_agent
url: http://host.docker.internal:8001
- id: news_agent # Defined but never referenced in any listener
url: http://host.docker.internal:8002
listeners:
- type: agent
name: orchestrator
port: 8000
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: Provides weather forecasts and current conditions.
# news_agent is missing here — the orchestrator cannot route to it
```
**Incorrect (listener references an agent ID not in the global agents list):**
```yaml
agents:
- id: weather_agent
url: http://host.docker.internal:8001
listeners:
- type: agent
name: orchestrator
port: 8000
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: Provides weather forecasts.
- id: flights_agent # ID not in global agents[] — startup error
description: Provides flight status information.
```
**Correct (every agent ID appears in both places):**
```yaml
version: v0.3.0
agents:
- id: weather_agent
url: http://host.docker.internal:8001
- id: flights_agent
url: http://host.docker.internal:8002
- id: hotels_agent
url: http://host.docker.internal:8003
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
listeners:
- type: agent
name: travel_orchestrator
port: 8000
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: Real-time weather, forecasts, and climate data for any city.
- id: flights_agent
description: Live flight status, schedules, gates, and delays.
- id: hotels_agent
description: Hotel search, availability, pricing, and booking.
default: true # Fallback if no other agent matches
```
Set `default: true` on one agent in each listener's agents list to handle unmatched requests. The agent's URL in the global `agents` array is the HTTP endpoint Plano forwards matching requests to — it must be reachable from within the Docker container (use `host.docker.internal` for services on the host).
Reference: https://github.com/katanemo/archgw
---
### 3.2 Write Capability-Focused Agent Descriptions for Accurate Routing
**Impact:** `HIGH` — The orchestrator LLM routes requests purely by reading agent descriptions — poor descriptions cause misroutes to the wrong specialized agent
## Write Capability-Focused Agent Descriptions for Accurate Routing
In an `agent` listener, Plano's orchestrator reads each agent's `description` and routes user requests to the best-matching agent. This is LLM-based intent matching — the description is the entire specification the router sees. Write it as a capability manifest: what can this agent do, what data does it have access to, and what types of requests should it handle?
## Configure MCP Filters with Explicit Type and Transport
Plano filters integrate with external services via MCP (Model Context Protocol) or plain HTTP. MCP filters call a specific tool on a remote MCP server. Always specify `type`, `transport`, and optionally `tool` (defaults to the filter `id`) to ensure Plano connects correctly to your filter implementation.
**Incorrect (minimal filter definition relying on all defaults):**
```yaml
filters:
- id: my_guard # Plano infers type=mcp, transport=streamable-http, tool=my_guard
url: http://localhost:10500
# If your MCP server uses a different tool name or transport, this silently misroutes
```
**Correct (explicit configuration for each filter):**
```yaml
version: v0.3.0
filters:
- id: input_guards
url: http://host.docker.internal:10500
type: mcp # Explicitly MCP protocol
transport: streamable-http # Streamable HTTP transport
tool: input_guards # MCP tool name (matches MCP server registration)
- id: query_rewriter
url: http://host.docker.internal:10501
type: mcp
transport: streamable-http
tool: rewrite_query # Tool name differs from filter ID — explicit is safer
- id: custom_validator
url: http://host.docker.internal:10503
type: http # Plain HTTP filter (not MCP)
# No tool field for HTTP filters
```
**MCP filter implementation contract:**
Your MCP server must expose a tool matching the `tool` name. The tool receives the request payload and must return either:
- A modified request (to pass through with changes)
- A rejection response (to short-circuit the pipeline)
**HTTP filter alternative** — use `type: http` for simpler request/response interceptors that don't need the MCP protocol:
```yaml
filters:
- id: auth_validator
url: http://host.docker.internal:9000/validate
type: http # Plano POSTs the request, expects the modified request back
```
Reference: https://github.com/katanemo/archgw
---
### 4.2 Configure Prompt Guards with Actionable Rejection Messages
**Impact:** `MEDIUM` — A generic or empty rejection message leaves users confused about why their request was blocked and unable to rephrase appropriately
## Configure Prompt Guards with Actionable Rejection Messages
Plano has built-in `prompt_guards` for detecting jailbreak attempts. When triggered, Plano returns the `on_exception.message` instead of forwarding the request. Write messages that explain the restriction and suggest what the user can do instead — both for user experience and to reduce support burden.
**Incorrect (no message configured — returns a generic error):**
`prompt_guards` applies globally to all listeners. Use `filter_chain` on individual agents for per-agent policies.
Reference: https://github.com/katanemo/archgw
---
### 4.3 Order Filter Chains with Guards First, Enrichment Last
**Impact:** `HIGH` — Running context builders before input guards means jailbreak attempts get RAG-enriched context before being blocked — wasting compute and risking data exposure
## Order Filter Chains with Guards First, Enrichment Last
A `filter_chain` is an ordered list of filter IDs applied sequentially to each request. The order is semantically meaningful: each filter receives the output of the previous one. Safety and validation filters must run first to short-circuit bad requests before expensive enrichment filters process them.
4.**Output guards** — validate or sanitize LLM response before returning
**Incorrect (context built before guards — wasteful and potentially unsafe):**
```yaml
filters:
- id: context_builder
url: http://host.docker.internal:10502 # Runs expensive RAG retrieval first
- id: query_rewriter
url: http://host.docker.internal:10501
- id: input_guards
url: http://host.docker.internal:10500 # Guards run last — jailbreak gets context
listeners:
- type: agent
name: rag_orchestrator
port: 8000
router: plano_orchestrator_v1
agents:
- id: rag_agent
filter_chain:
- context_builder # Wrong: expensive enrichment before safety check
- query_rewriter
- input_guards
```
**Correct (guards block bad requests before any enrichment):**
```yaml
version: v0.3.0
filters:
- id: input_guards
url: http://host.docker.internal:10500
type: mcp
transport: streamable-http
- id: query_rewriter
url: http://host.docker.internal:10501
type: mcp
transport: streamable-http
- id: context_builder
url: http://host.docker.internal:10502
type: mcp
transport: streamable-http
listeners:
- type: agent
name: rag_orchestrator
port: 8000
router: plano_orchestrator_v1
agents:
- id: rag_agent
description: Answers questions using internal knowledge base documents.
filter_chain:
- input_guards # 1. Block jailbreaks and policy violations
- query_rewriter # 2. Normalize the safe query
- context_builder # 3. Retrieve relevant context for the clean query
```
Different agents within the same listener can have different filter chains — a public-facing agent may need all guards while an internal admin agent may skip them.
Reference: https://github.com/katanemo/archgw
---
## Section 5: Observability & Debugging
*OpenTelemetry tracing, log levels, span attributes, and sampling for production visibility.*
### 5.1 Add Custom Span Attributes for Correlation and Filtering
**Impact:** `MEDIUM` — Without custom span attributes, traces cannot be filtered by user, session, or environment — making production debugging significantly harder
## Add Custom Span Attributes for Correlation and Filtering
Plano can automatically extract HTTP request headers and attach them as span attributes, plus attach static key-value pairs to every span. This enables filtering traces by user, session, tenant, environment, or any other dimension that matters to your application.
**Incorrect (no span attributes — traces are unfiltered blobs):**
```yaml
tracing:
random_sampling: 20
# No span_attributes — cannot filter by user, session, or environment
```
**Correct (rich span attributes for production correlation):**
```yaml
version: v0.3.0
tracing:
random_sampling: 20
trace_arch_internal: true
span_attributes:
# Match all headers with this prefix, then map to span attributes by:
# 1) stripping the prefix and 2) converting hyphens to dots
header_prefixes:
- x-katanemo-
# Static attributes added to every span from this Plano instance
static:
environment: production
service.name: plano-gateway
deployment.region: us-east-1
service.version: "2.1.0"
team: platform-engineering
```
**Sending correlation headers from client code:**
```python
import httpx
response = httpx.post(
"http://localhost:12000/v1/chat/completions",
headers={
"x-katanemo-request-id": "req_abc123",
"x-katanemo-user-id": "usr_12",
"x-katanemo-session-id": "sess_xyz456",
"x-katanemo-tenant-id": "acme-corp",
},
json={"model": "plano.v1", "messages": [...]}
)
```
**Querying by custom attribute:**
```bash
# Find all requests from a specific user
planoai trace --where user.id=usr_12
# Find all traces from production environment
planoai trace --where environment=production
# Find traces from a specific tenant
planoai trace --where tenant.id=acme-corp
```
Header prefix matching is a prefix match. With `x-katanemo-`, these mappings apply:
### 5.2 Enable Tracing with Appropriate Sampling for Your Environment
**Impact:** `HIGH` — Without tracing enabled, debugging routing decisions, latency issues, and model selection is guesswork — traces are the primary observability primitive in Plano
## Enable Tracing with Appropriate Sampling for Your Environment
Plano emits OpenTelemetry (OTEL) traces for every request, capturing routing decisions, LLM provider selection, filter chain execution, and response latency. Traces are the best tool for understanding why a request was routed to a particular model and debugging unexpected behavior.
**Incorrect (no tracing configured — flying blind in production):**
```yaml
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
# No tracing block — no visibility into routing, latency, or errors
```
**Correct (tracing enabled with environment-appropriate sampling):**
```yaml
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
tracing:
random_sampling: 100 # 100% for development/debugging
trace_arch_internal: true # Include Plano's internal routing spans
```
**Production configuration (sampled to control volume):**
```yaml
tracing:
random_sampling: 10 # Sample 10% of requests in production
trace_arch_internal: false # Skip internal spans to reduce noise
span_attributes:
header_prefixes:
- x-katanemo- # Match all x-katanemo-* headers
static:
environment: production
service.name: my-plano-service
version: "1.0.0"
```
With `x-katanemo-` configured, Plano maps headers to attributes by stripping the prefix and converting hyphens to dots:
-`x-katanemo-user-id` -> `user.id`
-`x-katanemo-session-id` -> `session.id`
-`x-katanemo-request-id` -> `request.id`
**Starting the trace collector:**
```bash
# Start Plano with built-in OTEL collector
planoai up config.yaml --with-tracing
```
Sampling rates: 100% for dev/staging, 5–20% for high-traffic production, 100% for low-traffic production. `trace_arch_internal: true` adds spans showing which routing preference matched — essential for debugging preference configuration.
### 5.3 Use `planoai trace` to Inspect Routing Decisions
**Impact:** `MEDIUM-HIGH` — The trace CLI lets you verify which model was selected, why, and how long each step took — without setting up a full OTEL backend
## Use `planoai trace` to Inspect Routing Decisions
`planoai trace` provides a built-in trace viewer backed by an in-memory OTEL collector. Use it to inspect routing decisions, verify preference matching, measure filter latency, and debug failed requests — all from the CLI without configuring Jaeger, Zipkin, or another backend.
**Workflow: start collector, run requests, then inspect traces:**
```bash
# 1. Start Plano with the built-in trace collector (recommended)
planoai up config.yaml --with-tracing
# 2. Send test requests through Plano
curl http://localhost:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "plano.v1", "messages": [{"role": "user", "content": "Write a Python function to sort a list"}]}'
# 3. Show the latest trace
planoai trace
```
You can also run the trace listener directly:
```bash
planoai trace listen # available on a process ID running OTEL collector
```
Stop the background trace listener:
```bash
planoai trace down
```
**Useful trace viewer patterns:**
```bash
# Show latest trace (default target is "last")
planoai trace
# List available trace IDs
planoai trace --list
# Show all traces
planoai trace any
# Show a specific trace (short 8-char or full 32-char ID)
planoai trace 7f4e9a1c
planoai trace 7f4e9a1c0d9d4a0bb9bf5a8a7d13f62a
# Filter by specific span attributes (AND semantics for repeated --where)
planoai trace any --where llm.model=gpt-4o-mini
# Filter by user ID (if header prefix is x-katanemo-, x-katanemo-user-id maps to user.id)
*Using the planoai CLI for startup, tracing, CLI agents, project init, and code generation.*
### 6.1 Follow the `planoai up` Validation Workflow Before Debugging Runtime Issues
**Impact:** `HIGH` — `planoai up` validates config, checks API keys, and health-checks all listeners — skipping this diagnostic information leads to unnecessary debugging of container or network issues
## Follow the `planoai up` Validation Workflow Before Debugging Runtime Issues
`planoai up` is the entry point for running Plano. It performs sequential checks before the container starts: schema validation, API key presence check, container startup, and health checks on all configured listener ports. Understanding what each failure stage means prevents chasing the wrong root cause.
**Validation stages and failure signals:**
```
Stage 1: Schema validation → "config.yaml: invalid against schema"
Stage 3: Container start → "Docker daemon not running" or image pull errors
Stage 4: Health check (/healthz) → "Listener not healthy after 120s" (timeout)
```
**Development startup workflow:**
```bash
# Standard startup — config.yaml in current directory
planoai up
# Explicit config file path
planoai up my-config.yaml
# Start in foreground to see all logs immediately (great for debugging)
planoai up config.yaml --foreground
# Start with built-in OTEL trace collector
planoai up config.yaml --with-tracing
# Enable verbose logging for debugging routing decisions
LOG_LEVEL=debug planoai up config.yaml --foreground
```
**Checking what's running:**
```bash
# Stream recent logs (last N lines, then exit)
planoai logs
# Follow logs in real-time
planoai logs --follow
# Include Envoy/gateway debug messages
planoai logs --debug --follow
```
**Stopping and restarting after config changes:**
```bash
# Stop the current container
planoai down
# Restart with updated config
planoai up config.yaml
```
**Common failure patterns:**
```bash
# API key missing — check your .env file or shell environment
export OPENAI_API_KEY=sk-proj-...
planoai up config.yaml
# Health check timeout — listener port may conflict
# Check if another process uses port 12000
lsof -i :12000
# Container fails to start — verify Docker daemon is running
docker ps
```
`planoai down` fully stops and removes the Plano container. Always run `planoai down` before `planoai up` when changing config to avoid stale container state.
Reference: https://github.com/katanemo/archgw
---
### 6.2 Generate Prompt Targets from Python Functions with `planoai generate_prompt_targets`
**Impact:** `MEDIUM` — Manually writing prompt_targets YAML for existing Python APIs is error-prone — the generator introspects function signatures and produces correct YAML automatically
## Generate Prompt Targets from Python Functions with `planoai generate_prompt_targets`
`planoai generate_prompt_targets` introspects Python function signatures and docstrings to generate `prompt_targets` YAML for your Plano config. This is the fastest way to expose existing Python APIs as LLM-callable functions without manually writing the YAML schema.
**Python function requirements for generation:**
- Use simple type annotations: `int`, `float`, `bool`, `str`, `list`, `tuple`, `set`, `dict`
- Include a docstring describing what the function does (becomes the `description`)
- Complex Pydantic models must be flattened into primitive typed parameters first
After generation, manually add the `endpoint` blocks pointing to your actual API. The generator produces the schema; you wire in the connectivity.
Reference: https://github.com/katanemo/archgw
---
### 6.3 Use `planoai cli_agent` to Connect Claude Code Through Plano
**Impact:** `MEDIUM-HIGH` — Running Claude Code directly against provider APIs bypasses Plano's routing, observability, and guardrails — cli_agent routes all Claude Code traffic through your configured Plano instance
## Use `planoai cli_agent` to Connect Claude Code Through Plano
`planoai cli_agent` starts a Claude Code session that routes all LLM traffic through your running Plano instance instead of directly to Anthropic. This gives you routing preferences, model aliases, tracing, and guardrails for your coding agent workflows — making Claude Code a first-class citizen of your Plano configuration.
**Prerequisites:**
```bash
# 1. Plano must be running with a model listener
planoai up config.yaml
# 2. ANTHROPIC_API_KEY must be set (Claude Code uses it for auth)
export ANTHROPIC_API_KEY=sk-ant-...
```
**Starting the CLI agent:**
```bash
# Start CLI agent using config.yaml in current directory
### 6.4 Use `planoai init` Templates to Bootstrap New Projects Correctly
**Impact:** `MEDIUM` — Starting from a blank config.yaml leads to missing required fields and common structural mistakes — templates provide validated, idiomatic starting points
## Use `planoai init` Templates to Bootstrap New Projects Correctly
`planoai init` generates a valid `config.yaml` from built-in templates. Each template demonstrates a specific Plano capability with correct structure, realistic examples, and comments. Use this instead of writing config from scratch — it ensures you start with a valid, working configuration.
**Available templates:**
| Template ID | What It Demonstrates | Best For |
|---|---|---|
| `sub_agent_orchestration` | Multi-agent routing with specialized sub-agents | Building agentic applications |
| `coding_agent_routing` | Routing preferences + model aliases for coding workflows | Claude Code and coding assistants |
| `preference_aware_routing` | Automatic LLM routing based on task type | Multi-model cost optimization |
Start with `preference_aware_routing` for most LLM gateway use cases and `sub_agent_orchestration` for multi-agent applications. Both can be combined after you understand each independently.
Reference: https://github.com/katanemo/archgw
---
## Section 7: Deployment & Security
*Docker deployment, environment variable management, health checks, and state storage for production.*
**Impact:** `HIGH` — Using `localhost` for agent URLs inside Docker always fails — Plano runs in a container and cannot reach host services via localhost
## Understand Plano's Docker Network Topology for Agent URL Configuration
Plano runs inside a Docker container managed by `planoai up`. Services running on your host machine (agent servers, filter servers, databases) are not accessible as `localhost` from inside the container. Use Docker's special hostname `host.docker.internal` to reach host services.
**Docker network rules:**
-`localhost` / `127.0.0.1` inside the container → Plano's own container (not your host)
-`host.docker.internal` → Your host machine's loopback interface
- Container name or `docker network` hostname → Other Docker containers
- External domain / IP → Reachable if Docker has network access
**Incorrect (using localhost — agent unreachable from inside container):**
```yaml
version: v0.3.0
agents:
- id: weather_agent
url: http://localhost:8001 # Wrong: this is Plano's own container
- id: flight_agent
url: http://127.0.0.1:8002 # Wrong: same issue
filters:
- id: input_guards
url: http://localhost:10500 # Wrong: filter server unreachable
```
**Correct (using host.docker.internal for host-side services):**
```yaml
version: v0.3.0
agents:
- id: weather_agent
url: http://host.docker.internal:8001 # Correct: reaches host port 8001
- id: flight_agent
url: http://host.docker.internal:8002 # Correct: reaches host port 8002
filters:
- id: input_guards
url: http://host.docker.internal:10500 # Correct: reaches filter server on host
endpoints:
internal_api:
endpoint: host.docker.internal # Correct for internal API on host
protocol: http
```
**Production deployment patterns:**
```yaml
# Kubernetes / Docker Compose — use service names
agents:
- id: weather_agent
url: http://weather-service:8001 # Kubernetes service DNS
### 7.2 Use PostgreSQL State Storage for Multi-Turn Conversations in Production
**Impact:** `HIGH` — The default in-memory state storage loses all conversation history when the container restarts — production multi-turn agents require persistent PostgreSQL storage
## Use PostgreSQL State Storage for Multi-Turn Conversations in Production
`state_storage` enables Plano to maintain conversation context across requests. Without it, each request is stateless. The `memory` type works for development and testing — all state is lost on container restart. Use `postgres` for any production deployment where conversation continuity matters.
**Incorrect (memory storage in production):**
```yaml
version: v0.3.0
# Memory storage — all conversations lost on planoai down / container restart
state_storage:
type: memory
listeners:
- type: agent
name: customer_support
port: 8000
router: plano_orchestrator_v1
agents:
- id: support_agent
description: Customer support assistant with conversation history.
```
**Correct (PostgreSQL for production persistence):**
description: Customer support assistant with access to full conversation history.
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
```
**Setting up PostgreSQL for local development:**
```bash
# Start PostgreSQL with Docker
docker run -d \
--name plano-postgres \
-e POSTGRES_USER=plano \
-e POSTGRES_PASSWORD=devpassword \
-e POSTGRES_DB=plano \
-p 5432:5432 \
postgres:16
# Set environment variables
export DB_USER=plano
export DB_PASS=devpassword
export DB_HOST=host.docker.internal # Use host.docker.internal from inside Plano container
export DB_NAME=plano
```
**Production `.env` pattern:**
```bash
DB_USER=plano_prod
DB_PASS=<strong-random-password>
DB_HOST=your-rds-endpoint.amazonaws.com
DB_NAME=plano
```
Plano automatically creates its state tables on first startup. The `connection_string` supports all standard PostgreSQL connection parameters including SSL: `postgresql://user:pass@host:5432/db?sslmode=require`.
Reference: https://github.com/katanemo/archgw
---
### 7.3 Verify Listener Health Before Sending Requests
**Impact:** `MEDIUM` — Sending requests to Plano before listeners are healthy results in connection refused errors that look like application bugs — always confirm health before testing
Each Plano listener exposes a `/healthz` HTTP endpoint. `planoai up` automatically health-checks all listeners during startup (120s timeout), but in CI/CD pipelines, custom scripts, or when troubleshooting, you may need to check health manually.
**Health check endpoints:**
```bash
# Check model listener health (port from your config)
curl -f http://localhost:12000/healthz
# Returns 200 OK when healthy
# Check prompt listener
curl -f http://localhost:10000/healthz
# Check agent listener
curl -f http://localhost:8000/healthz
```
**Polling health in scripts (CI/CD pattern):**
```bash
#!/bin/bash
# wait-for-plano.sh
LISTENER_PORT=${1:-12000}
MAX_WAIT=120
INTERVAL=2
elapsed=0
echo "Waiting for Plano listener on port $LISTENER_PORT..."
until curl -sf "http://localhost:$LISTENER_PORT/healthz" > /dev/null; do
if [ $elapsed -ge $MAX_WAIT ]; then
echo "ERROR: Plano listener not healthy after ${MAX_WAIT}s"
planoai logs --debug
exit 1
fi
sleep $INTERVAL
elapsed=$((elapsed + INTERVAL))
done
echo "Plano listener healthy after ${elapsed}s"
```
**Docker Compose health check:**
```yaml
# docker-compose.yml for services that depend on Plano
services:
plano:
image: katanemo/plano:latest
# Plano is managed by planoai, not directly via compose in most setups
planoai down && planoai up config.yaml --foreground
```
Reference: https://github.com/katanemo/archgw
---
## Section 8: Advanced Patterns
*Prompt targets, external API integration, rate limiting, and multi-listener architectures.*
### 8.1 Combine Multiple Listener Types for Layered Agent Architectures
**Impact:** `MEDIUM` — Using a single listener type forces all traffic through one gateway pattern — combining types lets you serve different clients with the right interface without running multiple Plano instances
## Combine Multiple Listener Types for Layered Agent Architectures
A single Plano `config.yaml` can define multiple listeners of different types, each on a separate port. This lets you serve different client types simultaneously: an OpenAI-compatible model gateway for direct API clients, a prompt gateway for LLM-callable function applications, and an agent orchestrator for multi-agent workflows — all from one Plano instance sharing the same model providers.
**Single listener (limited — forces all clients through one interface):**
```yaml
version: v0.3.0
listeners:
- type: model # Only model clients can use this
name: model_gateway
port: 12000
# Prompt target clients and agent clients cannot connect
```
**Multi-listener architecture (serves all client types):**
# --- Listener 1: OpenAI-compatible API gateway ---
# For: SDK clients, Claude Code, LangChain, etc.
listeners:
- type: model
name: model_gateway
port: 12000
timeout: "120s"
# --- Listener 2: Prompt function gateway ---
# For: Applications that expose LLM-callable APIs
- type: prompt
name: function_gateway
port: 10000
timeout: "60s"
# --- Listener 3: Agent orchestration gateway ---
# For: Multi-agent application clients
- type: agent
name: agent_orchestrator
port: 8000
timeout: "90s"
router: plano_orchestrator_v1
agents:
- id: research_agent
description: Searches, synthesizes, and summarizes information from multiple sources.
filter_chain:
- input_guards
- context_builder
- id: code_agent
description: Writes, reviews, debugs, and explains code across all languages.
default: true
# --- Agents ---
agents:
- id: research_agent
url: http://host.docker.internal:8001
- id: code_agent
url: http://host.docker.internal:8002
# --- Filters ---
filters:
- id: input_guards
url: http://host.docker.internal:10500
type: mcp
transport: streamable-http
- id: context_builder
url: http://host.docker.internal:10501
type: mcp
transport: streamable-http
# --- Prompt targets (for function gateway) ---
endpoints:
internal_api:
endpoint: host.docker.internal
protocol: http
prompt_targets:
- name: search_knowledge_base
description: Search the internal knowledge base for relevant documents and facts.
parameters:
- name: query
type: str
required: true
description: Search query to find relevant information
endpoint:
name: internal_api
path: /kb/search?q={query}
http_method: GET
# --- Observability ---
model_aliases:
plano.fast.v1:
target: gpt-4o-mini
plano.smart.v1:
target: gpt-4o
tracing:
random_sampling: 50
trace_arch_internal: true
span_attributes:
static:
environment: production
header_prefixes:
- x-katanemo-
```
This architecture serves: SDK clients on `:12000`, function-calling apps on `:10000`, and multi-agent orchestration on `:8000` — with shared cost-optimized routing across all three.
### 8.2 Design Prompt Targets with Precise Parameter Schemas
**Impact:** `HIGH` — Imprecise parameter definitions cause the LLM to hallucinate values, skip required fields, or produce malformed API calls — the schema is the contract between the LLM and your API
## Design Prompt Targets with Precise Parameter Schemas
`prompt_targets` define functions that Plano's LLM can call autonomously when it determines a user request matches the function's description. The parameter schema tells the LLM exactly what values to extract from user input — vague schemas lead to hallucinated parameters and failed API calls.
**Incorrect (too few constraints — LLM must guess):**