Configuration Reference
The following is a complete reference of the plano_config.yml that controls the behavior of a single instance of
the Plano gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
where prompts get routed to, apply guardrails, and enable critical agent observability features.
Model provider headers
Each entry under model_providers (or the legacy llm_providers alias) may include a headers map of extra
HTTP headers that Plano adds to upstream LLM requests. Plano applies these headers after it sets authentication from
access_key or passthrough_auth, so you can supply provider-specific metadata without replacing the configured
credentials.
Type: map of strings (header name → value)
Optional: yes
Common uses: required
User-Agentvalues, organization or account identifiers, or other headers some APIs expect
model_providers:
- model: moonshotai/kimi-for-coding
access_key: $MOONSHOTAI_API_KEY
base_url: https://api.kimi.com/coding/v1
headers:
User-Agent: "KimiCLI/1.3"
The example below includes this and other provider options in context.
1# Plano Gateway configuration version
2version: v0.4.0
3
4# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
5agents:
6 - id: weather_agent # Example agent for weather
7 url: http://localhost:10510
8
9 - id: flight_agent # Example agent for flights
10 url: http://localhost:10520
11
12# MCP filters applied to requests/responses (e.g., input validation, query rewriting)
13filters:
14 - id: input_guards # Example filter for input validation
15 url: http://localhost:10500
16 # type: mcp (default)
17 # transport: streamable-http (default)
18 # tool: input_guards (default - same as filter id)
19
20# LLM provider configurations with API keys and model routing
21model_providers:
22 - model: openai/gpt-4o
23 access_key: $OPENAI_API_KEY
24 default: true
25
26 - model: openai/gpt-4o-mini
27 access_key: $OPENAI_API_KEY
28
29 - model: anthropic/claude-sonnet-4-0
30 access_key: $ANTHROPIC_API_KEY
31
32 - model: mistral/ministral-3b-latest
33 access_key: $MISTRAL_API_KEY
34
35 - model: groq/llama-3.3-70b-versatile
36 access_key: $GROQ_API_KEY
37
38 # passthrough_auth: forwards the client's Authorization header upstream instead of
39 # using the configured access_key. Useful for LiteLLM or similar proxy setups.
40 - model: openai/gpt-4o-litellm
41 base_url: https://litellm.example.com
42 passthrough_auth: true
43
44 # Custom/self-hosted endpoint with explicit http_host override
45 - model: openai/llama-3.3-70b
46 base_url: https://api.custom-provider.com
47 http_host: api.custom-provider.com
48 access_key: $CUSTOM_API_KEY
49
50 # headers: optional map of extra HTTP headers sent on upstream requests (after auth).
51 # Use for provider-specific requirements such as User-Agent, org IDs, or account headers.
52 - model: moonshotai/kimi-for-coding
53 access_key: $MOONSHOTAI_API_KEY
54 base_url: https://api.kimi.com/coding/v1
55 headers:
56 User-Agent: "KimiCLI/1.3"
57
58# Model aliases - use friendly names instead of full provider model names
59model_aliases:
60 fast-llm:
61 target: gpt-4o-mini
62
63 smart-llm:
64 target: gpt-4o
65
66# routing_preferences: top-level list that tags named task categories with an
67# ordered pool of candidate models. Plano's LLM router matches incoming requests
68# against these descriptions and returns an ordered list of models; the client
69# uses models[0] as primary and retries with models[1], models[2]... on 429/5xx.
70# Requires overrides.llm_routing_model to point at Plano-Orchestrator (or equivalent).
71# Each model in `models` must be declared in model_providers above.
72# selection_policy is optional: {prefer: cheapest|fastest|none} lets the router
73# reorder candidates using live cost/latency data from model_metrics_sources.
74routing_preferences:
75 - name: code generation
76 description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
77 models:
78 - anthropic/claude-sonnet-4-0
79 - openai/gpt-4o
80 - groq/llama-3.3-70b-versatile
81 - name: code review
82 description: reviewing, analyzing, and suggesting improvements to existing code
83 models:
84 - anthropic/claude-sonnet-4-0
85 - groq/llama-3.3-70b-versatile
86 selection_policy:
87 prefer: cheapest
88
89# model_metrics_sources: external catalogs the router reads to reorder candidate
90# models for selection_policy.prefer. A `cost` source ranks `prefer: cheapest`;
91# a `latency` source ranks `prefer: fastest`. Both are optional.
92model_metrics_sources:
93 # Cost catalog. provider: models.dev | digitalocean (default url per provider).
94 - type: cost
95 provider: models.dev
96 url: https://models.dev/api.json # optional; omit to use the provider default
97 refresh_interval: 3600 # optional, seconds
98 model_aliases: # optional: catalog key -> Plano model name
99 openai/gpt-oss-120b: openai/gpt-4o
100 # Latency catalog (Prometheus). Used for selection_policy.prefer: fastest.
101 - type: latency
102 provider: prometheus
103 url: http://prometheus:9090
104 query: avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))
105 refresh_interval: 60
106
107# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
108listeners:
109 # Agent listener for routing requests to multiple agents
110 - type: agent
111 name: travel_booking_service
112 port: 8001
113 router: plano_orchestrator_v1
114 address: 0.0.0.0
115 agents:
116 - id: rag_agent
117 description: virtual assistant for retrieval augmented generation tasks
118 input_filters:
119 - input_guards
120
121 # Model listener for direct LLM access
122 - type: model
123 name: model_1
124 address: 0.0.0.0
125 port: 12000
126 timeout: 30s # Request timeout (e.g. "30s", "60s")
127 max_retries: 3 # Number of retries on upstream failure
128 input_filters: # Filters applied before forwarding to LLM
129 - input_guards
130 output_filters: # Filters applied to LLM responses before returning to client
131 - input_guards
132
133 # Prompt listener for function calling (for prompt_targets)
134 - type: prompt
135 name: prompt_function_listener
136 address: 0.0.0.0
137 port: 10000
138
139# Reusable service endpoints
140endpoints:
141 app_server:
142 endpoint: 127.0.0.1:80
143 connect_timeout: 0.005s
144 protocol: http # http or https
145
146 mistral_local:
147 endpoint: 127.0.0.1:8001
148
149 secure_service:
150 endpoint: api.example.com:443
151 protocol: https
152 http_host: api.example.com # Override the Host header sent upstream
153
154# Optional top-level system prompt applied to all prompt_targets
155system_prompt: |
156 You are a helpful assistant. Always respond concisely and accurately.
157
158# Prompt targets for function calling and API orchestration
159prompt_targets:
160 - name: get_current_weather
161 description: Get current weather at a location.
162 parameters:
163 - name: location
164 description: The location to get the weather for
165 required: true
166 type: string
167 format: City, State
168 - name: days
169 description: the number of days for the request
170 required: true
171 type: int
172 endpoint:
173 name: app_server
174 path: /weather
175 http_method: POST
176 # Per-target system prompt (overrides top-level system_prompt for this target)
177 system_prompt: You are a weather expert. Provide accurate and concise weather information.
178 # auto_llm_dispatch_on_response: when true, the LLM is called again with the
179 # function response to produce a final natural-language answer for the user
180 auto_llm_dispatch_on_response: true
181
182# Rate limits - control token usage per model and request selector
183ratelimits:
184 - model: openai/gpt-4o
185 selector:
186 key: x-user-id # HTTP header key used to identify the rate-limit subject
187 value: "*" # Wildcard matches any value; use a specific string to target one
188 limit:
189 tokens: 100000 # Maximum tokens allowed in the given time unit
190 unit: hour # Time unit: "minute", "hour", or "day"
191
192 - model: openai/gpt-4o-mini
193 selector:
194 key: x-org-id
195 value: acme-corp
196 limit:
197 tokens: 500000
198 unit: day
199
200# Global behavior overrides
201overrides:
202 # Threshold for routing a request to a prompt_target (0.0–1.0). Lower = more permissive.
203 prompt_target_intent_matching_threshold: 0.7
204 # Trim conversation history to fit within the model's context window
205 optimize_context_window: true
206 # Use Plano's agent orchestrator for multi-agent request routing
207 use_agent_orchestrator: false
208 # Connect timeout for upstream provider clusters (e.g., "5s", "10s"). Default: "5s"
209 upstream_connect_timeout: 10s
210 # Path to the trusted CA bundle for upstream TLS verification
211 upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
212 # Model used for intent-based LLM routing (must be listed in model_providers)
213 llm_routing_model: Plano-Orchestrator
214 # Model used for agent orchestration (must be listed in model_providers)
215 agent_orchestration_model: Plano-Orchestrator
216 # Disable agentic signal analysis (frustration, repetition, escalation, etc.)
217 # on LLM responses to save CPU. Default: false.
218 disable_signals: false
219
220# Model affinity — pin routing decisions for agentic loops
221routing:
222 session_ttl_seconds: 600 # How long a pinned session lasts (default: 600s / 10 min)
223 session_max_entries: 10000 # Max cached sessions before eviction (upper limit: 10000)
224 # session_cache controls the backend used to store affinity state.
225 # "memory" (default) is in-process and works for single-instance deployments.
226 # "redis" shares state across replicas — required for multi-replica / Kubernetes setups.
227 session_cache:
228 type: memory # "memory" (default) or "redis"
229 # url is required when type is "redis". Supports redis:// and rediss:// (TLS).
230 # url: redis://localhost:6379
231 # tenant_header: x-org-id # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}
232
233# State storage for multi-turn conversation history
234state_storage:
235 type: memory # "memory" (in-process) or "postgres" (persistent)
236 # connection_string is required when type is postgres.
237 # Supports environment variable substitution: $VAR or ${VAR}
238 # connection_string: postgresql://user:$DB_PASS@localhost:5432/plano
239
240# Input guardrails applied globally to all incoming requests
241prompt_guards:
242 input_guards:
243 jailbreak:
244 on_exception:
245 message: "I'm sorry, I can't help with that request."
246
247# OpenTelemetry tracing configuration
248tracing:
249 # Random sampling percentage (1-100)
250 random_sampling: 100
251 # Include internal Plano spans in traces
252 trace_arch_internal: false
253 # gRPC endpoint for OpenTelemetry collector (e.g., Jaeger, Tempo)
254 opentracing_grpc_endpoint: http://localhost:4317
255 span_attributes:
256 # Propagate request headers whose names start with these prefixes as span attributes
257 header_prefixes:
258 - x-user-
259 - x-org-
260 # Static key/value pairs added to every span
261 static:
262 environment: production
263 service.team: platform
264 # Provider-agnostic export destinations. LLM spans are streamed to each of
265 # these in addition to any opentracing_grpc_endpoint above.
266 exporters:
267 # PostHog AI observability: each LLM call is captured as an $ai_generation event.
268 - type: posthog
269 # PostHog host. The /batch/ capture path is appended automatically.
270 url: https://us.i.posthog.com
271 # PostHog project API key (token). Supports $ENV_VAR expansion.
272 api_key: $POSTHOG_API_KEY
273 # Optional: request header used as the PostHog distinct_id. Omit for anonymous capture.
274 distinct_id_header: x-user-id
275 # Optional: include the (truncated) user message as $ai_input. Defaults to false.
276 capture_messages: false