Configuration Reference

The following is a complete reference of the plano_config.yml that controls the behavior of a single instance of the Plano gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets where prompts get routed to, apply guardrails, and enable critical agent observability features.

Model provider headers

Each entry under model_providers (or the legacy llm_providers alias) may include a headers map of extra HTTP headers that Plano adds to upstream LLM requests. Plano applies these headers after it sets authentication from access_key or passthrough_auth, so you can supply provider-specific metadata without replacing the configured credentials.

  • Type: map of strings (header name → value)

  • Optional: yes

  • Common uses: required User-Agent values, organization or account identifiers, or other headers some APIs expect

model_providers:
  - model: moonshotai/kimi-for-coding
    access_key: $MOONSHOTAI_API_KEY
    base_url: https://api.kimi.com/coding/v1
    headers:
      User-Agent: "KimiCLI/1.3"

The example below includes this and other provider options in context.

  1# Plano Gateway configuration version
  2version: v0.4.0
  3
  4# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
  5agents:
  6  - id: weather_agent # Example agent for weather
  7    url: http://localhost:10510
  8
  9  - id: flight_agent # Example agent for flights
 10    url: http://localhost:10520
 11
 12# MCP filters applied to requests/responses (e.g., input validation, query rewriting)
 13filters:
 14  - id: input_guards # Example filter for input validation
 15    url: http://localhost:10500
 16    # type: mcp (default)
 17    # transport: streamable-http (default)
 18    # tool: input_guards (default - same as filter id)
 19
 20# LLM provider configurations with API keys and model routing
 21model_providers:
 22  - model: openai/gpt-4o
 23    access_key: $OPENAI_API_KEY
 24    default: true
 25
 26  - model: openai/gpt-4o-mini
 27    access_key: $OPENAI_API_KEY
 28
 29  - model: anthropic/claude-sonnet-4-0
 30    access_key: $ANTHROPIC_API_KEY
 31
 32  - model: mistral/ministral-3b-latest
 33    access_key: $MISTRAL_API_KEY
 34
 35  - model: groq/llama-3.3-70b-versatile
 36    access_key: $GROQ_API_KEY
 37
 38  # passthrough_auth: forwards the client's Authorization header upstream instead of
 39  # using the configured access_key. Useful for LiteLLM or similar proxy setups.
 40  - model: openai/gpt-4o-litellm
 41    base_url: https://litellm.example.com
 42    passthrough_auth: true
 43
 44  # Custom/self-hosted endpoint with explicit http_host override
 45  - model: openai/llama-3.3-70b
 46    base_url: https://api.custom-provider.com
 47    http_host: api.custom-provider.com
 48    access_key: $CUSTOM_API_KEY
 49
 50  # headers: optional map of extra HTTP headers sent on upstream requests (after auth).
 51  # Use for provider-specific requirements such as User-Agent, org IDs, or account headers.
 52  - model: moonshotai/kimi-for-coding
 53    access_key: $MOONSHOTAI_API_KEY
 54    base_url: https://api.kimi.com/coding/v1
 55    headers:
 56      User-Agent: "KimiCLI/1.3"
 57
 58# Model aliases - use friendly names instead of full provider model names
 59model_aliases:
 60  fast-llm:
 61    target: gpt-4o-mini
 62
 63  smart-llm:
 64    target: gpt-4o
 65
 66# routing_preferences: top-level list that tags named task categories with an
 67# ordered pool of candidate models. Plano's LLM router matches incoming requests
 68# against these descriptions and returns an ordered list of models; the client
 69# uses models[0] as primary and retries with models[1], models[2]... on 429/5xx.
 70# Requires overrides.llm_routing_model to point at Plano-Orchestrator (or equivalent).
 71# Each model in `models` must be declared in model_providers above.
 72# selection_policy is optional: {prefer: cheapest|fastest|none} lets the router
 73# reorder candidates using live cost/latency data from model_metrics_sources.
 74routing_preferences:
 75  - name: code generation
 76    description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
 77    models:
 78      - anthropic/claude-sonnet-4-0
 79      - openai/gpt-4o
 80      - groq/llama-3.3-70b-versatile
 81  - name: code review
 82    description: reviewing, analyzing, and suggesting improvements to existing code
 83    models:
 84      - anthropic/claude-sonnet-4-0
 85      - groq/llama-3.3-70b-versatile
 86    selection_policy:
 87      prefer: cheapest
 88
 89# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
 90listeners:
 91  # Agent listener for routing requests to multiple agents
 92  - type: agent
 93    name: travel_booking_service
 94    port: 8001
 95    router: plano_orchestrator_v1
 96    address: 0.0.0.0
 97    agents:
 98      - id: rag_agent
 99        description: virtual assistant for retrieval augmented generation tasks
100        input_filters:
101          - input_guards
102
103  # Model listener for direct LLM access
104  - type: model
105    name: model_1
106    address: 0.0.0.0
107    port: 12000
108    timeout: 30s          # Request timeout (e.g. "30s", "60s")
109    max_retries: 3        # Number of retries on upstream failure
110    input_filters:        # Filters applied before forwarding to LLM
111      - input_guards
112    output_filters:       # Filters applied to LLM responses before returning to client
113      - input_guards
114
115  # Prompt listener for function calling (for prompt_targets)
116  - type: prompt
117    name: prompt_function_listener
118    address: 0.0.0.0
119    port: 10000
120
121# Reusable service endpoints
122endpoints:
123  app_server:
124    endpoint: 127.0.0.1:80
125    connect_timeout: 0.005s
126    protocol: http        # http or https
127
128  mistral_local:
129    endpoint: 127.0.0.1:8001
130
131  secure_service:
132    endpoint: api.example.com:443
133    protocol: https
134    http_host: api.example.com  # Override the Host header sent upstream
135
136# Optional top-level system prompt applied to all prompt_targets
137system_prompt: |
138  You are a helpful assistant. Always respond concisely and accurately.
139
140# Prompt targets for function calling and API orchestration
141prompt_targets:
142  - name: get_current_weather
143    description: Get current weather at a location.
144    parameters:
145      - name: location
146        description: The location to get the weather for
147        required: true
148        type: string
149        format: City, State
150      - name: days
151        description: the number of days for the request
152        required: true
153        type: int
154    endpoint:
155      name: app_server
156      path: /weather
157      http_method: POST
158    # Per-target system prompt (overrides top-level system_prompt for this target)
159    system_prompt: You are a weather expert. Provide accurate and concise weather information.
160    # auto_llm_dispatch_on_response: when true, the LLM is called again with the
161    # function response to produce a final natural-language answer for the user
162    auto_llm_dispatch_on_response: true
163
164# Rate limits - control token usage per model and request selector
165ratelimits:
166  - model: openai/gpt-4o
167    selector:
168      key: x-user-id       # HTTP header key used to identify the rate-limit subject
169      value: "*"           # Wildcard matches any value; use a specific string to target one
170    limit:
171      tokens: 100000       # Maximum tokens allowed in the given time unit
172      unit: hour           # Time unit: "minute", "hour", or "day"
173
174  - model: openai/gpt-4o-mini
175    selector:
176      key: x-org-id
177      value: acme-corp
178    limit:
179      tokens: 500000
180      unit: day
181
182# Global behavior overrides
183overrides:
184  # Threshold for routing a request to a prompt_target (0.0–1.0). Lower = more permissive.
185  prompt_target_intent_matching_threshold: 0.7
186  # Trim conversation history to fit within the model's context window
187  optimize_context_window: true
188  # Use Plano's agent orchestrator for multi-agent request routing
189  use_agent_orchestrator: false
190  # Connect timeout for upstream provider clusters (e.g., "5s", "10s"). Default: "5s"
191  upstream_connect_timeout: 10s
192  # Path to the trusted CA bundle for upstream TLS verification
193  upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
194  # Model used for intent-based LLM routing (must be listed in model_providers)
195  llm_routing_model: Plano-Orchestrator
196  # Model used for agent orchestration (must be listed in model_providers)
197  agent_orchestration_model: Plano-Orchestrator
198  # Disable agentic signal analysis (frustration, repetition, escalation, etc.)
199  # on LLM responses to save CPU. Default: false.
200  disable_signals: false
201
202# Model affinity — pin routing decisions for agentic loops
203routing:
204  session_ttl_seconds: 600    # How long a pinned session lasts (default: 600s / 10 min)
205  session_max_entries: 10000  # Max cached sessions before eviction (upper limit: 10000)
206  # session_cache controls the backend used to store affinity state.
207  # "memory" (default) is in-process and works for single-instance deployments.
208  # "redis" shares state across replicas — required for multi-replica / Kubernetes setups.
209  session_cache:
210    type: memory              # "memory" (default) or "redis"
211    # url is required when type is "redis". Supports redis:// and rediss:// (TLS).
212    # url: redis://localhost:6379
213    # tenant_header: x-org-id  # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}
214
215# State storage for multi-turn conversation history
216state_storage:
217  type: memory            # "memory" (in-process) or "postgres" (persistent)
218  # connection_string is required when type is postgres.
219  # Supports environment variable substitution: $VAR or ${VAR}
220  # connection_string: postgresql://user:$DB_PASS@localhost:5432/plano
221
222# Input guardrails applied globally to all incoming requests
223prompt_guards:
224  input_guards:
225    jailbreak:
226      on_exception:
227        message: "I'm sorry, I can't help with that request."
228
229# OpenTelemetry tracing configuration
230tracing:
231  # Random sampling percentage (1-100)
232  random_sampling: 100
233  # Include internal Plano spans in traces
234  trace_arch_internal: false
235  # gRPC endpoint for OpenTelemetry collector (e.g., Jaeger, Tempo)
236  opentracing_grpc_endpoint: http://localhost:4317
237  span_attributes:
238    # Propagate request headers whose names start with these prefixes as span attributes
239    header_prefixes:
240      - x-user-
241      - x-org-
242    # Static key/value pairs added to every span
243    static:
244      environment: production
245      service.team: platform