fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level

Lift inline routing_preferences under each model_provider into the
top-level routing_preferences list with merged models[] and bump
version to v0.4.0, with a deprecation warning. Existing v0.3.0
demo configs (Claude Code, Codex, preference_based_routing, etc.)
keep working unchanged. Schema flags the inline shape as deprecated
but still accepts it. Docs and skills updated to canonical top-level
multi-model form.
This commit is contained in:
Spherrrical 2026-04-24 11:28:22 -07:00
parent b81eb7266c
commit dde90cae82
11 changed files with 693 additions and 224 deletions

View file

@ -158,7 +158,9 @@ Anthropic
.. code-block:: yaml
llm_providers:
version: v0.4.0
model_providers:
# Configure all Anthropic models with wildcard
- model: anthropic/*
access_key: $ANTHROPIC_API_KEY
@ -179,8 +181,12 @@ Anthropic
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_PROD_API_KEY
routing_preferences:
- name: code_generation
routing_preferences:
- name: code_generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- anthropic/claude-sonnet-4-20250514
DeepSeek
~~~~~~~~
@ -798,7 +804,9 @@ You can configure specific models with custom settings even when using wildcards
.. code-block:: yaml
llm_providers:
version: v0.4.0
model_providers:
# Expand to all Anthropic models
- model: anthropic/*
access_key: $ANTHROPIC_API_KEY
@ -807,14 +815,17 @@ You can configure specific models with custom settings even when using wildcards
# This model will NOT be included in the wildcard expansion above
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_PROD_API_KEY
routing_preferences:
- name: code_generation
priority: 1
# Another specific override
- model: anthropic/claude-3-haiku-20240307
access_key: $ANTHROPIC_DEV_API_KEY
routing_preferences:
- name: code_generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- anthropic/claude-sonnet-4-20250514
**Custom Provider Wildcards:**
For providers not in Plano's registry, wildcards enable dynamic model routing:
@ -856,24 +867,36 @@ Mark one model as the default for fallback scenarios:
Routing Preferences
~~~~~~~~~~~~~~~~~~~
Configure routing preferences for dynamic model selection:
Starting in ``v0.4.0``, configure routing preferences at the top level of the config. Each preference declares an ordered ``models`` candidate pool; the first entry is primary and the rest are fallbacks the client tries on ``429``/``5xx`` errors. Multiple providers can serve the same route — just list them all under ``models``. See :doc:`/guides/llm_router` for the full routing model.
.. code-block:: yaml
llm_providers:
version: v0.4.0
model_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
- name: code_review
description: reviewing and analyzing existing code for bugs and improvements
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative_writing
description: creative content generation, storytelling, and writing assistance
routing_preferences:
- name: complex_reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
models:
- openai/gpt-5.2
- anthropic/claude-sonnet-4-5
- name: code_review
description: reviewing and analyzing existing code for bugs and improvements
models:
- openai/gpt-5.2
- name: creative_writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
.. note::
``v0.3.0`` configs that declare ``routing_preferences`` inline under each ``model_provider`` are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update to the form above to silence the warning and gain the multi-model fallback behavior.
.. _passthrough_auth:

View file

@ -147,38 +147,53 @@ Plano-Orchestrator analyzes each prompt to infer domain and action, then applies
Configuration
^^^^^^^^^^^^^
To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:
To configure preference-aligned dynamic routing, declare a top-level ``routing_preferences`` list and attach an ordered ``models`` candidate pool to each route. Starting in ``v0.4.0``, ``routing_preferences`` lives at the root of the config (not inline under ``model_providers``), which lets multiple models serve the same route — the first entry in ``models`` is primary, the rest are fallbacks that the client tries on ``429``/``5xx`` errors.
.. code-block:: yaml
:caption: Preference-Aligned Dynamic Routing Configuration
version: v0.4.0
listeners:
egress_traffic:
- name: egress_traffic
type: model
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
model_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
- name: complex reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts
routing_preferences:
- name: code understanding
description: understand and explain existing code snippets, functions, or libraries
models:
- openai/gpt-5
- anthropic/claude-sonnet-4-5
- name: complex reasoning
description: deep analysis, mathematical problem solving, and logical reasoning
models:
- openai/gpt-5
- name: creative writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts
models:
- anthropic/claude-sonnet-4-5
- openai/gpt-5
.. note::
Configs still using the ``v0.3.0`` inline style (``routing_preferences`` nested under each ``model_provider``) are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update your config to the form above to silence the warning.
Client usage
^^^^^^^^^^^^
@ -253,6 +268,8 @@ Using Ollama (recommended for local development)
.. code-block:: yaml
version: v0.4.0
overrides:
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
@ -266,9 +283,12 @@ Using Ollama (recommended for local development)
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
4. **Verify the model is running**
@ -322,6 +342,8 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
.. code-block:: yaml
version: v0.4.0
overrides:
llm_routing_model: plano/Plano-Orchestrator
@ -335,9 +357,12 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
routing_preferences:
- name: creative writing
description: creative content generation, storytelling, and writing assistance
models:
- anthropic/claude-sonnet-4-5
5. **Verify the server is running**
@ -468,22 +493,30 @@ You can combine static model selection with dynamic routing preferences for maxi
.. code-block:: yaml
:caption: Hybrid Routing Configuration
llm_providers:
version: v0.4.0
model_providers:
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: deep analysis and complex problem solving
- model: anthropic/claude-sonnet-4-5
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: creative_tasks
description: creative writing and content generation
routing_preferences:
- name: complex_reasoning
description: deep analysis and complex problem solving
models:
- openai/gpt-5
- anthropic/claude-sonnet-4-5
- name: creative_tasks
description: creative writing and content generation
models:
- anthropic/claude-sonnet-4-5
- openai/gpt-5
model_aliases:
# Model aliases - friendly names that map to actual provider names

View file

@ -1,5 +1,5 @@
# Plano Gateway configuration version
version: v0.3.0
version: v0.4.0
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
agents:
@ -32,17 +32,8 @@ model_providers:
- model: mistral/ministral-3b-latest
access_key: $MISTRAL_API_KEY
# routing_preferences: tags a model with named capabilities so Plano's LLM router
# can select the best model for each request based on intent. Requires the
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
# Each preference has a name (short label) and a description (used for intent matching).
- model: groq/llama-3.3-70b-versatile
access_key: $GROQ_API_KEY
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
- name: code review
description: reviewing, analyzing, and suggesting improvements to existing code
# passthrough_auth: forwards the client's Authorization header upstream instead of
# using the configured access_key. Useful for LiteLLM or similar proxy setups.
@ -64,6 +55,29 @@ model_aliases:
smart-llm:
target: gpt-4o
# routing_preferences: top-level list that tags named task categories with an
# ordered pool of candidate models. Plano's LLM router matches incoming requests
# against these descriptions and returns an ordered list of models; the client
# uses models[0] as primary and retries with models[1], models[2]... on 429/5xx.
# Requires overrides.llm_routing_model to point at Plano-Orchestrator (or equivalent).
# Each model in `models` must be declared in model_providers above.
# selection_policy is optional: {prefer: cheapest|fastest|none} lets the router
# reorder candidates using live cost/latency data from model_metrics_sources.
routing_preferences:
- name: code generation
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
models:
- anthropic/claude-sonnet-4-0
- openai/gpt-4o
- groq/llama-3.3-70b-versatile
- name: code review
description: reviewing, analyzing, and suggesting improvements to existing code
models:
- anthropic/claude-sonnet-4-0
- groq/llama-3.3-70b-versatile
selection_policy:
prefer: cheapest
# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
listeners:
# Agent listener for routing requests to multiple agents

View file

@ -69,12 +69,6 @@ listeners:
model: llama-3.3-70b-versatile
name: groq/llama-3.3-70b-versatile
provider_interface: groq
routing_preferences:
- description: generating new code snippets, functions, or boilerplate based on
user prompts or requirements
name: code generation
- description: reviewing, analyzing, and suggesting improvements to existing code
name: code review
- base_url: https://litellm.example.com
cluster_name: openai_litellm.example.com
endpoint: litellm.example.com
@ -131,12 +125,6 @@ model_providers:
model: llama-3.3-70b-versatile
name: groq/llama-3.3-70b-versatile
provider_interface: groq
routing_preferences:
- description: generating new code snippets, functions, or boilerplate based on
user prompts or requirements
name: code generation
- description: reviewing, analyzing, and suggesting improvements to existing code
name: code review
- base_url: https://litellm.example.com
cluster_name: openai_litellm.example.com
endpoint: litellm.example.com
@ -221,6 +209,21 @@ routing:
type: memory
session_max_entries: 10000
session_ttl_seconds: 600
routing_preferences:
- description: generating new code snippets, functions, or boilerplate based on user
prompts or requirements
models:
- anthropic/claude-sonnet-4-0
- openai/gpt-4o
- groq/llama-3.3-70b-versatile
name: code generation
- description: reviewing, analyzing, and suggesting improvements to existing code
models:
- anthropic/claude-sonnet-4-0
- groq/llama-3.3-70b-versatile
name: code review
selection_policy:
prefer: cheapest
state_storage:
type: memory
system_prompt: 'You are a helpful assistant. Always respond concisely and accurately.
@ -237,4 +240,4 @@ tracing:
environment: production
service.team: platform
trace_arch_internal: false
version: v0.3.0
version: v0.4.0