fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level (#912)

* fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level Lift inline routing_preferences under each model_provider into the top-level routing_preferences list with merged models[] and bump version to v0.4.0, with a deprecation warning. Existing v0.3.0 demo configs (Claude Code, Codex, preference_based_routing, etc.) keep working unchanged. Schema flags the inline shape as deprecated but still accepts it. Docs and skills updated to canonical top-level multi-model form. * test(common): bump reference config assertion to v0.4.0 The rendered reference config was bumped to v0.4.0 when its inline routing_preferences were lifted to the top level; align the configuration deserialization test with that change. * fix(config_generator): bump version to v0.4.0 up front in migration Move the v0.3.0 -> v0.4.0 version bump to the top of migrate_inline_routing_preferences so it runs unconditionally, including for configs that already declare top-level routing_preferences at v0.3.0. Previously the bump only fired when inline migration produced entries, leaving top-level v0.3.0 configs rejected by brightstaff's v0.4.0 gate. Tests updated to cover the new behavior and to confirm we never downgrade newer versions. * fix(config_generator): gate routing_preferences migration on version < v0.4.0 Short-circuit the migration when the config already declares v0.4.0 or newer. Anything at v0.4.0+ is assumed to be on the canonical top-level shape and is passed through untouched, including stray inline preferences (which are the author's bug to fix). Only v0.3.0 and older configs are rewritten and bumped.
2026-04-30 19:36:34 +02:00 · 2026-04-24 12:31:44 -07:00 · 2026-04-24 12:31:44 -07:00 · 897fda2deb
commit 897fda2deb
parent 5a652eb666
12 changed files with 748 additions and 225 deletions
--- a/docs/source/concepts/llm_providers/supported_providers.rst
+++ b/docs/source/concepts/llm_providers/supported_providers.rst
@ -158,7 +158,9 @@ Anthropic

 .. code-block:: yaml

-    llm_providers:
+    version: v0.4.0
+
+    model_providers:
      # Configure all Anthropic models with wildcard
      - model: anthropic/*
        access_key: $ANTHROPIC_API_KEY
@ -179,8 +181,12 @@ Anthropic

      - model: anthropic/claude-sonnet-4-20250514
        access_key: $ANTHROPIC_PROD_API_KEY
-        routing_preferences:
-          - name: code_generation
+
+    routing_preferences:
+      - name: code_generation
+        description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+        models:
+          - anthropic/claude-sonnet-4-20250514

 DeepSeek
 ~~~~~~~~
@ -798,7 +804,9 @@ You can configure specific models with custom settings even when using wildcards

 .. code-block:: yaml

-    llm_providers:
+    version: v0.4.0
+
+    model_providers:
      # Expand to all Anthropic models
      - model: anthropic/*
        access_key: $ANTHROPIC_API_KEY
@ -807,14 +815,17 @@ You can configure specific models with custom settings even when using wildcards
      # This model will NOT be included in the wildcard expansion above
      - model: anthropic/claude-sonnet-4-20250514
        access_key: $ANTHROPIC_PROD_API_KEY
-        routing_preferences:
-          - name: code_generation
-            priority: 1

      # Another specific override
      - model: anthropic/claude-3-haiku-20240307
        access_key: $ANTHROPIC_DEV_API_KEY

+    routing_preferences:
+      - name: code_generation
+        description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+        models:
+          - anthropic/claude-sonnet-4-20250514
+
 **Custom Provider Wildcards:**

 For providers not in Plano's registry, wildcards enable dynamic model routing:
@ -856,24 +867,36 @@ Mark one model as the default for fallback scenarios:
 Routing Preferences
 ~~~~~~~~~~~~~~~~~~~

-Configure routing preferences for dynamic model selection:
+Starting in ``v0.4.0``, configure routing preferences at the top level of the config. Each preference declares an ordered ``models`` candidate pool; the first entry is primary and the rest are fallbacks the client tries on ``429``/``5xx`` errors. Multiple providers can serve the same route — just list them all under ``models``. See :doc:`/guides/llm_router` for the full routing model.

 .. code-block:: yaml

-    llm_providers:
+    version: v0.4.0
+
+    model_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY
-        routing_preferences:
-          - name: complex_reasoning
-            description: deep analysis, mathematical problem solving, and logical reasoning
-          - name: code_review
-            description: reviewing and analyzing existing code for bugs and improvements

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY
-        routing_preferences:
-          - name: creative_writing
-            description: creative content generation, storytelling, and writing assistance
+
+    routing_preferences:
+      - name: complex_reasoning
+        description: deep analysis, mathematical problem solving, and logical reasoning
+        models:
+          - openai/gpt-5.2
+          - anthropic/claude-sonnet-4-5
+      - name: code_review
+        description: reviewing and analyzing existing code for bugs and improvements
+        models:
+          - openai/gpt-5.2
+      - name: creative_writing
+        description: creative content generation, storytelling, and writing assistance
+        models:
+          - anthropic/claude-sonnet-4-5
+
+.. note::
+   ``v0.3.0`` configs that declare ``routing_preferences`` inline under each ``model_provider`` are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update to the form above to silence the warning and gain the multi-model fallback behavior.

 .. _passthrough_auth:

--- a/docs/source/guides/llm_router.rst
+++ b/docs/source/guides/llm_router.rst
@ -147,38 +147,53 @@ Plano-Orchestrator analyzes each prompt to infer domain and action, then applies
 Configuration
 ^^^^^^^^^^^^^

-To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:
+To configure preference-aligned dynamic routing, declare a top-level ``routing_preferences`` list and attach an ordered ``models`` candidate pool to each route. Starting in ``v0.4.0``, ``routing_preferences`` lives at the root of the config (not inline under ``model_providers``), which lets multiple models serve the same route — the first entry in ``models`` is primary, the rest are fallbacks that the client tries on ``429``/``5xx`` errors.

 .. code-block:: yaml
    :caption: Preference-Aligned Dynamic Routing Configuration

+    version: v0.4.0
+
    listeners:
-      egress_traffic:
+      - name: egress_traffic
+        type: model
        address: 0.0.0.0
        port: 12000
-        message_format: openai
        timeout: 30s

-    llm_providers:
+    model_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY
        default: true

      - model: openai/gpt-5
        access_key: $OPENAI_API_KEY
-        routing_preferences:
-          - name: code understanding
-            description: understand and explain existing code snippets, functions, or libraries
-          - name: complex reasoning
-            description: deep analysis, mathematical problem solving, and logical reasoning

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY
-        routing_preferences:
-          - name: creative writing
-            description: creative content generation, storytelling, and writing assistance
-          - name: code generation
-            description: generating new code snippets, functions, or boilerplate based on user prompts
+
+    routing_preferences:
+      - name: code understanding
+        description: understand and explain existing code snippets, functions, or libraries
+        models:
+          - openai/gpt-5
+          - anthropic/claude-sonnet-4-5
+      - name: complex reasoning
+        description: deep analysis, mathematical problem solving, and logical reasoning
+        models:
+          - openai/gpt-5
+      - name: creative writing
+        description: creative content generation, storytelling, and writing assistance
+        models:
+          - anthropic/claude-sonnet-4-5
+      - name: code generation
+        description: generating new code snippets, functions, or boilerplate based on user prompts
+        models:
+          - anthropic/claude-sonnet-4-5
+          - openai/gpt-5
+
+.. note::
+   Configs still using the ``v0.3.0`` inline style (``routing_preferences`` nested under each ``model_provider``) are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update your config to the form above to silence the warning.

 Client usage
 ^^^^^^^^^^^^
@ -253,6 +268,8 @@ Using Ollama (recommended for local development)

   .. code-block:: yaml

+       version: v0.4.0
+
       overrides:
         llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M

@ -266,9 +283,12 @@ Using Ollama (recommended for local development)

         - model: anthropic/claude-sonnet-4-5
           access_key: $ANTHROPIC_API_KEY
-           routing_preferences:
-             - name: creative writing
-               description: creative content generation, storytelling, and writing assistance
+
+       routing_preferences:
+         - name: creative writing
+           description: creative content generation, storytelling, and writing assistance
+           models:
+             - anthropic/claude-sonnet-4-5

 4. **Verify the model is running**

@ -322,6 +342,8 @@ vLLM provides higher throughput and GPU optimizations suitable for production de

   .. code-block:: yaml

+       version: v0.4.0
+
       overrides:
         llm_routing_model: plano/Plano-Orchestrator

@ -335,9 +357,12 @@ vLLM provides higher throughput and GPU optimizations suitable for production de

         - model: anthropic/claude-sonnet-4-5
           access_key: $ANTHROPIC_API_KEY
-           routing_preferences:
-             - name: creative writing
-               description: creative content generation, storytelling, and writing assistance
+
+       routing_preferences:
+         - name: creative writing
+           description: creative content generation, storytelling, and writing assistance
+           models:
+             - anthropic/claude-sonnet-4-5

 5. **Verify the server is running**

@ -468,22 +493,30 @@ You can combine static model selection with dynamic routing preferences for maxi
 .. code-block:: yaml
    :caption: Hybrid Routing Configuration

-    llm_providers:
+    version: v0.4.0
+
+    model_providers:
      - model: openai/gpt-5.2
        access_key: $OPENAI_API_KEY
        default: true

      - model: openai/gpt-5
        access_key: $OPENAI_API_KEY
-        routing_preferences:
-          - name: complex_reasoning
-            description: deep analysis and complex problem solving

      - model: anthropic/claude-sonnet-4-5
        access_key: $ANTHROPIC_API_KEY
-        routing_preferences:
-          - name: creative_tasks
-            description: creative writing and content generation
+
+    routing_preferences:
+      - name: complex_reasoning
+        description: deep analysis and complex problem solving
+        models:
+          - openai/gpt-5
+          - anthropic/claude-sonnet-4-5
+      - name: creative_tasks
+        description: creative writing and content generation
+        models:
+          - anthropic/claude-sonnet-4-5
+          - openai/gpt-5

    model_aliases:
      # Model aliases - friendly names that map to actual provider names
--- a/docs/source/resources/includes/plano_config_full_reference.yaml
+++ b/docs/source/resources/includes/plano_config_full_reference.yaml
@ -1,5 +1,5 @@
 # Plano Gateway configuration version
-version: v0.3.0
+version: v0.4.0

 # External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
 agents:
@ -32,17 +32,8 @@ model_providers:
  - model: mistral/ministral-3b-latest
    access_key: $MISTRAL_API_KEY

-  # routing_preferences: tags a model with named capabilities so Plano's LLM router
-  # can select the best model for each request based on intent. Requires the
-  # Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
-  # Each preference has a name (short label) and a description (used for intent matching).
  - model: groq/llama-3.3-70b-versatile
    access_key: $GROQ_API_KEY
-    routing_preferences:
-      - name: code generation
-        description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
-      - name: code review
-        description: reviewing, analyzing, and suggesting improvements to existing code

  # passthrough_auth: forwards the client's Authorization header upstream instead of
  # using the configured access_key. Useful for LiteLLM or similar proxy setups.
@ -64,6 +55,29 @@ model_aliases:
  smart-llm:
    target: gpt-4o

+# routing_preferences: top-level list that tags named task categories with an
+# ordered pool of candidate models. Plano's LLM router matches incoming requests
+# against these descriptions and returns an ordered list of models; the client
+# uses models[0] as primary and retries with models[1], models[2]... on 429/5xx.
+# Requires overrides.llm_routing_model to point at Plano-Orchestrator (or equivalent).
+# Each model in `models` must be declared in model_providers above.
+# selection_policy is optional: {prefer: cheapest|fastest|none} lets the router
+# reorder candidates using live cost/latency data from model_metrics_sources.
+routing_preferences:
+  - name: code generation
+    description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+    models:
+      - anthropic/claude-sonnet-4-0
+      - openai/gpt-4o
+      - groq/llama-3.3-70b-versatile
+  - name: code review
+    description: reviewing, analyzing, and suggesting improvements to existing code
+    models:
+      - anthropic/claude-sonnet-4-0
+      - groq/llama-3.3-70b-versatile
+    selection_policy:
+      prefer: cheapest
+
 # HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
 listeners:
  # Agent listener for routing requests to multiple agents
--- a/docs/source/resources/includes/plano_config_full_reference_rendered.yaml
+++ b/docs/source/resources/includes/plano_config_full_reference_rendered.yaml
@ -69,12 +69,6 @@ listeners:
    model: llama-3.3-70b-versatile
    name: groq/llama-3.3-70b-versatile
    provider_interface: groq
-    routing_preferences:
-    - description: generating new code snippets, functions, or boilerplate based on
-        user prompts or requirements
-      name: code generation
-    - description: reviewing, analyzing, and suggesting improvements to existing code
-      name: code review
  - base_url: https://litellm.example.com
    cluster_name: openai_litellm.example.com
    endpoint: litellm.example.com
@ -131,12 +125,6 @@ model_providers:
  model: llama-3.3-70b-versatile
  name: groq/llama-3.3-70b-versatile
  provider_interface: groq
-  routing_preferences:
-  - description: generating new code snippets, functions, or boilerplate based on
-      user prompts or requirements
-    name: code generation
-  - description: reviewing, analyzing, and suggesting improvements to existing code
-    name: code review
 - base_url: https://litellm.example.com
  cluster_name: openai_litellm.example.com
  endpoint: litellm.example.com
@ -221,6 +209,21 @@ routing:
    type: memory
  session_max_entries: 10000
  session_ttl_seconds: 600
+routing_preferences:
+- description: generating new code snippets, functions, or boilerplate based on user
+    prompts or requirements
+  models:
+  - anthropic/claude-sonnet-4-0
+  - openai/gpt-4o
+  - groq/llama-3.3-70b-versatile
+  name: code generation
+- description: reviewing, analyzing, and suggesting improvements to existing code
+  models:
+  - anthropic/claude-sonnet-4-0
+  - groq/llama-3.3-70b-versatile
+  name: code review
+  selection_policy:
+    prefer: cheapest
 state_storage:
  type: memory
 system_prompt: 'You are a helpful assistant. Always respond concisely and accurately.
@ -237,4 +240,4 @@ tracing:
      environment: production
      service.team: platform
  trace_arch_internal: false
-version: v0.3.0
+version: v0.4.0