mirror of
https://github.com/katanemo/plano.git
synced 2026-05-30 14:25:15 +02:00
fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level
Lift inline routing_preferences under each model_provider into the top-level routing_preferences list with merged models[] and bump version to v0.4.0, with a deprecation warning. Existing v0.3.0 demo configs (Claude Code, Codex, preference_based_routing, etc.) keep working unchanged. Schema flags the inline shape as deprecated but still accepts it. Docs and skills updated to canonical top-level multi-model form.
This commit is contained in:
parent
b81eb7266c
commit
dde90cae82
11 changed files with 693 additions and 224 deletions
|
|
@ -34,11 +34,13 @@ POST /v1/chat/completions
|
|||
|
||||
### `routing_preferences` fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|---|---|---|---|
|
||||
| `name` | string | yes | Route identifier. Must match the LLM router's route classification. |
|
||||
| `description` | string | yes | Natural language description used by the router to match user intent. |
|
||||
| `models` | string[] | yes | Ordered candidate pool. At least one entry required. Must be declared in `model_providers`. |
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
| ------------- | -------- | -------- | ------------------------------------------------------------------------------------------- |
|
||||
| `name` | string | yes | Route identifier. Must match the LLM router's route classification. |
|
||||
| `description` | string | yes | Natural language description used by the router to match user intent. |
|
||||
| `models` | string[] | yes | Ordered candidate pool. At least one entry required. Must be declared in `model_providers`. |
|
||||
|
||||
|
||||
### Notes
|
||||
|
||||
|
|
@ -64,11 +66,13 @@ POST /v1/chat/completions
|
|||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Description |
|
||||
|---|---|---|
|
||||
| `models` | string[] | Ranked model list. Use `models[0]` as primary; retry with `models[1]` on 429/5xx, and so on. |
|
||||
| `route` | string \| null | Name of the matched route. `null` if no route matched — client should use the original request `model`. |
|
||||
| `trace_id` | string | Trace ID for distributed tracing and observability. |
|
||||
|
||||
| Field | Type | Description |
|
||||
| ---------- | ------------- | ------------------------------------------------------------------------------------------------------- |
|
||||
| `models` | string[] | Ranked model list. Use `models[0]` as primary; retry with `models[1]` on 429/5xx, and so on. |
|
||||
| `route` | string | null | Name of the matched route. `null` if no route matched — client should use the original request `model`. |
|
||||
| `trace_id` | string | Trace ID for distributed tracing and observability. |
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -142,6 +146,7 @@ X-Model-Affinity: a1b2c3d4-5678-...
|
|||
```
|
||||
|
||||
Response when pinned:
|
||||
|
||||
```json
|
||||
{
|
||||
"models": ["anthropic/claude-sonnet-4-20250514"],
|
||||
|
|
@ -155,6 +160,7 @@ Response when pinned:
|
|||
Without the header, routing runs fresh every time (no breaking change).
|
||||
|
||||
Configure TTL and cache size:
|
||||
|
||||
```yaml
|
||||
routing:
|
||||
session_ttl_seconds: 600 # default: 10 min
|
||||
|
|
@ -165,7 +171,8 @@ routing:
|
|||
|
||||
## Version Requirements
|
||||
|
||||
| Version | Top-level `routing_preferences` |
|
||||
|---|---|
|
||||
|
||||
| Version | Top-level `routing_preferences` |
|
||||
| ---------- | -------------------------------------- |
|
||||
| `< v0.4.0` | Not allowed — startup error if present |
|
||||
| `v0.4.0+` | Supported (required for model routing) |
|
||||
| `v0.4.0+` | Supported (required for model routing) |
|
||||
|
|
|
|||
|
|
@ -158,7 +158,9 @@ Anthropic
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
llm_providers:
|
||||
version: v0.4.0
|
||||
|
||||
model_providers:
|
||||
# Configure all Anthropic models with wildcard
|
||||
- model: anthropic/*
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
|
|
@ -179,8 +181,12 @@ Anthropic
|
|||
|
||||
- model: anthropic/claude-sonnet-4-20250514
|
||||
access_key: $ANTHROPIC_PROD_API_KEY
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-20250514
|
||||
|
||||
DeepSeek
|
||||
~~~~~~~~
|
||||
|
|
@ -798,7 +804,9 @@ You can configure specific models with custom settings even when using wildcards
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
llm_providers:
|
||||
version: v0.4.0
|
||||
|
||||
model_providers:
|
||||
# Expand to all Anthropic models
|
||||
- model: anthropic/*
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
|
|
@ -807,14 +815,17 @@ You can configure specific models with custom settings even when using wildcards
|
|||
# This model will NOT be included in the wildcard expansion above
|
||||
- model: anthropic/claude-sonnet-4-20250514
|
||||
access_key: $ANTHROPIC_PROD_API_KEY
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
priority: 1
|
||||
|
||||
# Another specific override
|
||||
- model: anthropic/claude-3-haiku-20240307
|
||||
access_key: $ANTHROPIC_DEV_API_KEY
|
||||
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-20250514
|
||||
|
||||
**Custom Provider Wildcards:**
|
||||
|
||||
For providers not in Plano's registry, wildcards enable dynamic model routing:
|
||||
|
|
@ -856,24 +867,36 @@ Mark one model as the default for fallback scenarios:
|
|||
Routing Preferences
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Configure routing preferences for dynamic model selection:
|
||||
Starting in ``v0.4.0``, configure routing preferences at the top level of the config. Each preference declares an ordered ``models`` candidate pool; the first entry is primary and the rest are fallbacks the client tries on ``429``/``5xx`` errors. Multiple providers can serve the same route — just list them all under ``models``. See :doc:`/guides/llm_router` for the full routing model.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
llm_providers:
|
||||
version: v0.4.0
|
||||
|
||||
model_providers:
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: deep analysis, mathematical problem solving, and logical reasoning
|
||||
- name: code_review
|
||||
description: reviewing and analyzing existing code for bugs and improvements
|
||||
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative_writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: deep analysis, mathematical problem solving, and logical reasoning
|
||||
models:
|
||||
- openai/gpt-5.2
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- name: code_review
|
||||
description: reviewing and analyzing existing code for bugs and improvements
|
||||
models:
|
||||
- openai/gpt-5.2
|
||||
- name: creative_writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-5
|
||||
|
||||
.. note::
|
||||
``v0.3.0`` configs that declare ``routing_preferences`` inline under each ``model_provider`` are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update to the form above to silence the warning and gain the multi-model fallback behavior.
|
||||
|
||||
.. _passthrough_auth:
|
||||
|
||||
|
|
|
|||
|
|
@ -147,38 +147,53 @@ Plano-Orchestrator analyzes each prompt to infer domain and action, then applies
|
|||
Configuration
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:
|
||||
To configure preference-aligned dynamic routing, declare a top-level ``routing_preferences`` list and attach an ordered ``models`` candidate pool to each route. Starting in ``v0.4.0``, ``routing_preferences`` lives at the root of the config (not inline under ``model_providers``), which lets multiple models serve the same route — the first entry in ``models`` is primary, the rest are fallbacks that the client tries on ``429``/``5xx`` errors.
|
||||
|
||||
.. code-block:: yaml
|
||||
:caption: Preference-Aligned Dynamic Routing Configuration
|
||||
|
||||
version: v0.4.0
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
- name: egress_traffic
|
||||
type: model
|
||||
address: 0.0.0.0
|
||||
port: 12000
|
||||
message_format: openai
|
||||
timeout: 30s
|
||||
|
||||
llm_providers:
|
||||
model_providers:
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-5
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: code understanding
|
||||
description: understand and explain existing code snippets, functions, or libraries
|
||||
- name: complex reasoning
|
||||
description: deep analysis, mathematical problem solving, and logical reasoning
|
||||
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts
|
||||
|
||||
routing_preferences:
|
||||
- name: code understanding
|
||||
description: understand and explain existing code snippets, functions, or libraries
|
||||
models:
|
||||
- openai/gpt-5
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- name: complex reasoning
|
||||
description: deep analysis, mathematical problem solving, and logical reasoning
|
||||
models:
|
||||
- openai/gpt-5
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- openai/gpt-5
|
||||
|
||||
.. note::
|
||||
Configs still using the ``v0.3.0`` inline style (``routing_preferences`` nested under each ``model_provider``) are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update your config to the form above to silence the warning.
|
||||
|
||||
Client usage
|
||||
^^^^^^^^^^^^
|
||||
|
|
@ -253,6 +268,8 @@ Using Ollama (recommended for local development)
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
version: v0.4.0
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
|
||||
|
|
@ -266,9 +283,12 @@ Using Ollama (recommended for local development)
|
|||
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
|
||||
routing_preferences:
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-5
|
||||
|
||||
4. **Verify the model is running**
|
||||
|
||||
|
|
@ -322,6 +342,8 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
|
|||
|
||||
.. code-block:: yaml
|
||||
|
||||
version: v0.4.0
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/Plano-Orchestrator
|
||||
|
||||
|
|
@ -335,9 +357,12 @@ vLLM provides higher throughput and GPU optimizations suitable for production de
|
|||
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
|
||||
routing_preferences:
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-5
|
||||
|
||||
5. **Verify the server is running**
|
||||
|
||||
|
|
@ -468,22 +493,30 @@ You can combine static model selection with dynamic routing preferences for maxi
|
|||
.. code-block:: yaml
|
||||
:caption: Hybrid Routing Configuration
|
||||
|
||||
llm_providers:
|
||||
version: v0.4.0
|
||||
|
||||
model_providers:
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-5
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: deep analysis and complex problem solving
|
||||
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative_tasks
|
||||
description: creative writing and content generation
|
||||
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: deep analysis and complex problem solving
|
||||
models:
|
||||
- openai/gpt-5
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- name: creative_tasks
|
||||
description: creative writing and content generation
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-5
|
||||
- openai/gpt-5
|
||||
|
||||
model_aliases:
|
||||
# Model aliases - friendly names that map to actual provider names
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
# Plano Gateway configuration version
|
||||
version: v0.3.0
|
||||
version: v0.4.0
|
||||
|
||||
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
|
||||
agents:
|
||||
|
|
@ -32,17 +32,8 @@ model_providers:
|
|||
- model: mistral/ministral-3b-latest
|
||||
access_key: $MISTRAL_API_KEY
|
||||
|
||||
# routing_preferences: tags a model with named capabilities so Plano's LLM router
|
||||
# can select the best model for each request based on intent. Requires the
|
||||
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Each preference has a name (short label) and a description (used for intent matching).
|
||||
- model: groq/llama-3.3-70b-versatile
|
||||
access_key: $GROQ_API_KEY
|
||||
routing_preferences:
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
|
||||
- name: code review
|
||||
description: reviewing, analyzing, and suggesting improvements to existing code
|
||||
|
||||
# passthrough_auth: forwards the client's Authorization header upstream instead of
|
||||
# using the configured access_key. Useful for LiteLLM or similar proxy setups.
|
||||
|
|
@ -64,6 +55,29 @@ model_aliases:
|
|||
smart-llm:
|
||||
target: gpt-4o
|
||||
|
||||
# routing_preferences: top-level list that tags named task categories with an
|
||||
# ordered pool of candidate models. Plano's LLM router matches incoming requests
|
||||
# against these descriptions and returns an ordered list of models; the client
|
||||
# uses models[0] as primary and retries with models[1], models[2]... on 429/5xx.
|
||||
# Requires overrides.llm_routing_model to point at Plano-Orchestrator (or equivalent).
|
||||
# Each model in `models` must be declared in model_providers above.
|
||||
# selection_policy is optional: {prefer: cheapest|fastest|none} lets the router
|
||||
# reorder candidates using live cost/latency data from model_metrics_sources.
|
||||
routing_preferences:
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-0
|
||||
- openai/gpt-4o
|
||||
- groq/llama-3.3-70b-versatile
|
||||
- name: code review
|
||||
description: reviewing, analyzing, and suggesting improvements to existing code
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-0
|
||||
- groq/llama-3.3-70b-versatile
|
||||
selection_policy:
|
||||
prefer: cheapest
|
||||
|
||||
# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
|
||||
listeners:
|
||||
# Agent listener for routing requests to multiple agents
|
||||
|
|
|
|||
|
|
@ -69,12 +69,6 @@ listeners:
|
|||
model: llama-3.3-70b-versatile
|
||||
name: groq/llama-3.3-70b-versatile
|
||||
provider_interface: groq
|
||||
routing_preferences:
|
||||
- description: generating new code snippets, functions, or boilerplate based on
|
||||
user prompts or requirements
|
||||
name: code generation
|
||||
- description: reviewing, analyzing, and suggesting improvements to existing code
|
||||
name: code review
|
||||
- base_url: https://litellm.example.com
|
||||
cluster_name: openai_litellm.example.com
|
||||
endpoint: litellm.example.com
|
||||
|
|
@ -131,12 +125,6 @@ model_providers:
|
|||
model: llama-3.3-70b-versatile
|
||||
name: groq/llama-3.3-70b-versatile
|
||||
provider_interface: groq
|
||||
routing_preferences:
|
||||
- description: generating new code snippets, functions, or boilerplate based on
|
||||
user prompts or requirements
|
||||
name: code generation
|
||||
- description: reviewing, analyzing, and suggesting improvements to existing code
|
||||
name: code review
|
||||
- base_url: https://litellm.example.com
|
||||
cluster_name: openai_litellm.example.com
|
||||
endpoint: litellm.example.com
|
||||
|
|
@ -221,6 +209,21 @@ routing:
|
|||
type: memory
|
||||
session_max_entries: 10000
|
||||
session_ttl_seconds: 600
|
||||
routing_preferences:
|
||||
- description: generating new code snippets, functions, or boilerplate based on user
|
||||
prompts or requirements
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-0
|
||||
- openai/gpt-4o
|
||||
- groq/llama-3.3-70b-versatile
|
||||
name: code generation
|
||||
- description: reviewing, analyzing, and suggesting improvements to existing code
|
||||
models:
|
||||
- anthropic/claude-sonnet-4-0
|
||||
- groq/llama-3.3-70b-versatile
|
||||
name: code review
|
||||
selection_policy:
|
||||
prefer: cheapest
|
||||
state_storage:
|
||||
type: memory
|
||||
system_prompt: 'You are a helpful assistant. Always respond concisely and accurately.
|
||||
|
|
@ -237,4 +240,4 @@ tracing:
|
|||
environment: production
|
||||
service.team: platform
|
||||
trace_arch_internal: false
|
||||
version: v0.3.0
|
||||
version: v0.4.0
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue