mirror of
https://github.com/katanemo/plano.git
synced 2026-04-25 16:56:24 +02:00
* fix(routing): auto-migrate v0.3.0 inline routing_preferences to v0.4.0 top-level Lift inline routing_preferences under each model_provider into the top-level routing_preferences list with merged models[] and bump version to v0.4.0, with a deprecation warning. Existing v0.3.0 demo configs (Claude Code, Codex, preference_based_routing, etc.) keep working unchanged. Schema flags the inline shape as deprecated but still accepts it. Docs and skills updated to canonical top-level multi-model form. * test(common): bump reference config assertion to v0.4.0 The rendered reference config was bumped to v0.4.0 when its inline routing_preferences were lifted to the top level; align the configuration deserialization test with that change. * fix(config_generator): bump version to v0.4.0 up front in migration Move the v0.3.0 -> v0.4.0 version bump to the top of migrate_inline_routing_preferences so it runs unconditionally, including for configs that already declare top-level routing_preferences at v0.3.0. Previously the bump only fired when inline migration produced entries, leaving top-level v0.3.0 configs rejected by brightstaff's v0.4.0 gate. Tests updated to cover the new behavior and to confirm we never downgrade newer versions. * fix(config_generator): gate routing_preferences migration on version < v0.4.0 Short-circuit the migration when the config already declares v0.4.0 or newer. Anything at v0.4.0+ is assumed to be on the canonical top-level shape and is passed through untouched, including stray inline preferences (which are the author's bug to fix). Only v0.3.0 and older configs are rewritten and bumped.
178 lines
5.3 KiB
Markdown
178 lines
5.3 KiB
Markdown
# Plano Routing API — Request & Response Format
|
|
|
|
## Overview
|
|
|
|
Plano intercepts LLM requests and routes them to the best available model based on semantic intent and live cost/latency data. The developer sends a standard OpenAI-compatible request with an optional `routing_preferences` field. Plano returns an ordered list of candidate models; the client uses the first and falls back to the next on 429 or 5xx errors.
|
|
|
|
---
|
|
|
|
## Request Format
|
|
|
|
Standard OpenAI chat completion body. The only addition is the optional `routing_preferences` field, which is stripped before the request is forwarded upstream.
|
|
|
|
```json
|
|
POST /v1/chat/completions
|
|
{
|
|
"model": "openai/gpt-4o-mini",
|
|
"messages": [
|
|
{"role": "user", "content": "write a sorting algorithm in Python"}
|
|
],
|
|
"routing_preferences": [
|
|
{
|
|
"name": "code generation",
|
|
"description": "generating new code snippets",
|
|
"models": ["anthropic/claude-sonnet-4-20250514", "openai/gpt-4o", "openai/gpt-4o-mini"]
|
|
},
|
|
{
|
|
"name": "general questions",
|
|
"description": "casual conversation and simple queries",
|
|
"models": ["openai/gpt-4o-mini"]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### `routing_preferences` fields
|
|
|
|
|
|
| Field | Type | Required | Description |
|
|
| ------------- | -------- | -------- | ------------------------------------------------------------------------------------------- |
|
|
| `name` | string | yes | Route identifier. Must match the LLM router's route classification. |
|
|
| `description` | string | yes | Natural language description used by the router to match user intent. |
|
|
| `models` | string[] | yes | Ordered candidate pool. At least one entry required. Must be declared in `model_providers`. |
|
|
|
|
|
|
### Notes
|
|
|
|
- `routing_preferences` is **optional**. If omitted, the config-defined preferences are used.
|
|
- If provided in the request body, it **overrides** the config for that single request only.
|
|
- `model` is still required and is used as the fallback if no route is matched.
|
|
|
|
---
|
|
|
|
## Response Format
|
|
|
|
```json
|
|
{
|
|
"models": [
|
|
"anthropic/claude-sonnet-4-20250514",
|
|
"openai/gpt-4o",
|
|
"openai/gpt-4o-mini"
|
|
],
|
|
"route": "code generation",
|
|
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736"
|
|
}
|
|
```
|
|
|
|
### Fields
|
|
|
|
|
|
| Field | Type | Description |
|
|
| ---------- | ------------- | ------------------------------------------------------------------------------------------------------- |
|
|
| `models` | string[] | Ranked model list. Use `models[0]` as primary; retry with `models[1]` on 429/5xx, and so on. |
|
|
| `route` | string | null | Name of the matched route. `null` if no route matched — client should use the original request `model`. |
|
|
| `trace_id` | string | Trace ID for distributed tracing and observability. |
|
|
|
|
|
|
---
|
|
|
|
## Client Usage Pattern
|
|
|
|
```python
|
|
response = plano.routing_decision(request)
|
|
models = response["models"]
|
|
|
|
for model in models:
|
|
try:
|
|
result = call_llm(model, messages)
|
|
break # success — stop trying
|
|
except (RateLimitError, ServerError):
|
|
continue # try next model in the ranked list
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration (set by platform/ops team)
|
|
|
|
Requires `version: v0.4.0` or above. Models listed under `routing_preferences` must be declared in `model_providers`.
|
|
|
|
```yaml
|
|
version: v0.4.0
|
|
|
|
model_providers:
|
|
- model: anthropic/claude-sonnet-4-20250514
|
|
access_key: $ANTHROPIC_API_KEY
|
|
- model: openai/gpt-4o
|
|
access_key: $OPENAI_API_KEY
|
|
- model: openai/gpt-4o-mini
|
|
access_key: $OPENAI_API_KEY
|
|
default: true
|
|
|
|
routing_preferences:
|
|
- name: code generation
|
|
description: generating new code snippets or boilerplate
|
|
models:
|
|
- anthropic/claude-sonnet-4-20250514
|
|
- openai/gpt-4o
|
|
|
|
- name: general questions
|
|
description: casual conversation and simple queries
|
|
models:
|
|
- openai/gpt-4o-mini
|
|
- openai/gpt-4o
|
|
```
|
|
|
|
---
|
|
|
|
## Model Affinity
|
|
|
|
In agentic loops where the same session makes multiple LLM calls, send an `X-Model-Affinity` header to pin the routing decision. The first request routes normally and caches the result. All subsequent requests with the same affinity ID return the cached model without re-running routing.
|
|
|
|
```json
|
|
POST /v1/chat/completions
|
|
X-Model-Affinity: a1b2c3d4-5678-...
|
|
|
|
{
|
|
"model": "openai/gpt-4o-mini",
|
|
"messages": [...]
|
|
}
|
|
```
|
|
|
|
The routing decision endpoint also supports model affinity:
|
|
|
|
```json
|
|
POST /routing/v1/chat/completions
|
|
X-Model-Affinity: a1b2c3d4-5678-...
|
|
```
|
|
|
|
Response when pinned:
|
|
|
|
```json
|
|
{
|
|
"models": ["anthropic/claude-sonnet-4-20250514"],
|
|
"route": "code generation",
|
|
"trace_id": "...",
|
|
"session_id": "a1b2c3d4-5678-...",
|
|
"pinned": true
|
|
}
|
|
```
|
|
|
|
Without the header, routing runs fresh every time (no breaking change).
|
|
|
|
Configure TTL and cache size:
|
|
|
|
```yaml
|
|
routing:
|
|
session_ttl_seconds: 600 # default: 10 min
|
|
session_max_entries: 10000 # upper limit
|
|
```
|
|
|
|
---
|
|
|
|
## Version Requirements
|
|
|
|
|
|
| Version | Top-level `routing_preferences` |
|
|
| ---------- | -------------------------------------- |
|
|
| `< v0.4.0` | Not allowed — startup error if present |
|
|
| `v0.4.0+` | Supported (required for model routing) |
|