This commit is contained in:
Spherrrical 2026-06-24 17:14:50 +00:00
parent 3f910c4943
commit e8c1f79969
6 changed files with 561 additions and 159 deletions

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.25
llms.txt (auto-generated)
Generated (UTC): 2026-06-24T17:14:10.499782+00:00
Generated (UTC): 2026-06-24T17:14:45.476309+00:00
Table of contents
- Agents (concepts/agents)
@ -4287,6 +4287,178 @@ response = client.chat.completions.create(
# No model specified - router will analyze and choose claude-sonnet-4-5
)
Cost- and latency-aware selection
When a route lists more than one candidate model, you can let Plano reorder that
candidate pool using live cost or latency data instead of relying solely on the
order you wrote them in. This is controlled per route with selection_policy and
backed by one or more model_metrics_sources.
This is useful when several models are equally capable for a route and you want Plano
to always reach for the cheapest (or fastest) option first, with the others kept as
fallbacks.
Selection policy
Attach an optional selection_policy to any entry in routing_preferences:
Per-route selection policy
routing_preferences:
- name: code review
description: reviewing, analyzing, and suggesting improvements to existing code
models:
- anthropic/claude-sonnet-4-5
- groq/llama-3.3-70b-versatile
selection_policy:
prefer: cheapest # cheapest | fastest | none
prefer accepts:
cheapest — order candidates by total price (input + output rate) ascending, using a cost metrics source.
fastest — order candidates by observed latency ascending, using a latency metrics source.
none (default) — keep the order you declared; no reordering.
Models that have no data in the selected source are ranked last, in their original
order, so routing always degrades gracefully rather than dropping a candidate.
Configuring the pricing source
cheapest routing needs a price catalog. Planos default pricing provider is
DigitalOcean — its GenAI model catalog is public (no API key, no signup), so cost data
is available out of the box and is what planoai obs uses if you dont configure
anything. The pricing source is fully swappable: point Plano at models.dev,
or at any endpoint that exposes a supported pricing structure.
The provider field selects which response schema Plano expects (and therefore how it
parses the catalog); the optional url lets you override the endpoint — for example to
use a mirror, a cached copy, or an internal catalog service that returns the same shape.
provider
Default catalog URL
Key format
Expected structure
digitalocean (default)
DigitalOcean GenAI model catalog
lowercase(creator)/model_id
{ data: [ { model_id, pricing: { input_price_per_million, output_price_per_million } } ] }
models.dev
https://models.dev/api.json
creator/model (e.g. anthropic/claude-sonnet-4-5)
{ <provider>: { models: { <model>: { cost: { input, output } } } } }
Because the source is selected per provider, switching is a one-line change. To stay
on the default DigitalOcean catalog you can omit model_metrics_sources entirely for
planoai obs, or declare it explicitly for routing:
Default cost source (DigitalOcean)
model_metrics_sources:
- type: cost
provider: digitalocean # default; uses the public DO GenAI catalog
To switch to models.dev — an open, community-maintained catalog covering a broad range of
providers and models — change the provider (and optionally url):
Cost source backed by models.dev
model_metrics_sources:
- type: cost
provider: models.dev # models.dev | digitalocean
url: https://models.dev/api.json # optional; defaults per provider
refresh_interval: 3600 # optional, seconds; refetch on this interval
model_aliases: # optional; see below
openai/gpt-oss-120b: openai/gpt-4o
To use your own endpoint, pick the provider whose structure your endpoint matches and
override url — Plano parses the response with that providers schema:
Custom endpoint exposing the DigitalOcean catalog structure
model_metrics_sources:
- type: cost
provider: digitalocean # selects the DO response schema
url: https://catalog.internal.example.com/pricing
The cost metric used for ranking is the sum of the input and output per-million-token
rates — a relative signal for ordering candidates, not a per-request bill. For actual
per-request cost, see the observability console below.
Matching catalog keys to your models
The router looks up each candidate model by the exact name you use in
routing_preferences (e.g. anthropic/claude-sonnet-4-5). models.dev keys models as
creator/model, which lines up with Planos provider/model naming, so most models
match automatically.
When a catalog key does not match your model name — for example a version skew, or an
open-weight model you serve under a different provider — use model_aliases to map the
catalog key to the Plano model name used in your routing preferences:
model_metrics_sources:
- type: cost
provider: models.dev
model_aliases:
# catalog key : plano model name
openai/gpt-oss-120b: openai/gpt-4o
Latency source
fastest routing reads observed latency from a Prometheus instance. Provide the query
that returns a per-model latency value (lower is faster), labelled by model_name:
Latency source backed by Prometheus
model_metrics_sources:
- type: latency
provider: prometheus
url: http://prometheus:9090
query: avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))
refresh_interval: 60
You can declare both a cost and a latency source at the same time; each route
picks whichever it needs based on its selection_policy.
Cost in the observability console
planoai obs displays a per-request USD cost column derived from the same pricing
catalog. By default it reads the cost source from your config (the first
type: cost entry under model_metrics_sources); you can also override it on the
command line:
# Use the cost source from ./config.yaml (default)
planoai obs
# Or override the provider / endpoint explicitly
planoai obs --pricing-provider models.dev
planoai obs --pricing-url https://models.dev/api.json
If no source is configured and no override is given, planoai obs falls back to the
DigitalOcean catalog so the cost column still populates out of the box.
Plano-Orchestrator
Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
@ -7072,6 +7244,24 @@ routing_preferences:
selection_policy:
prefer: cheapest
# model_metrics_sources: external catalogs the router reads to reorder candidate
# models for selection_policy.prefer. A `cost` source ranks `prefer: cheapest`;
# a `latency` source ranks `prefer: fastest`. Both are optional.
model_metrics_sources:
# Cost catalog. provider: models.dev | digitalocean (default url per provider).
- type: cost
provider: models.dev
url: https://models.dev/api.json # optional; omit to use the provider default
refresh_interval: 3600 # optional, seconds
model_aliases: # optional: catalog key -> Plano model name
openai/gpt-oss-120b: openai/gpt-4o
# Latency catalog (Prometheus). Used for selection_policy.prefer: fastest.
- type: latency
provider: prometheus
url: http://prometheus:9090
query: avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))
refresh_interval: 60
# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
listeners:
# Agent listener for routing requests to multiple agents