<liclass="toctree-l2"><aclass="reference internal"href="../concepts/llm_providers/supported_providers.html">Supported Providers & Configuration</a></li>
<spanid="llm-router"></span><h1>LLM Routing<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#llm-routing"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<p>With the rapid proliferation of large language models (LLMs) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models. Plano provides three distinct routing approaches to meet different use cases: <aclass="reference internal"href="#model-based-routing"><spanclass="std std-ref">Model-based routing</span></a>, <aclass="reference internal"href="#alias-based-routing"><spanclass="std std-ref">Alias-based routing</span></a>, and <aclass="reference internal"href="#preference-aligned-routing"><spanclass="std std-ref">Preference-aligned routing</span></a>. This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.</p>
<divclass="admonition note">
<pclass="admonition-title">Note</p>
<p>For details on supported model providers, configuration options, and client libraries, see <aclass="reference internal"href="../concepts/llm_providers/llm_providers.html#llm-providers"><spanclass="std std-ref">LLM Providers</span></a>.</p>
<p>Direct routing allows you to specify exact provider and model combinations using the format <codeclass="docutils literal notranslate"><spanclass="pre">provider/model-name</span></code>:</p>
<h4>Configuration<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#configuration"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>Configure your LLM providers with specific provider/model names:</p>
<h4>Client usage<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#client-usage"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><code><spanid="line-1"><spanclass="c1"># Direct provider/model specification</span>
</span><spanid="line-9"><spanclass="n">messages</span><spanclass="o">=</span><spanclass="p">[{</span><spanclass="s2">"role"</span><spanclass="p">:</span><spanclass="s2">"user"</span><spanclass="p">,</span><spanclass="s2">"content"</span><spanclass="p">:</span><spanclass="s2">"Write a story"</span><spanclass="p">}]</span>
<spanid="id2"></span><h3>Alias-based routing<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#alias-based-routing"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#alias-based-routing'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>Alias-based routing lets you create semantic model names that decouple your application from specific providers:</p>
<ulclass="simple">
<li><p>Use meaningful names like <codeclass="docutils literal notranslate"><spanclass="pre">fast-model</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">reasoning-model</span></code>, or <codeclass="docutils literal notranslate"><spanclass="pre">plano.summarize.v1</span></code> (see <aclass="reference internal"href="../concepts/llm_providers/model_aliases.html#model-aliases"><spanclass="std std-ref">Model Aliases</span></a>)</p></li>
<li><p>Maps semantic names to underlying provider models for easier experimentation and provider switching</p></li>
<li><p>Ideal for applications that want abstraction from specific model names while maintaining control</p></li>
</ul>
<sectionid="id3">
<h4>Configuration<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id3"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<h4>Client usage<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id4"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><code><spanid="line-1"><spanclass="c1"># Using semantic aliases</span>
</span><spanid="line-3"><spanclass="n">model</span><spanclass="o">=</span><spanclass="s2">"fast-model"</span><spanclass="p">,</span><spanclass="c1"># Routes to best available fast model</span>
</span><spanid="line-8"><spanclass="n">model</span><spanclass="o">=</span><spanclass="s2">"reasoning-model"</span><spanclass="p">,</span><spanclass="c1"># Routes to best reasoning model</span>
</span><spanid="line-9"><spanclass="n">messages</span><spanclass="o">=</span><spanclass="p">[{</span><spanclass="s2">"role"</span><spanclass="p">:</span><spanclass="s2">"user"</span><spanclass="p">,</span><spanclass="s2">"content"</span><spanclass="p">:</span><spanclass="s2">"Solve this complex problem"</span><spanclass="p">}]</span>
<spanid="preference-aligned-routing"></span><h3>Preference-aligned routing (Plano-Orchestrator)<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#preference-aligned-routing-plano-orchestrator"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#preference-aligned-routing-plano-orchestrator'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>Preference-aligned routing uses the <aclass="reference external"href="https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B"rel="nofollow noopener">Plano-Orchestrator<svgfill="currentColor"height="1em"stroke="none"viewbox="0 96 960 960"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.</p>
<p>Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples <strong>routing policy</strong> (how to choose) from <strong>model assignment</strong> (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.</p>
<h4>Configuration<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id5"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>To configure preference-aligned dynamic routing, declare a top-level <codeclass="docutils literal notranslate"><spanclass="pre">routing_preferences</span></code> list and attach an ordered <codeclass="docutils literal notranslate"><spanclass="pre">models</span></code> candidate pool to each route. Starting in <codeclass="docutils literal notranslate"><spanclass="pre">v0.4.0</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">routing_preferences</span></code> lives at the root of the config (not inline under <codeclass="docutils literal notranslate"><spanclass="pre">model_providers</span></code>), which lets multiple models serve the same route — the first entry in <codeclass="docutils literal notranslate"><spanclass="pre">models</span></code> is primary, the rest are fallbacks that the client tries on <codeclass="docutils literal notranslate"><spanclass="pre">429</span></code>/<codeclass="docutils literal notranslate"><spanclass="pre">5xx</span></code> errors.</p>
</span><spanid="line-28"><spanclass="w"></span><spanclass="nt">description</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">deep analysis, mathematical problem solving, and logical reasoning</span>
</span><spanid="line-36"><spanclass="w"></span><spanclass="nt">description</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">generating new code snippets, functions, or boilerplate based on user prompts</span>
<p>Configs still using the <codeclass="docutils literal notranslate"><spanclass="pre">v0.3.0</span></code> inline style (<codeclass="docutils literal notranslate"><spanclass="pre">routing_preferences</span></code> nested under each <codeclass="docutils literal notranslate"><spanclass="pre">model_provider</span></code>) are auto-migrated to this top-level shape by the Plano CLI at compile time, with a deprecation warning. Update your config to the form above to silence the warning.</p>
<h4>Client usage<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id6"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>Clients can let the router decide or still specify aliases:</p>
<divclass="highlight-python notranslate"><divclass="highlight"><pre><span></span><code><spanid="line-1"><spanclass="c1"># Let Plano-Orchestrator choose based on content</span>
</span><spanid="line-3"><spanclass="n">messages</span><spanclass="o">=</span><spanclass="p">[{</span><spanclass="s2">"role"</span><spanclass="p">:</span><spanclass="s2">"user"</span><spanclass="p">,</span><spanclass="s2">"content"</span><spanclass="p">:</span><spanclass="s2">"Write a creative story about space exploration"</span><spanclass="p">}]</span>
<spanid="cost-latency-aware-selection"></span><h2>Cost- and latency-aware selection<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#cost-and-latency-aware-selection"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#cost-and-latency-aware-selection'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>When a route lists more than one candidate model, you can let Plano reorder that
candidate pool using <strong>live cost or latency data</strong> instead of relying solely on the
order you wrote them in. This is controlled per route with <codeclass="docutils literal notranslate"><spanclass="pre">selection_policy</span></code> and
backed by one or more <codeclass="docutils literal notranslate"><spanclass="pre">model_metrics_sources</span></code>.</p>
<p>This is useful when several models are equally capable for a route and you want Plano
to always reach for the cheapest (or fastest) option first, with the others kept as
fallbacks.</p>
<sectionid="selection-policy">
<h3>Selection policy<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#selection-policy"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#selection-policy'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>Attach an optional <codeclass="docutils literal notranslate"><spanclass="pre">selection_policy</span></code> to any entry in <codeclass="docutils literal notranslate"><spanclass="pre">routing_preferences</span></code>:</p>
</span><spanid="line-3"><spanclass="w"></span><spanclass="nt">description</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">reviewing, analyzing, and suggesting improvements to existing code</span>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">cheapest</span></code> — order candidates by total price (input + output rate) ascending, using a <codeclass="docutils literal notranslate"><spanclass="pre">cost</span></code> metrics source.</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">fastest</span></code> — order candidates by observed latency ascending, using a <codeclass="docutils literal notranslate"><spanclass="pre">latency</span></code> metrics source.</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">none</span></code> (default) — keep the order you declared; no reordering.</p></li>
</ul>
<p>Models that have no data in the selected source are ranked <strong>last</strong>, in their original
order, so routing always degrades gracefully rather than dropping a candidate.</p>
</section>
<sectionid="configuring-the-pricing-source">
<h3>Configuring the pricing source<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#configuring-the-pricing-source"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#configuring-the-pricing-source'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p><codeclass="docutils literal notranslate"><spanclass="pre">cheapest</span></code> routing needs a price catalog. Plano’s <strong>default pricing provider is
DigitalOcean</strong> — its GenAI model catalog is public (no API key, no signup), so cost data
is available out of the box and is what <codeclass="docutils literal notranslate"><spanclass="pre">planoai</span><spanclass="pre">obs</span></code> uses if you don’t configure
anything. The pricing source is fully swappable: point Plano at <aclass="reference external"href="https://models.dev/"rel="nofollow noopener">models.dev<svgfill="currentColor"height="1em"stroke="none"viewbox="0 96 960 960"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>,
or at <strong>any endpoint that exposes a supported pricing structure</strong>.</p>
<p>The <codeclass="docutils literal notranslate"><spanclass="pre">provider</span></code> field selects which response schema Plano expects (and therefore how it
parses the catalog); the optional <codeclass="docutils literal notranslate"><spanclass="pre">url</span></code> lets you override the endpoint — for example to
use a mirror, a cached copy, or an internal catalog service that returns the same shape.</p>
<p>Because the source is selected per <codeclass="docutils literal notranslate"><spanclass="pre">provider</span></code>, switching is a one-line change. To stay
on the default DigitalOcean catalog you can omit <codeclass="docutils literal notranslate"><spanclass="pre">model_metrics_sources</span></code> entirely for
<codeclass="docutils literal notranslate"><spanclass="pre">planoai</span><spanclass="pre">obs</span></code>, or declare it explicitly for routing:</p>
</span><spanid="line-3"><spanclass="w"></span><spanclass="nt">provider</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">digitalocean</span><spanclass="w"></span><spanclass="c1"># default; uses the public DO GenAI catalog</span>
</span></code></pre></div>
</div>
</div>
<p>To switch to models.dev — an open, community-maintained catalog covering a broad range of
providers and models — change the <codeclass="docutils literal notranslate"><spanclass="pre">provider</span></code> (and optionally <codeclass="docutils literal notranslate"><spanclass="pre">url</span></code>):</p>
<divclass="code-block-caption"><spanclass="caption-text">Cost source backed by models.dev</span><a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id14"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
</span><spanid="line-4"><spanclass="w"></span><spanclass="nt">url</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">https://models.dev/api.json</span><spanclass="w"></span><spanclass="c1"># optional; defaults per provider</span>
</span><spanid="line-5"><spanclass="w"></span><spanclass="nt">refresh_interval</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">3600</span><spanclass="w"></span><spanclass="c1"># optional, seconds; refetch on this interval</span>
</span><spanid="line-6"><spanclass="w"></span><spanclass="nt">model_aliases</span><spanclass="p">:</span><spanclass="w"></span><spanclass="c1"># optional; see below</span>
<p>To use your own endpoint, pick the <codeclass="docutils literal notranslate"><spanclass="pre">provider</span></code> whose structure your endpoint matches and
override <codeclass="docutils literal notranslate"><spanclass="pre">url</span></code> — Plano parses the response with that provider’s schema:</p>
<divclass="code-block-caption"><spanclass="caption-text">Custom endpoint exposing the DigitalOcean catalog structure</span><a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id15"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
</span><spanid="line-3"><spanclass="w"></span><spanclass="nt">provider</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">digitalocean</span><spanclass="w"></span><spanclass="c1"># selects the DO response schema</span>
<h3>Matching catalog keys to your models<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#matching-catalog-keys-to-your-models"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#matching-catalog-keys-to-your-models'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>The router looks up each candidate model by the exact name you use in
<codeclass="docutils literal notranslate"><spanclass="pre">creator/model</span></code>, which lines up with Plano’s <codeclass="docutils literal notranslate"><spanclass="pre">provider/model</span></code> naming, so most models
match automatically.</p>
<p>When a catalog key does not match your model name — for example a version skew, or an
open-weight model you serve under a different provider — use <codeclass="docutils literal notranslate"><spanclass="pre">model_aliases</span></code> to map the
<strong>catalog key</strong> to the <strong>Plano model name</strong> used in your routing preferences:</p>
<h3>Latency source<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#latency-source"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#latency-source'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p><codeclass="docutils literal notranslate"><spanclass="pre">fastest</span></code> routing reads observed latency from a Prometheus instance. Provide the query
that returns a per-model latency value (lower is faster), labelled by <codeclass="docutils literal notranslate"><spanclass="pre">model_name</span></code>:</p>
<divclass="code-block-caption"><spanclass="caption-text">Latency source backed by Prometheus</span><a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id16"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<p>You can declare both a <codeclass="docutils literal notranslate"><spanclass="pre">cost</span></code> and a <codeclass="docutils literal notranslate"><spanclass="pre">latency</span></code> source at the same time; each route
picks whichever it needs based on its <codeclass="docutils literal notranslate"><spanclass="pre">selection_policy</span></code>.</p>
</section>
<sectionid="cost-in-the-observability-console">
<h3>Cost in the observability console<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#cost-in-the-observability-console"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#cost-in-the-observability-console'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p><codeclass="docutils literal notranslate"><spanclass="pre">planoai</span><spanclass="pre">obs</span></code> displays a per-request USD cost column derived from the same pricing
catalog. By default it reads the <codeclass="docutils literal notranslate"><spanclass="pre">cost</span></code> source from your config (the first
<codeclass="docutils literal notranslate"><spanclass="pre">type:</span><spanclass="pre">cost</span></code> entry under <codeclass="docutils literal notranslate"><spanclass="pre">model_metrics_sources</span></code>); you can also override it on the
command line:</p>
<divclass="highlight-bash notranslate"><divclass="highlight"><pre><span></span><code><spanid="line-1"><spanclass="c1"># Use the cost source from ./config.yaml (default)</span>
<p>If no source is configured and no override is given, <codeclass="docutils literal notranslate"><spanclass="pre">planoai</span><spanclass="pre">obs</span></code> falls back to the
DigitalOcean catalog so the cost column still populates out of the box.</p>
<h2>Plano-Orchestrator<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#id7"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#id7'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>Plano-Orchestrator is a <strong>preference-based routing model</strong> specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.</p>
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.</p>
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.</p>
<p><strong>Preference-Encoded Routing</strong>
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.</p>
<li><p><strong>Domain</strong>– the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).</p></li>
<li><p><strong>Action</strong>– the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).</p></li>
<p>Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.</p>
<li><p><strong>Structured Preference Routing</strong>: Aligns prompt request with model strengths using explicit domain–action mappings.</p></li>
<li><p><strong>Transparent and Controllable</strong>: Makes routing decisions transparent and configurable, empowering users to customize system behavior.</p></li>
<li><p><strong>Flexible and Adaptive</strong>: Supports evolving user needs, model updates, and new domains/actions without retraining the router.</p></li>
<li><p><strong>Production-Ready Performance</strong>: Optimized for low-latency, high-throughput applications in multi-model environments.</p></li>
<h2>Self-hosting Plano-Orchestrator<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#self-hosting-plano-orchestrator"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#self-hosting-plano-orchestrator'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either <strong>Ollama</strong> or <strong>vLLM</strong>.</p>
<h3>Using Ollama (recommended for local development)<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#using-ollama-recommended-for-local-development"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#using-ollama-recommended-for-local-development'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>This downloads the quantized GGUF model from HuggingFace and starts serving on <codeclass="docutils literal notranslate"><spanclass="pre">http://localhost:11434</span></code>.</p>
<h3>Using vLLM (recommended for production / EC2)<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#using-vllm-recommended-for-production-ec2"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#using-vllm-recommended-for-production-ec2'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>vLLM provides higher throughput and GPU optimizations suitable for production deployments.</p>
<h3>Using vLLM on Kubernetes (GPU nodes)<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#using-vllm-on-kubernetes-gpu-nodes"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#using-vllm-on-kubernetes-gpu-nodes'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
The <codeclass="docutils literal notranslate"><spanclass="pre">demos/llm_routing/model_routing_service/</span></code> directory includes ready-to-use manifests:</p>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">vllm-deployment.yaml</span></code> — Plano-Orchestrator served by vLLM, with an init container to download
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">plano-deployment.yaml</span></code> — Plano proxy configured to use the in-cluster Plano-Orchestrator</p></li>
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">config_k8s.yaml</span></code> — Plano config with <codeclass="docutils literal notranslate"><spanclass="pre">llm_routing_model</span></code> pointing at
<codeclass="docutils literal notranslate"><spanclass="pre">http://plano-orchestrator:10000</span></code> instead of the default hosted endpoint</p></li>
<li><p>GPU nodes commonly have a <codeclass="docutils literal notranslate"><spanclass="pre">nvidia.com/gpu:NoSchedule</span></code> taint — the <codeclass="docutils literal notranslate"><spanclass="pre">vllm-deployment.yaml</span></code>
includes a matching toleration. The <codeclass="docutils literal notranslate"><spanclass="pre">nvidia.com/gpu:</span><spanclass="pre">"1"</span></code> resource request is sufficient
for scheduling in most clusters; a <codeclass="docutils literal notranslate"><spanclass="pre">nodeSelector</span></code> is optional and commented out in the
manifest for cases where you need to pin to a specific GPU node pool.</p></li>
<li><p>Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
<codeclass="docutils literal notranslate"><spanclass="pre">livenessProbe</span></code> has a 180-second <codeclass="docutils literal notranslate"><spanclass="pre">initialDelaySeconds</span></code> to avoid premature restarts.</p></li>
<li><p>The Plano config ConfigMap must use <codeclass="docutils literal notranslate"><spanclass="pre">--from-file=plano_config.yaml=config_k8s.yaml</span></code> with
<codeclass="docutils literal notranslate"><spanclass="pre">subPath</span></code> in the Deployment — omitting <codeclass="docutils literal notranslate"><spanclass="pre">subPath</span></code> causes Kubernetes to mount a directory
instead of a file.</p></li>
</ul>
<p>For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
<aclass="reference internal"href="../resources/deployment.html#deployment"><spanclass="std std-ref">Deployment</span></a>. For full step-by-step commands specific to this demo, see the
<p>In agentic loops — where a single user request triggers multiple LLM calls through tool use — Plano’s router classifies each turn independently. Because successive prompts differ in intent (tool selection looks like code generation, reasoning about results looks like analysis), the router may select different models mid-session. This causes behavioral inconsistency and invalidates provider-side KV caches, increasing both latency and cost.</p>
<p><strong>Model affinity</strong> pins the routing decision for the duration of a session. Send an <codeclass="docutils literal notranslate"><spanclass="pre">X-Model-Affinity</span></code> header with any string identifier (typically a UUID). The first request routes normally and caches the result. All subsequent requests with the same affinity ID skip routing and reuse the cached model.</p>
</span><spanid="line-2"><spanclass="w"></span><spanclass="nt">session_ttl_seconds</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">600</span><spanclass="w"></span><spanclass="c1"># How long affinity lasts (default: 10 min)</span>
</span><spanid="line-3"><spanclass="w"></span><spanclass="nt">session_max_entries</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">10000</span><spanclass="w"></span><spanclass="c1"># Max cached sessions (upper limit: 10000)</span>
</span></code></pre></div>
</div>
<p>To start a new routing decision (e.g., when the agent’s task changes), generate a new affinity ID.</p>
<h3>Session Cache Backends<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#session-cache-backends"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#session-cache-backends'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>By default, Plano stores session affinity state in an in-process LRU cache. This works well for single-instance deployments, but sessions are not shared across replicas — each instance has its own independent cache.</p>
<p>For deployments with multiple Plano replicas (Kubernetes, Docker Compose with <codeclass="docutils literal notranslate"><spanclass="pre">scale</span></code>, or any load-balanced setup), use Redis as the session cache backend. All replicas connect to the same Redis instance, so an affinity decision made by one replica is honoured by every other replica in the pool.</p>
<p><strong>In-memory (default)</strong></p>
<p>No configuration required. Sessions live only for the lifetime of the process and are lost on restart.</p>
</span><spanid="line-2"><spanclass="w"></span><spanclass="nt">session_ttl_seconds</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">600</span><spanclass="w"></span><spanclass="c1"># How long affinity lasts (default: 10 min)</span>
<p>Requires a reachable Redis instance. The <codeclass="docutils literal notranslate"><spanclass="pre">url</span></code> field supports standard Redis URI syntax, including authentication (<codeclass="docutils literal notranslate"><spanclass="pre">redis://:password@host:6379</span></code>) and TLS (<codeclass="docutils literal notranslate"><spanclass="pre">rediss://host:6380</span></code>). Redis handles TTL expiry natively, so no periodic cleanup is needed.</p>
<p>When using Redis in a multi-tenant environment, construct the <codeclass="docutils literal notranslate"><spanclass="pre">X-Model-Affinity</span></code> header value to include a tenant identifier, for example <codeclass="docutils literal notranslate"><spanclass="pre">{tenant_id}:{session_id}</span></code>. Plano stores each key under the internal namespace <codeclass="docutils literal notranslate"><spanclass="pre">plano:affinity:{key}</span></code>, so tenant-scoped values avoid cross-tenant collisions without any additional configuration.</p>
<p>With this configuration, any replica that first receives a request for affinity ID <codeclass="docutils literal notranslate"><spanclass="pre">abc-123</span></code> caches the routing decision in Redis. Subsequent requests for <codeclass="docutils literal notranslate"><spanclass="pre">abc-123</span></code> — regardless of which replica they land on — retrieve the same pinned model.</p>
</span><spanid="line-16"><spanclass="w"></span><spanclass="nt">description</span><spanclass="p">:</span><spanclass="w"></span><spanclass="l l-Scalar l-Scalar-Plain">deep analysis and complex problem solving</span>
<li><p><strong>Use direct model selection</strong>: <codeclass="docutils literal notranslate"><spanclass="pre">model="fast-model"</span></code></p></li>
<li><p><strong>Let the router decide</strong>: No model specified, router analyzes content</p></li>
<h2>Example Use Cases<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#example-use-cases"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#example-use-cases'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<li><p><strong>Coding Tasks</strong>: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.</p></li>
<li><p><strong>Content Processing Workflows</strong>: Classify requests as summarization (“summarize this document”), translation (“translate to Spanish”), or analysis (“what are the key themes”), enabling targeted model selection.</p></li>
<li><p><strong>Multi-Domain Applications</strong>: Accurately identify whether requests fall into legal, healthcare, technical, or general domains, even when the subject matter isn’t explicitly stated in the prompt.</p></li>
<li><p><strong>Conversational Routing</strong>: Track conversation context to identify when topics shift between domains or when the type of assistance needed changes mid-conversation.</p></li>
<li><p><strong>💡 Clear Usage Description:</strong> Make your route names and descriptions specific, unambiguous, and minimizing overlap between routes. The Router performs better when it can clearly distinguish between different types of requests.</p>
<li><p><strong>💡Nouns Descriptor:</strong> Preference-based routers perform better with noun-centric descriptors, as they offer more stable and semantically rich signals for matching.</p></li>
<li><p><strong>💡Domain Inclusion:</strong> for best user experience, you should always include a domain route. This helps the router fall back to domain when action is not confidently inferred.</p></li>
</ul>
</section>
<sectionid="unsupported-features">
<h2>Unsupported Features<a@click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)"aria-label="Copy link to this element"class="headerlink"data-tooltip="Copy link to this element"href="#unsupported-features"x-intersect.margin.0%.0%.-70%.0%="activeSection ='#unsupported-features'"><svgheight="1em"viewbox="0 0 24 24"width="1em"xmlns="http://www.w3.org/2000/svg"><pathd="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<li><p><strong>Multi-modality</strong>: The model is not trained to process raw image or audio inputs. It can handle textual queries <em>about</em> these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.</p></li>
<li><p><strong>Function calling</strong>: Plano-Orchestrator is designed for <strong>semantic preference matching</strong>, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.</p></li>
<li><p><strong>System prompt dependency</strong>: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.</p></li>
<li><a:data-current="activeSection === '#configuring-the-pricing-source'"class="reference internal"href="#configuring-the-pricing-source">Configuring the pricing source</a></li>
<li><a:data-current="activeSection === '#matching-catalog-keys-to-your-models'"class="reference internal"href="#matching-catalog-keys-to-your-models">Matching catalog keys to your models</a></li>
<li><a:data-current="activeSection === '#cost-in-the-observability-console'"class="reference internal"href="#cost-in-the-observability-console">Cost in the observability console</a></li>
<li><a:data-current="activeSection === '#using-ollama-recommended-for-local-development'"class="reference internal"href="#using-ollama-recommended-for-local-development">Using Ollama (recommended for local development)</a></li>
<li><a:data-current="activeSection === '#using-vllm-recommended-for-production-ec2'"class="reference internal"href="#using-vllm-recommended-for-production-ec2">Using vLLM (recommended for production / EC2)</a></li>