This commit is contained in:
adilhafeez 2026-04-15 23:42:15 +00:00
parent 0dd2552f91
commit 07b84a0d42
35 changed files with 105 additions and 105 deletions

View file

@ -34,7 +34,7 @@ model_providers:
# routing_preferences: tags a model with named capabilities so Plano's LLM router
# can select the best model for each request based on intent. Requires the
# Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
# Each preference has a name (short label) and a description (used for intent matching).
- model: groq/llama-3.3-70b-versatile
access_key: $GROQ_API_KEY
@ -170,7 +170,7 @@ overrides:
# Path to the trusted CA bundle for upstream TLS verification
upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
# Model used for intent-based LLM routing (must be listed in model_providers)
llm_routing_model: Arch-Router
llm_routing_model: Plano-Orchestrator
# Model used for agent orchestration (must be listed in model_providers)
agent_orchestration_model: Plano-Orchestrator

View file

@ -267,7 +267,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -333,7 +333,7 @@ powerful abstraction for evolving your agent workflows over time.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -270,7 +270,7 @@ application to LLMs (API-based or hosted) via prompt targets.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -660,7 +660,7 @@ Implement fallback logic for better reliability:</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -185,7 +185,7 @@ Three powerful routing approaches to optimize model selection:</p>
<ul class="simple">
<li><p>Model-based Routing: Direct routing to specific models using provider/model names (see <a class="reference internal" href="supported_providers.html#supported-providers"><span class="std std-ref">Supported Providers &amp; Configuration</span></a>)</p></li>
<li><p>Alias-based Routing: Semantic routing using custom aliases (see <a class="reference internal" href="model_aliases.html#model-aliases"><span class="std std-ref">Model Aliases</span></a>)</p></li>
<li><p>Preference-aligned Routing: Intelligent routing using the Plano-Router model (see <a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Arch-Router)</span></a>)</p></li>
<li><p>Preference-aligned Routing: Intelligent routing using the Plano-Router model (see <a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Plano-Orchestrator)</span></a>)</p></li>
</ul>
<p><strong>Unified Client Interface</strong>
Use your preferred client library without changing existing code (see <a class="reference internal" href="client_libraries.html#client-libraries"><span class="std std-ref">Client Libraries</span></a> for details):</p>
@ -225,7 +225,7 @@ Use your preferred client library without changing existing code (see <a class="
<section id="advanced-features">
<h2>Advanced Features<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#advanced-features" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#advanced-features'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<ul class="simple">
<li><p><a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Arch-Router)</span></a> - Learn about preference-aligned dynamic routing and intelligent model selection</p></li>
<li><p><a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Plano-Orchestrator)</span></a> - Learn about preference-aligned dynamic routing and intelligent model selection</p></li>
</ul>
</section>
<section id="getting-started">
@ -304,7 +304,7 @@ Use your preferred client library without changing existing code (see <a class="
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -434,7 +434,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -1190,7 +1190,7 @@ Any provider that implements the OpenAI API interface can be configured using cu
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -473,7 +473,7 @@ that you can test and modify locally for multi-turn RAG scenarios.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -540,7 +540,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -226,7 +226,7 @@ This gives Plano several advantages:</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -337,7 +337,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -521,7 +521,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -372,7 +372,7 @@ on the stuff that matters most.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -180,8 +180,8 @@
<section id="configuration">
<h4>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configuration"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>Configure your LLM providers with specific provider/model names:</p>
<div class="literal-block-wrapper docutils container" id="id10">
<div class="code-block-caption"><span class="caption-text">Model-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id10"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="literal-block-wrapper docutils container" id="id9">
<div class="code-block-caption"><span class="caption-text">Model-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id9"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">listeners</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">egress_traffic</span><span class="p">:</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
@ -231,8 +231,8 @@
<section id="id3">
<h4>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id3"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>Configure semantic aliases that map to underlying models:</p>
<div class="literal-block-wrapper docutils container" id="id11">
<div class="code-block-caption"><span class="caption-text">Alias-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id11"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="literal-block-wrapper docutils container" id="id10">
<div class="code-block-caption"><span class="caption-text">Alias-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id10"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">listeners</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">egress_traffic</span><span class="p">:</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
@ -281,20 +281,20 @@
</div>
</section>
</section>
<section id="preference-aligned-routing-arch-router">
<span id="preference-aligned-routing"></span><h3>Preference-aligned routing (Arch-Router)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#preference-aligned-routing-arch-router" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#preference-aligned-routing-arch-router'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>Preference-aligned routing uses the <a class="reference external" href="https://huggingface.co/katanemo/Arch-Router-1.5B" rel="nofollow noopener">Arch-Router<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.</p>
<section id="preference-aligned-routing-plano-orchestrator">
<span id="preference-aligned-routing"></span><h3>Preference-aligned routing (Plano-Orchestrator)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#preference-aligned-routing-plano-orchestrator" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#preference-aligned-routing-plano-orchestrator'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>Preference-aligned routing uses the <a class="reference external" href="https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B" rel="nofollow noopener">Plano-Orchestrator<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.</p>
<ul class="simple">
<li><p><strong>Domain</strong>: High-level topic of the request (e.g., legal, healthcare, programming).</p></li>
<li><p><strong>Action</strong>: What the user wants to do (e.g., summarize, generate code, translate).</p></li>
<li><p><strong>Routing preferences</strong>: Your mapping from (domain, action) to preferred models.</p></li>
</ul>
<p>Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples <strong>routing policy</strong> (how to choose) from <strong>model assignment</strong> (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.</p>
<p>Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples <strong>routing policy</strong> (how to choose) from <strong>model assignment</strong> (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.</p>
<section id="id5">
<h4>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id5"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:</p>
<div class="literal-block-wrapper docutils container" id="id12">
<div class="code-block-caption"><span class="caption-text">Preference-Aligned Dynamic Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id12"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="literal-block-wrapper docutils container" id="id11">
<div class="code-block-caption"><span class="caption-text">Preference-Aligned Dynamic Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id11"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">listeners</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">egress_traffic</span><span class="p">:</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
@ -329,7 +329,7 @@
<section id="id6">
<h4>Client usage<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id6"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
<p>Clients can let the router decide or still specify aliases:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="c1"># Let Arch-Router choose based on content</span>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="c1"># Let Plano-Orchestrator choose based on content</span>
</span><span id="line-2"><span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">chat</span><span class="o">.</span><span class="n">completions</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span><span id="line-3"> <span class="n">messages</span><span class="o">=</span><span class="p">[{</span><span class="s2">"role"</span><span class="p">:</span> <span class="s2">"user"</span><span class="p">,</span> <span class="s2">"content"</span><span class="p">:</span> <span class="s2">"Write a creative story about space exploration"</span><span class="p">}]</span>
</span><span id="line-4"> <span class="c1"># No model specified - router will analyze and choose claude-sonnet-4-5</span>
@ -340,22 +340,22 @@
</section>
</section>
<section id="id7">
<h2>Arch-Router<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id7" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#id7'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>The <a class="reference external" href="https://huggingface.co/katanemo/Arch-Router-1.5B" rel="nofollow noopener">Arch-Router<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> is a state-of-the-art <strong>preference-based routing model</strong> specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.</p>
<h2>Plano-Orchestrator<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id7" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#id7'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>Plano-Orchestrator is a <strong>preference-based routing model</strong> specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.</p>
<p><strong>Addressing Traditional Routing Limitations:</strong></p>
<p><strong>Human Preference Alignment</strong>
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.</p>
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.</p>
<p><strong>Flexible Model Integration</strong>
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.</p>
<p><strong>Preference-Encoded Routing</strong>
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.</p>
<p>To support effective routing, Arch-Router introduces two key concepts:</p>
<p>To support effective routing, Plano-Orchestrator introduces two key concepts:</p>
<ul class="simple">
<li><p><strong>Domain</strong> the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).</p></li>
<li><p><strong>Action</strong> the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).</p></li>
</ul>
<p>Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.</p>
<p>In summary, Arch-Router demonstrates:</p>
<p>Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.</p>
<p>In summary, Plano-Orchestrator demonstrates:</p>
<ul class="simple">
<li><p><strong>Structured Preference Routing</strong>: Aligns prompt request with model strengths using explicit domainaction mappings.</p></li>
<li><p><strong>Transparent and Controllable</strong>: Makes routing decisions transparent and configurable, empowering users to customize system behavior.</p></li>
@ -363,23 +363,23 @@ Provides a practical mechanism to encode user preferences through domain-action
<li><p><strong>Production-Ready Performance</strong>: Optimized for low-latency, high-throughput applications in multi-model environments.</p></li>
</ul>
</section>
<section id="self-hosting-arch-router">
<h2>Self-hosting Arch-Router<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#self-hosting-arch-router" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#self-hosting-arch-router'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either <strong>Ollama</strong> or <strong>vLLM</strong>.</p>
<section id="self-hosting-plano-orchestrator">
<h2>Self-hosting Plano-Orchestrator<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#self-hosting-plano-orchestrator" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#self-hosting-plano-orchestrator'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either <strong>Ollama</strong> or <strong>vLLM</strong>.</p>
<section id="using-ollama-recommended-for-local-development">
<h3>Using Ollama (recommended for local development)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#using-ollama-recommended-for-local-development" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#using-ollama-recommended-for-local-development'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<ol class="arabic">
<li><p><strong>Install Ollama</strong></p>
<p>Download and install from <a class="reference external" href="https://ollama.ai" rel="nofollow noopener">ollama.ai<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>.</p>
</li>
<li><p><strong>Pull and serve Arch-Router</strong></p>
<li><p><strong>Pull and serve the routing model</strong></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><code><span id="line-1">ollama<span class="w"> </span>pull<span class="w"> </span>hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
</span><span id="line-2">ollama<span class="w"> </span>serve
</span></code></pre></div>
</div>
<p>This downloads the quantized GGUF model from HuggingFace and starts serving on <code class="docutils literal notranslate"><span class="pre">http://localhost:11434</span></code>.</p>
</li>
<li><p><strong>Configure Plano to use local Arch-Router</strong></p>
<li><p><strong>Configure Plano to use local routing model</strong></p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">overrides</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M</span>
</span><span id="line-3">
@ -434,7 +434,7 @@ Provides a practical mechanism to encode user preferences through domain-action
</span><span id="line-7"><span class="w"> </span>--load-format<span class="w"> </span>gguf<span class="w"> </span><span class="se">\</span>
</span><span id="line-8"><span class="w"> </span>--chat-template<span class="w"> </span><span class="si">${</span><span class="nv">SNAPSHOT_DIR</span><span class="si">}</span>template.jinja<span class="w"> </span><span class="se">\</span>
</span><span id="line-9"><span class="w"> </span>--tokenizer<span class="w"> </span>katanemo/Arch-Router-1.5B<span class="w"> </span><span class="se">\</span>
</span><span id="line-10"><span class="w"> </span>--served-model-name<span class="w"> </span>Arch-Router<span class="w"> </span><span class="se">\</span>
</span><span id="line-10"><span class="w"> </span>--served-model-name<span class="w"> </span>Plano-Orchestrator<span class="w"> </span><span class="se">\</span>
</span><span id="line-11"><span class="w"> </span>--gpu-memory-utilization<span class="w"> </span><span class="m">0</span>.3<span class="w"> </span><span class="se">\</span>
</span><span id="line-12"><span class="w"> </span>--tensor-parallel-size<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
</span><span id="line-13"><span class="w"> </span>--enable-prefix-caching
@ -443,10 +443,10 @@ Provides a practical mechanism to encode user preferences through domain-action
</li>
<li><p><strong>Configure Plano to use the vLLM endpoint</strong></p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">overrides</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Arch-Router</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Plano-Orchestrator</span>
</span><span id="line-3">
</span><span id="line-4"><span class="nt">model_providers</span><span class="p">:</span>
</span><span id="line-5"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Arch-Router</span>
</span><span id="line-5"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Plano-Orchestrator</span>
</span><span id="line-6"><span class="w"> </span><span class="nt">base_url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://&lt;your-server-ip&gt;:10000</span>
</span><span id="line-7">
</span><span id="line-8"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-5.2</span>
@ -471,14 +471,14 @@ Provides a practical mechanism to encode user preferences through domain-action
</section>
<section id="using-vllm-on-kubernetes-gpu-nodes">
<h3>Using vLLM on Kubernetes (GPU nodes)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#using-vllm-on-kubernetes-gpu-nodes" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#using-vllm-on-kubernetes-gpu-nodes'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
<p>For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
The <code class="docutils literal notranslate"><span class="pre">demos/llm_routing/model_routing_service/</span></code> directory includes ready-to-use manifests:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">vllm-deployment.yaml</span></code>Arch-Router served by vLLM, with an init container to download
<li><p><code class="docutils literal notranslate"><span class="pre">vllm-deployment.yaml</span></code>Plano-Orchestrator served by vLLM, with an init container to download
the model from HuggingFace</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">plano-deployment.yaml</span></code> — Plano proxy configured to use the in-cluster Arch-Router</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">plano-deployment.yaml</span></code> — Plano proxy configured to use the in-cluster Plano-Orchestrator</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">config_k8s.yaml</span></code> — Plano config with <code class="docutils literal notranslate"><span class="pre">llm_routing_model</span></code> pointing at
<code class="docutils literal notranslate"><span class="pre">http://arch-router:10000</span></code> instead of the default hosted endpoint</p></li>
<code class="docutils literal notranslate"><span class="pre">http://plano-orchestrator:10000</span></code> instead of the default hosted endpoint</p></li>
</ul>
<p>Key things to know before deploying:</p>
<ul class="simple">
@ -498,7 +498,7 @@ instead of a file.</p></li>
</section>
</section>
<section id="model-affinity">
<span id="id9"></span><h2>Model Affinity<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#model-affinity" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#model-affinity'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<span id="id8"></span><h2>Model Affinity<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#model-affinity" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#model-affinity'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>In agentic loops — where a single user request triggers multiple LLM calls through tool use — Planos router classifies each turn independently. Because successive prompts differ in intent (tool selection looks like code generation, reasoning about results looks like analysis), the router may select different models mid-session. This causes behavioral inconsistency and invalidates provider-side KV caches, increasing both latency and cost.</p>
<p><strong>Model affinity</strong> pins the routing decision for the duration of a session. Send an <code class="docutils literal notranslate"><span class="pre">X-Model-Affinity</span></code> header with any string identifier (typically a UUID). The first request routes normally and caches the result. All subsequent requests with the same affinity ID skip routing and reuse the cached model.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="kn">import</span><span class="w"> </span><span class="nn">uuid</span>
@ -563,8 +563,8 @@ instead of a file.</p></li>
<section id="combining-routing-methods">
<h2>Combining Routing Methods<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#combining-routing-methods" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#combining-routing-methods'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>You can combine static model selection with dynamic routing preferences for maximum flexibility:</p>
<div class="literal-block-wrapper docutils container" id="id13">
<div class="code-block-caption"><span class="caption-text">Hybrid Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id13"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="literal-block-wrapper docutils container" id="id12">
<div class="code-block-caption"><span class="caption-text">Hybrid Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id12"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">llm_providers</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-5.2</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">access_key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">$OPENAI_API_KEY</span>
@ -604,7 +604,7 @@ instead of a file.</p></li>
</section>
<section id="example-use-cases">
<h2>Example Use Cases<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#example-use-cases" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#example-use-cases'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>Here are common scenarios where Arch-Router excels:</p>
<p>Here are common scenarios where Plano-Orchestrator excels:</p>
<ul class="simple">
<li><p><strong>Coding Tasks</strong>: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.</p></li>
<li><p><strong>Content Processing Workflows</strong>: Classify requests as summarization (“summarize this document”), translation (“translate to Spanish”), or analysis (“what are the key themes”), enabling targeted model selection.</p></li>
@ -645,11 +645,11 @@ instead of a file.</p></li>
</section>
<section id="unsupported-features">
<h2>Unsupported Features<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#unsupported-features" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#unsupported-features'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>The following features are <strong>not supported</strong> by the Arch-Router model:</p>
<p>The following features are <strong>not supported</strong> by the Plano-Orchestrator routing model:</p>
<ul class="simple">
<li><p><strong>Multi-modality</strong>: The model is not trained to process raw image or audio inputs. It can handle textual queries <em>about</em> these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.</p></li>
<li><p><strong>Function calling</strong>: Arch-Router is designed for <strong>semantic preference matching</strong>, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.</p></li>
<li><p><strong>System prompt dependency</strong>: Arch-Router routes based solely on the users conversation history. It does not use or rely on system prompts for routing decisions.</p></li>
<li><p><strong>Function calling</strong>: Plano-Orchestrator is designed for <strong>semantic preference matching</strong>, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.</p></li>
<li><p><strong>System prompt dependency</strong>: Plano-Orchestrator routes based solely on the users conversation history. It does not use or rely on system prompts for routing decisions.</p></li>
</ul>
</section>
</section>
@ -684,15 +684,15 @@ instead of a file.</p></li>
<li><a :data-current="activeSection === '#id4'" class="reference internal" href="#id4">Client usage</a></li>
</ul>
</li>
<li><a :data-current="activeSection === '#preference-aligned-routing-arch-router'" class="reference internal" href="#preference-aligned-routing-arch-router">Preference-aligned routing (Arch-Router)</a><ul>
<li><a :data-current="activeSection === '#preference-aligned-routing-plano-orchestrator'" class="reference internal" href="#preference-aligned-routing-plano-orchestrator">Preference-aligned routing (Plano-Orchestrator)</a><ul>
<li><a :data-current="activeSection === '#id5'" class="reference internal" href="#id5">Configuration</a></li>
<li><a :data-current="activeSection === '#id6'" class="reference internal" href="#id6">Client usage</a></li>
</ul>
</li>
</ul>
</li>
<li><a :data-current="activeSection === '#id7'" class="reference internal" href="#id7">Arch-Router</a></li>
<li><a :data-current="activeSection === '#self-hosting-arch-router'" class="reference internal" href="#self-hosting-arch-router">Self-hosting Arch-Router</a><ul>
<li><a :data-current="activeSection === '#id7'" class="reference internal" href="#id7">Plano-Orchestrator</a></li>
<li><a :data-current="activeSection === '#self-hosting-plano-orchestrator'" class="reference internal" href="#self-hosting-plano-orchestrator">Self-hosting Plano-Orchestrator</a><ul>
<li><a :data-current="activeSection === '#using-ollama-recommended-for-local-development'" class="reference internal" href="#using-ollama-recommended-for-local-development">Using Ollama (recommended for local development)</a></li>
<li><a :data-current="activeSection === '#using-vllm-recommended-for-production-ec2'" class="reference internal" href="#using-vllm-recommended-for-production-ec2">Using vLLM (recommended for production / EC2)</a></li>
<li><a :data-current="activeSection === '#using-vllm-on-kubernetes-gpu-nodes'" class="reference internal" href="#using-vllm-on-kubernetes-gpu-nodes">Using vLLM on Kubernetes (GPU nodes)</a></li>
@ -714,7 +714,7 @@ instead of a file.</p></li>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -248,7 +248,7 @@ Access logs can be exported to centralized logging systems (e.g., ELK stack or F
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -260,7 +260,7 @@ are some sample configuration files for both, respectively.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -216,7 +216,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -792,7 +792,7 @@ tools like AWS X-Ray and Datadog, enhancing observability and facilitating faste
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -1003,7 +1003,7 @@ Plano makes it easy to build and scale these systems by managing the orchestrati
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -298,7 +298,7 @@ the agent. If validation fails (<code class="docutils literal notranslate"><span
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -453,7 +453,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.18
llms.txt (auto-generated)
Generated (UTC): 2026-04-14T02:31:14.825020+00:00
Generated (UTC): 2026-04-15T23:42:11.682797+00:00
Table of contents
- Agents (concepts/agents)
@ -3760,9 +3760,9 @@ response = client.chat.completions.create(
Preference-aligned routing (Arch-Router)
Preference-aligned routing (Plano-Orchestrator)
Preference-aligned routing uses the Arch-Router model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
Preference-aligned routing uses the Plano-Orchestrator model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
Domain: High-level topic of the request (e.g., legal, healthcare, programming).
@ -3770,7 +3770,7 @@ Action: What the user wants to do (e.g., summarize, generate code, translate).
Routing preferences: Your mapping from (domain, action) to preferred models.
Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
Configuration
@ -3810,20 +3810,20 @@ Client usage
Clients can let the router decide or still specify aliases:
# Let Arch-Router choose based on content
# Let Plano-Orchestrator choose based on content
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
# No model specified - router will analyze and choose claude-sonnet-4-5
)
Arch-Router
Plano-Orchestrator
The Arch-Router is a state-of-the-art preference-based routing model specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
Addressing Traditional Routing Limitations:
Human Preference Alignment
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
Flexible Model Integration
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
@ -3831,15 +3831,15 @@ The system supports seamlessly adding new models for routing without requiring r
Preference-Encoded Routing
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
To support effective routing, Arch-Router introduces two key concepts:
To support effective routing, Plano-Orchestrator introduces two key concepts:
Domain the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
Action the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
In summary, Arch-Router demonstrates:
In summary, Plano-Orchestrator demonstrates:
Structured Preference Routing: Aligns prompt request with model strengths using explicit domainaction mappings.
@ -3849,9 +3849,9 @@ Flexible and Adaptive: Supports evolving user needs, model updates, and new doma
Production-Ready Performance: Optimized for low-latency, high-throughput applications in multi-model environments.
Self-hosting Arch-Router
Self-hosting Plano-Orchestrator
By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either Ollama or vLLM.
By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either Ollama or vLLM.
Using Ollama (recommended for local development)
@ -3859,14 +3859,14 @@ Install Ollama
Download and install from ollama.ai.
Pull and serve Arch-Router
Pull and serve the routing model
ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
ollama serve
This downloads the quantized GGUF model from HuggingFace and starts serving on http://localhost:11434.
Configure Plano to use local Arch-Router
Configure Plano to use local routing model
overrides:
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
@ -3919,7 +3919,7 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
--load-format gguf \
--chat-template ${SNAPSHOT_DIR}template.jinja \
--tokenizer katanemo/Arch-Router-1.5B \
--served-model-name Arch-Router \
--served-model-name Plano-Orchestrator \
--gpu-memory-utilization 0.3 \
--tensor-parallel-size 1 \
--enable-prefix-caching
@ -3927,10 +3927,10 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
Configure Plano to use the vLLM endpoint
overrides:
llm_routing_model: plano/Arch-Router
llm_routing_model: plano/Plano-Orchestrator
model_providers:
- model: plano/Arch-Router
- model: plano/Plano-Orchestrator
base_url: http://<your-server-ip>:10000
- model: openai/gpt-5.2
@ -3950,16 +3950,16 @@ curl http://localhost:10000/v1/models
Using vLLM on Kubernetes (GPU nodes)
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
vllm-deployment.yaml — Plano-Orchestrator served by vLLM, with an init container to download
the model from HuggingFace
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
plano-deployment.yaml — Plano proxy configured to use the in-cluster Plano-Orchestrator
config_k8s.yaml — Plano config with llm_routing_model pointing at
http://arch-router:10000 instead of the default hosted endpoint
http://plano-orchestrator:10000 instead of the default hosted endpoint
Key things to know before deploying:
@ -4092,7 +4092,7 @@ Let the router decide: No model specified, router analyzes content
Example Use Cases
Here are common scenarios where Arch-Router excels:
Here are common scenarios where Plano-Orchestrator excels:
Coding Tasks: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.
@ -4134,13 +4134,13 @@ Best practices
Unsupported Features
The following features are not supported by the Arch-Router model:
The following features are not supported by the Plano-Orchestrator routing model:
Multi-modality: The model is not trained to process raw image or audio inputs. It can handle textual queries about these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.
Function calling: Arch-Router is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
Function calling: Plano-Orchestrator is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
System prompt dependency: Arch-Router routes based solely on the users conversation history. It does not use or rely on system prompts for routing decisions.
System prompt dependency: Plano-Orchestrator routes based solely on the users conversation history. It does not use or rely on system prompts for routing decisions.
---
@ -6455,7 +6455,7 @@ model_providers:
# routing_preferences: tags a model with named capabilities so Plano's LLM router
# can select the best model for each request based on intent. Requires the
# Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
# Each preference has a name (short label) and a description (used for intent matching).
- model: groq/llama-3.3-70b-versatile
access_key: $GROQ_API_KEY
@ -6591,7 +6591,7 @@ overrides:
# Path to the trusted CA bundle for upstream TLS verification
upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
# Model used for intent-based LLM routing (must be listed in model_providers)
llm_routing_model: Arch-Router
llm_routing_model: Plano-Orchestrator
# Model used for agent orchestration (must be listed in model_providers)
agent_orchestration_model: Plano-Orchestrator

View file

@ -247,7 +247,7 @@ Resources</label><div class="sd-tab-content docutils">
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

Binary file not shown.

View file

@ -437,7 +437,7 @@ Use this page as the canonical source for command syntax, options, and recommend
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -203,7 +203,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
</span><span id="line-34"><span class="linenos"> 34</span>
</span><span id="line-35"><span class="linenos"> 35</span><span class="w"> </span><span class="c1"># routing_preferences: tags a model with named capabilities so Plano's LLM router</span>
</span><span id="line-36"><span class="linenos"> 36</span><span class="w"> </span><span class="c1"># can select the best model for each request based on intent. Requires the</span>
</span><span id="line-37"><span class="linenos"> 37</span><span class="w"> </span><span class="c1"># Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.</span>
</span><span id="line-37"><span class="linenos"> 37</span><span class="w"> </span><span class="c1"># Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.</span>
</span><span id="line-38"><span class="linenos"> 38</span><span class="w"> </span><span class="c1"># Each preference has a name (short label) and a description (used for intent matching).</span>
</span><span id="line-39"><span class="linenos"> 39</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">groq/llama-3.3-70b-versatile</span>
</span><span id="line-40"><span class="linenos"> 40</span><span class="w"> </span><span class="nt">access_key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">$GROQ_API_KEY</span>
@ -339,7 +339,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
</span><span id="line-170"><span class="linenos">170</span><span class="w"> </span><span class="c1"># Path to the trusted CA bundle for upstream TLS verification</span>
</span><span id="line-171"><span class="linenos">171</span><span class="w"> </span><span class="nt">upstream_tls_ca_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/etc/ssl/certs/ca-certificates.crt</span>
</span><span id="line-172"><span class="linenos">172</span><span class="w"> </span><span class="c1"># Model used for intent-based LLM routing (must be listed in model_providers)</span>
</span><span id="line-173"><span class="linenos">173</span><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Arch-Router</span>
</span><span id="line-173"><span class="linenos">173</span><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
</span><span id="line-174"><span class="linenos">174</span><span class="w"> </span><span class="c1"># Model used for agent orchestration (must be listed in model_providers)</span>
</span><span id="line-175"><span class="linenos">175</span><span class="w"> </span><span class="nt">agent_orchestration_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
</span><span id="line-176"><span class="linenos">176</span>
@ -414,7 +414,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -542,7 +542,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -179,7 +179,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -199,7 +199,7 @@ own deployments), and Plano reaches them via HTTP.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -485,7 +485,7 @@ processing request headers and then finalized by the HCM during post-request pro
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -200,7 +200,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -200,7 +200,7 @@ hardware threads on the machine.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
</div>
</div>
</footer>

View file

@ -221,7 +221,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company&nbsp;Last updated: Apr 14, 2026.&nbsp;</p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company&nbsp;Last updated: Apr 15, 2026.&nbsp;</p>
</div>
</div>
</footer>

File diff suppressed because one or more lines are too long