mirror of
https://github.com/katanemo/plano.git
synced 2026-06-08 14:55:14 +02:00
deploy: 90b926c2ce
This commit is contained in:
parent
0dd2552f91
commit
07b84a0d42
35 changed files with 105 additions and 105 deletions
|
|
@ -34,7 +34,7 @@ model_providers:
|
|||
|
||||
# routing_preferences: tags a model with named capabilities so Plano's LLM router
|
||||
# can select the best model for each request based on intent. Requires the
|
||||
# Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Each preference has a name (short label) and a description (used for intent matching).
|
||||
- model: groq/llama-3.3-70b-versatile
|
||||
access_key: $GROQ_API_KEY
|
||||
|
|
@ -170,7 +170,7 @@ overrides:
|
|||
# Path to the trusted CA bundle for upstream TLS verification
|
||||
upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
|
||||
# Model used for intent-based LLM routing (must be listed in model_providers)
|
||||
llm_routing_model: Arch-Router
|
||||
llm_routing_model: Plano-Orchestrator
|
||||
# Model used for agent orchestration (must be listed in model_providers)
|
||||
agent_orchestration_model: Plano-Orchestrator
|
||||
|
||||
|
|
|
|||
|
|
@ -267,7 +267,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -333,7 +333,7 @@ powerful abstraction for evolving your agent workflows over time.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -270,7 +270,7 @@ application to LLMs (API-based or hosted) via prompt targets.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -660,7 +660,7 @@ Implement fallback logic for better reliability:</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -185,7 +185,7 @@ Three powerful routing approaches to optimize model selection:</p>
|
|||
<ul class="simple">
|
||||
<li><p>Model-based Routing: Direct routing to specific models using provider/model names (see <a class="reference internal" href="supported_providers.html#supported-providers"><span class="std std-ref">Supported Providers & Configuration</span></a>)</p></li>
|
||||
<li><p>Alias-based Routing: Semantic routing using custom aliases (see <a class="reference internal" href="model_aliases.html#model-aliases"><span class="std std-ref">Model Aliases</span></a>)</p></li>
|
||||
<li><p>Preference-aligned Routing: Intelligent routing using the Plano-Router model (see <a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Arch-Router)</span></a>)</p></li>
|
||||
<li><p>Preference-aligned Routing: Intelligent routing using the Plano-Router model (see <a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Plano-Orchestrator)</span></a>)</p></li>
|
||||
</ul>
|
||||
<p><strong>Unified Client Interface</strong>
|
||||
Use your preferred client library without changing existing code (see <a class="reference internal" href="client_libraries.html#client-libraries"><span class="std std-ref">Client Libraries</span></a> for details):</p>
|
||||
|
|
@ -225,7 +225,7 @@ Use your preferred client library without changing existing code (see <a class="
|
|||
<section id="advanced-features">
|
||||
<h2>Advanced Features<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#advanced-features" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#advanced-features'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<ul class="simple">
|
||||
<li><p><a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Arch-Router)</span></a> - Learn about preference-aligned dynamic routing and intelligent model selection</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/llm_router.html#preference-aligned-routing"><span class="std std-ref">Preference-aligned routing (Plano-Orchestrator)</span></a> - Learn about preference-aligned dynamic routing and intelligent model selection</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="getting-started">
|
||||
|
|
@ -304,7 +304,7 @@ Use your preferred client library without changing existing code (see <a class="
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -434,7 +434,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -1190,7 +1190,7 @@ Any provider that implements the OpenAI API interface can be configured using cu
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -473,7 +473,7 @@ that you can test and modify locally for multi-turn RAG scenarios.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -540,7 +540,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -226,7 +226,7 @@ This gives Plano several advantages:</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -337,7 +337,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -521,7 +521,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -372,7 +372,7 @@ on the stuff that matters most.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -180,8 +180,8 @@
|
|||
<section id="configuration">
|
||||
<h4>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configuration"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
|
||||
<p>Configure your LLM providers with specific provider/model names:</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id10">
|
||||
<div class="code-block-caption"><span class="caption-text">Model-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id10"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="literal-block-wrapper docutils container" id="id9">
|
||||
<div class="code-block-caption"><span class="caption-text">Model-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id9"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">listeners</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="nt">egress_traffic</span><span class="p">:</span>
|
||||
</span><span id="line-3"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
|
||||
|
|
@ -231,8 +231,8 @@
|
|||
<section id="id3">
|
||||
<h4>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id3"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
|
||||
<p>Configure semantic aliases that map to underlying models:</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id11">
|
||||
<div class="code-block-caption"><span class="caption-text">Alias-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id11"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="literal-block-wrapper docutils container" id="id10">
|
||||
<div class="code-block-caption"><span class="caption-text">Alias-based Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id10"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">listeners</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="nt">egress_traffic</span><span class="p">:</span>
|
||||
</span><span id="line-3"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
|
||||
|
|
@ -281,20 +281,20 @@
|
|||
</div>
|
||||
</section>
|
||||
</section>
|
||||
<section id="preference-aligned-routing-arch-router">
|
||||
<span id="preference-aligned-routing"></span><h3>Preference-aligned routing (Arch-Router)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#preference-aligned-routing-arch-router" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#preference-aligned-routing-arch-router'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>Preference-aligned routing uses the <a class="reference external" href="https://huggingface.co/katanemo/Arch-Router-1.5B" rel="nofollow noopener">Arch-Router<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.</p>
|
||||
<section id="preference-aligned-routing-plano-orchestrator">
|
||||
<span id="preference-aligned-routing"></span><h3>Preference-aligned routing (Plano-Orchestrator)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#preference-aligned-routing-plano-orchestrator" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#preference-aligned-routing-plano-orchestrator'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>Preference-aligned routing uses the <a class="reference external" href="https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B" rel="nofollow noopener">Plano-Orchestrator<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Domain</strong>: High-level topic of the request (e.g., legal, healthcare, programming).</p></li>
|
||||
<li><p><strong>Action</strong>: What the user wants to do (e.g., summarize, generate code, translate).</p></li>
|
||||
<li><p><strong>Routing preferences</strong>: Your mapping from (domain, action) to preferred models.</p></li>
|
||||
</ul>
|
||||
<p>Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples <strong>routing policy</strong> (how to choose) from <strong>model assignment</strong> (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.</p>
|
||||
<p>Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples <strong>routing policy</strong> (how to choose) from <strong>model assignment</strong> (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.</p>
|
||||
<section id="id5">
|
||||
<h4>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id5"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
|
||||
<p>To configure preference-aligned dynamic routing, define routing preferences that map domains and actions to specific models:</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id12">
|
||||
<div class="code-block-caption"><span class="caption-text">Preference-Aligned Dynamic Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id12"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="literal-block-wrapper docutils container" id="id11">
|
||||
<div class="code-block-caption"><span class="caption-text">Preference-Aligned Dynamic Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id11"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">listeners</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="nt">egress_traffic</span><span class="p">:</span>
|
||||
</span><span id="line-3"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
|
||||
|
|
@ -329,7 +329,7 @@
|
|||
<section id="id6">
|
||||
<h4>Client usage<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id6"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h4>
|
||||
<p>Clients can let the router decide or still specify aliases:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="c1"># Let Arch-Router choose based on content</span>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="c1"># Let Plano-Orchestrator choose based on content</span>
|
||||
</span><span id="line-2"><span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">chat</span><span class="o">.</span><span class="n">completions</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
|
||||
</span><span id="line-3"> <span class="n">messages</span><span class="o">=</span><span class="p">[{</span><span class="s2">"role"</span><span class="p">:</span> <span class="s2">"user"</span><span class="p">,</span> <span class="s2">"content"</span><span class="p">:</span> <span class="s2">"Write a creative story about space exploration"</span><span class="p">}]</span>
|
||||
</span><span id="line-4"> <span class="c1"># No model specified - router will analyze and choose claude-sonnet-4-5</span>
|
||||
|
|
@ -340,22 +340,22 @@
|
|||
</section>
|
||||
</section>
|
||||
<section id="id7">
|
||||
<h2>Arch-Router<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id7" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#id7'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>The <a class="reference external" href="https://huggingface.co/katanemo/Arch-Router-1.5B" rel="nofollow noopener">Arch-Router<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> is a state-of-the-art <strong>preference-based routing model</strong> specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.</p>
|
||||
<h2>Plano-Orchestrator<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id7" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#id7'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Plano-Orchestrator is a <strong>preference-based routing model</strong> specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.</p>
|
||||
<p><strong>Addressing Traditional Routing Limitations:</strong></p>
|
||||
<p><strong>Human Preference Alignment</strong>
|
||||
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.</p>
|
||||
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.</p>
|
||||
<p><strong>Flexible Model Integration</strong>
|
||||
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.</p>
|
||||
<p><strong>Preference-Encoded Routing</strong>
|
||||
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.</p>
|
||||
<p>To support effective routing, Arch-Router introduces two key concepts:</p>
|
||||
<p>To support effective routing, Plano-Orchestrator introduces two key concepts:</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Domain</strong> – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).</p></li>
|
||||
<li><p><strong>Action</strong> – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).</p></li>
|
||||
</ul>
|
||||
<p>Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.</p>
|
||||
<p>In summary, Arch-Router demonstrates:</p>
|
||||
<p>Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.</p>
|
||||
<p>In summary, Plano-Orchestrator demonstrates:</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Structured Preference Routing</strong>: Aligns prompt request with model strengths using explicit domain–action mappings.</p></li>
|
||||
<li><p><strong>Transparent and Controllable</strong>: Makes routing decisions transparent and configurable, empowering users to customize system behavior.</p></li>
|
||||
|
|
@ -363,23 +363,23 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
<li><p><strong>Production-Ready Performance</strong>: Optimized for low-latency, high-throughput applications in multi-model environments.</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="self-hosting-arch-router">
|
||||
<h2>Self-hosting Arch-Router<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#self-hosting-arch-router" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#self-hosting-arch-router'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either <strong>Ollama</strong> or <strong>vLLM</strong>.</p>
|
||||
<section id="self-hosting-plano-orchestrator">
|
||||
<h2>Self-hosting Plano-Orchestrator<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#self-hosting-plano-orchestrator" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#self-hosting-plano-orchestrator'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either <strong>Ollama</strong> or <strong>vLLM</strong>.</p>
|
||||
<section id="using-ollama-recommended-for-local-development">
|
||||
<h3>Using Ollama (recommended for local development)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#using-ollama-recommended-for-local-development" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#using-ollama-recommended-for-local-development'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<ol class="arabic">
|
||||
<li><p><strong>Install Ollama</strong></p>
|
||||
<p>Download and install from <a class="reference external" href="https://ollama.ai" rel="nofollow noopener">ollama.ai<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>.</p>
|
||||
</li>
|
||||
<li><p><strong>Pull and serve Arch-Router</strong></p>
|
||||
<li><p><strong>Pull and serve the routing model</strong></p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><code><span id="line-1">ollama<span class="w"> </span>pull<span class="w"> </span>hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
</span><span id="line-2">ollama<span class="w"> </span>serve
|
||||
</span></code></pre></div>
|
||||
</div>
|
||||
<p>This downloads the quantized GGUF model from HuggingFace and starts serving on <code class="docutils literal notranslate"><span class="pre">http://localhost:11434</span></code>.</p>
|
||||
</li>
|
||||
<li><p><strong>Configure Plano to use local Arch-Router</strong></p>
|
||||
<li><p><strong>Configure Plano to use local routing model</strong></p>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">overrides</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M</span>
|
||||
</span><span id="line-3">
|
||||
|
|
@ -434,7 +434,7 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
</span><span id="line-7"><span class="w"> </span>--load-format<span class="w"> </span>gguf<span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-8"><span class="w"> </span>--chat-template<span class="w"> </span><span class="si">${</span><span class="nv">SNAPSHOT_DIR</span><span class="si">}</span>template.jinja<span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-9"><span class="w"> </span>--tokenizer<span class="w"> </span>katanemo/Arch-Router-1.5B<span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-10"><span class="w"> </span>--served-model-name<span class="w"> </span>Arch-Router<span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-10"><span class="w"> </span>--served-model-name<span class="w"> </span>Plano-Orchestrator<span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-11"><span class="w"> </span>--gpu-memory-utilization<span class="w"> </span><span class="m">0</span>.3<span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-12"><span class="w"> </span>--tensor-parallel-size<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
|
||||
</span><span id="line-13"><span class="w"> </span>--enable-prefix-caching
|
||||
|
|
@ -443,10 +443,10 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
</li>
|
||||
<li><p><strong>Configure Plano to use the vLLM endpoint</strong></p>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">overrides</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Arch-Router</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Plano-Orchestrator</span>
|
||||
</span><span id="line-3">
|
||||
</span><span id="line-4"><span class="nt">model_providers</span><span class="p">:</span>
|
||||
</span><span id="line-5"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Arch-Router</span>
|
||||
</span><span id="line-5"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano/Plano-Orchestrator</span>
|
||||
</span><span id="line-6"><span class="w"> </span><span class="nt">base_url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://<your-server-ip>:10000</span>
|
||||
</span><span id="line-7">
|
||||
</span><span id="line-8"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-5.2</span>
|
||||
|
|
@ -471,14 +471,14 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
</section>
|
||||
<section id="using-vllm-on-kubernetes-gpu-nodes">
|
||||
<h3>Using vLLM on Kubernetes (GPU nodes)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#using-vllm-on-kubernetes-gpu-nodes" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#using-vllm-on-kubernetes-gpu-nodes'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
<p>For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
|
||||
The <code class="docutils literal notranslate"><span class="pre">demos/llm_routing/model_routing_service/</span></code> directory includes ready-to-use manifests:</p>
|
||||
<ul class="simple">
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">vllm-deployment.yaml</span></code> — Arch-Router served by vLLM, with an init container to download
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">vllm-deployment.yaml</span></code> — Plano-Orchestrator served by vLLM, with an init container to download
|
||||
the model from HuggingFace</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">plano-deployment.yaml</span></code> — Plano proxy configured to use the in-cluster Arch-Router</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">plano-deployment.yaml</span></code> — Plano proxy configured to use the in-cluster Plano-Orchestrator</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">config_k8s.yaml</span></code> — Plano config with <code class="docutils literal notranslate"><span class="pre">llm_routing_model</span></code> pointing at
|
||||
<code class="docutils literal notranslate"><span class="pre">http://arch-router:10000</span></code> instead of the default hosted endpoint</p></li>
|
||||
<code class="docutils literal notranslate"><span class="pre">http://plano-orchestrator:10000</span></code> instead of the default hosted endpoint</p></li>
|
||||
</ul>
|
||||
<p>Key things to know before deploying:</p>
|
||||
<ul class="simple">
|
||||
|
|
@ -498,7 +498,7 @@ instead of a file.</p></li>
|
|||
</section>
|
||||
</section>
|
||||
<section id="model-affinity">
|
||||
<span id="id9"></span><h2>Model Affinity<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#model-affinity" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#model-affinity'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<span id="id8"></span><h2>Model Affinity<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#model-affinity" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#model-affinity'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>In agentic loops — where a single user request triggers multiple LLM calls through tool use — Plano’s router classifies each turn independently. Because successive prompts differ in intent (tool selection looks like code generation, reasoning about results looks like analysis), the router may select different models mid-session. This causes behavioral inconsistency and invalidates provider-side KV caches, increasing both latency and cost.</p>
|
||||
<p><strong>Model affinity</strong> pins the routing decision for the duration of a session. Send an <code class="docutils literal notranslate"><span class="pre">X-Model-Affinity</span></code> header with any string identifier (typically a UUID). The first request routes normally and caches the result. All subsequent requests with the same affinity ID skip routing and reuse the cached model.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="kn">import</span><span class="w"> </span><span class="nn">uuid</span>
|
||||
|
|
@ -563,8 +563,8 @@ instead of a file.</p></li>
|
|||
<section id="combining-routing-methods">
|
||||
<h2>Combining Routing Methods<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#combining-routing-methods" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#combining-routing-methods'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>You can combine static model selection with dynamic routing preferences for maximum flexibility:</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id13">
|
||||
<div class="code-block-caption"><span class="caption-text">Hybrid Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id13"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="literal-block-wrapper docutils container" id="id12">
|
||||
<div class="code-block-caption"><span class="caption-text">Hybrid Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id12"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">llm_providers</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-5.2</span>
|
||||
</span><span id="line-3"><span class="w"> </span><span class="nt">access_key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">$OPENAI_API_KEY</span>
|
||||
|
|
@ -604,7 +604,7 @@ instead of a file.</p></li>
|
|||
</section>
|
||||
<section id="example-use-cases">
|
||||
<h2>Example Use Cases<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#example-use-cases" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#example-use-cases'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Here are common scenarios where Arch-Router excels:</p>
|
||||
<p>Here are common scenarios where Plano-Orchestrator excels:</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Coding Tasks</strong>: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.</p></li>
|
||||
<li><p><strong>Content Processing Workflows</strong>: Classify requests as summarization (“summarize this document”), translation (“translate to Spanish”), or analysis (“what are the key themes”), enabling targeted model selection.</p></li>
|
||||
|
|
@ -645,11 +645,11 @@ instead of a file.</p></li>
|
|||
</section>
|
||||
<section id="unsupported-features">
|
||||
<h2>Unsupported Features<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#unsupported-features" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#unsupported-features'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>The following features are <strong>not supported</strong> by the Arch-Router model:</p>
|
||||
<p>The following features are <strong>not supported</strong> by the Plano-Orchestrator routing model:</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Multi-modality</strong>: The model is not trained to process raw image or audio inputs. It can handle textual queries <em>about</em> these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.</p></li>
|
||||
<li><p><strong>Function calling</strong>: Arch-Router is designed for <strong>semantic preference matching</strong>, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.</p></li>
|
||||
<li><p><strong>System prompt dependency</strong>: Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.</p></li>
|
||||
<li><p><strong>Function calling</strong>: Plano-Orchestrator is designed for <strong>semantic preference matching</strong>, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.</p></li>
|
||||
<li><p><strong>System prompt dependency</strong>: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
|
|
@ -684,15 +684,15 @@ instead of a file.</p></li>
|
|||
<li><a :data-current="activeSection === '#id4'" class="reference internal" href="#id4">Client usage</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a :data-current="activeSection === '#preference-aligned-routing-arch-router'" class="reference internal" href="#preference-aligned-routing-arch-router">Preference-aligned routing (Arch-Router)</a><ul>
|
||||
<li><a :data-current="activeSection === '#preference-aligned-routing-plano-orchestrator'" class="reference internal" href="#preference-aligned-routing-plano-orchestrator">Preference-aligned routing (Plano-Orchestrator)</a><ul>
|
||||
<li><a :data-current="activeSection === '#id5'" class="reference internal" href="#id5">Configuration</a></li>
|
||||
<li><a :data-current="activeSection === '#id6'" class="reference internal" href="#id6">Client usage</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a :data-current="activeSection === '#id7'" class="reference internal" href="#id7">Arch-Router</a></li>
|
||||
<li><a :data-current="activeSection === '#self-hosting-arch-router'" class="reference internal" href="#self-hosting-arch-router">Self-hosting Arch-Router</a><ul>
|
||||
<li><a :data-current="activeSection === '#id7'" class="reference internal" href="#id7">Plano-Orchestrator</a></li>
|
||||
<li><a :data-current="activeSection === '#self-hosting-plano-orchestrator'" class="reference internal" href="#self-hosting-plano-orchestrator">Self-hosting Plano-Orchestrator</a><ul>
|
||||
<li><a :data-current="activeSection === '#using-ollama-recommended-for-local-development'" class="reference internal" href="#using-ollama-recommended-for-local-development">Using Ollama (recommended for local development)</a></li>
|
||||
<li><a :data-current="activeSection === '#using-vllm-recommended-for-production-ec2'" class="reference internal" href="#using-vllm-recommended-for-production-ec2">Using vLLM (recommended for production / EC2)</a></li>
|
||||
<li><a :data-current="activeSection === '#using-vllm-on-kubernetes-gpu-nodes'" class="reference internal" href="#using-vllm-on-kubernetes-gpu-nodes">Using vLLM on Kubernetes (GPU nodes)</a></li>
|
||||
|
|
@ -714,7 +714,7 @@ instead of a file.</p></li>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -248,7 +248,7 @@ Access logs can be exported to centralized logging systems (e.g., ELK stack or F
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -260,7 +260,7 @@ are some sample configuration files for both, respectively.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -216,7 +216,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -792,7 +792,7 @@ tools like AWS X-Ray and Datadog, enhancing observability and facilitating faste
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -1003,7 +1003,7 @@ Plano makes it easy to build and scale these systems by managing the orchestrati
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -298,7 +298,7 @@ the agent. If validation fails (<code class="docutils literal notranslate"><span
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -453,7 +453,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
Plano Docs v0.4.18
|
||||
llms.txt (auto-generated)
|
||||
Generated (UTC): 2026-04-14T02:31:14.825020+00:00
|
||||
Generated (UTC): 2026-04-15T23:42:11.682797+00:00
|
||||
|
||||
Table of contents
|
||||
- Agents (concepts/agents)
|
||||
|
|
@ -3760,9 +3760,9 @@ response = client.chat.completions.create(
|
|||
|
||||
|
||||
|
||||
Preference-aligned routing (Arch-Router)
|
||||
Preference-aligned routing (Plano-Orchestrator)
|
||||
|
||||
Preference-aligned routing uses the Arch-Router model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
|
||||
Preference-aligned routing uses the Plano-Orchestrator model to pick the best LLM based on domain, action, and your configured preferences instead of hard-coding a model.
|
||||
|
||||
Domain: High-level topic of the request (e.g., legal, healthcare, programming).
|
||||
|
||||
|
|
@ -3770,7 +3770,7 @@ Action: What the user wants to do (e.g., summarize, generate code, translate).
|
|||
|
||||
Routing preferences: Your mapping from (domain, action) to preferred models.
|
||||
|
||||
Arch-Router analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
|
||||
Plano-Orchestrator analyzes each prompt to infer domain and action, then applies your preferences to select a model. This decouples routing policy (how to choose) from model assignment (what to run), making routing transparent, controllable, and easy to extend as you add or swap models.
|
||||
|
||||
Configuration
|
||||
|
||||
|
|
@ -3810,20 +3810,20 @@ Client usage
|
|||
|
||||
Clients can let the router decide or still specify aliases:
|
||||
|
||||
# Let Arch-Router choose based on content
|
||||
# Let Plano-Orchestrator choose based on content
|
||||
response = client.chat.completions.create(
|
||||
messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
|
||||
# No model specified - router will analyze and choose claude-sonnet-4-5
|
||||
)
|
||||
|
||||
Arch-Router
|
||||
Plano-Orchestrator
|
||||
|
||||
The Arch-Router is a state-of-the-art preference-based routing model specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
|
||||
Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
|
||||
|
||||
Addressing Traditional Routing Limitations:
|
||||
|
||||
Human Preference Alignment
|
||||
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
|
||||
Unlike benchmark-driven approaches, Plano-Orchestrator learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
|
||||
|
||||
Flexible Model Integration
|
||||
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
|
||||
|
|
@ -3831,15 +3831,15 @@ The system supports seamlessly adding new models for routing without requiring r
|
|||
Preference-Encoded Routing
|
||||
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
|
||||
|
||||
To support effective routing, Arch-Router introduces two key concepts:
|
||||
To support effective routing, Plano-Orchestrator introduces two key concepts:
|
||||
|
||||
Domain – the high-level thematic category or subject matter of a request (e.g., legal, healthcare, programming).
|
||||
|
||||
Action – the specific type of operation the user wants performed (e.g., summarization, code generation, booking appointment, translation).
|
||||
|
||||
Both domain and action configs are associated with preferred models or model variants. At inference time, Arch-Router analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
|
||||
Both domain and action configs are associated with preferred models or model variants. At inference time, Plano-Orchestrator analyzes the incoming prompt to infer its domain and action using semantic similarity, task indicators, and contextual cues. It then applies the user-defined routing preferences to select the model best suited to handle the request.
|
||||
|
||||
In summary, Arch-Router demonstrates:
|
||||
In summary, Plano-Orchestrator demonstrates:
|
||||
|
||||
Structured Preference Routing: Aligns prompt request with model strengths using explicit domain–action mappings.
|
||||
|
||||
|
|
@ -3849,9 +3849,9 @@ Flexible and Adaptive: Supports evolving user needs, model updates, and new doma
|
|||
|
||||
Production-Ready Performance: Optimized for low-latency, high-throughput applications in multi-model environments.
|
||||
|
||||
Self-hosting Arch-Router
|
||||
Self-hosting Plano-Orchestrator
|
||||
|
||||
By default, Plano uses a hosted Arch-Router endpoint. To run Arch-Router locally, you can serve the model yourself using either Ollama or vLLM.
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To run Plano-Orchestrator locally, you can serve the model yourself using either Ollama or vLLM.
|
||||
|
||||
Using Ollama (recommended for local development)
|
||||
|
||||
|
|
@ -3859,14 +3859,14 @@ Install Ollama
|
|||
|
||||
Download and install from ollama.ai.
|
||||
|
||||
Pull and serve Arch-Router
|
||||
Pull and serve the routing model
|
||||
|
||||
ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
ollama serve
|
||||
|
||||
This downloads the quantized GGUF model from HuggingFace and starts serving on http://localhost:11434.
|
||||
|
||||
Configure Plano to use local Arch-Router
|
||||
Configure Plano to use local routing model
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
|
|
@ -3919,7 +3919,7 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
|
|||
--load-format gguf \
|
||||
--chat-template ${SNAPSHOT_DIR}template.jinja \
|
||||
--tokenizer katanemo/Arch-Router-1.5B \
|
||||
--served-model-name Arch-Router \
|
||||
--served-model-name Plano-Orchestrator \
|
||||
--gpu-memory-utilization 0.3 \
|
||||
--tensor-parallel-size 1 \
|
||||
--enable-prefix-caching
|
||||
|
|
@ -3927,10 +3927,10 @@ vllm serve ${SNAPSHOT_DIR}Arch-Router-1.5B-Q4_K_M.gguf \
|
|||
Configure Plano to use the vLLM endpoint
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/Arch-Router
|
||||
llm_routing_model: plano/Plano-Orchestrator
|
||||
|
||||
model_providers:
|
||||
- model: plano/Arch-Router
|
||||
- model: plano/Plano-Orchestrator
|
||||
base_url: http://<your-server-ip>:10000
|
||||
|
||||
- model: openai/gpt-5.2
|
||||
|
|
@ -3950,16 +3950,16 @@ curl http://localhost:10000/v1/models
|
|||
|
||||
Using vLLM on Kubernetes (GPU nodes)
|
||||
|
||||
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
For teams running Kubernetes, Plano-Orchestrator and Plano can be deployed as in-cluster services.
|
||||
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
|
||||
|
||||
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
|
||||
vllm-deployment.yaml — Plano-Orchestrator served by vLLM, with an init container to download
|
||||
the model from HuggingFace
|
||||
|
||||
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
|
||||
plano-deployment.yaml — Plano proxy configured to use the in-cluster Plano-Orchestrator
|
||||
|
||||
config_k8s.yaml — Plano config with llm_routing_model pointing at
|
||||
http://arch-router:10000 instead of the default hosted endpoint
|
||||
http://plano-orchestrator:10000 instead of the default hosted endpoint
|
||||
|
||||
Key things to know before deploying:
|
||||
|
||||
|
|
@ -4092,7 +4092,7 @@ Let the router decide: No model specified, router analyzes content
|
|||
|
||||
Example Use Cases
|
||||
|
||||
Here are common scenarios where Arch-Router excels:
|
||||
Here are common scenarios where Plano-Orchestrator excels:
|
||||
|
||||
Coding Tasks: Distinguish between code generation requests (“write a Python function”), debugging needs (“fix this error”), and code optimization (“make this faster”), routing each to appropriately specialized models.
|
||||
|
||||
|
|
@ -4134,13 +4134,13 @@ Best practices
|
|||
|
||||
Unsupported Features
|
||||
|
||||
The following features are not supported by the Arch-Router model:
|
||||
The following features are not supported by the Plano-Orchestrator routing model:
|
||||
|
||||
Multi-modality: The model is not trained to process raw image or audio inputs. It can handle textual queries about these modalities (e.g., “generate an image of a cat”), but cannot interpret encoded multimedia data directly.
|
||||
|
||||
Function calling: Arch-Router is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
|
||||
Function calling: Plano-Orchestrator is designed for semantic preference matching, not exact intent classification or tool execution. For structured function invocation, use models in the Plano Function Calling collection instead.
|
||||
|
||||
System prompt dependency: Arch-Router routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
|
||||
System prompt dependency: Plano-Orchestrator routes based solely on the user’s conversation history. It does not use or rely on system prompts for routing decisions.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -6455,7 +6455,7 @@ model_providers:
|
|||
|
||||
# routing_preferences: tags a model with named capabilities so Plano's LLM router
|
||||
# can select the best model for each request based on intent. Requires the
|
||||
# Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.
|
||||
# Each preference has a name (short label) and a description (used for intent matching).
|
||||
- model: groq/llama-3.3-70b-versatile
|
||||
access_key: $GROQ_API_KEY
|
||||
|
|
@ -6591,7 +6591,7 @@ overrides:
|
|||
# Path to the trusted CA bundle for upstream TLS verification
|
||||
upstream_tls_ca_path: /etc/ssl/certs/ca-certificates.crt
|
||||
# Model used for intent-based LLM routing (must be listed in model_providers)
|
||||
llm_routing_model: Arch-Router
|
||||
llm_routing_model: Plano-Orchestrator
|
||||
# Model used for agent orchestration (must be listed in model_providers)
|
||||
agent_orchestration_model: Plano-Orchestrator
|
||||
|
||||
|
|
|
|||
|
|
@ -247,7 +247,7 @@ Resources</label><div class="sd-tab-content docutils">
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
BIN
objects.inv
BIN
objects.inv
Binary file not shown.
|
|
@ -437,7 +437,7 @@ Use this page as the canonical source for command syntax, options, and recommend
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -203,7 +203,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
|
|||
</span><span id="line-34"><span class="linenos"> 34</span>
|
||||
</span><span id="line-35"><span class="linenos"> 35</span><span class="w"> </span><span class="c1"># routing_preferences: tags a model with named capabilities so Plano's LLM router</span>
|
||||
</span><span id="line-36"><span class="linenos"> 36</span><span class="w"> </span><span class="c1"># can select the best model for each request based on intent. Requires the</span>
|
||||
</span><span id="line-37"><span class="linenos"> 37</span><span class="w"> </span><span class="c1"># Arch-Router model (or equivalent) to be configured in overrides.llm_routing_model.</span>
|
||||
</span><span id="line-37"><span class="linenos"> 37</span><span class="w"> </span><span class="c1"># Plano-Orchestrator model (or equivalent) to be configured in overrides.llm_routing_model.</span>
|
||||
</span><span id="line-38"><span class="linenos"> 38</span><span class="w"> </span><span class="c1"># Each preference has a name (short label) and a description (used for intent matching).</span>
|
||||
</span><span id="line-39"><span class="linenos"> 39</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">groq/llama-3.3-70b-versatile</span>
|
||||
</span><span id="line-40"><span class="linenos"> 40</span><span class="w"> </span><span class="nt">access_key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">$GROQ_API_KEY</span>
|
||||
|
|
@ -339,7 +339,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
|
|||
</span><span id="line-170"><span class="linenos">170</span><span class="w"> </span><span class="c1"># Path to the trusted CA bundle for upstream TLS verification</span>
|
||||
</span><span id="line-171"><span class="linenos">171</span><span class="w"> </span><span class="nt">upstream_tls_ca_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/etc/ssl/certs/ca-certificates.crt</span>
|
||||
</span><span id="line-172"><span class="linenos">172</span><span class="w"> </span><span class="c1"># Model used for intent-based LLM routing (must be listed in model_providers)</span>
|
||||
</span><span id="line-173"><span class="linenos">173</span><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Arch-Router</span>
|
||||
</span><span id="line-173"><span class="linenos">173</span><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
|
||||
</span><span id="line-174"><span class="linenos">174</span><span class="w"> </span><span class="c1"># Model used for agent orchestration (must be listed in model_providers)</span>
|
||||
</span><span id="line-175"><span class="linenos">175</span><span class="w"> </span><span class="nt">agent_orchestration_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
|
||||
</span><span id="line-176"><span class="linenos">176</span>
|
||||
|
|
@ -414,7 +414,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -542,7 +542,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -179,7 +179,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -199,7 +199,7 @@ own deployments), and Plano reaches them via HTTP.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -485,7 +485,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -200,7 +200,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -200,7 +200,7 @@ hardware threads on the machine.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -221,7 +221,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 15, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
Loading…
Add table
Add a link
Reference in a new issue