mirror of
https://github.com/katanemo/plano.git
synced 2026-06-02 14:35:14 +02:00
deploy: 5388c6777f
This commit is contained in:
parent
498b2615d6
commit
c042debbfa
33 changed files with 92 additions and 33 deletions
|
|
@ -267,7 +267,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -294,7 +294,7 @@ powerful abstraction for evolving your agent workflows over time.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -267,7 +267,7 @@ application to LLMs (API-based or hosted) via prompt targets.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -660,7 +660,7 @@ Implement fallback logic for better reliability:</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -304,7 +304,7 @@ Use your preferred client library without changing existing code (see <a class="
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -434,7 +434,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -1140,7 +1140,7 @@ Any provider that implements the OpenAI API interface can be configured using cu
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -473,7 +473,7 @@ that you can test and modify locally for multi-turn RAG scenarios.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -540,7 +540,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -226,7 +226,7 @@ This gives Plano several advantages:</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -337,7 +337,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -521,7 +521,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -372,7 +372,7 @@ on the stuff that matters most.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -469,6 +469,33 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
</li>
|
||||
</ol>
|
||||
</section>
|
||||
<section id="using-vllm-on-kubernetes-gpu-nodes">
|
||||
<h3>Using vLLM on Kubernetes (GPU nodes)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#using-vllm-on-kubernetes-gpu-nodes" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#using-vllm-on-kubernetes-gpu-nodes'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
The <code class="docutils literal notranslate"><span class="pre">demos/llm_routing/model_routing_service/</span></code> directory includes ready-to-use manifests:</p>
|
||||
<ul class="simple">
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">vllm-deployment.yaml</span></code> — Arch-Router served by vLLM, with an init container to download
|
||||
the model from HuggingFace</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">plano-deployment.yaml</span></code> — Plano proxy configured to use the in-cluster Arch-Router</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">config_k8s.yaml</span></code> — Plano config with <code class="docutils literal notranslate"><span class="pre">llm_routing_model</span></code> pointing at
|
||||
<code class="docutils literal notranslate"><span class="pre">http://arch-router:10000</span></code> instead of the default hosted endpoint</p></li>
|
||||
</ul>
|
||||
<p>Key things to know before deploying:</p>
|
||||
<ul class="simple">
|
||||
<li><p>GPU nodes commonly have a <code class="docutils literal notranslate"><span class="pre">nvidia.com/gpu:NoSchedule</span></code> taint — the <code class="docutils literal notranslate"><span class="pre">vllm-deployment.yaml</span></code>
|
||||
includes a matching toleration. The <code class="docutils literal notranslate"><span class="pre">nvidia.com/gpu:</span> <span class="pre">"1"</span></code> resource request is sufficient
|
||||
for scheduling in most clusters; a <code class="docutils literal notranslate"><span class="pre">nodeSelector</span></code> is optional and commented out in the
|
||||
manifest for cases where you need to pin to a specific GPU node pool.</p></li>
|
||||
<li><p>Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
|
||||
<code class="docutils literal notranslate"><span class="pre">livenessProbe</span></code> has a 180-second <code class="docutils literal notranslate"><span class="pre">initialDelaySeconds</span></code> to avoid premature restarts.</p></li>
|
||||
<li><p>The Plano config ConfigMap must use <code class="docutils literal notranslate"><span class="pre">--from-file=plano_config.yaml=config_k8s.yaml</span></code> with
|
||||
<code class="docutils literal notranslate"><span class="pre">subPath</span></code> in the Deployment — omitting <code class="docutils literal notranslate"><span class="pre">subPath</span></code> causes Kubernetes to mount a directory
|
||||
instead of a file.</p></li>
|
||||
</ul>
|
||||
<p>For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
|
||||
<a class="reference internal" href="../resources/deployment.html#deployment"><span class="std std-ref">Deployment</span></a>. For full step-by-step commands specific to this demo, see the
|
||||
<a class="reference external" href="https://github.com/katanemo/plano/tree/main/demos/llm_routing/model_routing_service/README.md" rel="nofollow noopener">demo README<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>.</p>
|
||||
</section>
|
||||
</section>
|
||||
<section id="combining-routing-methods">
|
||||
<h2>Combining Routing Methods<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#combining-routing-methods" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#combining-routing-methods'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
|
|
@ -605,6 +632,7 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
<li><a :data-current="activeSection === '#self-hosting-arch-router'" class="reference internal" href="#self-hosting-arch-router">Self-hosting Arch-Router</a><ul>
|
||||
<li><a :data-current="activeSection === '#using-ollama-recommended-for-local-development'" class="reference internal" href="#using-ollama-recommended-for-local-development">Using Ollama (recommended for local development)</a></li>
|
||||
<li><a :data-current="activeSection === '#using-vllm-recommended-for-production-ec2'" class="reference internal" href="#using-vllm-recommended-for-production-ec2">Using vLLM (recommended for production / EC2)</a></li>
|
||||
<li><a :data-current="activeSection === '#using-vllm-on-kubernetes-gpu-nodes'" class="reference internal" href="#using-vllm-on-kubernetes-gpu-nodes">Using vLLM on Kubernetes (GPU nodes)</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a :data-current="activeSection === '#combining-routing-methods'" class="reference internal" href="#combining-routing-methods">Combining Routing Methods</a></li>
|
||||
|
|
@ -619,7 +647,7 @@ Provides a practical mechanism to encode user preferences through domain-action
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -248,7 +248,7 @@ Access logs can be exported to centralized logging systems (e.g., ELK stack or F
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -260,7 +260,7 @@ are some sample configuration files for both, respectively.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -216,7 +216,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -792,7 +792,7 @@ tools like AWS X-Ray and Datadog, enhancing observability and facilitating faste
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -1003,7 +1003,7 @@ Plano makes it easy to build and scale these systems by managing the orchestrati
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -298,7 +298,7 @@ the agent. If validation fails (<code class="docutils literal notranslate"><span
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -453,7 +453,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
Plano Docs v0.4.12
|
||||
llms.txt (auto-generated)
|
||||
Generated (UTC): 2026-03-15T20:04:02.309985+00:00
|
||||
Generated (UTC): 2026-03-16T19:05:58.621874+00:00
|
||||
|
||||
Table of contents
|
||||
- Agents (concepts/agents)
|
||||
|
|
@ -3855,6 +3855,37 @@ Verify the server is running
|
|||
curl http://localhost:10000/health
|
||||
curl http://localhost:10000/v1/models
|
||||
|
||||
Using vLLM on Kubernetes (GPU nodes)
|
||||
|
||||
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
|
||||
|
||||
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
|
||||
the model from HuggingFace
|
||||
|
||||
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
|
||||
|
||||
config_k8s.yaml — Plano config with llm_routing_model pointing at
|
||||
http://arch-router:10000 instead of the default hosted endpoint
|
||||
|
||||
Key things to know before deploying:
|
||||
|
||||
GPU nodes commonly have a nvidia.com/gpu:NoSchedule taint — the vllm-deployment.yaml
|
||||
includes a matching toleration. The nvidia.com/gpu: "1" resource request is sufficient
|
||||
for scheduling in most clusters; a nodeSelector is optional and commented out in the
|
||||
manifest for cases where you need to pin to a specific GPU node pool.
|
||||
|
||||
Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
|
||||
livenessProbe has a 180-second initialDelaySeconds to avoid premature restarts.
|
||||
|
||||
The Plano config ConfigMap must use --from-file=plano_config.yaml=config_k8s.yaml with
|
||||
subPath in the Deployment — omitting subPath causes Kubernetes to mount a directory
|
||||
instead of a file.
|
||||
|
||||
For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
|
||||
deployment. For full step-by-step commands specific to this demo, see the
|
||||
demo README.
|
||||
|
||||
Combining Routing Methods
|
||||
|
||||
You can combine static model selection with dynamic routing preferences for maximum flexibility:
|
||||
|
|
|
|||
|
|
@ -247,7 +247,7 @@ Resources</label><div class="sd-tab-content docutils">
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -437,7 +437,7 @@ Use this page as the canonical source for command syntax, options, and recommend
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -302,7 +302,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -542,7 +542,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -179,7 +179,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -199,7 +199,7 @@ own deployments), and Plano reaches them via HTTP.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -485,7 +485,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -200,7 +200,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -200,7 +200,7 @@ hardware threads on the machine.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -221,7 +221,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 15, 2026. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Mar 16, 2026. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
Loading…
Add table
Add a link
Reference in a new issue