This commit is contained in:
adilhafeez 2026-03-16 19:06:02 +00:00
parent 498b2615d6
commit c042debbfa
33 changed files with 92 additions and 33 deletions

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.12
llms.txt (auto-generated)
Generated (UTC): 2026-03-15T20:04:02.309985+00:00
Generated (UTC): 2026-03-16T19:05:58.621874+00:00
Table of contents
- Agents (concepts/agents)
@ -3855,6 +3855,37 @@ Verify the server is running
curl http://localhost:10000/health
curl http://localhost:10000/v1/models
Using vLLM on Kubernetes (GPU nodes)
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
the model from HuggingFace
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
config_k8s.yaml — Plano config with llm_routing_model pointing at
http://arch-router:10000 instead of the default hosted endpoint
Key things to know before deploying:
GPU nodes commonly have a nvidia.com/gpu:NoSchedule taint — the vllm-deployment.yaml
includes a matching toleration. The nvidia.com/gpu: "1" resource request is sufficient
for scheduling in most clusters; a nodeSelector is optional and commented out in the
manifest for cases where you need to pin to a specific GPU node pool.
Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
livenessProbe has a 180-second initialDelaySeconds to avoid premature restarts.
The Plano config ConfigMap must use --from-file=plano_config.yaml=config_k8s.yaml with
subPath in the Deployment — omitting subPath causes Kubernetes to mount a directory
instead of a file.
For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
deployment. For full step-by-step commands specific to this demo, see the
demo README.
Combining Routing Methods
You can combine static model selection with dynamic routing preferences for maximum flexibility: