mirror of
https://github.com/katanemo/plano.git
synced 2026-06-05 14:45:15 +02:00
deploy: 5388c6777f
This commit is contained in:
parent
498b2615d6
commit
c042debbfa
33 changed files with 92 additions and 33 deletions
|
|
@ -1,6 +1,6 @@
|
|||
Plano Docs v0.4.12
|
||||
llms.txt (auto-generated)
|
||||
Generated (UTC): 2026-03-15T20:04:02.309985+00:00
|
||||
Generated (UTC): 2026-03-16T19:05:58.621874+00:00
|
||||
|
||||
Table of contents
|
||||
- Agents (concepts/agents)
|
||||
|
|
@ -3855,6 +3855,37 @@ Verify the server is running
|
|||
curl http://localhost:10000/health
|
||||
curl http://localhost:10000/v1/models
|
||||
|
||||
Using vLLM on Kubernetes (GPU nodes)
|
||||
|
||||
For teams running Kubernetes, Arch-Router and Plano can be deployed as in-cluster services.
|
||||
The demos/llm_routing/model_routing_service/ directory includes ready-to-use manifests:
|
||||
|
||||
vllm-deployment.yaml — Arch-Router served by vLLM, with an init container to download
|
||||
the model from HuggingFace
|
||||
|
||||
plano-deployment.yaml — Plano proxy configured to use the in-cluster Arch-Router
|
||||
|
||||
config_k8s.yaml — Plano config with llm_routing_model pointing at
|
||||
http://arch-router:10000 instead of the default hosted endpoint
|
||||
|
||||
Key things to know before deploying:
|
||||
|
||||
GPU nodes commonly have a nvidia.com/gpu:NoSchedule taint — the vllm-deployment.yaml
|
||||
includes a matching toleration. The nvidia.com/gpu: "1" resource request is sufficient
|
||||
for scheduling in most clusters; a nodeSelector is optional and commented out in the
|
||||
manifest for cases where you need to pin to a specific GPU node pool.
|
||||
|
||||
Model download takes ~1 minute; vLLM loads the model in ~1-2 minutes after that. The
|
||||
livenessProbe has a 180-second initialDelaySeconds to avoid premature restarts.
|
||||
|
||||
The Plano config ConfigMap must use --from-file=plano_config.yaml=config_k8s.yaml with
|
||||
subPath in the Deployment — omitting subPath causes Kubernetes to mount a directory
|
||||
instead of a file.
|
||||
|
||||
For the canonical Plano Kubernetes deployment (ConfigMap, Secrets, Deployment YAML), see
|
||||
deployment. For full step-by-step commands specific to this demo, see the
|
||||
demo README.
|
||||
|
||||
Combining Routing Methods
|
||||
|
||||
You can combine static model selection with dynamic routing preferences for maximum flexibility:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue