fix nodeSelector placeholder, add provider examples and cluster check step

This commit is contained in:
Adil Hafeez 2026-03-16 11:42:49 -07:00
parent 8dc9744c84
commit 1d4d212c4c
No known key found for this signature in database
GPG key ID: 9B18EF7691369645
3 changed files with 23 additions and 23 deletions

View file

@ -107,22 +107,23 @@ The response tells you which model would handle this request and which route was
To run Arch-Router in-cluster using vLLM instead of the default hosted endpoint:
**1. Update `vllm-deployment.yaml`** — set `nodeSelector` to match your GPU node's labels:
```yaml
# Examples:
# GKE: cloud.google.com/gke-accelerator: nvidia-l4
# EKS: eks.amazonaws.com/nodegroup: gpu-nodes
# AKS: kubernetes.azure.com/agentpool: gpupool
nodeSelector:
node.kubernetes.io/instance-type: gpu-node
```
**2. Deploy Arch-Router and Plano:**
**0. Check your GPU node labels and taints**
```bash
kubectl get nodes --show-labels | grep -i gpu
kubectl get node <gpu-node-name> -o jsonpath='{.spec.taints}'
```
GPU nodes commonly have a `nvidia.com/gpu:NoSchedule` taint — `vllm-deployment.yaml` includes a matching toleration. If you have multiple GPU node pools and need to pin to a specific one, uncomment and set the `nodeSelector` in `vllm-deployment.yaml` using the label for your cloud provider.
**1. Deploy Arch-Router and Plano:**
```bash
# arch-router deployment
kubectl apply -f vllm-deployment.yaml
# plano deployment
kubectl create secret generic plano-secrets \
--from-literal=OPENAI_API_KEY=$OPENAI_API_KEY \
--from-literal=ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY

View file

@ -18,13 +18,13 @@ spec:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
nodeSelector:
# Replace with the label that identifies GPU nodes in your cluster
# Examples:
# GKE: cloud.google.com/gke-accelerator: nvidia-l4
# EKS: eks.amazonaws.com/nodegroup: gpu-nodes
# AKS: kubernetes.azure.com/agentpool: gpupool
node.kubernetes.io/instance-type: gpu-node
# Optional: add a nodeSelector to pin to a specific GPU node pool.
# The nvidia.com/gpu resource request below is sufficient for most clusters.
# nodeSelector:
# DigitalOcean: doks.digitalocean.com/gpu-model: l40s
# GKE: cloud.google.com/gke-accelerator: nvidia-l4
# EKS: eks.amazonaws.com/nodegroup: gpu-nodes
# AKS: kubernetes.azure.com/agentpool: gpupool
initContainers:
- name: download-model
image: python:3.11-slim