mirror of
https://github.com/katanemo/plano.git
synced 2026-05-10 08:12:48 +02:00
Merge origin/main into musa/chatgpt-subscription
This commit is contained in:
commit
6f67048c04
118 changed files with 11627 additions and 2194 deletions
|
|
@ -1,6 +1,6 @@
|
|||
#!/usr/bin/env bash
|
||||
# Pretty-print Plano MODEL_RESOLUTION lines from docker logs
|
||||
# - hides Arch-Router
|
||||
# - hides Plano-Orchestrator
|
||||
# - prints timestamp
|
||||
# - colors MODEL_RESOLUTION red
|
||||
# - colors req_model cyan
|
||||
|
|
@ -9,7 +9,7 @@
|
|||
|
||||
docker logs -f plano 2>&1 \
|
||||
| awk '
|
||||
/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
|
||||
/MODEL_RESOLUTION:/ && $0 !~ /Plano-Orchestrator/ {
|
||||
# extract timestamp between first [ and ]
|
||||
ts=""
|
||||
if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
#!/usr/bin/env bash
|
||||
# Pretty-print Plano MODEL_RESOLUTION lines from docker logs
|
||||
# - hides Arch-Router
|
||||
# - hides Plano-Orchestrator
|
||||
# - prints timestamp
|
||||
# - colors MODEL_RESOLUTION red
|
||||
# - colors req_model cyan
|
||||
|
|
@ -9,7 +9,7 @@
|
|||
|
||||
docker logs -f plano 2>&1 \
|
||||
| awk '
|
||||
/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
|
||||
/MODEL_RESOLUTION:/ && $0 !~ /Plano-Orchestrator/ {
|
||||
# extract timestamp between first [ and ]
|
||||
ts=""
|
||||
if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@ Plano is an AI-native proxy and data plane for agentic apps — with built-in or
|
|||
┌───────────┐ ┌─────────────────────────────────┐ ┌──────────────┐
|
||||
│ Client │ ───► │ Plano │ ───► │ OpenAI │
|
||||
│ (any │ │ │ │ Anthropic │
|
||||
│ language)│ │ Arch-Router (1.5B model) │ │ Any Provider│
|
||||
│ language)│ │ Plano-Orchestrator │ │ Any Provider│
|
||||
└───────────┘ │ analyzes intent → picks model │ └──────────────┘
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
|
@ -39,17 +39,17 @@ routing_preferences:
|
|||
|
||||
When a request arrives, Plano:
|
||||
|
||||
1. Sends the conversation + route descriptions to Arch-Router for intent classification
|
||||
1. Sends the conversation + route descriptions to Plano-Orchestrator for intent classification
|
||||
2. Looks up the matched route and returns its candidate models
|
||||
3. Returns an ordered list — client uses `models[0]`, falls back to `models[1]` on 429/5xx
|
||||
|
||||
```
|
||||
1. Request arrives → "Write binary search in Python"
|
||||
2. Arch-Router classifies → route: "code_generation"
|
||||
2. Plano-Orchestrator classifies → route: "code_generation"
|
||||
3. Response → models: ["anthropic/claude-sonnet-4-20250514", "openai/gpt-4o"]
|
||||
```
|
||||
|
||||
No match? Arch-Router returns `null` route → client falls back to the model in the original request.
|
||||
No match? Plano-Orchestrator returns an empty route → client falls back to the model in the original request.
|
||||
|
||||
The `/routing/v1/*` endpoints return the routing decision **without** forwarding to the LLM — useful for testing routing behavior before going to production.
|
||||
|
||||
|
|
@ -163,9 +163,9 @@ routing:
|
|||
|
||||
Without the `X-Model-Affinity` header, routing runs fresh every time (no breaking change).
|
||||
|
||||
## Kubernetes Deployment (Self-hosted Arch-Router on GPU)
|
||||
## Kubernetes Deployment (Self-hosted Plano-Orchestrator on GPU)
|
||||
|
||||
To run Arch-Router in-cluster using vLLM instead of the default hosted endpoint:
|
||||
To run Plano-Orchestrator in-cluster using vLLM instead of the default hosted endpoint:
|
||||
|
||||
**0. Check your GPU node labels and taints**
|
||||
|
||||
|
|
@ -176,10 +176,10 @@ kubectl get node <gpu-node-name> -o jsonpath='{.spec.taints}'
|
|||
|
||||
GPU nodes commonly have a `nvidia.com/gpu:NoSchedule` taint — `vllm-deployment.yaml` includes a matching toleration. If you have multiple GPU node pools and need to pin to a specific one, uncomment and set the `nodeSelector` in `vllm-deployment.yaml` using the label for your cloud provider.
|
||||
|
||||
**1. Deploy Arch-Router and Plano:**
|
||||
**1. Deploy Plano-Orchestrator and Plano:**
|
||||
|
||||
```bash
|
||||
# arch-router deployment
|
||||
# plano-orchestrator deployment
|
||||
kubectl apply -f vllm-deployment.yaml
|
||||
|
||||
# plano deployment
|
||||
|
|
@ -197,8 +197,8 @@ kubectl apply -f plano-deployment.yaml
|
|||
**3. Wait for both pods to be ready:**
|
||||
|
||||
```bash
|
||||
# Arch-Router downloads the model (~1 min) then vLLM loads it (~2 min)
|
||||
kubectl get pods -l app=arch-router -w
|
||||
# Plano-Orchestrator downloads the model (~1 min) then vLLM loads it (~2 min)
|
||||
kubectl get pods -l app=plano-orchestrator -w
|
||||
kubectl rollout status deployment/plano
|
||||
```
|
||||
|
||||
|
|
@ -209,10 +209,10 @@ kubectl port-forward svc/plano 12000:12000
|
|||
./demo.sh
|
||||
```
|
||||
|
||||
To confirm requests are hitting your in-cluster Arch-Router (not just health checks):
|
||||
To confirm requests are hitting your in-cluster Plano-Orchestrator (not just health checks):
|
||||
|
||||
```bash
|
||||
kubectl logs -l app=arch-router -f --tail=0
|
||||
kubectl logs -l app=plano-orchestrator -f --tail=0
|
||||
# Look for POST /v1/chat/completions entries
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
version: v0.3.0
|
||||
|
||||
overrides:
|
||||
llm_routing_model: plano/Arch-Router
|
||||
llm_routing_model: plano/Plano-Orchestrator
|
||||
|
||||
listeners:
|
||||
- type: model
|
||||
|
|
@ -10,8 +10,8 @@ listeners:
|
|||
|
||||
model_providers:
|
||||
|
||||
- model: plano/Arch-Router
|
||||
base_url: http://arch-router:10000
|
||||
- model: plano/Plano-Orchestrator
|
||||
base_url: http://plano-orchestrator:10000
|
||||
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
|
|
|
|||
|
|
@ -1,18 +1,18 @@
|
|||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: arch-router
|
||||
name: plano-orchestrator
|
||||
labels:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
spec:
|
||||
tolerations:
|
||||
- key: nvidia.com/gpu
|
||||
|
|
@ -53,7 +53,7 @@ spec:
|
|||
- "--tokenizer"
|
||||
- "katanemo/Arch-Router-1.5B"
|
||||
- "--served-model-name"
|
||||
- "Arch-Router"
|
||||
- "Plano-Orchestrator"
|
||||
- "--gpu-memory-utilization"
|
||||
- "0.3"
|
||||
- "--tensor-parallel-size"
|
||||
|
|
@ -94,10 +94,10 @@ spec:
|
|||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: arch-router
|
||||
name: plano-orchestrator
|
||||
spec:
|
||||
selector:
|
||||
app: arch-router
|
||||
app: plano-orchestrator
|
||||
ports:
|
||||
- name: http
|
||||
port: 10000
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
version: v0.1.0
|
||||
|
||||
overrides:
|
||||
llm_routing_model: Arch-Router
|
||||
llm_routing_model: Plano-Orchestrator
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@ This demo shows how you can use user preferences to route user prompts to approp
|
|||
|
||||
## How to start the demo
|
||||
|
||||
Make sure you have Plano CLI installed (`pip install planoai==0.4.18` or `uv tool install planoai==0.4.18`).
|
||||
Make sure you have Plano CLI installed (`pip install planoai==0.4.20` or `uv tool install planoai==0.4.20`).
|
||||
|
||||
```bash
|
||||
cd demos/llm_routing/preference_based_routing
|
||||
|
|
@ -32,9 +32,9 @@ planoai up config.yaml
|
|||
|
||||
3. Test with curl or open AnythingLLM http://localhost:3001/
|
||||
|
||||
## Running with local Arch-Router (via Ollama)
|
||||
## Running with local routing model (via Ollama)
|
||||
|
||||
By default, Plano uses a hosted Arch-Router endpoint. To self-host Arch-Router locally using Ollama:
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host a routing model locally using Ollama:
|
||||
|
||||
1. Install [Ollama](https://ollama.ai) and pull the model:
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -22,11 +22,11 @@ Content-Type: application/json
|
|||
|
||||
### get model list from arch-function
|
||||
GET https://archfc.katanemo.dev/v1/models HTTP/1.1
|
||||
model: Arch-Router
|
||||
model: Plano-Orchestrator
|
||||
|
||||
### get model list from Arch-Router (notice model header)
|
||||
### get model list from Plano-Orchestrator (notice model header)
|
||||
GET https://archfc.katanemo.dev/v1/models HTTP/1.1
|
||||
model: Arch-Router
|
||||
model: Plano-Orchestrator
|
||||
|
||||
|
||||
### test try code generating
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue