Merge origin/main into musa/chatgpt-subscription

2026-05-10 08:12:48 +02:00 · 2026-04-20 23:36:25 +00:00 · 2026-04-20 23:36:25 +00:00 · 6f67048c04
commit 6f67048c04
parent 5af3199f5a 9812540602
118 changed files with 11627 additions and 2194 deletions
--- a/demos/llm_routing/claude_code_router/pretty_model_resolution.sh
+++ b/demos/llm_routing/claude_code_router/pretty_model_resolution.sh
@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 # Pretty-print Plano MODEL_RESOLUTION lines from docker logs
-# - hides Arch-Router
+# - hides Plano-Orchestrator
 # - prints timestamp
 # - colors MODEL_RESOLUTION red
 # - colors req_model cyan
@ -9,7 +9,7 @@

 docker logs -f plano 2>&1 \
 | awk '
-/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
+/MODEL_RESOLUTION:/ && $0 !~ /Plano-Orchestrator/ {
  # extract timestamp between first [ and ]
  ts=""
  if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
--- a/demos/llm_routing/codex_router/pretty_model_resolution.sh
+++ b/demos/llm_routing/codex_router/pretty_model_resolution.sh
@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 # Pretty-print Plano MODEL_RESOLUTION lines from docker logs
-# - hides Arch-Router
+# - hides Plano-Orchestrator
 # - prints timestamp
 # - colors MODEL_RESOLUTION red
 # - colors req_model cyan
@ -9,7 +9,7 @@

 docker logs -f plano 2>&1 \
 | awk '
-/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
+/MODEL_RESOLUTION:/ && $0 !~ /Plano-Orchestrator/ {
  # extract timestamp between first [ and ]
  ts=""
  if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
--- a/demos/llm_routing/model_routing_service/README.md
+++ b/demos/llm_routing/model_routing_service/README.md
@ -6,7 +6,7 @@ Plano is an AI-native proxy and data plane for agentic apps — with built-in or
 ┌───────────┐      ┌─────────────────────────────────┐      ┌──────────────┐
 │  Client   │ ───► │  Plano                          │ ───► │  OpenAI      │
 │  (any     │      │                                 │      │  Anthropic   │
-│  language)│      │  Arch-Router (1.5B model)       │      │  Any Provider│
+│  language)│      │  Plano-Orchestrator              │      │  Any Provider│
 └───────────┘      │  analyzes intent → picks model  │      └──────────────┘
                   └─────────────────────────────────┘
 ```
@ -39,17 +39,17 @@ routing_preferences:

 When a request arrives, Plano:

-1. Sends the conversation + route descriptions to Arch-Router for intent classification
+1. Sends the conversation + route descriptions to Plano-Orchestrator for intent classification
 2. Looks up the matched route and returns its candidate models
 3. Returns an ordered list — client uses `models[0]`, falls back to `models[1]` on 429/5xx

 ```
 1. Request arrives          → "Write binary search in Python"
-2. Arch-Router classifies   → route: "code_generation"
+2. Plano-Orchestrator classifies → route: "code_generation"
 3. Response                 → models: ["anthropic/claude-sonnet-4-20250514", "openai/gpt-4o"]
 ```

-No match? Arch-Router returns `null` route → client falls back to the model in the original request.
+No match? Plano-Orchestrator returns an empty route → client falls back to the model in the original request.

 The `/routing/v1/*` endpoints return the routing decision **without** forwarding to the LLM — useful for testing routing behavior before going to production.

@ -163,9 +163,9 @@ routing:

 Without the `X-Model-Affinity` header, routing runs fresh every time (no breaking change).

-## Kubernetes Deployment (Self-hosted Arch-Router on GPU)
+## Kubernetes Deployment (Self-hosted Plano-Orchestrator on GPU)

-To run Arch-Router in-cluster using vLLM instead of the default hosted endpoint:
+To run Plano-Orchestrator in-cluster using vLLM instead of the default hosted endpoint:

 **0. Check your GPU node labels and taints**

@ -176,10 +176,10 @@ kubectl get node <gpu-node-name> -o jsonpath='{.spec.taints}'

 GPU nodes commonly have a `nvidia.com/gpu:NoSchedule` taint — `vllm-deployment.yaml` includes a matching toleration. If you have multiple GPU node pools and need to pin to a specific one, uncomment and set the `nodeSelector` in `vllm-deployment.yaml` using the label for your cloud provider.

-**1. Deploy Arch-Router and Plano:**
+**1. Deploy Plano-Orchestrator and Plano:**

 ```bash
-# arch-router deployment
+# plano-orchestrator deployment
 kubectl apply -f vllm-deployment.yaml

 # plano deployment
@ -197,8 +197,8 @@ kubectl apply -f plano-deployment.yaml
 **3. Wait for both pods to be ready:**

 ```bash
-# Arch-Router downloads the model (~1 min) then vLLM loads it (~2 min)
-kubectl get pods -l app=arch-router -w
+# Plano-Orchestrator downloads the model (~1 min) then vLLM loads it (~2 min)
+kubectl get pods -l app=plano-orchestrator -w
 kubectl rollout status deployment/plano
 ```

@ -209,10 +209,10 @@ kubectl port-forward svc/plano 12000:12000
 ./demo.sh
 ```

-To confirm requests are hitting your in-cluster Arch-Router (not just health checks):
+To confirm requests are hitting your in-cluster Plano-Orchestrator (not just health checks):

 ```bash
-kubectl logs -l app=arch-router -f --tail=0
+kubectl logs -l app=plano-orchestrator -f --tail=0
 # Look for POST /v1/chat/completions entries
 ```

--- a/demos/llm_routing/model_routing_service/config_k8s.yaml
+++ b/demos/llm_routing/model_routing_service/config_k8s.yaml
@ -1,7 +1,7 @@
 version: v0.3.0

 overrides:
-  llm_routing_model: plano/Arch-Router
+  llm_routing_model: plano/Plano-Orchestrator

 listeners:
  - type: model
@ -10,8 +10,8 @@ listeners:

 model_providers:

-  - model: plano/Arch-Router
-    base_url: http://arch-router:10000
+  - model: plano/Plano-Orchestrator
+    base_url: http://plano-orchestrator:10000

  - model: openai/gpt-4o-mini
    access_key: $OPENAI_API_KEY
--- a/demos/llm_routing/model_routing_service/vllm-deployment.yaml
+++ b/demos/llm_routing/model_routing_service/vllm-deployment.yaml
@ -1,18 +1,18 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
-  name: arch-router
+  name: plano-orchestrator
  labels:
-    app: arch-router
+    app: plano-orchestrator
 spec:
  replicas: 1
  selector:
    matchLabels:
-      app: arch-router
+      app: plano-orchestrator
  template:
    metadata:
      labels:
-        app: arch-router
+        app: plano-orchestrator
    spec:
      tolerations:
        - key: nvidia.com/gpu
@ -53,7 +53,7 @@ spec:
            - "--tokenizer"
            - "katanemo/Arch-Router-1.5B"
            - "--served-model-name"
-            - "Arch-Router"
+            - "Plano-Orchestrator"
            - "--gpu-memory-utilization"
            - "0.3"
            - "--tensor-parallel-size"
@ -94,10 +94,10 @@ spec:
 apiVersion: v1
 kind: Service
 metadata:
-  name: arch-router
+  name: plano-orchestrator
 spec:
  selector:
-    app: arch-router
+    app: plano-orchestrator
  ports:
    - name: http
      port: 10000
--- a/demos/llm_routing/openclaw_routing/config.yaml
+++ b/demos/llm_routing/openclaw_routing/config.yaml
@ -1,7 +1,7 @@
 version: v0.1.0

 overrides:
-  llm_routing_model: Arch-Router
+  llm_routing_model: Plano-Orchestrator

 listeners:
  egress_traffic:
--- a/demos/llm_routing/preference_based_routing/README.md
+++ b/demos/llm_routing/preference_based_routing/README.md
@ -3,7 +3,7 @@ This demo shows how you can use user preferences to route user prompts to approp

 ## How to start the demo

-Make sure you have Plano CLI installed (`pip install planoai==0.4.18` or `uv tool install planoai==0.4.18`).
+Make sure you have Plano CLI installed (`pip install planoai==0.4.20` or `uv tool install planoai==0.4.20`).

 ```bash
 cd demos/llm_routing/preference_based_routing
@ -32,9 +32,9 @@ planoai up config.yaml

 3. Test with curl or open AnythingLLM http://localhost:3001/

-## Running with local Arch-Router (via Ollama)
+## Running with local routing model (via Ollama)

-By default, Plano uses a hosted Arch-Router endpoint. To self-host Arch-Router locally using Ollama:
+By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host a routing model locally using Ollama:

 1. Install [Ollama](https://ollama.ai) and pull the model:
 ```bash
--- a/demos/llm_routing/preference_based_routing/test_router_endpoint.rest
+++ b/demos/llm_routing/preference_based_routing/test_router_endpoint.rest
@ -22,11 +22,11 @@ Content-Type: application/json

 ### get model list from arch-function
 GET https://archfc.katanemo.dev/v1/models HTTP/1.1
-model: Arch-Router
+model: Plano-Orchestrator

-### get model list from Arch-Router (notice model header)
+### get model list from Plano-Orchestrator (notice model header)
 GET https://archfc.katanemo.dev/v1/models HTTP/1.1
-model: Arch-Router
+model: Plano-Orchestrator


 ### test try code generating