Unified overrides for custom router and orchestrator models (#820)

* support configurable orchestrator model via orchestration config section * add self-hosting docs and demo for Plano-Orchestrator * list all Plano-Orchestrator model variants in docs * use overrides for custom routing and orchestration model * update docs * update orchestrator model name * rename arch provider to plano, use llm_routing_model and agent_orchestration_model * regenerate rendered config reference
2026-04-25 00:36:34 +02:00 · 2026-03-15 09:36:11 -07:00 · 2026-03-15 09:36:11 -07:00 · bc059aed4d
commit bc059aed4d
parent 785bf7e021
20 changed files with 312 additions and 103 deletions
--- a/demos/agent_orchestration/travel_agents/README.md
+++ b/demos/agent_orchestration/travel_agents/README.md
@ -123,6 +123,42 @@ Each agent:

 Both agents run as native local processes and communicate with Plano running natively on the host.

+## Running with local Plano-Orchestrator (via vLLM)
+
+By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:
+
+1. Install vLLM and download the model:
+```bash
+pip install vllm
+```
+
+2. Start the vLLM server with the 4B model:
+```bash
+vllm serve katanemo/Plano-Orchestrator-4B \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --tensor-parallel-size 1 \
+    --gpu-memory-utilization 0.3 \
+    --tokenizer katanemo/Plano-Orchestrator-4B \
+    --chat-template chat_template.jinja \
+    --served-model-name katanemo/Plano-Orchestrator-4B \
+    --enable-prefix-caching
+```
+
+3. Start the demo with the local orchestrator config:
+```bash
+./run_demo.sh --local-orchestrator
+```
+
+4. Test with curl:
+```bash
+curl -X POST http://localhost:8001/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'
+```
+
+You should see Plano use your local orchestrator to route the request to the weather agent.
+
 ## Observability

 This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions:
--- a/demos/agent_orchestration/travel_agents/config_local_orchestrator.yaml
+++ b/demos/agent_orchestration/travel_agents/config_local_orchestrator.yaml
@ -0,0 +1,66 @@
+version: v0.3.0
+
+overrides:
+  agent_orchestration_model: plano/katanemo/Plano-Orchestrator-4B
+
+agents:
+  - id: weather_agent
+    url: http://localhost:10510
+  - id: flight_agent
+    url: http://localhost:10520
+
+model_providers:
+  - model: plano/katanemo/Plano-Orchestrator-4B
+    base_url: http://localhost:8000
+
+  - model: openai/gpt-5.2
+    access_key: $OPENAI_API_KEY
+    default: true
+  - model: openai/gpt-4o-mini
+    access_key: $OPENAI_API_KEY # smaller, faster, cheaper model for extracting entities like location
+
+listeners:
+  - type: agent
+    name: travel_booking_service
+    port: 8001
+    router: plano_orchestrator_v1
+    agents:
+      - id: weather_agent
+        description: |
+
+          WeatherAgent is a specialized AI assistant for real-time weather information and forecasts. It provides accurate weather data for any city worldwide using the Open-Meteo API, helping travelers plan their trips with up-to-date weather conditions.
+
+          Capabilities:
+            * Get real-time weather conditions and multi-day forecasts for any city worldwide using Open-Meteo API (free, no API key needed)
+            * Provides current temperature
+            * Provides multi-day forecasts
+            * Provides weather conditions
+            * Provides sunrise/sunset times
+            * Provides detailed weather information
+            * Understands conversation context to resolve location references from previous messages
+            * Handles weather-related questions including "What's the weather in [city]?", "What's the forecast for [city]?", "How's the weather in [city]?"
+            * When queries include both weather and other travel questions (e.g., flights, currency), this agent answers ONLY the weather part
+
+      - id: flight_agent
+        description: |
+
+          FlightAgent is an AI-powered tool specialized in providing live flight information between airports. It leverages the FlightAware AeroAPI to deliver real-time flight status, gate information, and delay updates.
+
+          Capabilities:
+            * Get live flight information between airports using FlightAware AeroAPI
+            * Shows real-time flight status
+            * Shows scheduled/estimated/actual departure and arrival times
+            * Shows gate and terminal information
+            * Shows delays
+            * Shows aircraft type
+            * Shows flight status
+            * Automatically resolves city names to airport codes (IATA/ICAO)
+            * Understands conversation context to infer origin/destination from follow-up questions
+            * Handles flight-related questions including "What flights go from [city] to [city]?", "Do flights go to [city]?", "Are there direct flights from [city]?"
+            * When queries include both flight and other travel questions (e.g., weather, currency), this agent answers ONLY the flight part
+
+tracing:
+  random_sampling: 100
+  span_attributes:
+    header_prefixes:
+      - x-acme-
--- a/demos/agent_orchestration/travel_agents/run_demo.sh
+++ b/demos/agent_orchestration/travel_agents/run_demo.sh
@ -31,8 +31,13 @@ start_demo() {
  fi

  # Step 4: Start Plano
-  echo "Starting Plano with config.yaml..."
-  planoai up config.yaml
+  PLANO_CONFIG="config.yaml"
+  if [ "$1" == "--local-orchestrator" ]; then
+    PLANO_CONFIG="config_local_orchestrator.yaml"
+    echo "Using local orchestrator config..."
+  fi
+  echo "Starting Plano with $PLANO_CONFIG..."
+  planoai up "$PLANO_CONFIG"

  # Step 5: Start agents natively
  echo "Starting agents..."
--- a/demos/llm_routing/openclaw_routing/config.yaml
+++ b/demos/llm_routing/openclaw_routing/config.yaml
@ -1,8 +1,7 @@
 version: v0.1.0

-routing:
-  model: Arch-Router
-  llm_provider: arch-router
+overrides:
+  llm_routing_model: Arch-Router

 listeners:
  egress_traffic:
--- a/demos/llm_routing/preference_based_routing/plano_config_local.yaml
+++ b/demos/llm_routing/preference_based_routing/plano_config_local.yaml
@ -1,8 +1,7 @@
 version: v0.3.0

-routing:
-  model: Arch-Router
-  llm_provider: arch-router
+overrides:
+  llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M

 listeners:
  - type: model
@ -11,8 +10,7 @@ listeners:

 model_providers:

-  - name: arch-router
-    model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
+  - model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
    base_url: http://localhost:11434

  - model: openai/gpt-4o-mini