Unified overrides for custom router and orchestrator models (#820)

* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference
This commit is contained in:
Adil Hafeez 2026-03-15 09:36:11 -07:00 committed by GitHub
parent 785bf7e021
commit bc059aed4d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
20 changed files with 312 additions and 103 deletions

View file

@ -123,6 +123,42 @@ Each agent:
Both agents run as native local processes and communicate with Plano running natively on the host.
## Running with local Plano-Orchestrator (via vLLM)
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:
1. Install vLLM and download the model:
```bash
pip install vllm
```
2. Start the vLLM server with the 4B model:
```bash
vllm serve katanemo/Plano-Orchestrator-4B \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.3 \
--tokenizer katanemo/Plano-Orchestrator-4B \
--chat-template chat_template.jinja \
--served-model-name katanemo/Plano-Orchestrator-4B \
--enable-prefix-caching
```
3. Start the demo with the local orchestrator config:
```bash
./run_demo.sh --local-orchestrator
```
4. Test with curl:
```bash
curl -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'
```
You should see Plano use your local orchestrator to route the request to the weather agent.
## Observability
This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions:

View file

@ -0,0 +1,66 @@
version: v0.3.0
overrides:
agent_orchestration_model: plano/katanemo/Plano-Orchestrator-4B
agents:
- id: weather_agent
url: http://localhost:10510
- id: flight_agent
url: http://localhost:10520
model_providers:
- model: plano/katanemo/Plano-Orchestrator-4B
base_url: http://localhost:8000
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY # smaller, faster, cheaper model for extracting entities like location
listeners:
- type: agent
name: travel_booking_service
port: 8001
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: |
WeatherAgent is a specialized AI assistant for real-time weather information and forecasts. It provides accurate weather data for any city worldwide using the Open-Meteo API, helping travelers plan their trips with up-to-date weather conditions.
Capabilities:
* Get real-time weather conditions and multi-day forecasts for any city worldwide using Open-Meteo API (free, no API key needed)
* Provides current temperature
* Provides multi-day forecasts
* Provides weather conditions
* Provides sunrise/sunset times
* Provides detailed weather information
* Understands conversation context to resolve location references from previous messages
* Handles weather-related questions including "What's the weather in [city]?", "What's the forecast for [city]?", "How's the weather in [city]?"
* When queries include both weather and other travel questions (e.g., flights, currency), this agent answers ONLY the weather part
- id: flight_agent
description: |
FlightAgent is an AI-powered tool specialized in providing live flight information between airports. It leverages the FlightAware AeroAPI to deliver real-time flight status, gate information, and delay updates.
Capabilities:
* Get live flight information between airports using FlightAware AeroAPI
* Shows real-time flight status
* Shows scheduled/estimated/actual departure and arrival times
* Shows gate and terminal information
* Shows delays
* Shows aircraft type
* Shows flight status
* Automatically resolves city names to airport codes (IATA/ICAO)
* Understands conversation context to infer origin/destination from follow-up questions
* Handles flight-related questions including "What flights go from [city] to [city]?", "Do flights go to [city]?", "Are there direct flights from [city]?"
* When queries include both flight and other travel questions (e.g., weather, currency), this agent answers ONLY the flight part
tracing:
random_sampling: 100
span_attributes:
header_prefixes:
- x-acme-

View file

@ -31,8 +31,13 @@ start_demo() {
fi
# Step 4: Start Plano
echo "Starting Plano with config.yaml..."
planoai up config.yaml
PLANO_CONFIG="config.yaml"
if [ "$1" == "--local-orchestrator" ]; then
PLANO_CONFIG="config_local_orchestrator.yaml"
echo "Using local orchestrator config..."
fi
echo "Starting Plano with $PLANO_CONFIG..."
planoai up "$PLANO_CONFIG"
# Step 5: Start agents natively
echo "Starting agents..."

View file

@ -1,8 +1,7 @@
version: v0.1.0
routing:
model: Arch-Router
llm_provider: arch-router
overrides:
llm_routing_model: Arch-Router
listeners:
egress_traffic:

View file

@ -1,8 +1,7 @@
version: v0.3.0
routing:
model: Arch-Router
llm_provider: arch-router
overrides:
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
listeners:
- type: model
@ -11,8 +10,7 @@ listeners:
model_providers:
- name: arch-router
model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
- model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
base_url: http://localhost:11434
- model: openai/gpt-4o-mini