mirror of
https://github.com/katanemo/plano.git
synced 2026-04-25 00:36:34 +02:00
Unified overrides for custom router and orchestrator models (#820)
* support configurable orchestrator model via orchestration config section * add self-hosting docs and demo for Plano-Orchestrator * list all Plano-Orchestrator model variants in docs * use overrides for custom routing and orchestration model * update docs * update orchestrator model name * rename arch provider to plano, use llm_routing_model and agent_orchestration_model * regenerate rendered config reference
This commit is contained in:
parent
785bf7e021
commit
bc059aed4d
20 changed files with 312 additions and 103 deletions
|
|
@ -123,6 +123,42 @@ Each agent:
|
|||
|
||||
Both agents run as native local processes and communicate with Plano running natively on the host.
|
||||
|
||||
## Running with local Plano-Orchestrator (via vLLM)
|
||||
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:
|
||||
|
||||
1. Install vLLM and download the model:
|
||||
```bash
|
||||
pip install vllm
|
||||
```
|
||||
|
||||
2. Start the vLLM server with the 4B model:
|
||||
```bash
|
||||
vllm serve katanemo/Plano-Orchestrator-4B \
|
||||
--host 0.0.0.0 \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 1 \
|
||||
--gpu-memory-utilization 0.3 \
|
||||
--tokenizer katanemo/Plano-Orchestrator-4B \
|
||||
--chat-template chat_template.jinja \
|
||||
--served-model-name katanemo/Plano-Orchestrator-4B \
|
||||
--enable-prefix-caching
|
||||
```
|
||||
|
||||
3. Start the demo with the local orchestrator config:
|
||||
```bash
|
||||
./run_demo.sh --local-orchestrator
|
||||
```
|
||||
|
||||
4. Test with curl:
|
||||
```bash
|
||||
curl -X POST http://localhost:8001/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'
|
||||
```
|
||||
|
||||
You should see Plano use your local orchestrator to route the request to the weather agent.
|
||||
|
||||
## Observability
|
||||
|
||||
This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions:
|
||||
|
|
|
|||
|
|
@ -0,0 +1,66 @@
|
|||
version: v0.3.0
|
||||
|
||||
overrides:
|
||||
agent_orchestration_model: plano/katanemo/Plano-Orchestrator-4B
|
||||
|
||||
agents:
|
||||
- id: weather_agent
|
||||
url: http://localhost:10510
|
||||
- id: flight_agent
|
||||
url: http://localhost:10520
|
||||
|
||||
model_providers:
|
||||
- model: plano/katanemo/Plano-Orchestrator-4B
|
||||
base_url: http://localhost:8000
|
||||
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY # smaller, faster, cheaper model for extracting entities like location
|
||||
|
||||
listeners:
|
||||
- type: agent
|
||||
name: travel_booking_service
|
||||
port: 8001
|
||||
router: plano_orchestrator_v1
|
||||
agents:
|
||||
- id: weather_agent
|
||||
description: |
|
||||
|
||||
WeatherAgent is a specialized AI assistant for real-time weather information and forecasts. It provides accurate weather data for any city worldwide using the Open-Meteo API, helping travelers plan their trips with up-to-date weather conditions.
|
||||
|
||||
Capabilities:
|
||||
* Get real-time weather conditions and multi-day forecasts for any city worldwide using Open-Meteo API (free, no API key needed)
|
||||
* Provides current temperature
|
||||
* Provides multi-day forecasts
|
||||
* Provides weather conditions
|
||||
* Provides sunrise/sunset times
|
||||
* Provides detailed weather information
|
||||
* Understands conversation context to resolve location references from previous messages
|
||||
* Handles weather-related questions including "What's the weather in [city]?", "What's the forecast for [city]?", "How's the weather in [city]?"
|
||||
* When queries include both weather and other travel questions (e.g., flights, currency), this agent answers ONLY the weather part
|
||||
|
||||
- id: flight_agent
|
||||
description: |
|
||||
|
||||
FlightAgent is an AI-powered tool specialized in providing live flight information between airports. It leverages the FlightAware AeroAPI to deliver real-time flight status, gate information, and delay updates.
|
||||
|
||||
Capabilities:
|
||||
* Get live flight information between airports using FlightAware AeroAPI
|
||||
* Shows real-time flight status
|
||||
* Shows scheduled/estimated/actual departure and arrival times
|
||||
* Shows gate and terminal information
|
||||
* Shows delays
|
||||
* Shows aircraft type
|
||||
* Shows flight status
|
||||
* Automatically resolves city names to airport codes (IATA/ICAO)
|
||||
* Understands conversation context to infer origin/destination from follow-up questions
|
||||
* Handles flight-related questions including "What flights go from [city] to [city]?", "Do flights go to [city]?", "Are there direct flights from [city]?"
|
||||
* When queries include both flight and other travel questions (e.g., weather, currency), this agent answers ONLY the flight part
|
||||
|
||||
tracing:
|
||||
random_sampling: 100
|
||||
span_attributes:
|
||||
header_prefixes:
|
||||
- x-acme-
|
||||
|
|
@ -31,8 +31,13 @@ start_demo() {
|
|||
fi
|
||||
|
||||
# Step 4: Start Plano
|
||||
echo "Starting Plano with config.yaml..."
|
||||
planoai up config.yaml
|
||||
PLANO_CONFIG="config.yaml"
|
||||
if [ "$1" == "--local-orchestrator" ]; then
|
||||
PLANO_CONFIG="config_local_orchestrator.yaml"
|
||||
echo "Using local orchestrator config..."
|
||||
fi
|
||||
echo "Starting Plano with $PLANO_CONFIG..."
|
||||
planoai up "$PLANO_CONFIG"
|
||||
|
||||
# Step 5: Start agents natively
|
||||
echo "Starting agents..."
|
||||
|
|
|
|||
|
|
@ -1,8 +1,7 @@
|
|||
version: v0.1.0
|
||||
|
||||
routing:
|
||||
model: Arch-Router
|
||||
llm_provider: arch-router
|
||||
overrides:
|
||||
llm_routing_model: Arch-Router
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
|
|
|
|||
|
|
@ -1,8 +1,7 @@
|
|||
version: v0.3.0
|
||||
|
||||
routing:
|
||||
model: Arch-Router
|
||||
llm_provider: arch-router
|
||||
overrides:
|
||||
llm_routing_model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
|
||||
listeners:
|
||||
- type: model
|
||||
|
|
@ -11,8 +10,7 @@ listeners:
|
|||
|
||||
model_providers:
|
||||
|
||||
- name: arch-router
|
||||
model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
- model: plano/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
|
||||
base_url: http://localhost:11434
|
||||
|
||||
- model: openai/gpt-4o-mini
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue