mirror of https://github.com/katanemo/plano.git synced 2026-04-25 00:36:34 +02:00

Unified overrides for custom router and orchestrator models (#820 )

* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference

2026-03-15 09:36:11 -07:00

5.7 KiB

Raw Blame History

Travel Booking Agent Demo

A multi-agent travel booking system demonstrating Plano's intelligent agent routing and orchestration capabilities. This demo showcases two specialized agents working together to help users plan trips with weather information and flight searches. All agent interactions are fully traced with OpenTelemetry-compatible tracing for complete observability.

Overview

This demo consists of two intelligent agents that work together seamlessly:

Weather Agent - Real-time weather conditions and multi-day forecasts for any city worldwide
Flight Agent - Live flight information between airports with real-time tracking

All agents use Plano's agent orchestration LLM to intelligently route user requests to the appropriate specialized agent based on conversation context and user intent.

Features

Intelligent Routing: Plano automatically routes requests to the right agent
Conversation Context: Agents understand follow-up questions and references
Real-Time Data: Live weather and flight data from public APIs
Multi-Day Forecasts: Weather agent supports up to 16-day forecasts
LLM-Powered: Uses GPT-4o-mini for extraction and GPT-5.2 for responses
Streaming Responses: Real-time streaming for better user experience

Prerequisites

Plano CLI installed (pip install planoai)
uv installed (for running agents natively)
OpenAI API key
FlightAware AeroAPI key
Docker and Docker Compose (optional, only needed for --with-ui)

Note: You'll need to obtain a FlightAware AeroAPI key for live flight data. Visit https://www.flightaware.com/aeroapi/portal to get your API key.

Quick Start

1. Set Environment Variables

Create a .env file or export environment variables:

export AEROAPI_KEY="your-flightaware-api-key"
export OPENAI_API_KEY="your OpenAI api key"

2. Start the Demo

./run_demo.sh

This starts Plano natively and runs agents as local processes:

Weather Agent on port 10510
Flight Agent on port 10520

Plano runs natively on the host (port 8001).

To also start Open WebUI, Jaeger tracing, and other optional services, pass --with-ui:

./run_demo.sh --with-ui

This additionally starts:

Open WebUI on port 8080
Jaeger tracing UI on port 16686

4. Test the System

Option A: Using curl

curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'

Option B: Using Open WebUI (requires --with-ui)

Navigate to http://localhost:8080

Note: The Open WebUI may take a few minutes to start up and be fully ready. Please wait for the container to finish initializing before accessing the interface. Once ready, make sure to select the gpt-5.2 model from the model dropdown menu in the UI.

Example Conversations

Multi-Agent Conversation

User: What's the weather in Istanbul?
Assistant: [Weather information]

User: Do they fly out from Seattle?
Assistant: [Flight information from Istanbul to Seattle]

The system understands context and pronouns, automatically routing to the right agent.

Multi-Intent Single Query

User: What's the weather in Seattle, and do any flights go direct to New York?
Assistant: [Both weather_agent and flight_agent respond simultaneously]
  - Weather Agent: [Weather information for Seattle]
  - Flight Agent: [Flight information from Seattle to New York]

Architecture

    User Request
         ↓
    Plano (8001)
     [Orchestrator]
         |
    ┌────┴──-──┐
    ↓          ↓
 Weather     Flight
  Agent       Agent
 (10510)     (10520)
 (10510)     (10520)

Each agent:

Extracts intent using GPT-4o-mini (with OpenTelemetry tracing)
Fetches real-time data from APIs
Generates response using GPT-5.2
Streams response back to user

Both agents run as native local processes and communicate with Plano running natively on the host.

Running with local Plano-Orchestrator (via vLLM)

By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:

Install vLLM and download the model:

pip install vllm

Start the vLLM server with the 4B model:

vllm serve katanemo/Plano-Orchestrator-4B \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.3 \
    --tokenizer katanemo/Plano-Orchestrator-4B \
    --chat-template chat_template.jinja \
    --served-model-name katanemo/Plano-Orchestrator-4B \
    --enable-prefix-caching

Start the demo with the local orchestrator config:

./run_demo.sh --local-orchestrator

Test with curl:

curl -X POST http://localhost:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'

You should see Plano use your local orchestrator to route the request to the weather agent.

Observability

This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions: The tracing data provides complete visibility into the multi-agent system, making it easy to identify bottlenecks, debug issues, and optimize performance.

For more details on setting up and using tracing, see the Plano Observability documentation.

5.7 KiB Raw Blame History