Unified overrides for custom router and orchestrator models (#820)

* support configurable orchestrator model via orchestration config section

* add self-hosting docs and demo for Plano-Orchestrator

* list all Plano-Orchestrator model variants in docs

* use overrides for custom routing and orchestration model

* update docs

* update orchestrator model name

* rename arch provider to plano, use llm_routing_model and agent_orchestration_model

* regenerate rendered config reference
This commit is contained in:
Adil Hafeez 2026-03-15 09:36:11 -07:00 committed by GitHub
parent 785bf7e021
commit bc059aed4d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
20 changed files with 312 additions and 103 deletions

View file

@ -123,6 +123,42 @@ Each agent:
Both agents run as native local processes and communicate with Plano running natively on the host.
## Running with local Plano-Orchestrator (via vLLM)
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:
1. Install vLLM and download the model:
```bash
pip install vllm
```
2. Start the vLLM server with the 4B model:
```bash
vllm serve katanemo/Plano-Orchestrator-4B \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.3 \
--tokenizer katanemo/Plano-Orchestrator-4B \
--chat-template chat_template.jinja \
--served-model-name katanemo/Plano-Orchestrator-4B \
--enable-prefix-caching
```
3. Start the demo with the local orchestrator config:
```bash
./run_demo.sh --local-orchestrator
```
4. Test with curl:
```bash
curl -X POST http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'
```
You should see Plano use your local orchestrator to route the request to the weather agent.
## Observability
This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions: