mirror of
https://github.com/katanemo/plano.git
synced 2026-05-08 23:32:43 +02:00
Unified overrides for custom router and orchestrator models (#820)
* support configurable orchestrator model via orchestration config section * add self-hosting docs and demo for Plano-Orchestrator * list all Plano-Orchestrator model variants in docs * use overrides for custom routing and orchestration model * update docs * update orchestrator model name * rename arch provider to plano, use llm_routing_model and agent_orchestration_model * regenerate rendered config reference
This commit is contained in:
parent
785bf7e021
commit
bc059aed4d
20 changed files with 312 additions and 103 deletions
|
|
@ -123,6 +123,42 @@ Each agent:
|
|||
|
||||
Both agents run as native local processes and communicate with Plano running natively on the host.
|
||||
|
||||
## Running with local Plano-Orchestrator (via vLLM)
|
||||
|
||||
By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the orchestrator model locally using vLLM on a server with an NVIDIA GPU:
|
||||
|
||||
1. Install vLLM and download the model:
|
||||
```bash
|
||||
pip install vllm
|
||||
```
|
||||
|
||||
2. Start the vLLM server with the 4B model:
|
||||
```bash
|
||||
vllm serve katanemo/Plano-Orchestrator-4B \
|
||||
--host 0.0.0.0 \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 1 \
|
||||
--gpu-memory-utilization 0.3 \
|
||||
--tokenizer katanemo/Plano-Orchestrator-4B \
|
||||
--chat-template chat_template.jinja \
|
||||
--served-model-name katanemo/Plano-Orchestrator-4B \
|
||||
--enable-prefix-caching
|
||||
```
|
||||
|
||||
3. Start the demo with the local orchestrator config:
|
||||
```bash
|
||||
./run_demo.sh --local-orchestrator
|
||||
```
|
||||
|
||||
4. Test with curl:
|
||||
```bash
|
||||
curl -X POST http://localhost:8001/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "What is the weather in Istanbul?"}]}'
|
||||
```
|
||||
|
||||
You should see Plano use your local orchestrator to route the request to the weather agent.
|
||||
|
||||
## Observability
|
||||
|
||||
This demo includes full OpenTelemetry (OTel) compatible distributed tracing to monitor and debug agent interactions:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue