plano/demos/llm_routing/preference_based_routing/README.md

# Usage based LLM Routing
This demo shows how you can use user preferences to route user prompts to appropriate llm. See [config.yaml](config.yaml) for details on how you can define user preferences.

## How to start the demo

Make sure you have Plano CLI installed (`pip install planoai==0.4.18` or `uv tool install planoai==0.4.18`).

```bash
cd demos/llm_routing/preference_based_routing
./run_demo.sh
```

To also start AnythingLLM (chat UI) and Jaeger (tracing):

```bash
./run_demo.sh --with-ui
```

Then open AnythingLLM at http://localhost:3001/

Or start manually:

1. (Optional) Start AnythingLLM and Jaeger
```bash
docker compose up -d
```

2. Start Plano
```bash
planoai up config.yaml
```

3. Test with curl or open AnythingLLM http://localhost:3001/

## Running with local routing model (via Ollama)

By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host a routing model locally using Ollama:

1. Install [Ollama](https://ollama.ai) and pull the model:
```bash
ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M
```

2. Make sure Ollama is running (`ollama serve` or the macOS app).

3. Start Plano with the local config:
```bash
planoai up plano_config_local.yaml
```

4. Test routing:
```bash
curl -s "http://localhost:12000/routing/v1/messages" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Create a REST API endpoint in Rust using actix-web"}
    ]
  }'
```

You should see the router select the appropriate model based on the routing preferences defined in `plano_config_local.yaml`.

# Testing out preference based routing

We have defined two routes 1. code generation and 2. code understanding

For code generation query LLM that is better suited for code generation wil handle the request,


If you look at the logs you'd see that code generation llm was selected,

```
...
2025-05-31T01:02:19.382716Z  INFO brightstaff::router::llm_router: router response: {'route': 'code_generation'}, response time: 203ms
...
```

<img width="1036" alt="image" src="https://github.com/user-attachments/assets/f923944b-ddbe-462e-9fd5-c75504adc8cf" />

Now if you ask for query related to code understanding you'd see llm that is better suited to handle code understanding in handled,

```
...
2025-05-31T01:06:33.555680Z  INFO brightstaff::router::llm_router: router response: {'route': 'code_understanding'}, response time: 327ms
...
```

<img width="1081" alt="image" src="https://github.com/user-attachments/assets/e50d167c-46a0-4e3a-ba77-e84db1bd376d" />
Introduce brightstaff a new terminal service for llm routing (#477) 2025-05-19 09:59:22 -07:00			`# Usage based LLM Routing`
rename cli to plano (#647) 2025-12-23 18:37:58 -08:00			`This demo shows how you can use user preferences to route user prompts to appropriate llm. See [config.yaml](config.yaml) for details on how you can define user preferences.`
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00
			`## How to start the demo`

release 0.4.18 (#878) 2026-04-09 13:12:45 -07:00			Make sure you have Plano CLI installed (`pip install planoai==0.4.18` or `uv tool install planoai==0.4.18`).
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00
Run plano natively by default (#744) 2026-03-05 07:35:25 -08:00			```bash
			`cd demos/llm_routing/preference_based_routing`
			`./run_demo.sh`
			```

Run demos without Docker (#809) 2026-03-11 12:49:36 -07:00			`To also start AnythingLLM (chat UI) and Jaeger (tracing):`
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00
			```bash
Run demos without Docker (#809) 2026-03-11 12:49:36 -07:00			`./run_demo.sh --with-ui`
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00			```
Run plano natively by default (#744) 2026-03-05 07:35:25 -08:00
Run demos without Docker (#809) 2026-03-11 12:49:36 -07:00			`Then open AnythingLLM at http://localhost:3001/`

			`Or start manually:`

			`1. (Optional) Start AnythingLLM and Jaeger`
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00			```bash
Run plano natively by default (#744) 2026-03-05 07:35:25 -08:00			`docker compose up -d`
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00			```

Run demos without Docker (#809) 2026-03-11 12:49:36 -07:00			`2. Start Plano`
			```bash
			`planoai up config.yaml`
			```

			`3. Test with curl or open AnythingLLM http://localhost:3001/`
update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00
use plano-orchestrator for LLM routing, remove arch-router Replace RouterService/RouterModelV1 (arch-router prompt) with OrchestratorService/OrchestratorModelV1 (plano-orchestrator prompt) for LLM routing. This ensures the correct system prompt is used when llm_routing_model points at a Plano-Orchestrator model. - Extend OrchestratorService with session caching, ModelMetricsService, top-level routing preferences, and determine_route() for LLM routing - Delete RouterService, RouterModel trait, RouterModelV1, and ARCH_ROUTER_V1_SYSTEM_PROMPT - Unify defaults to Plano-Orchestrator / plano-orchestrator - Update CLI config generator, demos, docs, and config schema Made-with: Cursor 2026-04-15 13:11:17 -07:00			`## Running with local routing model (via Ollama)`
add instructions on hosting arch-router locally (#819) 2026-03-11 15:28:50 -07:00
use plano-orchestrator for LLM routing, remove arch-router Replace RouterService/RouterModelV1 (arch-router prompt) with OrchestratorService/OrchestratorModelV1 (plano-orchestrator prompt) for LLM routing. This ensures the correct system prompt is used when llm_routing_model points at a Plano-Orchestrator model. - Extend OrchestratorService with session caching, ModelMetricsService, top-level routing preferences, and determine_route() for LLM routing - Delete RouterService, RouterModel trait, RouterModelV1, and ARCH_ROUTER_V1_SYSTEM_PROMPT - Unify defaults to Plano-Orchestrator / plano-orchestrator - Update CLI config generator, demos, docs, and config schema Made-with: Cursor 2026-04-15 13:11:17 -07:00			`By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host a routing model locally using Ollama:`
add instructions on hosting arch-router locally (#819) 2026-03-11 15:28:50 -07:00
			`1. Install [Ollama](https://ollama.ai) and pull the model:`
			```bash
			`ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M`
			```

			2. Make sure Ollama is running (`ollama serve` or the macOS app).

			`3. Start Plano with the local config:`
			```bash
			`planoai up plano_config_local.yaml`
			```

			`4. Test routing:`
			```bash
			`curl -s "http://localhost:12000/routing/v1/messages" \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "gpt-4o-mini",`
			`"max_tokens": 1024,`
			`"messages": [`
			`{"role": "user", "content": "Create a REST API endpoint in Rust using actix-web"}`
			`]`
			`}'`
			```

			You should see the router select the appropriate model based on the routing preferences defined in `plano_config_local.yaml`.

update readme for preference based routing (#496) 2025-05-30 18:09:10 -07:00			`# Testing out preference based routing`

			`We have defined two routes 1. code generation and 2. code understanding`

			`For code generation query LLM that is better suited for code generation wil handle the request,`


			`If you look at the logs you'd see that code generation llm was selected,`

			```
			`...`
			`2025-05-31T01:02:19.382716Z INFO brightstaff::router::llm_router: router response: {'route': 'code_generation'}, response time: 203ms`
			`...`
			```

			`<img width="1036" alt="image" src="https://github.com/user-attachments/assets/f923944b-ddbe-462e-9fd5-c75504adc8cf" />`

			`Now if you ask for query related to code understanding you'd see llm that is better suited to handle code understanding in handled,`

			```
			`...`
			`2025-05-31T01:06:33.555680Z INFO brightstaff::router::llm_router: router response: {'route': 'code_understanding'}, response time: 327ms`
			`...`
			```

			`<img width="1081" alt="image" src="https://github.com/user-attachments/assets/e50d167c-46a0-4e3a-ba77-e84db1bd376d" />`