plano/skills/rules/routing-passthrough.md

70 lines
2.3 KiB
Markdown
Raw Normal View History

---
title: Use Passthrough Auth for Proxy and Multi-Tenant Setups
impact: MEDIUM
impactDescription: Without passthrough auth, self-hosted proxy services (LiteLLM, vLLM, etc.) reject Plano's requests because the wrong Authorization header is sent
tags: routing, authentication, proxy, litellm, multi-tenant
---
## Use Passthrough Auth for Proxy and Multi-Tenant Setups
When routing to a self-hosted LLM proxy (LiteLLM, vLLM, OpenRouter, Azure APIM) or in multi-tenant setups where clients supply their own keys, set `passthrough_auth: true`. This forwards the client's `Authorization` header rather than Plano's configured `access_key`. Combine with a `base_url` pointing to the proxy.
**Incorrect (Plano sends its own key to a proxy that expects the client's key):**
```yaml
model_providers:
- model: custom/proxy
base_url: http://host.docker.internal:8000
access_key: $SOME_KEY # Plano overwrites the client's auth — proxy rejects it
```
**Correct (forward client Authorization header to the proxy):**
```yaml
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: custom/litellm-proxy
base_url: http://host.docker.internal:4000 # LiteLLM server
provider_interface: openai # LiteLLM uses OpenAI format
passthrough_auth: true # Forward client's Bearer token
default: true
```
**Multi-tenant pattern (client supplies their own API key):**
```yaml
model_providers:
# Plano acts as a passthrough gateway; each client has their own OpenAI key
- model: openai/gpt-4o
passthrough_auth: true # No access_key here — client's key is forwarded
default: true
```
**Combined: proxy for some models, Plano-managed for others:**
```yaml
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY # Plano manages this key
default: true
routing_preferences:
- name: quick tasks
description: Short answers, simple lookups, fast completions
- model: custom/vllm-llama
base_url: http://gpu-server:8000
provider_interface: openai
passthrough_auth: true # vLLM cluster handles its own auth
routing_preferences:
- name: long context
description: Processing very long documents, multi-document analysis
```
Reference: https://github.com/katanemo/archgw