plano/skills/rules/routing-passthrough.md

---
title: Use Passthrough Auth for Proxy and Multi-Tenant Setups
impact: MEDIUM
impactDescription: Without passthrough auth, self-hosted proxy services (LiteLLM, vLLM, etc.) reject Plano's requests because the wrong Authorization header is sent
tags: routing, authentication, proxy, litellm, multi-tenant
---

## Use Passthrough Auth for Proxy and Multi-Tenant Setups

When routing to a self-hosted LLM proxy (LiteLLM, vLLM, OpenRouter, Azure APIM) or in multi-tenant setups where clients supply their own keys, set `passthrough_auth: true`. This forwards the client's `Authorization` header rather than Plano's configured `access_key`. Combine with a `base_url` pointing to the proxy.

**Incorrect (Plano sends its own key to a proxy that expects the client's key):**

```yaml
model_providers:
  - model: custom/proxy
    base_url: http://host.docker.internal:8000
    access_key: $SOME_KEY    # Plano overwrites the client's auth — proxy rejects it
```

**Correct (forward client Authorization header to the proxy):**

```yaml
version: v0.3.0

listeners:
  - type: model
    name: model_listener
    port: 12000

model_providers:
  - model: custom/litellm-proxy
    base_url: http://host.docker.internal:4000    # LiteLLM server
    provider_interface: openai                    # LiteLLM uses OpenAI format
    passthrough_auth: true                        # Forward client's Bearer token
    default: true
```

**Multi-tenant pattern (client supplies their own API key):**

```yaml
model_providers:
  # Plano acts as a passthrough gateway; each client has their own OpenAI key
  - model: openai/gpt-4o
    passthrough_auth: true    # No access_key here — client's key is forwarded
    default: true
```

**Combined: proxy for some models, Plano-managed for others:**

```yaml
model_providers:
  - model: openai/gpt-4o-mini
    access_key: $OPENAI_API_KEY    # Plano manages this key
    default: true
    routing_preferences:
      - name: quick tasks
        description: Short answers, simple lookups, fast completions

  - model: custom/vllm-llama
    base_url: http://gpu-server:8000
    provider_interface: openai
    passthrough_auth: true         # vLLM cluster handles its own auth
    routing_preferences:
      - name: long context
        description: Processing very long documents, multi-document analysis
```

Reference: https://github.com/katanemo/archgw
add Plano agent skills framework and rule set (#797) * feat: add initial documentation for Plano Agent Skills * feat: readme with examples * feat: add detailed skills documentation and examples for Plano --------- Co-authored-by: Adil Hafeez <adil.hafeez@gmail.com> 2026-04-16 13:16:51 -07:00			`---`
			`title: Use Passthrough Auth for Proxy and Multi-Tenant Setups`
			`impact: MEDIUM`
			`impactDescription: Without passthrough auth, self-hosted proxy services (LiteLLM, vLLM, etc.) reject Plano's requests because the wrong Authorization header is sent`
			`tags: routing, authentication, proxy, litellm, multi-tenant`
			`---`

			`## Use Passthrough Auth for Proxy and Multi-Tenant Setups`

			When routing to a self-hosted LLM proxy (LiteLLM, vLLM, OpenRouter, Azure APIM) or in multi-tenant setups where clients supply their own keys, set `passthrough_auth: true`. This forwards the client's `Authorization` header rather than Plano's configured `access_key`. Combine with a `base_url` pointing to the proxy.

			`Incorrect (Plano sends its own key to a proxy that expects the client's key):`

			```yaml
			`model_providers:`
			`- model: custom/proxy`
			`base_url: http://host.docker.internal:8000`
			`access_key: $SOME_KEY # Plano overwrites the client's auth — proxy rejects it`
			```

			`Correct (forward client Authorization header to the proxy):`

			```yaml
			`version: v0.3.0`

			`listeners:`
			`- type: model`
			`name: model_listener`
			`port: 12000`

			`model_providers:`
			`- model: custom/litellm-proxy`
			`base_url: http://host.docker.internal:4000 # LiteLLM server`
			`provider_interface: openai # LiteLLM uses OpenAI format`
			`passthrough_auth: true # Forward client's Bearer token`
			`default: true`
			```

			`Multi-tenant pattern (client supplies their own API key):`

			```yaml
			`model_providers:`
			`# Plano acts as a passthrough gateway; each client has their own OpenAI key`
			`- model: openai/gpt-4o`
			`passthrough_auth: true # No access_key here — client's key is forwarded`
			`default: true`
			```

			`Combined: proxy for some models, Plano-managed for others:`

			```yaml
			`model_providers:`
			`- model: openai/gpt-4o-mini`
			`access_key: $OPENAI_API_KEY # Plano manages this key`
			`default: true`
			`routing_preferences:`
			`- name: quick tasks`
			`description: Short answers, simple lookups, fast completions`

			`- model: custom/vllm-llama`
			`base_url: http://gpu-server:8000`
			`provider_interface: openai`
			`passthrough_auth: true # vLLM cluster handles its own auth`
			`routing_preferences:`
			`- name: long context`
			`description: Processing very long documents, multi-document analysis`
			```

			`Reference: https://github.com/katanemo/archgw`