plano/docs/source/concepts/llm_providers/model_aliases.rst

255 lines
6.7 KiB
ReStructuredText
Raw Normal View History

.. _model_aliases:
Model Aliases
=============
Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like ``gpt-4o-mini`` or ``claude-3-5-sonnet-20241022``, you can create meaningful aliases like ``fast-model`` or ``arch.summarize.v1``.
**Benefits of Model Aliases:**
- **Semantic Naming**: Use descriptive names that reflect the model's purpose
- **Version Control**: Implement versioning schemes (e.g., ``v1``, ``v2``) for model upgrades
- **Environment Management**: Different aliases can point to different models across environments
- **Client Simplification**: Clients use consistent, meaningful names regardless of underlying provider
- **Advanced Routing (Coming Soon)**: Enable guardrails, fallbacks, and traffic splitting at the alias level
Basic Configuration
-------------------
**Simple Alias Mapping**
.. code-block:: yaml
:caption: Basic Model Aliases
llm_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
- model: anthropic/claude-3-5-sonnet-20241022
access_key: $ANTHROPIC_API_KEY
- model: ollama/llama3.1
base_url: http://host.docker.internal:11434
# Define aliases that map to the models above
model_aliases:
# Semantic versioning approach
arch.summarize.v1:
target: gpt-4o-mini
arch.reasoning.v1:
target: gpt-4o
arch.creative.v1:
target: claude-3-5-sonnet-20241022
# Functional aliases
fast-model:
target: gpt-4o-mini
smart-model:
target: gpt-4o
creative-model:
target: claude-3-5-sonnet-20241022
# Local model alias
local-chat:
target: llama3.1
Using Aliases
-------------
**Client Code Examples**
Once aliases are configured, clients can use semantic names instead of provider-specific model names:
.. code-block:: python
:caption: Python Client Usage
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:12000/")
# Use semantic alias instead of provider model name
response = client.chat.completions.create(
model="arch.summarize.v1", # Points to gpt-4o-mini
messages=[{"role": "user", "content": "Summarize this document..."}]
)
# Switch to a different capability
response = client.chat.completions.create(
model="arch.reasoning.v1", # Points to gpt-4o
messages=[{"role": "user", "content": "Solve this complex problem..."}]
)
.. code-block:: bash
:caption: cURL Example
curl -X POST http://127.0.0.1:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fast-model",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Naming Best Practices
---------------------
**Semantic Versioning**
Use version numbers for backward compatibility and gradual model upgrades:
.. code-block:: yaml
model_aliases:
# Current production version
arch.summarize.v1:
target: gpt-4o-mini
# Beta version for testing
arch.summarize.v2:
target: gpt-4o
# Stable alias that always points to latest
arch.summarize.latest:
target: gpt-4o-mini
**Purpose-Based Naming**
Create aliases that reflect the intended use case:
.. code-block:: yaml
model_aliases:
# Task-specific
code-reviewer:
target: gpt-4o
document-summarizer:
target: gpt-4o-mini
creative-writer:
target: claude-3-5-sonnet-20241022
data-analyst:
target: gpt-4o
**Environment-Specific Aliases**
Different environments can use different underlying models:
.. code-block:: yaml
model_aliases:
# Development environment - use faster/cheaper models
dev.chat.v1:
target: gpt-4o-mini
# Production environment - use more capable models
prod.chat.v1:
target: gpt-4o
# Staging environment - test new models
staging.chat.v1:
target: claude-3-5-sonnet-20241022
Advanced Features (Coming Soon)
--------------------------------
The following features are planned for future releases of model aliases:
**Guardrails Integration**
Apply safety, cost, or latency rules at the alias level:
.. code-block:: yaml
:caption: Future Feature - Guardrails
model_aliases:
arch.reasoning.v1:
target: gpt-oss-120b
guardrails:
max_latency: 5s
max_cost_per_request: 0.10
block_categories: ["jailbreak", "PII"]
content_filters:
- type: "profanity"
- type: "sensitive_data"
**Fallback Chains**
Provide a chain of models if the primary target fails or hits quota limits:
.. code-block:: yaml
:caption: Future Feature - Fallbacks
model_aliases:
arch.summarize.v1:
target: gpt-4o-mini
fallbacks:
- target: llama3.1
conditions: ["quota_exceeded", "timeout"]
- target: claude-3-haiku-20240307
conditions: ["primary_and_first_fallback_failed"]
**Traffic Splitting & Canary Deployments**
Distribute traffic across multiple models for A/B testing or gradual rollouts:
.. code-block:: yaml
:caption: Future Feature - Traffic Splitting
model_aliases:
arch.v1:
targets:
- model: llama3.1
weight: 80
- model: gpt-4o-mini
weight: 20
# Canary deployment
arch.experimental.v1:
targets:
- model: gpt-4o # Current stable
weight: 95
- model: o1-preview # New model being tested
weight: 5
**Load Balancing**
Distribute requests across multiple instances of the same model:
.. code-block:: yaml
:caption: Future Feature - Load Balancing
model_aliases:
high-throughput-chat:
load_balance:
algorithm: "round_robin" # or "least_connections", "weighted"
targets:
- model: gpt-4o-mini
endpoint: "https://api-1.example.com"
- model: gpt-4o-mini
endpoint: "https://api-2.example.com"
- model: gpt-4o-mini
endpoint: "https://api-3.example.com"
Validation Rules
----------------
- Alias names must be valid identifiers (alphanumeric, dots, hyphens, underscores)
- Target models must be defined in the ``llm_providers`` section
- Circular references between aliases are not allowed
- Weights in traffic splitting must sum to 100
See Also
--------
- :ref:`llm_providers` - Learn about configuring LLM providers
- :ref:`llm_router` - Understand how aliases work with intelligent routing