.. _model_aliases: Model Aliases ============= Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like ``gpt-4o-mini`` or ``claude-3-5-sonnet-20241022``, you can create meaningful aliases like ``fast-model`` or ``arch.summarize.v1``. **Benefits of Model Aliases:** - **Semantic Naming**: Use descriptive names that reflect the model's purpose - **Version Control**: Implement versioning schemes (e.g., ``v1``, ``v2``) for model upgrades - **Environment Management**: Different aliases can point to different models across environments - **Client Simplification**: Clients use consistent, meaningful names regardless of underlying provider - **Advanced Routing (Coming Soon)**: Enable guardrails, fallbacks, and traffic splitting at the alias level Basic Configuration ------------------- **Simple Alias Mapping** .. code-block:: yaml :caption: Basic Model Aliases llm_providers: - model: openai/gpt-4o-mini access_key: $OPENAI_API_KEY - model: openai/gpt-4o access_key: $OPENAI_API_KEY - model: anthropic/claude-3-5-sonnet-20241022 access_key: $ANTHROPIC_API_KEY - model: ollama/llama3.1 base_url: http://host.docker.internal:11434 # Define aliases that map to the models above model_aliases: # Semantic versioning approach arch.summarize.v1: target: gpt-4o-mini arch.reasoning.v1: target: gpt-4o arch.creative.v1: target: claude-3-5-sonnet-20241022 # Functional aliases fast-model: target: gpt-4o-mini smart-model: target: gpt-4o creative-model: target: claude-3-5-sonnet-20241022 # Local model alias local-chat: target: llama3.1 Using Aliases ------------- **Client Code Examples** Once aliases are configured, clients can use semantic names instead of provider-specific model names: .. code-block:: python :caption: Python Client Usage from openai import OpenAI client = OpenAI(base_url="http://127.0.0.1:12000/") # Use semantic alias instead of provider model name response = client.chat.completions.create( model="arch.summarize.v1", # Points to gpt-4o-mini messages=[{"role": "user", "content": "Summarize this document..."}] ) # Switch to a different capability response = client.chat.completions.create( model="arch.reasoning.v1", # Points to gpt-4o messages=[{"role": "user", "content": "Solve this complex problem..."}] ) .. code-block:: bash :caption: cURL Example curl -X POST http://127.0.0.1:12000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "fast-model", "messages": [{"role": "user", "content": "Hello!"}] }' Naming Best Practices --------------------- **Semantic Versioning** Use version numbers for backward compatibility and gradual model upgrades: .. code-block:: yaml model_aliases: # Current production version arch.summarize.v1: target: gpt-4o-mini # Beta version for testing arch.summarize.v2: target: gpt-4o # Stable alias that always points to latest arch.summarize.latest: target: gpt-4o-mini **Purpose-Based Naming** Create aliases that reflect the intended use case: .. code-block:: yaml model_aliases: # Task-specific code-reviewer: target: gpt-4o document-summarizer: target: gpt-4o-mini creative-writer: target: claude-3-5-sonnet-20241022 data-analyst: target: gpt-4o **Environment-Specific Aliases** Different environments can use different underlying models: .. code-block:: yaml model_aliases: # Development environment - use faster/cheaper models dev.chat.v1: target: gpt-4o-mini # Production environment - use more capable models prod.chat.v1: target: gpt-4o # Staging environment - test new models staging.chat.v1: target: claude-3-5-sonnet-20241022 Advanced Features (Coming Soon) -------------------------------- The following features are planned for future releases of model aliases: **Guardrails Integration** Apply safety, cost, or latency rules at the alias level: .. code-block:: yaml :caption: Future Feature - Guardrails model_aliases: arch.reasoning.v1: target: gpt-oss-120b guardrails: max_latency: 5s max_cost_per_request: 0.10 block_categories: ["jailbreak", "PII"] content_filters: - type: "profanity" - type: "sensitive_data" **Fallback Chains** Provide a chain of models if the primary target fails or hits quota limits: .. code-block:: yaml :caption: Future Feature - Fallbacks model_aliases: arch.summarize.v1: target: gpt-4o-mini fallbacks: - target: llama3.1 conditions: ["quota_exceeded", "timeout"] - target: claude-3-haiku-20240307 conditions: ["primary_and_first_fallback_failed"] **Traffic Splitting & Canary Deployments** Distribute traffic across multiple models for A/B testing or gradual rollouts: .. code-block:: yaml :caption: Future Feature - Traffic Splitting model_aliases: arch.v1: targets: - model: llama3.1 weight: 80 - model: gpt-4o-mini weight: 20 # Canary deployment arch.experimental.v1: targets: - model: gpt-4o # Current stable weight: 95 - model: o1-preview # New model being tested weight: 5 **Load Balancing** Distribute requests across multiple instances of the same model: .. code-block:: yaml :caption: Future Feature - Load Balancing model_aliases: high-throughput-chat: load_balance: algorithm: "round_robin" # or "least_connections", "weighted" targets: - model: gpt-4o-mini endpoint: "https://api-1.example.com" - model: gpt-4o-mini endpoint: "https://api-2.example.com" - model: gpt-4o-mini endpoint: "https://api-3.example.com" Validation Rules ---------------- - Alias names must be valid identifiers (alphanumeric, dots, hyphens, underscores) - Target models must be defined in the ``llm_providers`` section - Circular references between aliases are not allowed - Weights in traffic splitting must sum to 100 See Also -------- - :ref:`llm_providers` - Learn about configuring LLM providers - :ref:`llm_router` - Understand how aliases work with intelligent routing