plano/docs/source/concepts/llm_providers/model_aliases.rst

.. _model_aliases:

Model Aliases
=============

Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like ``gpt-4o-mini`` or ``claude-3-5-sonnet-20241022``, you can create meaningful aliases like ``fast-model`` or ``arch.summarize.v1``.

**Benefits of Model Aliases:**

- **Semantic Naming**: Use descriptive names that reflect the model's purpose
- **Version Control**: Implement versioning schemes (e.g., ``v1``, ``v2``) for model upgrades
- **Environment Management**: Different aliases can point to different models across environments
- **Client Simplification**: Clients use consistent, meaningful names regardless of underlying provider
- **Advanced Routing (Coming Soon)**: Enable guardrails, fallbacks, and traffic splitting at the alias level

Basic Configuration
-------------------

**Simple Alias Mapping**

.. code-block:: yaml
    :caption: Basic Model Aliases

    llm_providers:
      - model: openai/gpt-4o-mini
        access_key: $OPENAI_API_KEY

      - model: openai/gpt-4o
        access_key: $OPENAI_API_KEY

      - model: anthropic/claude-3-5-sonnet-20241022
        access_key: $ANTHROPIC_API_KEY

      - model: ollama/llama3.1
        base_url: http://host.docker.internal:11434

    # Define aliases that map to the models above
    model_aliases:
      # Semantic versioning approach
      arch.summarize.v1:
        target: gpt-4o-mini

      arch.reasoning.v1:
        target: gpt-4o

      arch.creative.v1:
        target: claude-3-5-sonnet-20241022

      # Functional aliases
      fast-model:
        target: gpt-4o-mini

      smart-model:
        target: gpt-4o

      creative-model:
        target: claude-3-5-sonnet-20241022

      # Local model alias
      local-chat:
        target: llama3.1

Using Aliases
-------------

**Client Code Examples**

Once aliases are configured, clients can use semantic names instead of provider-specific model names:

.. code-block:: python
    :caption: Python Client Usage

    from openai import OpenAI

    client = OpenAI(base_url="http://127.0.0.1:12000/")

    # Use semantic alias instead of provider model name
    response = client.chat.completions.create(
        model="arch.summarize.v1",  # Points to gpt-4o-mini
        messages=[{"role": "user", "content": "Summarize this document..."}]
    )

    # Switch to a different capability
    response = client.chat.completions.create(
        model="arch.reasoning.v1",  # Points to gpt-4o
        messages=[{"role": "user", "content": "Solve this complex problem..."}]
    )

.. code-block:: bash
    :caption: cURL Example

    curl -X POST http://127.0.0.1:12000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "fast-model",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'

Naming Best Practices
---------------------

**Semantic Versioning**

Use version numbers for backward compatibility and gradual model upgrades:

.. code-block:: yaml

    model_aliases:
      # Current production version
      arch.summarize.v1:
        target: gpt-4o-mini

      # Beta version for testing
      arch.summarize.v2:
        target: gpt-4o

      # Stable alias that always points to latest
      arch.summarize.latest:
        target: gpt-4o-mini

**Purpose-Based Naming**

Create aliases that reflect the intended use case:

.. code-block:: yaml

    model_aliases:
      # Task-specific
      code-reviewer:
        target: gpt-4o

      document-summarizer:
        target: gpt-4o-mini

      creative-writer:
        target: claude-3-5-sonnet-20241022

      data-analyst:
        target: gpt-4o

**Environment-Specific Aliases**

Different environments can use different underlying models:

.. code-block:: yaml

    model_aliases:
      # Development environment - use faster/cheaper models
      dev.chat.v1:
        target: gpt-4o-mini

      # Production environment - use more capable models
      prod.chat.v1:
        target: gpt-4o

      # Staging environment - test new models
      staging.chat.v1:
        target: claude-3-5-sonnet-20241022

Advanced Features (Coming Soon)
--------------------------------

The following features are planned for future releases of model aliases:

**Guardrails Integration**

Apply safety, cost, or latency rules at the alias level:

.. code-block:: yaml
    :caption: Future Feature - Guardrails

    model_aliases:
      arch.reasoning.v1:
        target: gpt-oss-120b
        guardrails:
          max_latency: 5s
          max_cost_per_request: 0.10
          block_categories: ["jailbreak", "PII"]
          content_filters:
            - type: "profanity"
            - type: "sensitive_data"

**Fallback Chains**

Provide a chain of models if the primary target fails or hits quota limits:

.. code-block:: yaml
    :caption: Future Feature - Fallbacks

    model_aliases:
      arch.summarize.v1:
        target: gpt-4o-mini
        fallbacks:
          - target: llama3.1
            conditions: ["quota_exceeded", "timeout"]
          - target: claude-3-haiku-20240307
            conditions: ["primary_and_first_fallback_failed"]

**Traffic Splitting & Canary Deployments**

Distribute traffic across multiple models for A/B testing or gradual rollouts:

.. code-block:: yaml
    :caption: Future Feature - Traffic Splitting

    model_aliases:
      arch.v1:
        targets:
          - model: llama3.1
            weight: 80
          - model: gpt-4o-mini
            weight: 20

      # Canary deployment
      arch.experimental.v1:
        targets:
          - model: gpt-4o      # Current stable
            weight: 95
          - model: o1-preview  # New model being tested
            weight: 5

**Load Balancing**

Distribute requests across multiple instances of the same model:

.. code-block:: yaml
    :caption: Future Feature - Load Balancing

    model_aliases:
      high-throughput-chat:
        load_balance:
          algorithm: "round_robin"  # or "least_connections", "weighted"
        targets:
          - model: gpt-4o-mini
            endpoint: "https://api-1.example.com"
          - model: gpt-4o-mini
            endpoint: "https://api-2.example.com"
          - model: gpt-4o-mini
            endpoint: "https://api-3.example.com"


Validation Rules
----------------

- Alias names must be valid identifiers (alphanumeric, dots, hyphens, underscores)
- Target models must be defined in the ``llm_providers`` section
- Circular references between aliases are not allowed
- Weights in traffic splitting must sum to 100

See Also
--------

- :ref:`llm_providers` - Learn about configuring LLM providers
- :ref:`llm_router` - Understand how aliases work with intelligent routing
Salmanap/fix docs new providers model alias (#571) * fixed docs and added ollama as a first-class LLM provider * matching the LLM routing section on the README.md to the docs * updated the section on preference-based routing --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-167.local> 2025-09-19 10:19:57 -07:00			`.. _model_aliases:`

			`Model Aliases`
			`=============`

			Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like ``gpt-4o-mini`` or ``claude-3-5-sonnet-20241022``, you can create meaningful aliases like ``fast-model`` or ``arch.summarize.v1``.

			`Benefits of Model Aliases:`

			`- Semantic Naming: Use descriptive names that reflect the model's purpose`
			- Version Control: Implement versioning schemes (e.g., ``v1``, ``v2``) for model upgrades
			`- Environment Management: Different aliases can point to different models across environments`
			`- Client Simplification: Clients use consistent, meaningful names regardless of underlying provider`
			`- Advanced Routing (Coming Soon): Enable guardrails, fallbacks, and traffic splitting at the alias level`

			`Basic Configuration`
			`-------------------`

			`Simple Alias Mapping`

			`.. code-block:: yaml`
			`:caption: Basic Model Aliases`

			`llm_providers:`
			`- model: openai/gpt-4o-mini`
			`access_key: $OPENAI_API_KEY`

			`- model: openai/gpt-4o`
			`access_key: $OPENAI_API_KEY`

			`- model: anthropic/claude-3-5-sonnet-20241022`
			`access_key: $ANTHROPIC_API_KEY`

			`- model: ollama/llama3.1`
			`base_url: http://host.docker.internal:11434`

			`# Define aliases that map to the models above`
			`model_aliases:`
			`# Semantic versioning approach`
			`arch.summarize.v1:`
			`target: gpt-4o-mini`

			`arch.reasoning.v1:`
			`target: gpt-4o`

			`arch.creative.v1:`
			`target: claude-3-5-sonnet-20241022`

			`# Functional aliases`
			`fast-model:`
			`target: gpt-4o-mini`

			`smart-model:`
			`target: gpt-4o`

			`creative-model:`
			`target: claude-3-5-sonnet-20241022`

			`# Local model alias`
			`local-chat:`
			`target: llama3.1`

			`Using Aliases`
			`-------------`

			`Client Code Examples`

			`Once aliases are configured, clients can use semantic names instead of provider-specific model names:`

			`.. code-block:: python`
			`:caption: Python Client Usage`

			`from openai import OpenAI`

			`client = OpenAI(base_url="http://127.0.0.1:12000/")`

			`# Use semantic alias instead of provider model name`
			`response = client.chat.completions.create(`
			`model="arch.summarize.v1", # Points to gpt-4o-mini`
			`messages=[{"role": "user", "content": "Summarize this document..."}]`
			`)`

			`# Switch to a different capability`
			`response = client.chat.completions.create(`
			`model="arch.reasoning.v1", # Points to gpt-4o`
			`messages=[{"role": "user", "content": "Solve this complex problem..."}]`
			`)`

			`.. code-block:: bash`
			`:caption: cURL Example`

			`curl -X POST http://127.0.0.1:12000/v1/chat/completions \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "fast-model",`
			`"messages": [{"role": "user", "content": "Hello!"}]`
			`}'`

			`Naming Best Practices`
			`---------------------`

			`Semantic Versioning`

			`Use version numbers for backward compatibility and gradual model upgrades:`

			`.. code-block:: yaml`

			`model_aliases:`
			`# Current production version`
			`arch.summarize.v1:`
			`target: gpt-4o-mini`

			`# Beta version for testing`
			`arch.summarize.v2:`
			`target: gpt-4o`

			`# Stable alias that always points to latest`
			`arch.summarize.latest:`
			`target: gpt-4o-mini`

			`Purpose-Based Naming`

			`Create aliases that reflect the intended use case:`

			`.. code-block:: yaml`

			`model_aliases:`
			`# Task-specific`
			`code-reviewer:`
			`target: gpt-4o`

			`document-summarizer:`
			`target: gpt-4o-mini`

			`creative-writer:`
			`target: claude-3-5-sonnet-20241022`

			`data-analyst:`
			`target: gpt-4o`

			`Environment-Specific Aliases`

			`Different environments can use different underlying models:`

			`.. code-block:: yaml`

			`model_aliases:`
			`# Development environment - use faster/cheaper models`
			`dev.chat.v1:`
			`target: gpt-4o-mini`

			`# Production environment - use more capable models`
			`prod.chat.v1:`
			`target: gpt-4o`

			`# Staging environment - test new models`
			`staging.chat.v1:`
			`target: claude-3-5-sonnet-20241022`

			`Advanced Features (Coming Soon)`
			`--------------------------------`

			`The following features are planned for future releases of model aliases:`

			`Guardrails Integration`

			`Apply safety, cost, or latency rules at the alias level:`

			`.. code-block:: yaml`
			`:caption: Future Feature - Guardrails`

			`model_aliases:`
			`arch.reasoning.v1:`
			`target: gpt-oss-120b`
			`guardrails:`
			`max_latency: 5s`
			`max_cost_per_request: 0.10`
			`block_categories: ["jailbreak", "PII"]`
			`content_filters:`
			`- type: "profanity"`
			`- type: "sensitive_data"`

			`Fallback Chains`

			`Provide a chain of models if the primary target fails or hits quota limits:`

			`.. code-block:: yaml`
			`:caption: Future Feature - Fallbacks`

			`model_aliases:`
			`arch.summarize.v1:`
			`target: gpt-4o-mini`
			`fallbacks:`
			`- target: llama3.1`
			`conditions: ["quota_exceeded", "timeout"]`
			`- target: claude-3-haiku-20240307`
			`conditions: ["primary_and_first_fallback_failed"]`

			`Traffic Splitting & Canary Deployments`

			`Distribute traffic across multiple models for A/B testing or gradual rollouts:`

			`.. code-block:: yaml`
			`:caption: Future Feature - Traffic Splitting`

			`model_aliases:`
			`arch.v1:`
			`targets:`
			`- model: llama3.1`
			`weight: 80`
			`- model: gpt-4o-mini`
			`weight: 20`

			`# Canary deployment`
			`arch.experimental.v1:`
			`targets:`
			`- model: gpt-4o # Current stable`
			`weight: 95`
			`- model: o1-preview # New model being tested`
			`weight: 5`

			`Load Balancing`

			`Distribute requests across multiple instances of the same model:`

			`.. code-block:: yaml`
			`:caption: Future Feature - Load Balancing`

			`model_aliases:`
			`high-throughput-chat:`
			`load_balance:`
			`algorithm: "round_robin" # or "least_connections", "weighted"`
			`targets:`
			`- model: gpt-4o-mini`
			`endpoint: "https://api-1.example.com"`
			`- model: gpt-4o-mini`
			`endpoint: "https://api-2.example.com"`
			`- model: gpt-4o-mini`
			`endpoint: "https://api-3.example.com"`


			`Validation Rules`
			`----------------`

			`- Alias names must be valid identifiers (alphanumeric, dots, hyphens, underscores)`
			- Target models must be defined in the ``llm_providers`` section
			`- Circular references between aliases are not allowed`
			`- Weights in traffic splitting must sum to 100`

			`See Also`
			`--------`

			- :ref:`llm_providers` - Learn about configuring LLM providers
			- :ref:`llm_router` - Understand how aliases work with intelligent routing