matching the LLM routing section on the README.md to the docs

This commit is contained in:
Salman Paracha 2025-09-18 23:23:26 -07:00
parent f7c9d04da9
commit 3925836858
2 changed files with 120 additions and 47 deletions

View file

@ -147,15 +147,15 @@ llm_providers:
access_key: $ANTHROPIC_API_KEY
model_aliases:
- alias: fast-model
models:
- openai/gpt-4o-mini
- anthropic/claude-3-5-haiku-20241022
# Model aliases - friendly names that map to actual model names
fast-model:
target: gpt-4o-mini
- alias: reasoning-model
models:
- openai/gpt-4o
- anthropic/claude-3-5-sonnet-20241022
reasoning-model:
target: gpt-4o
creative-model:
target: claude-3-5-sonnet-20241022
```
Use semantic aliases in your application code:

View file

@ -5,10 +5,11 @@ LLM Routing
With the rapid proliferation of large language models (LLM) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models.
Arch provides two distinct routing approaches to meet different use cases:
Arch provides three distinct routing approaches to meet different use cases:
1. **Static Model Selection**: Direct routing to specific models based on provider configuration and model aliases
2. **Preference-Aligned Dynamic Routing**: Intelligent routing using the Arch-Router model based on context and user-defined preferences
1. **Model-based Routing**: Direct routing to specific models using provider/model names
2. **Alias-based Routing**: Semantic routing using custom aliases that map to underlying models
3. **Preference-aligned Routing**: Intelligent routing using the Arch-Router model based on context and user-defined preferences
This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.
@ -16,38 +17,44 @@ This enables optimal performance, cost efficiency, and response quality by match
Routing Methods
---------------
**Static Model Selection**
**Model-based Routing**
Static routing allows you to directly specify which model to use, either through:
Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
- **Direct Model Names**: Use provider-specific names like ``openai/gpt-4o-mini``
- **Model Aliases**: Use semantic names like ``fast-model`` or ``arch.summarize.v1`` (see :ref:`model_aliases`)
- Use provider-specific names like ``openai/gpt-4o`` or ``anthropic/claude-3-5-sonnet-20241022``
- Provides full control and transparency over which model handles each request
- Ideal for production workloads where you want predictable routing behavior
This approach is ideal when you know exactly which model you want to use for specific tasks or when implementing your own routing logic at the application level.
**Alias-based Routing**
**Preference-Aligned Dynamic Routing (Arch-Router)**
Alias-based routing lets you create semantic model names that decouple your application from specific providers:
Dynamic routing uses the Arch-Router model to automatically select the most appropriate LLM for each request based on:
- Use meaningful names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1`` (see :ref:`model_aliases`)
- Maps semantic names to underlying provider models for easier experimentation and provider switching
- Ideal for applications that want abstraction from specific model names while maintaining control
**Preference-aligned Routing (Arch-Router)**
Intelligent routing uses the Arch-Router model to automatically select the most appropriate LLM based on:
- **Domain Analysis**: Identifies the subject matter (e.g., legal, healthcare, programming)
- **Action Classification**: Determines the type of operation (e.g., summarization, code generation, translation)
- **User-Defined Preferences**: Maps domains and actions to preferred models
This approach is ideal when you want intelligent, context-aware routing that adapts to the content and intent of each request.
- Ideal for dynamic, context-aware routing that adapts to request content and intent
Static Model Selection Workflow
--------------------------------
Model-based Routing Workflow
----------------------------
For static routing, the process is straightforward:
For direct model routing, the process is straightforward:
#. **Client Request**
The client specifies the exact model to use, either by provider name (``openai/gpt-4o``) or alias (``fast-model``).
The client specifies the exact model using provider/model format (``openai/gpt-4o``).
#. **Model Resolution**
#. **Provider Validation**
If using an alias, Arch resolves it to the actual provider model name.
Arch validates that the specified provider and model are configured and available.
#. **Direct Routing**
@ -58,8 +65,34 @@ For static routing, the process is straightforward:
The response is returned to the client with optional metadata about the routing decision.
Preference-Aligned Dynamic Routing Workflow (Arch-Router)
---------------------------------------
Alias-based Routing Workflow
-----------------------------
For alias-based routing, the process includes name resolution:
#. **Client Request**
The client specifies a semantic alias name (``reasoning-model``).
#. **Alias Resolution**
Arch resolves the alias to the actual provider/model name based on configuration.
#. **Model Selection**
If the alias maps to multiple models, Arch selects one based on availability and load balancing.
#. **Request Forwarding**
The request is forwarded to the resolved model.
#. **Response Handling**
The response is returned with optional metadata about the alias resolution.
Preference-aligned Routing Workflow (Arch-Router)
-------------------------------------------------
For preference-aligned dynamic routing, the process involves intelligent analysis:
@ -105,12 +138,12 @@ In summary, Arch-Router demonstrates:
Implementing Routing
--------------------
**Static Model Selection**
**Model-based Routing**
For static routing, simply configure your LLM providers and optionally define model aliases:
For direct model routing, configure your LLM providers with specific provider/model names:
.. code-block:: yaml
:caption: Static Routing Configuration
:caption: Model-based Routing Configuration
listeners:
egress_traffic:
@ -130,32 +163,72 @@ For static routing, simply configure your LLM providers and optionally define mo
- model: anthropic/claude-3-5-sonnet-20241022
access_key: $ANTHROPIC_API_KEY
# Optional: Define aliases for easier client usage
model_aliases:
fast-model:
target: gpt-4o-mini
smart-model:
target: gpt-4o
creative-model:
target: claude-3-5-sonnet-20241022
Clients can then specify models directly:
Clients specify exact models:
.. code-block:: python
# Using provider model names
# Direct provider/model specification
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
# Using aliases
response = client.chat.completions.create(
model="fast-model",
messages=[{"role": "user", "content": "Hello!"}]
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Write a story"}]
)
**Preference-Aligned Dynamic Routing (Arch-Router)**
**Alias-based Routing**
Configure semantic aliases that map to underlying models:
.. code-block:: yaml
:caption: Alias-based Routing Configuration
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
- model: anthropic/claude-3-5-sonnet-20241022
access_key: $ANTHROPIC_API_KEY
model_aliases:
# Model aliases - friendly names that map to actual provider names
fast-model:
target: gpt-4o-mini
reasoning-model:
target: gpt-4o
creative-model:
target: claude-3-5-sonnet-20241022
Clients use semantic names:
.. code-block:: python
# Using semantic aliases
response = client.chat.completions.create(
model="fast-model", # Routes to best available fast model
messages=[{"role": "user", "content": "Quick summary please"}]
)
response = client.chat.completions.create(
model="reasoning-model", # Routes to best reasoning model
messages=[{"role": "user", "content": "Solve this complex problem"}]
)
**Preference-aligned Routing (Arch-Router)**
To configure preference-aligned dynamic routing, you need to define routing preferences that map domains and actions to specific models:
@ -227,7 +300,7 @@ You can combine static model selection with dynamic routing preferences for maxi
description: creative writing and content generation
model_aliases:
# Static aliases for direct routing
# Model aliases - friendly names that map to actual provider names
fast-model:
target: gpt-4o-mini