updated the section on preference-based routing

This commit is contained in:
Salman Paracha 2025-09-18 23:55:34 -07:00
parent 3925836858
commit acc3803a02
2 changed files with 54 additions and 44 deletions

View file

@ -16,27 +16,20 @@ Core Capabilities
-----------------
**Multi-Provider Support**
Connect to any combination of providers simultaneously:
Connect to any combination of providers simultaneously (see :ref:`supported_providers` for full details):
- **First-Class Providers**: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama
- **OpenAI-Compatible Providers**: Support for any provider implementing OpenAI's API interface
- **OpenAI-Compatible Providers**: Any provider implementing the OpenAI Chat Completions API standard
**Intelligent Routing**
Two powerful routing approaches to optimize model selection:
Three powerful routing approaches to optimize model selection:
- **Static Model Selection**: Direct routing using provider names or semantic model aliases
- **Preference-Aligned Dynamic Routing**: Intelligent, context-aware routing using the Arch-Router model that analyzes prompts and selects optimal models based on domain and action preferences
**Model Aliases & Management**
Create semantic, version-controlled names for simplified model management:
- **Semantic Naming**: Use descriptive names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1``
- **Environment Management**: Different aliases for dev/staging/production environments
- **Version Control**: Implement versioning schemes for gradual model upgrades
- **Future Features**: Planned support for guardrails, fallback chains, and traffic splitting
- **Model-based Routing**: Direct routing to specific models using provider/model names (see :ref:`supported_providers`)
- **Alias-based Routing**: Semantic routing using custom aliases (see :ref:`model_aliases`)
- **Preference-aligned Routing**: Intelligent routing using the Arch-Router model (see :ref:`preference_aligned_routing`)
**Unified Client Interface**
Use your preferred client library without changing existing code:
Use your preferred client library without changing existing code (see :ref:`client_libraries` for details):
- **OpenAI Python SDK**: Full compatibility with all providers
- **Anthropic Python SDK**: Native support with cross-provider capabilities
@ -47,26 +40,12 @@ Key Benefits
------------
- **Provider Flexibility**: Switch between providers without changing client code
- **Intelligent Routing**: Automatically select the best model for each request
- **Three Routing Methods**: Choose from model-based, alias-based, or preference-aligned routing (using `Arch-Router-1.5B <https://huggingface.co/katanemo/Arch-Router-1.5B>`_) strategies
- **Cost Optimization**: Route requests to cost-effective models based on complexity
- **Performance Optimization**: Use fast models for simple tasks, powerful models for complex reasoning
- **Environment Management**: Configure different models for different environments
- **Future-Proof**: Easy to add new providers and upgrade models
Getting Started
---------------
Dive into specific areas based on your needs:
.. toctree::
:maxdepth: 2
supported_providers
client_libraries
model_aliases
**3. Advanced Features**
- **:ref:`llm_router`**: Learn about preference-aligned dynamic routing and intelligent model selection
Common Use Cases
----------------
@ -85,10 +64,17 @@ Common Use Cases
- Apply consistent security and governance policies across all providers
- Scale across regions using different provider endpoints
Next Steps
----------
Advanced Features
-----------------
- :ref:`preference_aligned_routing` - Learn about preference-aligned dynamic routing and intelligent model selection
1. **:ref:`supported_providers`** - See all supported providers, models, and configuration examples
2. **:ref:`client_libraries`** - Start using with your preferred client
3. **:ref:`model_aliases`** - Create semantic model names
4. **:ref:`llm_router`** - Set up intelligent routing
Getting Started
---------------
Dive into specific areas based on your needs:
.. toctree::
:maxdepth: 2
supported_providers
client_libraries
model_aliases

View file

@ -17,7 +17,8 @@ This enables optimal performance, cost efficiency, and response quality by match
Routing Methods
---------------
**Model-based Routing**
Model-based Routing
~~~~~~~~~~~~~~~~~~~
Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
@ -25,7 +26,8 @@ Direct routing allows you to specify exact provider and model combinations using
- Provides full control and transparency over which model handles each request
- Ideal for production workloads where you want predictable routing behavior
**Alias-based Routing**
Alias-based Routing
~~~~~~~~~~~~~~~~~~~
Alias-based routing lets you create semantic model names that decouple your application from specific providers:
@ -33,14 +35,23 @@ Alias-based routing lets you create semantic model names that decouple your appl
- Maps semantic names to underlying provider models for easier experimentation and provider switching
- Ideal for applications that want abstraction from specific model names while maintaining control
**Preference-aligned Routing (Arch-Router)**
.. _preference_aligned_routing:
Intelligent routing uses the Arch-Router model to automatically select the most appropriate LLM based on:
Preference-aligned Routing (Arch-Router)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- **Domain Analysis**: Identifies the subject matter (e.g., legal, healthcare, programming)
- **Action Classification**: Determines the type of operation (e.g., summarization, code generation, translation)
- **User-Defined Preferences**: Maps domains and actions to preferred models
- Ideal for dynamic, context-aware routing that adapts to request content and intent
Traditional LLM routing approaches face significant limitations: they evaluate performance using benchmarks that often fail to capture human preferences, select from fixed model pools, and operate as "black boxes" without practical mechanisms for encoding user preferences.
Arch's preference-aligned routing addresses these challenges by applying a fundamental engineering principle: decoupling. The framework separates route selection (matching queries to human-readable policies) from model assignment (mapping policies to specific LLMs). This separation allows you to define routing policies using descriptive labels like ``Domain: 'finance', Action: 'analyze_earnings_report'`` rather than cryptic identifiers, while independently configuring which models handle each policy.
The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ model automatically selects the most appropriate LLM based on:
- Domain Analysis: Identifies the subject matter (e.g., legal, healthcare, programming)
- Action Classification: Determines the type of operation (e.g., summarization, code generation, translation)
- User-Defined Preferences: Maps domains and actions to preferred models using transparent, configurable routing decisions
- Human Preference Alignment: Uses domain-action mappings that capture subjective evaluation criteria, ensuring routing aligns with real-world user needs rather than just benchmark scores
This approach supports seamlessly adding new models without retraining and is ideal for dynamic, context-aware routing that adapts to request content and intent.
Model-based Routing Workflow
@ -91,6 +102,8 @@ For alias-based routing, the process includes name resolution:
The response is returned with optional metadata about the alias resolution.
.. _preference_aligned_routing_workflow:
Preference-aligned Routing Workflow (Arch-Router)
-------------------------------------------------
@ -114,7 +127,18 @@ For preference-aligned dynamic routing, the process involves intelligent analysi
Arch-Router
-------------------------
The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ is a state-of-the-art **preference-based routing model** specifically designed for intelligent LLM selection. This model delivers production-ready performance with low latency and high accuracy.
The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ is a state-of-the-art **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
**Addressing Traditional Routing Limitations:**
**Human Preference Alignment**
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
**Flexible Model Integration**
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
**Preference-Encoded Routing**
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
To support effective routing, Arch-Router introduces two key concepts: