From acc3803a0209f20f0ff61844c8fd51e574dfcff4 Mon Sep 17 00:00:00 2001 From: Salman Paracha Date: Thu, 18 Sep 2025 23:55:34 -0700 Subject: [PATCH] updated the section on preference-based routing --- .../concepts/llm_providers/llm_providers.rst | 56 +++++++------------ docs/source/guides/llm_router.rst | 42 +++++++++++--- 2 files changed, 54 insertions(+), 44 deletions(-) diff --git a/docs/source/concepts/llm_providers/llm_providers.rst b/docs/source/concepts/llm_providers/llm_providers.rst index 5e4ab671..782a7163 100644 --- a/docs/source/concepts/llm_providers/llm_providers.rst +++ b/docs/source/concepts/llm_providers/llm_providers.rst @@ -16,27 +16,20 @@ Core Capabilities ----------------- **Multi-Provider Support** -Connect to any combination of providers simultaneously: +Connect to any combination of providers simultaneously (see :ref:`supported_providers` for full details): - **First-Class Providers**: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama -- **OpenAI-Compatible Providers**: Support for any provider implementing OpenAI's API interface +- **OpenAI-Compatible Providers**: Any provider implementing the OpenAI Chat Completions API standard **Intelligent Routing** -Two powerful routing approaches to optimize model selection: +Three powerful routing approaches to optimize model selection: -- **Static Model Selection**: Direct routing using provider names or semantic model aliases -- **Preference-Aligned Dynamic Routing**: Intelligent, context-aware routing using the Arch-Router model that analyzes prompts and selects optimal models based on domain and action preferences - -**Model Aliases & Management** -Create semantic, version-controlled names for simplified model management: - -- **Semantic Naming**: Use descriptive names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1`` -- **Environment Management**: Different aliases for dev/staging/production environments -- **Version Control**: Implement versioning schemes for gradual model upgrades -- **Future Features**: Planned support for guardrails, fallback chains, and traffic splitting +- **Model-based Routing**: Direct routing to specific models using provider/model names (see :ref:`supported_providers`) +- **Alias-based Routing**: Semantic routing using custom aliases (see :ref:`model_aliases`) +- **Preference-aligned Routing**: Intelligent routing using the Arch-Router model (see :ref:`preference_aligned_routing`) **Unified Client Interface** -Use your preferred client library without changing existing code: +Use your preferred client library without changing existing code (see :ref:`client_libraries` for details): - **OpenAI Python SDK**: Full compatibility with all providers - **Anthropic Python SDK**: Native support with cross-provider capabilities @@ -47,26 +40,12 @@ Key Benefits ------------ - **Provider Flexibility**: Switch between providers without changing client code -- **Intelligent Routing**: Automatically select the best model for each request +- **Three Routing Methods**: Choose from model-based, alias-based, or preference-aligned routing (using `Arch-Router-1.5B `_) strategies - **Cost Optimization**: Route requests to cost-effective models based on complexity - **Performance Optimization**: Use fast models for simple tasks, powerful models for complex reasoning - **Environment Management**: Configure different models for different environments - **Future-Proof**: Easy to add new providers and upgrade models -Getting Started ---------------- -Dive into specific areas based on your needs: - -.. toctree:: - :maxdepth: 2 - - supported_providers - client_libraries - model_aliases - -**3. Advanced Features** -- **:ref:`llm_router`**: Learn about preference-aligned dynamic routing and intelligent model selection - Common Use Cases ---------------- @@ -85,10 +64,17 @@ Common Use Cases - Apply consistent security and governance policies across all providers - Scale across regions using different provider endpoints -Next Steps ----------- +Advanced Features +----------------- +- :ref:`preference_aligned_routing` - Learn about preference-aligned dynamic routing and intelligent model selection -1. **:ref:`supported_providers`** - See all supported providers, models, and configuration examples -2. **:ref:`client_libraries`** - Start using with your preferred client -3. **:ref:`model_aliases`** - Create semantic model names -4. **:ref:`llm_router`** - Set up intelligent routing +Getting Started +--------------- +Dive into specific areas based on your needs: + +.. toctree:: + :maxdepth: 2 + + supported_providers + client_libraries + model_aliases diff --git a/docs/source/guides/llm_router.rst b/docs/source/guides/llm_router.rst index 4f7595a7..963df0f0 100644 --- a/docs/source/guides/llm_router.rst +++ b/docs/source/guides/llm_router.rst @@ -17,7 +17,8 @@ This enables optimal performance, cost efficiency, and response quality by match Routing Methods --------------- -**Model-based Routing** +Model-based Routing +~~~~~~~~~~~~~~~~~~~ Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``: @@ -25,7 +26,8 @@ Direct routing allows you to specify exact provider and model combinations using - Provides full control and transparency over which model handles each request - Ideal for production workloads where you want predictable routing behavior -**Alias-based Routing** +Alias-based Routing +~~~~~~~~~~~~~~~~~~~ Alias-based routing lets you create semantic model names that decouple your application from specific providers: @@ -33,14 +35,23 @@ Alias-based routing lets you create semantic model names that decouple your appl - Maps semantic names to underlying provider models for easier experimentation and provider switching - Ideal for applications that want abstraction from specific model names while maintaining control -**Preference-aligned Routing (Arch-Router)** +.. _preference_aligned_routing: -Intelligent routing uses the Arch-Router model to automatically select the most appropriate LLM based on: +Preference-aligned Routing (Arch-Router) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- **Domain Analysis**: Identifies the subject matter (e.g., legal, healthcare, programming) -- **Action Classification**: Determines the type of operation (e.g., summarization, code generation, translation) -- **User-Defined Preferences**: Maps domains and actions to preferred models -- Ideal for dynamic, context-aware routing that adapts to request content and intent +Traditional LLM routing approaches face significant limitations: they evaluate performance using benchmarks that often fail to capture human preferences, select from fixed model pools, and operate as "black boxes" without practical mechanisms for encoding user preferences. + +Arch's preference-aligned routing addresses these challenges by applying a fundamental engineering principle: decoupling. The framework separates route selection (matching queries to human-readable policies) from model assignment (mapping policies to specific LLMs). This separation allows you to define routing policies using descriptive labels like ``Domain: 'finance', Action: 'analyze_earnings_report'`` rather than cryptic identifiers, while independently configuring which models handle each policy. + +The `Arch-Router `_ model automatically selects the most appropriate LLM based on: + +- Domain Analysis: Identifies the subject matter (e.g., legal, healthcare, programming) +- Action Classification: Determines the type of operation (e.g., summarization, code generation, translation) +- User-Defined Preferences: Maps domains and actions to preferred models using transparent, configurable routing decisions +- Human Preference Alignment: Uses domain-action mappings that capture subjective evaluation criteria, ensuring routing aligns with real-world user needs rather than just benchmark scores + +This approach supports seamlessly adding new models without retraining and is ideal for dynamic, context-aware routing that adapts to request content and intent. Model-based Routing Workflow @@ -91,6 +102,8 @@ For alias-based routing, the process includes name resolution: The response is returned with optional metadata about the alias resolution. +.. _preference_aligned_routing_workflow: + Preference-aligned Routing Workflow (Arch-Router) ------------------------------------------------- @@ -114,7 +127,18 @@ For preference-aligned dynamic routing, the process involves intelligent analysi Arch-Router ------------------------- -The `Arch-Router `_ is a state-of-the-art **preference-based routing model** specifically designed for intelligent LLM selection. This model delivers production-ready performance with low latency and high accuracy. +The `Arch-Router `_ is a state-of-the-art **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges. + +**Addressing Traditional Routing Limitations:** + +**Human Preference Alignment** +Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs. + +**Flexible Model Integration** +The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes. + +**Preference-Encoded Routing** +Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases. To support effective routing, Arch-Router introduces two key concepts: