mirror of
https://github.com/katanemo/plano.git
synced 2026-07-02 15:51:02 +02:00
updated the section on preference-based routing
This commit is contained in:
parent
3925836858
commit
acc3803a02
2 changed files with 54 additions and 44 deletions
|
|
@ -16,27 +16,20 @@ Core Capabilities
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
**Multi-Provider Support**
|
**Multi-Provider Support**
|
||||||
Connect to any combination of providers simultaneously:
|
Connect to any combination of providers simultaneously (see :ref:`supported_providers` for full details):
|
||||||
|
|
||||||
- **First-Class Providers**: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama
|
- **First-Class Providers**: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama
|
||||||
- **OpenAI-Compatible Providers**: Support for any provider implementing OpenAI's API interface
|
- **OpenAI-Compatible Providers**: Any provider implementing the OpenAI Chat Completions API standard
|
||||||
|
|
||||||
**Intelligent Routing**
|
**Intelligent Routing**
|
||||||
Two powerful routing approaches to optimize model selection:
|
Three powerful routing approaches to optimize model selection:
|
||||||
|
|
||||||
- **Static Model Selection**: Direct routing using provider names or semantic model aliases
|
- **Model-based Routing**: Direct routing to specific models using provider/model names (see :ref:`supported_providers`)
|
||||||
- **Preference-Aligned Dynamic Routing**: Intelligent, context-aware routing using the Arch-Router model that analyzes prompts and selects optimal models based on domain and action preferences
|
- **Alias-based Routing**: Semantic routing using custom aliases (see :ref:`model_aliases`)
|
||||||
|
- **Preference-aligned Routing**: Intelligent routing using the Arch-Router model (see :ref:`preference_aligned_routing`)
|
||||||
**Model Aliases & Management**
|
|
||||||
Create semantic, version-controlled names for simplified model management:
|
|
||||||
|
|
||||||
- **Semantic Naming**: Use descriptive names like ``fast-model``, ``reasoning-model``, or ``arch.summarize.v1``
|
|
||||||
- **Environment Management**: Different aliases for dev/staging/production environments
|
|
||||||
- **Version Control**: Implement versioning schemes for gradual model upgrades
|
|
||||||
- **Future Features**: Planned support for guardrails, fallback chains, and traffic splitting
|
|
||||||
|
|
||||||
**Unified Client Interface**
|
**Unified Client Interface**
|
||||||
Use your preferred client library without changing existing code:
|
Use your preferred client library without changing existing code (see :ref:`client_libraries` for details):
|
||||||
|
|
||||||
- **OpenAI Python SDK**: Full compatibility with all providers
|
- **OpenAI Python SDK**: Full compatibility with all providers
|
||||||
- **Anthropic Python SDK**: Native support with cross-provider capabilities
|
- **Anthropic Python SDK**: Native support with cross-provider capabilities
|
||||||
|
|
@ -47,26 +40,12 @@ Key Benefits
|
||||||
------------
|
------------
|
||||||
|
|
||||||
- **Provider Flexibility**: Switch between providers without changing client code
|
- **Provider Flexibility**: Switch between providers without changing client code
|
||||||
- **Intelligent Routing**: Automatically select the best model for each request
|
- **Three Routing Methods**: Choose from model-based, alias-based, or preference-aligned routing (using `Arch-Router-1.5B <https://huggingface.co/katanemo/Arch-Router-1.5B>`_) strategies
|
||||||
- **Cost Optimization**: Route requests to cost-effective models based on complexity
|
- **Cost Optimization**: Route requests to cost-effective models based on complexity
|
||||||
- **Performance Optimization**: Use fast models for simple tasks, powerful models for complex reasoning
|
- **Performance Optimization**: Use fast models for simple tasks, powerful models for complex reasoning
|
||||||
- **Environment Management**: Configure different models for different environments
|
- **Environment Management**: Configure different models for different environments
|
||||||
- **Future-Proof**: Easy to add new providers and upgrade models
|
- **Future-Proof**: Easy to add new providers and upgrade models
|
||||||
|
|
||||||
Getting Started
|
|
||||||
---------------
|
|
||||||
Dive into specific areas based on your needs:
|
|
||||||
|
|
||||||
.. toctree::
|
|
||||||
:maxdepth: 2
|
|
||||||
|
|
||||||
supported_providers
|
|
||||||
client_libraries
|
|
||||||
model_aliases
|
|
||||||
|
|
||||||
**3. Advanced Features**
|
|
||||||
- **:ref:`llm_router`**: Learn about preference-aligned dynamic routing and intelligent model selection
|
|
||||||
|
|
||||||
Common Use Cases
|
Common Use Cases
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
|
|
@ -85,10 +64,17 @@ Common Use Cases
|
||||||
- Apply consistent security and governance policies across all providers
|
- Apply consistent security and governance policies across all providers
|
||||||
- Scale across regions using different provider endpoints
|
- Scale across regions using different provider endpoints
|
||||||
|
|
||||||
Next Steps
|
Advanced Features
|
||||||
----------
|
-----------------
|
||||||
|
- :ref:`preference_aligned_routing` - Learn about preference-aligned dynamic routing and intelligent model selection
|
||||||
|
|
||||||
1. **:ref:`supported_providers`** - See all supported providers, models, and configuration examples
|
Getting Started
|
||||||
2. **:ref:`client_libraries`** - Start using with your preferred client
|
---------------
|
||||||
3. **:ref:`model_aliases`** - Create semantic model names
|
Dive into specific areas based on your needs:
|
||||||
4. **:ref:`llm_router`** - Set up intelligent routing
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
supported_providers
|
||||||
|
client_libraries
|
||||||
|
model_aliases
|
||||||
|
|
|
||||||
|
|
@ -17,7 +17,8 @@ This enables optimal performance, cost efficiency, and response quality by match
|
||||||
Routing Methods
|
Routing Methods
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
**Model-based Routing**
|
Model-based Routing
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
|
Direct routing allows you to specify exact provider and model combinations using the format ``provider/model-name``:
|
||||||
|
|
||||||
|
|
@ -25,7 +26,8 @@ Direct routing allows you to specify exact provider and model combinations using
|
||||||
- Provides full control and transparency over which model handles each request
|
- Provides full control and transparency over which model handles each request
|
||||||
- Ideal for production workloads where you want predictable routing behavior
|
- Ideal for production workloads where you want predictable routing behavior
|
||||||
|
|
||||||
**Alias-based Routing**
|
Alias-based Routing
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Alias-based routing lets you create semantic model names that decouple your application from specific providers:
|
Alias-based routing lets you create semantic model names that decouple your application from specific providers:
|
||||||
|
|
||||||
|
|
@ -33,14 +35,23 @@ Alias-based routing lets you create semantic model names that decouple your appl
|
||||||
- Maps semantic names to underlying provider models for easier experimentation and provider switching
|
- Maps semantic names to underlying provider models for easier experimentation and provider switching
|
||||||
- Ideal for applications that want abstraction from specific model names while maintaining control
|
- Ideal for applications that want abstraction from specific model names while maintaining control
|
||||||
|
|
||||||
**Preference-aligned Routing (Arch-Router)**
|
.. _preference_aligned_routing:
|
||||||
|
|
||||||
Intelligent routing uses the Arch-Router model to automatically select the most appropriate LLM based on:
|
Preference-aligned Routing (Arch-Router)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
- **Domain Analysis**: Identifies the subject matter (e.g., legal, healthcare, programming)
|
Traditional LLM routing approaches face significant limitations: they evaluate performance using benchmarks that often fail to capture human preferences, select from fixed model pools, and operate as "black boxes" without practical mechanisms for encoding user preferences.
|
||||||
- **Action Classification**: Determines the type of operation (e.g., summarization, code generation, translation)
|
|
||||||
- **User-Defined Preferences**: Maps domains and actions to preferred models
|
Arch's preference-aligned routing addresses these challenges by applying a fundamental engineering principle: decoupling. The framework separates route selection (matching queries to human-readable policies) from model assignment (mapping policies to specific LLMs). This separation allows you to define routing policies using descriptive labels like ``Domain: 'finance', Action: 'analyze_earnings_report'`` rather than cryptic identifiers, while independently configuring which models handle each policy.
|
||||||
- Ideal for dynamic, context-aware routing that adapts to request content and intent
|
|
||||||
|
The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ model automatically selects the most appropriate LLM based on:
|
||||||
|
|
||||||
|
- Domain Analysis: Identifies the subject matter (e.g., legal, healthcare, programming)
|
||||||
|
- Action Classification: Determines the type of operation (e.g., summarization, code generation, translation)
|
||||||
|
- User-Defined Preferences: Maps domains and actions to preferred models using transparent, configurable routing decisions
|
||||||
|
- Human Preference Alignment: Uses domain-action mappings that capture subjective evaluation criteria, ensuring routing aligns with real-world user needs rather than just benchmark scores
|
||||||
|
|
||||||
|
This approach supports seamlessly adding new models without retraining and is ideal for dynamic, context-aware routing that adapts to request content and intent.
|
||||||
|
|
||||||
|
|
||||||
Model-based Routing Workflow
|
Model-based Routing Workflow
|
||||||
|
|
@ -91,6 +102,8 @@ For alias-based routing, the process includes name resolution:
|
||||||
The response is returned with optional metadata about the alias resolution.
|
The response is returned with optional metadata about the alias resolution.
|
||||||
|
|
||||||
|
|
||||||
|
.. _preference_aligned_routing_workflow:
|
||||||
|
|
||||||
Preference-aligned Routing Workflow (Arch-Router)
|
Preference-aligned Routing Workflow (Arch-Router)
|
||||||
-------------------------------------------------
|
-------------------------------------------------
|
||||||
|
|
||||||
|
|
@ -114,7 +127,18 @@ For preference-aligned dynamic routing, the process involves intelligent analysi
|
||||||
|
|
||||||
Arch-Router
|
Arch-Router
|
||||||
-------------------------
|
-------------------------
|
||||||
The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ is a state-of-the-art **preference-based routing model** specifically designed for intelligent LLM selection. This model delivers production-ready performance with low latency and high accuracy.
|
The `Arch-Router <https://huggingface.co/katanemo/Arch-Router-1.5B>`_ is a state-of-the-art **preference-based routing model** specifically designed to address the limitations of traditional LLM routing. This compact 1.5B model delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
|
||||||
|
|
||||||
|
**Addressing Traditional Routing Limitations:**
|
||||||
|
|
||||||
|
**Human Preference Alignment**
|
||||||
|
Unlike benchmark-driven approaches, Arch-Router learns to match queries with human preferences by using domain-action mappings that capture subjective evaluation criteria, ensuring routing decisions align with real-world user needs.
|
||||||
|
|
||||||
|
**Flexible Model Integration**
|
||||||
|
The system supports seamlessly adding new models for routing without requiring retraining or architectural modifications, enabling dynamic adaptation to evolving model landscapes.
|
||||||
|
|
||||||
|
**Preference-Encoded Routing**
|
||||||
|
Provides a practical mechanism to encode user preferences through domain-action mappings, offering transparent and controllable routing decisions that can be customized for specific use cases.
|
||||||
|
|
||||||
To support effective routing, Arch-Router introduces two key concepts:
|
To support effective routing, Arch-Router introduces two key concepts:
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue