mirror of
https://github.com/katanemo/plano.git
synced 2026-06-17 15:25:17 +02:00
fixed docs and added ollama as a first-class LLM provider
This commit is contained in:
parent
8d0b468345
commit
f7c9d04da9
16 changed files with 1612 additions and 149 deletions
|
|
@ -134,7 +134,7 @@ It will automatically validate parameters, and ensure that the required paramete
|
|||
|
||||
|
||||
Once a downstream function (API) is called, Arch Gateway takes the response and sends it an upstream LLM to complete the request (for summarization, Q/A, text generation tasks).
|
||||
For more details on how Arch Gateway enables you to centralize usage of LLMs, please read :ref:`LLM providers <llm_provider>`.
|
||||
For more details on how Arch Gateway enables you to centralize usage of LLMs, please read :ref:`LLM providers <llm_providers>`.
|
||||
|
||||
By completing these steps, you enable Arch to manage the process from validation to response, ensuring users receive consistent, reliable results - and that you are focused
|
||||
on the stuff that matters most.
|
||||
|
|
|
|||
|
|
@ -5,18 +5,67 @@ LLM Routing
|
|||
|
||||
With the rapid proliferation of large language models (LLM) — each optimized for different strengths, style, or latency/cost profile — routing has become an essential technique to operationalize the use of different models.
|
||||
|
||||
Arch Router is an intelligent routing system that automatically selects the most appropriate LLM for each user request based on user-defined usage preferences. Specifically Arch-Router guides model selection by matching queries to user-defined domains (e.g., finance and healthcare) and action types (e.g., code generation, image editing, etc.).
|
||||
Our preference-aligned approach matches practical definitions of performance in the real world and makes routing decisions more transparent and adaptable.
|
||||
Arch provides two distinct routing approaches to meet different use cases:
|
||||
|
||||
1. **Static Model Selection**: Direct routing to specific models based on provider configuration and model aliases
|
||||
2. **Preference-Aligned Dynamic Routing**: Intelligent routing using the Arch-Router model based on context and user-defined preferences
|
||||
|
||||
This enables optimal performance, cost efficiency, and response quality by matching requests with the most suitable model from your available LLM fleet.
|
||||
|
||||
|
||||
Routing Workflow
|
||||
-------------------------
|
||||
Routing Methods
|
||||
---------------
|
||||
|
||||
**Static Model Selection**
|
||||
|
||||
Static routing allows you to directly specify which model to use, either through:
|
||||
|
||||
- **Direct Model Names**: Use provider-specific names like ``openai/gpt-4o-mini``
|
||||
- **Model Aliases**: Use semantic names like ``fast-model`` or ``arch.summarize.v1`` (see :ref:`model_aliases`)
|
||||
|
||||
This approach is ideal when you know exactly which model you want to use for specific tasks or when implementing your own routing logic at the application level.
|
||||
|
||||
**Preference-Aligned Dynamic Routing (Arch-Router)**
|
||||
|
||||
Dynamic routing uses the Arch-Router model to automatically select the most appropriate LLM for each request based on:
|
||||
|
||||
- **Domain Analysis**: Identifies the subject matter (e.g., legal, healthcare, programming)
|
||||
- **Action Classification**: Determines the type of operation (e.g., summarization, code generation, translation)
|
||||
- **User-Defined Preferences**: Maps domains and actions to preferred models
|
||||
|
||||
This approach is ideal when you want intelligent, context-aware routing that adapts to the content and intent of each request.
|
||||
|
||||
|
||||
Static Model Selection Workflow
|
||||
--------------------------------
|
||||
|
||||
For static routing, the process is straightforward:
|
||||
|
||||
#. **Client Request**
|
||||
|
||||
The client specifies the exact model to use, either by provider name (``openai/gpt-4o``) or alias (``fast-model``).
|
||||
|
||||
#. **Model Resolution**
|
||||
|
||||
If using an alias, Arch resolves it to the actual provider model name.
|
||||
|
||||
#. **Direct Routing**
|
||||
|
||||
The request is sent directly to the specified model without analysis or decision-making.
|
||||
|
||||
#. **Response Handling**
|
||||
|
||||
The response is returned to the client with optional metadata about the routing decision.
|
||||
|
||||
|
||||
Preference-Aligned Dynamic Routing Workflow (Arch-Router)
|
||||
---------------------------------------
|
||||
|
||||
For preference-aligned dynamic routing, the process involves intelligent analysis:
|
||||
|
||||
#. **Prompt Analysis**
|
||||
|
||||
When a user submits a prompt, the Router analyzes it to determine the domain (subject matter) or action (type of operation requested).
|
||||
When a user submits a prompt without specifying a model, the Arch-Router analyzes it to determine the domain (subject matter) and action (type of operation requested).
|
||||
|
||||
#. **Model Selection**
|
||||
|
||||
|
|
@ -53,51 +102,146 @@ In summary, Arch-Router demonstrates:
|
|||
- **Production-Ready Performance**: Optimized for low-latency, high-throughput applications in multi-model environments.
|
||||
|
||||
|
||||
Implementing LLM Routing
|
||||
-----------------------------
|
||||
Implementing Routing
|
||||
--------------------
|
||||
|
||||
To configure LLM routing in our gateway, you need to define a prompt target configuration that specifies the routing model and the LLM providers. This configuration will allow Arch Gateway to route incoming prompts to the appropriate model based on the defined routes.
|
||||
|
||||
Below is an example to show how to set up a prompt target for the Arch Router:
|
||||
|
||||
- **Step 1: Define the routing model in the `routing` section**. You can use the `archgw-v1-router-model` as the katanemo routing model or any other routing model you prefer.
|
||||
|
||||
- **Step 2: Define the listeners in the `listeners` section**. This is where you specify the address and port for incoming traffic, as well as the message format (e.g., OpenAI).
|
||||
|
||||
- **Step 3: Define the LLM providers in the `llm_providers` section**. This is where you specify the routing model, and any other models you want to use for specific tasks and their route usage descriptions (e.g., code generation, code understanding).
|
||||
|
||||
.. Note::
|
||||
Make sure you define a model for default usage, such as `gpt-4o`, which will be used when no specific route is matched for an user prompt.
|
||||
**Static Model Selection**
|
||||
|
||||
For static routing, simply configure your LLM providers and optionally define model aliases:
|
||||
|
||||
.. code-block:: yaml
|
||||
:caption: Route Config Example
|
||||
|
||||
:caption: Static Routing Configuration
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
egress_traffic:
|
||||
address: 0.0.0.0
|
||||
port: 12000
|
||||
message_format: openai
|
||||
timeout: 30s
|
||||
|
||||
llm_providers:
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: code understanding
|
||||
description: understand and explain existing code snippets, functions, or libraries
|
||||
- model: anthropic/claude-3-5-sonnet-20241022
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
|
||||
- model: openai/gpt-4.1
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
|
||||
# Optional: Define aliases for easier client usage
|
||||
model_aliases:
|
||||
fast-model:
|
||||
target: gpt-4o-mini
|
||||
smart-model:
|
||||
target: gpt-4o
|
||||
creative-model:
|
||||
target: claude-3-5-sonnet-20241022
|
||||
|
||||
Clients can then specify models directly:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Using provider model names
|
||||
response = client.chat.completions.create(
|
||||
model="openai/gpt-4o-mini",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
|
||||
# Using aliases
|
||||
response = client.chat.completions.create(
|
||||
model="fast-model",
|
||||
messages=[{"role": "user", "content": "Hello!"}]
|
||||
)
|
||||
|
||||
**Preference-Aligned Dynamic Routing (Arch-Router)**
|
||||
|
||||
To configure preference-aligned dynamic routing, you need to define routing preferences that map domains and actions to specific models:
|
||||
|
||||
.. code-block:: yaml
|
||||
:caption: Preference-Aligned Dynamic Routing Configuration
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
address: 0.0.0.0
|
||||
port: 12000
|
||||
message_format: openai
|
||||
timeout: 30s
|
||||
|
||||
llm_providers:
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: code understanding
|
||||
description: understand and explain existing code snippets, functions, or libraries
|
||||
- name: complex reasoning
|
||||
description: deep analysis, mathematical problem solving, and logical reasoning
|
||||
|
||||
- model: anthropic/claude-3-5-sonnet-20241022
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative writing
|
||||
description: creative content generation, storytelling, and writing assistance
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts
|
||||
|
||||
Clients can let the router decide or use aliases:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Let Arch-Router choose based on content
|
||||
response = client.chat.completions.create(
|
||||
messages=[{"role": "user", "content": "Write a creative story about space exploration"}]
|
||||
# No model specified - router will analyze and choose claude-3-5-sonnet-20241022
|
||||
)
|
||||
|
||||
|
||||
Combining Routing Methods
|
||||
-------------------------
|
||||
|
||||
You can combine static model selection with dynamic routing preferences for maximum flexibility:
|
||||
|
||||
.. code-block:: yaml
|
||||
:caption: Hybrid Routing Configuration
|
||||
|
||||
llm_providers:
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: deep analysis and complex problem solving
|
||||
|
||||
- model: anthropic/claude-3-5-sonnet-20241022
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative_tasks
|
||||
description: creative writing and content generation
|
||||
|
||||
model_aliases:
|
||||
# Static aliases for direct routing
|
||||
fast-model:
|
||||
target: gpt-4o-mini
|
||||
|
||||
reasoning-model:
|
||||
target: gpt-4o
|
||||
|
||||
# Aliases that can also participate in dynamic routing
|
||||
creative-model:
|
||||
target: claude-3-5-sonnet-20241022
|
||||
|
||||
This configuration allows clients to:
|
||||
|
||||
1. **Use direct model selection**: ``model="fast-model"``
|
||||
2. **Let the router decide**: No model specified, router analyzes content
|
||||
|
||||
Example Use Cases
|
||||
-------------------------
|
||||
|
|
@ -112,7 +256,7 @@ Here are common scenarios where Arch-Router excels:
|
|||
- **Conversational Routing**: Track conversation context to identify when topics shift between domains or when the type of assistance needed changes mid-conversation.
|
||||
|
||||
|
||||
Best practice
|
||||
Best practicesm
|
||||
-------------------------
|
||||
- **💡Consistent Naming:** Route names should align with their descriptions.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue