mirror of
https://github.com/katanemo/plano.git
synced 2026-04-30 11:26:27 +02:00
Update docs to Plano (#639)
This commit is contained in:
parent
15fbb6c3af
commit
e224cba3e3
139 changed files with 4407 additions and 24735 deletions
|
|
@ -3,7 +3,7 @@
|
|||
Client Libraries
|
||||
================
|
||||
|
||||
Arch provides a unified interface that works seamlessly with multiple client libraries and tools. You can use your preferred client library without changing your existing code - just point it to Arch's gateway endpoints.
|
||||
Plano provides a unified interface that works seamlessly with multiple client libraries and tools. You can use your preferred client library without changing your existing code - just point it to Plano's gateway endpoints.
|
||||
|
||||
Supported Clients
|
||||
------------------
|
||||
|
|
@ -16,7 +16,7 @@ Supported Clients
|
|||
Gateway Endpoints
|
||||
-----------------
|
||||
|
||||
Arch exposes two main endpoints:
|
||||
Plano exposes three main endpoints:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
|
@ -26,13 +26,15 @@ Arch exposes two main endpoints:
|
|||
- Purpose
|
||||
* - ``http://127.0.0.1:12000/v1/chat/completions``
|
||||
- OpenAI-compatible chat completions (LLM Gateway)
|
||||
* - ``http://127.0.0.1:12000/v1/responses``
|
||||
- OpenAI Responses API with :ref:`conversational state management <managing_conversational_state>` (LLM Gateway)
|
||||
* - ``http://127.0.0.1:12000/v1/messages``
|
||||
- Anthropic-compatible messages (LLM Gateway)
|
||||
|
||||
OpenAI (Python) SDK
|
||||
-------------------
|
||||
|
||||
The OpenAI SDK works with any provider through Arch's OpenAI-compatible endpoint.
|
||||
The OpenAI SDK works with any provider through Plano's OpenAI-compatible endpoint.
|
||||
|
||||
**Installation:**
|
||||
|
||||
|
|
@ -46,7 +48,7 @@ The OpenAI SDK works with any provider through Arch's OpenAI-compatible endpoint
|
|||
|
||||
from openai import OpenAI
|
||||
|
||||
# Point to Arch's LLM Gateway
|
||||
# Point to Plano's LLM Gateway
|
||||
client = OpenAI(
|
||||
api_key="test-key", # Can be any value for local testing
|
||||
base_url="http://127.0.0.1:12000/v1"
|
||||
|
|
@ -96,7 +98,7 @@ The OpenAI SDK works with any provider through Arch's OpenAI-compatible endpoint
|
|||
|
||||
**Using with Non-OpenAI Models:**
|
||||
|
||||
The OpenAI SDK can be used with any provider configured in Arch:
|
||||
The OpenAI SDK can be used with any provider configured in Plano:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
|
@ -124,10 +126,92 @@ The OpenAI SDK can be used with any provider configured in Arch:
|
|||
]
|
||||
)
|
||||
|
||||
OpenAI Responses API (Conversational State)
|
||||
-------------------------------------------
|
||||
|
||||
The OpenAI Responses API (``v1/responses``) enables multi-turn conversations with automatic state management. Plano handles conversation history for you, so you don't need to manually include previous messages in each request.
|
||||
|
||||
See :ref:`managing_conversational_state` for detailed configuration and storage backend options.
|
||||
|
||||
**Installation:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install openai
|
||||
|
||||
**Basic Multi-Turn Conversation:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from openai import OpenAI
|
||||
|
||||
# Point to Plano's LLM Gateway
|
||||
client = OpenAI(
|
||||
api_key="test-key",
|
||||
base_url="http://127.0.0.1:12000/v1"
|
||||
)
|
||||
|
||||
# First turn - creates a new conversation
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "user", "content": "My name is Alice"}
|
||||
]
|
||||
)
|
||||
|
||||
# Extract response_id for conversation continuity
|
||||
response_id = response.id
|
||||
print(f"Assistant: {response.choices[0].message.content}")
|
||||
|
||||
# Second turn - continues the conversation
|
||||
# Plano automatically retrieves and merges previous context
|
||||
response = client.chat.completions.create(
|
||||
model="gpt-4o-mini",
|
||||
messages=[
|
||||
{"role": "user", "content": "What's my name?"}
|
||||
],
|
||||
metadata={"response_id": response_id} # Reference previous conversation
|
||||
)
|
||||
|
||||
print(f"Assistant: {response.choices[0].message.content}")
|
||||
# Output: "Your name is Alice"
|
||||
|
||||
**Using with Any Provider:**
|
||||
|
||||
The Responses API works with any LLM provider configured in Plano:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Multi-turn conversation with Claude
|
||||
response = client.chat.completions.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
messages=[
|
||||
{"role": "user", "content": "Let's discuss quantum physics"}
|
||||
]
|
||||
)
|
||||
|
||||
response_id = response.id
|
||||
|
||||
# Continue conversation - Plano manages state regardless of provider
|
||||
response = client.chat.completions.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
messages=[
|
||||
{"role": "user", "content": "Tell me more about entanglement"}
|
||||
],
|
||||
metadata={"response_id": response_id}
|
||||
)
|
||||
|
||||
**Key Benefits:**
|
||||
|
||||
* **Reduced payload size**: No need to send full conversation history in each request
|
||||
* **Provider flexibility**: Use any configured LLM provider with state management
|
||||
* **Automatic context merging**: Plano handles conversation continuity behind the scenes
|
||||
* **Production-ready storage**: Configure :ref:`PostgreSQL or memory storage <managing_conversational_state>` based on your needs
|
||||
|
||||
Anthropic (Python) SDK
|
||||
----------------------
|
||||
|
||||
The Anthropic SDK works with any provider through Arch's Anthropic-compatible endpoint.
|
||||
The Anthropic SDK works with any provider through Plano's Anthropic-compatible endpoint.
|
||||
|
||||
**Installation:**
|
||||
|
||||
|
|
@ -141,7 +225,7 @@ The Anthropic SDK works with any provider through Arch's Anthropic-compatible en
|
|||
|
||||
import anthropic
|
||||
|
||||
# Point to Arch's LLM Gateway
|
||||
# Point to Plano's LLM Gateway
|
||||
client = anthropic.Anthropic(
|
||||
api_key="test-key", # Can be any value for local testing
|
||||
base_url="http://127.0.0.1:12000"
|
||||
|
|
@ -192,7 +276,7 @@ The Anthropic SDK works with any provider through Arch's Anthropic-compatible en
|
|||
|
||||
**Using with Non-Anthropic Models:**
|
||||
|
||||
The Anthropic SDK can be used with any provider configured in Arch:
|
||||
The Anthropic SDK can be used with any provider configured in Plano:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
|
@ -284,7 +368,7 @@ For direct HTTP requests or integration with any programming language:
|
|||
Cross-Client Compatibility
|
||||
--------------------------
|
||||
|
||||
One of Arch's key features is cross-client compatibility. You can:
|
||||
One of Plano's key features is cross-client compatibility. You can:
|
||||
|
||||
**Use OpenAI SDK with Claude Models:**
|
||||
|
||||
|
|
|
|||
|
|
@ -1,16 +1,16 @@
|
|||
.. _llm_providers:
|
||||
|
||||
LLM Providers
|
||||
=============
|
||||
**LLM Providers** are a top-level primitive in Arch, helping developers centrally define, secure, observe,
|
||||
and manage the usage of their LLMs. Arch builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_
|
||||
to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,
|
||||
ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly
|
||||
switch between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs
|
||||
across applications.
|
||||
Model (LLM) Providers
|
||||
=====================
|
||||
**Model Providers** are a top-level primitive in Plano, helping developers centrally define, secure, observe,
|
||||
and manage the usage of their models. Plano builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_ to manage egress traffic to models, which includes intelligent routing, retry and fail-over mechanisms,
|
||||
ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly switch between model providers or upgrade model versions, simplifying the integration and scaling of models across applications.
|
||||
|
||||
Today, we are enabling you to connect to 11+ different AI providers through a unified interface with advanced routing and management capabilities.
|
||||
Whether you're using OpenAI, Anthropic, Azure OpenAI, local Ollama models, or any OpenAI-compatible provider, Arch provides seamless integration with enterprise-grade features.
|
||||
Today, we are enable you to connect to 15+ different AI providers through a unified interface with advanced routing and management capabilities.
|
||||
Whether you're using OpenAI, Anthropic, Azure OpenAI, local Ollama models, or any OpenAI-compatible provider, Plano provides seamless integration with enterprise-grade features.
|
||||
|
||||
.. note::
|
||||
Please refer to the quickstart guide :ref:`here <llm_routing_quickstart>` to configure and use LLM providers via common client libraries like OpenAI and Anthropic Python SDKs, or via direct HTTP/cURL requests.
|
||||
|
||||
Core Capabilities
|
||||
-----------------
|
||||
|
|
@ -18,29 +18,29 @@ Core Capabilities
|
|||
**Multi-Provider Support**
|
||||
Connect to any combination of providers simultaneously (see :ref:`supported_providers` for full details):
|
||||
|
||||
- **First-Class Providers**: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama
|
||||
- **OpenAI-Compatible Providers**: Any provider implementing the OpenAI Chat Completions API standard
|
||||
- First-Class Providers: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama
|
||||
- OpenAI-Compatible Providers: Any provider implementing the OpenAI Chat Completions API standard
|
||||
|
||||
**Intelligent Routing**
|
||||
Three powerful routing approaches to optimize model selection:
|
||||
|
||||
- **Model-based Routing**: Direct routing to specific models using provider/model names (see :ref:`supported_providers`)
|
||||
- **Alias-based Routing**: Semantic routing using custom aliases (see :ref:`model_aliases`)
|
||||
- **Preference-aligned Routing**: Intelligent routing using the Arch-Router model (see :ref:`preference_aligned_routing`)
|
||||
- Model-based Routing: Direct routing to specific models using provider/model names (see :ref:`supported_providers`)
|
||||
- Alias-based Routing: Semantic routing using custom aliases (see :ref:`model_aliases`)
|
||||
- Preference-aligned Routing: Intelligent routing using the Plano-Router model (see :ref:`preference_aligned_routing`)
|
||||
|
||||
**Unified Client Interface**
|
||||
Use your preferred client library without changing existing code (see :ref:`client_libraries` for details):
|
||||
|
||||
- **OpenAI Python SDK**: Full compatibility with all providers
|
||||
- **Anthropic Python SDK**: Native support with cross-provider capabilities
|
||||
- **cURL & HTTP Clients**: Direct REST API access for any programming language
|
||||
- **Custom Integrations**: Standard HTTP interfaces for seamless integration
|
||||
- OpenAI Python SDK: Full compatibility with all providers
|
||||
- Anthropic Python SDK: Native support with cross-provider capabilities
|
||||
- cURL & HTTP Clients: Direct REST API access for any programming language
|
||||
- Custom Integrations: Standard HTTP interfaces for seamless integration
|
||||
|
||||
Key Benefits
|
||||
------------
|
||||
|
||||
- **Provider Flexibility**: Switch between providers without changing client code
|
||||
- **Three Routing Methods**: Choose from model-based, alias-based, or preference-aligned routing (using `Arch-Router-1.5B <https://huggingface.co/katanemo/Arch-Router-1.5B>`_) strategies
|
||||
- **Three Routing Methods**: Choose from model-based, alias-based, or preference-aligned routing (using `Plano-Router-1.5B <https://huggingface.co/katanemo/Plano-Router-1.5B>`_) strategies
|
||||
- **Cost Optimization**: Route requests to cost-effective models based on complexity
|
||||
- **Performance Optimization**: Use fast models for simple tasks, powerful models for complex reasoning
|
||||
- **Environment Management**: Configure different models for different environments
|
||||
|
|
|
|||
|
|
@ -3,27 +3,21 @@
|
|||
Supported Providers & Configuration
|
||||
===================================
|
||||
|
||||
Arch provides first-class support for multiple LLM providers through native integrations and OpenAI-compatible interfaces. This comprehensive guide covers all supported providers, their available chat models, and detailed configuration instructions.
|
||||
Plano provides first-class support for multiple LLM providers through native integrations and OpenAI-compatible interfaces. This comprehensive guide covers all supported providers, their available chat models, and detailed configuration instructions.
|
||||
|
||||
.. note::
|
||||
**Model Support:** Arch supports all chat models from each provider, not just the examples shown in this guide. The configurations below demonstrate common models for reference, but you can use any chat model available from your chosen provider.
|
||||
**Model Support:** Plano supports all chat models from each provider, not just the examples shown in this guide. The configurations below demonstrate common models for reference, but you can use any chat model available from your chosen provider.
|
||||
|
||||
Please refer to the quuickstart guide :ref:`here <llm_routing_quickstart>` to configure and use LLM providers via common client libraries like OpenAI and Anthropic Python SDKs, or via direct HTTP/cURL requests.
|
||||
|
||||
|
||||
Configuration Structure
|
||||
-----------------------
|
||||
|
||||
All providers are configured in the ``llm_providers`` section of your ``arch_config.yaml`` file:
|
||||
All providers are configured in the ``llm_providers`` section of your ``plano_config.yaml`` file:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
version: v0.1
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
address: 0.0.0.0
|
||||
port: 12000
|
||||
message_format: openai
|
||||
timeout: 30s
|
||||
|
||||
llm_providers:
|
||||
# Provider configurations go here
|
||||
- model: provider/model-name
|
||||
|
|
@ -50,7 +44,7 @@ Any provider that implements the OpenAI API interface can be configured using cu
|
|||
Supported API Endpoints
|
||||
------------------------
|
||||
|
||||
Arch supports the following standardized endpoints across providers:
|
||||
Plano supports the following standardized endpoints across providers:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
|
@ -65,6 +59,9 @@ Arch supports the following standardized endpoints across providers:
|
|||
* - ``/v1/messages``
|
||||
- Anthropic-style messages
|
||||
- Anthropic SDK, cURL, custom clients
|
||||
* - ``/v1/responses``
|
||||
- Unified response endpoint for agentic apps
|
||||
- All SDKs, cURL, custom clients
|
||||
|
||||
First-Class Providers
|
||||
---------------------
|
||||
|
|
@ -78,7 +75,7 @@ OpenAI
|
|||
|
||||
**Authentication:** API Key - Get your OpenAI API key from `OpenAI Platform <https://platform.openai.com/api-keys>`_.
|
||||
|
||||
**Supported Chat Models:** All OpenAI chat models including GPT-5, GPT-4o, GPT-4, GPT-3.5-turbo, and all future releases.
|
||||
**Supported Chat Models:** All OpenAI chat models including GPT-5.2, GPT-5, GPT-4o, and all future releases.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
|
@ -87,21 +84,18 @@ OpenAI
|
|||
* - Model Name
|
||||
- Model ID for Config
|
||||
- Description
|
||||
* - GPT-5.2
|
||||
- ``openai/gpt-5.2``
|
||||
- Next-generation model (use any model name from OpenAI's API)
|
||||
* - GPT-5
|
||||
- ``openai/gpt-5``
|
||||
- Next-generation model (use any model name from OpenAI's API)
|
||||
* - GPT-4o
|
||||
- ``openai/gpt-4o``
|
||||
- Latest multimodal model
|
||||
* - GPT-4o mini
|
||||
- ``openai/gpt-4o-mini``
|
||||
- Fast, cost-effective model
|
||||
* - GPT-4
|
||||
- ``openai/gpt-4``
|
||||
* - GPT-4o
|
||||
- ``openai/gpt-4o``
|
||||
- High-capability reasoning model
|
||||
* - GPT-3.5 Turbo
|
||||
- ``openai/gpt-3.5-turbo``
|
||||
- Balanced performance and cost
|
||||
* - o3-mini
|
||||
- ``openai/o3-mini``
|
||||
- Reasoning-focused model (preview)
|
||||
|
|
@ -115,15 +109,15 @@ OpenAI
|
|||
|
||||
llm_providers:
|
||||
# Latest models (examples - use any OpenAI chat model)
|
||||
- model: openai/gpt-4o-mini
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-4o
|
||||
- model: openai/gpt-5
|
||||
access_key: $OPENAI_API_KEY
|
||||
|
||||
# Use any model name from OpenAI's API
|
||||
- model: openai/gpt-5
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
|
||||
Anthropic
|
||||
|
|
@ -135,7 +129,7 @@ Anthropic
|
|||
|
||||
**Authentication:** API Key - Get your Anthropic API key from `Anthropic Console <https://console.anthropic.com/settings/keys>`_.
|
||||
|
||||
**Supported Chat Models:** All Anthropic Claude models including Claude Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus, and all future releases.
|
||||
**Supported Chat Models:** All Anthropic Claude models including Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5, and all future releases.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
|
@ -144,24 +138,18 @@ Anthropic
|
|||
* - Model Name
|
||||
- Model ID for Config
|
||||
- Description
|
||||
* - Claude Sonnet 4
|
||||
- ``anthropic/claude-sonnet-4``
|
||||
- Next-generation model (use any model name from Anthropic's API)
|
||||
* - Claude 3.5 Sonnet
|
||||
- ``anthropic/claude-3-5-sonnet-20241022``
|
||||
- Latest high-performance model
|
||||
* - Claude 3.5 Haiku
|
||||
- ``anthropic/claude-3-5-haiku-20241022``
|
||||
- Fast and efficient model
|
||||
* - Claude 3 Opus
|
||||
- ``anthropic/claude-3-opus-20240229``
|
||||
* - Claude Opus 4.5
|
||||
- ``anthropic/claude-opus-4-5``
|
||||
- Most capable model for complex tasks
|
||||
* - Claude 3 Sonnet
|
||||
- ``anthropic/claude-3-sonnet-20240229``
|
||||
* - Claude Sonnet 4.5
|
||||
- ``anthropic/claude-sonnet-4-5``
|
||||
- Balanced performance model
|
||||
* - Claude 3 Haiku
|
||||
- ``anthropic/claude-3-haiku-20240307``
|
||||
- Fastest model
|
||||
* - Claude Haiku 4.5
|
||||
- ``anthropic/claude-haiku-4-5``
|
||||
- Fast and efficient model
|
||||
* - Claude Sonnet 3.5
|
||||
- ``anthropic/claude-sonnet-3-5``
|
||||
- Complex agents and coding
|
||||
|
||||
**Configuration Examples:**
|
||||
|
||||
|
|
@ -169,14 +157,14 @@ Anthropic
|
|||
|
||||
llm_providers:
|
||||
# Latest models (examples - use any Anthropic chat model)
|
||||
- model: anthropic/claude-3-5-sonnet-20241022
|
||||
- model: anthropic/claude-opus-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
|
||||
- model: anthropic/claude-3-5-haiku-20241022
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
|
||||
# Use any model name from Anthropic's API
|
||||
- model: anthropic/claude-sonnet-4
|
||||
- model: anthropic/claude-haiku-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
|
||||
DeepSeek
|
||||
|
|
@ -267,7 +255,7 @@ Groq
|
|||
|
||||
**Authentication:** API Key - Get your Groq API key from `Groq Console <https://console.groq.com/keys>`_.
|
||||
|
||||
**Supported Chat Models:** All Groq chat models including Llama 3, Mixtral, Gemma, and all future releases.
|
||||
**Supported Chat Models:** All Groq chat models including Llama 4, GPT OSS, Mixtral, Gemma, and all future releases.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
|
@ -276,25 +264,28 @@ Groq
|
|||
* - Model Name
|
||||
- Model ID for Config
|
||||
- Description
|
||||
* - Llama 3.1 8B
|
||||
- ``groq/llama3-8b-8192``
|
||||
* - Llama 4 Maverick 17B
|
||||
- ``groq/llama-4-maverick-17b-128e-instruct``
|
||||
- Fast inference Llama model
|
||||
* - Llama 3.1 70B
|
||||
- ``groq/llama3-70b-8192``
|
||||
- Larger Llama model
|
||||
* - Mixtral 8x7B
|
||||
- ``groq/mixtral-8x7b-32768``
|
||||
- Mixture of experts model
|
||||
* - Llama 4 Scout 8B
|
||||
- ``groq/llama-4-scout-8b-128e-instruct``
|
||||
- Smaller Llama model
|
||||
* - GPT OSS 20B
|
||||
- ``groq/gpt-oss-20b``
|
||||
- Open source GPT model
|
||||
|
||||
**Configuration Examples:**
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
llm_providers:
|
||||
- model: groq/llama3-8b-8192
|
||||
- model: groq/llama-4-maverick-17b-128e-instruct
|
||||
access_key: $GROQ_API_KEY
|
||||
|
||||
- model: groq/mixtral-8x7b-32768
|
||||
- model: groq/llama-4-scout-8b-128e-instruct
|
||||
access_key: $GROQ_API_KEY
|
||||
|
||||
- model: groq/gpt-oss-20b
|
||||
access_key: $GROQ_API_KEY
|
||||
|
||||
Google Gemini
|
||||
|
|
@ -306,7 +297,7 @@ Google Gemini
|
|||
|
||||
**Authentication:** API Key - Get your Google AI API key from `Google AI Studio <https://aistudio.google.com/app/apikey>`_.
|
||||
|
||||
**Supported Chat Models:** All Google Gemini chat models including Gemini 1.5 Pro, Gemini 1.5 Flash, and all future releases.
|
||||
**Supported Chat Models:** All Google Gemini chat models including Gemini 3 Pro, Gemini 3 Flash, and all future releases.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
|
@ -315,11 +306,11 @@ Google Gemini
|
|||
* - Model Name
|
||||
- Model ID for Config
|
||||
- Description
|
||||
* - Gemini 1.5 Pro
|
||||
- ``gemini/gemini-1.5-pro``
|
||||
* - Gemini 3 Pro
|
||||
- ``gemini/gemini-3-pro``
|
||||
- Advanced reasoning and creativity
|
||||
* - Gemini 1.5 Flash
|
||||
- ``gemini/gemini-1.5-flash``
|
||||
* - Gemini 3 Flash
|
||||
- ``gemini/gemini-3-flash``
|
||||
- Fast and efficient model
|
||||
|
||||
**Configuration Examples:**
|
||||
|
|
@ -327,10 +318,10 @@ Google Gemini
|
|||
.. code-block:: yaml
|
||||
|
||||
llm_providers:
|
||||
- model: gemini/gemini-1.5-pro
|
||||
- model: gemini/gemini-3-pro
|
||||
access_key: $GOOGLE_API_KEY
|
||||
|
||||
- model: gemini/gemini-1.5-flash
|
||||
- model: gemini/gemini-3-flash
|
||||
access_key: $GOOGLE_API_KEY
|
||||
|
||||
Together AI
|
||||
|
|
@ -524,7 +515,7 @@ Amazon Bedrock
|
|||
|
||||
**Provider Prefix:** ``amazon_bedrock/``
|
||||
|
||||
**API Endpoint:** Arch automatically constructs the endpoint as:
|
||||
**API Endpoint:** Plano automatically constructs the endpoint as:
|
||||
- Non-streaming: ``/model/{model-id}/converse``
|
||||
- Streaming: ``/model/{model-id}/converse-stream``
|
||||
|
||||
|
|
@ -723,7 +714,7 @@ Configure routing preferences for dynamic model selection:
|
|||
.. code-block:: yaml
|
||||
|
||||
llm_providers:
|
||||
- model: openai/gpt-4o
|
||||
- model: openai/gpt-5.2
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
|
|
@ -731,7 +722,7 @@ Configure routing preferences for dynamic model selection:
|
|||
- name: code_review
|
||||
description: reviewing and analyzing existing code for bugs and improvements
|
||||
|
||||
- model: anthropic/claude-3-5-sonnet-20241022
|
||||
- model: anthropic/claude-sonnet-4-5
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: creative_writing
|
||||
|
|
@ -741,15 +732,15 @@ Model Selection Guidelines
|
|||
--------------------------
|
||||
|
||||
**For Production Applications:**
|
||||
- **High Performance**: OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet
|
||||
- **Cost-Effective**: OpenAI GPT-4o mini, Anthropic Claude 3.5 Haiku
|
||||
- **High Performance**: OpenAI GPT-5.2, Anthropic Claude Sonnet 4.5
|
||||
- **Cost-Effective**: OpenAI GPT-5, Anthropic Claude Haiku 4.5
|
||||
- **Code Tasks**: DeepSeek Coder, Together AI Code Llama
|
||||
- **Local Deployment**: Ollama with Llama 3.1 or Code Llama
|
||||
|
||||
**For Development/Testing:**
|
||||
- **Fast Iteration**: Groq models (optimized inference)
|
||||
- **Local Testing**: Ollama models
|
||||
- **Cost Control**: Smaller models like GPT-4o mini or Mistral Small
|
||||
- **Cost Control**: Smaller models like GPT-4o or Mistral Small
|
||||
|
||||
See Also
|
||||
--------
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue