LLM provider is a top-level primitive in Arch, helping developers centrally define, secure, observe,
-and manage the usage of their LLMs. Arch builds on Envoy’s reliable cluster subsystem
-to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,
-ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly
-switching between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs
-across applications.
-
Below is an example of how you can configure llm_providers with an instance of an Arch gateway.
-
-
Example Configuration
-
1version:v0.1.0
- 2
- 3listeners:
- 4ingress_traffic:
- 5address:0.0.0.0
- 6port:10000
- 7message_format:openai
- 8timeout:30s
- 9
-10# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
-11llm_providers:
-12-access_key:$OPENAI_API_KEY
-13model:openai/gpt-4o
-14default:true
-15
-16# default system prompt used by all prompt targets
-17system_prompt:You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
-18
-19prompt_guards:
-20input_guards:
-
-
-
-
-
Note
-
When you start Arch, it creates a listener port for egress traffic based on the presence of llm_providers
-configuration section in the arch_config.yml file. Arch binds itself to a local address such as
-127.0.0.1:12000.
-
-
Arch also offers vendor-agnostic SDKs and libraries to make LLM calls to API-based LLM providers (like OpenAI,
-Anthropic, Mistral, Cohere, etc.) and supports calls to OSS LLMs that are hosted on your infrastructure. Arch
-abstracts the complexities of integrating with different LLM providers, providing a unified interface for making
-calls, handling retries, managing rate limits, and ensuring seamless integration with cloud-based and on-premise
-LLMs. Simply configure the details of the LLMs your application will use, and Arch offers a unified interface to
-make outbound LLM calls.
-
-
Adding custom LLM Provider
-
We support any OpenAI compliant LLM for example mistral, openai, ollama etc. We also offer first class support for OpenAI, Anthropic, DeepSeek, Mistral, Groq, and Ollama based models.
-You can easily configure an LLM that communicates over the OpenAI API interface, by following the below guide.
-
For example following code block shows you how to add an ollama-supported LLM in the arch_config.yaml file.
fromopenaiimportOpenAI
-
-# Initialize the Arch client
-client=OpenAI(base_url="http://127.0.0.1:2000/")
-
-# Define your model and messages
-model="llama3.2"
-messages=[{"role":"user","content":"What is the capital of France?"}]
-
-# Send the messages to the LLM through Arch
-response=client.chat.completions.create(model=model,messages=messages)
-
-# Print the response
-print("LLM Response:",response.choices[0].message.content)
-
Arch provides a unified interface that works seamlessly with multiple client libraries and tools. You can use your preferred client library without changing your existing code - just point it to Arch’s gateway endpoints.
+
+
Supported Clients
+
+
OpenAI SDK - Full compatibility with OpenAI’s official client
+
Anthropic SDK - Native support for Anthropic’s client library
+
cURL - Direct HTTP requests for any programming language
+
Custom HTTP Clients - Any HTTP client that supports REST APIs
+
+
+
+
Gateway Endpoints
+
Arch exposes two main endpoints:
+
+
+
+
+
+
+
Endpoint
+
Purpose
+
+
+
+
http://127.0.0.1:12000/v1/chat/completions
+
OpenAI-compatible chat completions (LLM Gateway)
+
+
http://127.0.0.1:12000/v1/messages
+
Anthropic-compatible messages (LLM Gateway)
+
+
+
+
+
+
OpenAI (Python) SDK
+
The OpenAI SDK works with any provider through Arch’s OpenAI-compatible endpoint.
+
Installation:
+
pipinstallopenai
+
+
+
Basic Usage:
+
fromopenaiimportOpenAI
+
+# Point to Arch's LLM Gateway
+client=OpenAI(
+api_key="test-key",# Can be any value for local testing
+base_url="http://127.0.0.1:12000/v1"
+)
+
+# Use any model configured in your arch_config.yaml
+completion=client.chat.completions.create(
+model="gpt-4o-mini",# Or use :ref:`model aliases <model_aliases>` like "fast-model"
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"Hello, how are you?"
+}
+]
+)
+
+print(completion.choices[0].message.content)
+
The OpenAI SDK can be used with any provider configured in Arch:
+
# Using Claude model through OpenAI SDK
+completion=client.chat.completions.create(
+model="claude-3-5-sonnet-20241022",
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"Explain quantum computing briefly"
+}
+]
+)
+
+# Using Ollama model through OpenAI SDK
+completion=client.chat.completions.create(
+model="llama3.1",
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"What's the capital of France?"
+}
+]
+)
+
+
+
+
+
Anthropic (Python) SDK
+
The Anthropic SDK works with any provider through Arch’s Anthropic-compatible endpoint.
+
Installation:
+
pipinstallanthropic
+
+
+
Basic Usage:
+
importanthropic
+
+# Point to Arch's LLM Gateway
+client=anthropic.Anthropic(
+api_key="test-key",# Can be any value for local testing
+base_url="http://127.0.0.1:12000"
+)
+
+# Use any model configured in your arch_config.yaml
+message=client.messages.create(
+model="claude-3-5-sonnet-20241022",
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"Hello, please respond briefly!"
+}
+]
+)
+
+print(message.content[0].text)
+
+
+
Streaming Responses:
+
importanthropic
+
+client=anthropic.Anthropic(
+api_key="test-key",
+base_url="http://127.0.0.1:12000"
+)
+
+withclient.messages.stream(
+model="claude-3-5-sonnet-20241022",
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"Tell me about artificial intelligence"
+}
+]
+)asstream:
+# Collect text deltas
+fortextinstream.text_stream:
+print(text,end="")
+
+# Get final assembled message
+final_message=stream.get_final_message()
+final_text="".join(block.textforblockinfinal_message.contentifblock.type=="text")
+
+
+
Using with Non-Anthropic Models:
+
The Anthropic SDK can be used with any provider configured in Arch:
+
# Using OpenAI model through Anthropic SDK
+message=client.messages.create(
+model="gpt-4o-mini",
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"Explain machine learning in simple terms"
+}
+]
+)
+
+# Using Ollama model through Anthropic SDK
+message=client.messages.create(
+model="llama3.1",
+max_tokens=50,
+messages=[
+{
+"role":"user",
+"content":"What is Python programming?"
+}
+]
+)
+
+
+
+
+
cURL Examples
+
For direct HTTP requests or integration with any programming language:
One of Arch’s key features is cross-client compatibility. You can:
+
Use OpenAI SDK with Claude Models:
+
# OpenAI client calling Claude model
+fromopenaiimportOpenAI
+
+client=OpenAI(base_url="http://127.0.0.1:12000/v1",api_key="test")
+
+response=client.chat.completions.create(
+model="claude-3-5-sonnet-20241022",# Claude model
+messages=[{"role":"user","content":"Hello"}]
+)
+
+
+
Use Anthropic SDK with OpenAI Models:
+
# Anthropic client calling OpenAI model
+importanthropic
+
+client=anthropic.Anthropic(base_url="http://127.0.0.1:12000",api_key="test")
+
+response=client.messages.create(
+model="gpt-4o-mini",# OpenAI model
+max_tokens=50,
+messages=[{"role":"user","content":"Hello"}]
+)
+
# Same code works with different underlying models
+defask_question(client,question):
+returnclient.chat.completions.create(
+model="reasoning-model",# Alias could point to any provider
+messages=[{"role":"user","content":question}]
+)
+
+# Works regardless of what "reasoning-model" actually points to
+openai_client=OpenAI(base_url="http://127.0.0.1:12000/v1",api_key="test")
+response=ask_question(openai_client,"Solve this math problem...")
+
LLM Providers are a top-level primitive in Arch, helping developers centrally define, secure, observe,
+and manage the usage of their LLMs. Arch builds on Envoy’s reliable cluster subsystem
+to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,
+ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly
+switch between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs
+across applications.
+
Today, we are enabling you to connect to 11+ different AI providers through a unified interface with advanced routing and management capabilities.
+Whether you’re using OpenAI, Anthropic, Azure OpenAI, local Ollama models, or any OpenAI-compatible provider, Arch provides seamless integration with enterprise-grade features.
Unified Client Interface
+Use your preferred client library without changing existing code (see Client Libraries for details):
+
+
OpenAI Python SDK: Full compatibility with all providers
+
Anthropic Python SDK: Native support with cross-provider capabilities
+
cURL & HTTP Clients: Direct REST API access for any programming language
+
Custom Integrations: Standard HTTP interfaces for seamless integration
+
+
+
+
Key Benefits
+
+
Provider Flexibility: Switch between providers without changing client code
+
Three Routing Methods: Choose from model-based, alias-based, or preference-aligned routing (using Arch-Router-1.5B) strategies
+
Cost Optimization: Route requests to cost-effective models based on complexity
+
Performance Optimization: Use fast models for simple tasks, powerful models for complex reasoning
+
Environment Management: Configure different models for different environments
+
Future-Proof: Easy to add new providers and upgrade models
+
+
+
+
Common Use Cases
+
Development Teams
+- Use aliases like dev.chat.v1 and prod.chat.v1 for environment-specific models
+- Route simple queries to fast/cheap models, complex tasks to powerful models
+- Test new models safely using canary deployments (coming soon)
+
Production Applications
+- Implement fallback strategies across multiple providers for reliability
+- Use intelligent routing to optimize cost and performance automatically
+- Monitor usage patterns and model performance across providers
+
Enterprise Deployments
+- Connect to both cloud providers and on-premises models (Ollama, custom deployments)
+- Apply consistent security and governance policies across all providers
+- Scale across regions using different provider endpoints
Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like gpt-4o-mini or claude-3-5-sonnet-20241022, you can create meaningful aliases like fast-model or arch.summarize.v1.
+
Benefits of Model Aliases:
+
+
Semantic Naming: Use descriptive names that reflect the model’s purpose
+
Version Control: Implement versioning schemes (e.g., v1, v2) for model upgrades
+
Environment Management: Different aliases can point to different models across environments
+
Client Simplification: Clients use consistent, meaningful names regardless of underlying provider
+
Advanced Routing (Coming Soon): Enable guardrails, fallbacks, and traffic splitting at the alias level
+
+
+
Basic Configuration
+
Simple Alias Mapping
+
+
Basic Model Aliases
+
llm_providers:
+-model:openai/gpt-4o-mini
+access_key:$OPENAI_API_KEY
+
+-model:openai/gpt-4o
+access_key:$OPENAI_API_KEY
+
+-model:anthropic/claude-3-5-sonnet-20241022
+access_key:$ANTHROPIC_API_KEY
+
+-model:ollama/llama3.1
+base_url:http://host.docker.internal:11434
+
+# Define aliases that map to the models above
+model_aliases:
+# Semantic versioning approach
+arch.summarize.v1:
+target:gpt-4o-mini
+
+arch.reasoning.v1:
+target:gpt-4o
+
+arch.creative.v1:
+target:claude-3-5-sonnet-20241022
+
+# Functional aliases
+fast-model:
+target:gpt-4o-mini
+
+smart-model:
+target:gpt-4o
+
+creative-model:
+target:claude-3-5-sonnet-20241022
+
+# Local model alias
+local-chat:
+target:llama3.1
+
+
+
+
+
+
Using Aliases
+
Client Code Examples
+
Once aliases are configured, clients can use semantic names instead of provider-specific model names:
+
+
Python Client Usage
+
fromopenaiimportOpenAI
+
+client=OpenAI(base_url="http://127.0.0.1:12000/")
+
+# Use semantic alias instead of provider model name
+response=client.chat.completions.create(
+model="arch.summarize.v1",# Points to gpt-4o-mini
+messages=[{"role":"user","content":"Summarize this document..."}]
+)
+
+# Switch to a different capability
+response=client.chat.completions.create(
+model="arch.reasoning.v1",# Points to gpt-4o
+messages=[{"role":"user","content":"Solve this complex problem..."}]
+)
+
Use version numbers for backward compatibility and gradual model upgrades:
+
model_aliases:
+# Current production version
+arch.summarize.v1:
+target:gpt-4o-mini
+
+# Beta version for testing
+arch.summarize.v2:
+target:gpt-4o
+
+# Stable alias that always points to latest
+arch.summarize.latest:
+target:gpt-4o-mini
+
+
+
Purpose-Based Naming
+
Create aliases that reflect the intended use case:
Different environments can use different underlying models:
+
model_aliases:
+# Development environment - use faster/cheaper models
+dev.chat.v1:
+target:gpt-4o-mini
+
+# Production environment - use more capable models
+prod.chat.v1:
+target:gpt-4o
+
+# Staging environment - test new models
+staging.chat.v1:
+target:claude-3-5-sonnet-20241022
+
+
+
+
+
Advanced Features (Coming Soon)
+
The following features are planned for future releases of model aliases:
+
Guardrails Integration
+
Apply safety, cost, or latency rules at the alias level:
Arch provides first-class support for multiple LLM providers through native integrations and OpenAI-compatible interfaces. This comprehensive guide covers all supported providers, their available chat models, and detailed configuration instructions.
+
+
Note
+
Model Support: Arch supports all chat models from each provider, not just the examples shown in this guide. The configurations below demonstrate common models for reference, but you can use any chat model available from your chosen provider.
+
+
+
Configuration Structure
+
All providers are configured in the llm_providers section of your arch_config.yaml file:
model: Provider prefix and model name (format: provider/model-name)
+
access_key: API key for authentication (supports environment variables)
+
default: Mark a model as the default (optional, boolean)
+
name: Custom name for the provider instance (optional)
+
base_url: Custom endpoint URL (required for some providers)
+
+
+
+
Provider Categories
+
First-Class Providers
+Native integrations with built-in support for provider-specific features and authentication.
+
OpenAI-Compatible Providers
+Any provider that implements the OpenAI API interface can be configured using custom endpoints.
+
+
+
Supported API Endpoints
+
Arch supports the following standardized endpoints across providers:
+
+
+
+
+
+
+
+
Endpoint
+
Purpose
+
Supported Clients
+
+
+
+
/v1/chat/completions
+
OpenAI-style chat completions
+
OpenAI SDK, cURL, custom clients
+
+
/v1/messages
+
Anthropic-style messages
+
Anthropic SDK, cURL, custom clients
+
+
+
+
+
+
First-Class Providers
+
+
OpenAI
+
Provider Prefix:openai/
+
API Endpoint:/v1/chat/completions
+
Authentication: API Key - Get your OpenAI API key from OpenAI Platform.
+
Supported Chat Models: All OpenAI chat models including GPT-5, GPT-4o, GPT-4, GPT-3.5-turbo, and all future releases.
+
+
+
+
+
+
+
+
Model Name
+
Model ID for Config
+
Description
+
+
+
+
GPT-5
+
openai/gpt-5
+
Next-generation model (use any model name from OpenAI’s API)
+
+
GPT-4o
+
openai/gpt-4o
+
Latest multimodal model
+
+
GPT-4o mini
+
openai/gpt-4o-mini
+
Fast, cost-effective model
+
+
GPT-4
+
openai/gpt-4
+
High-capability reasoning model
+
+
GPT-3.5 Turbo
+
openai/gpt-3.5-turbo
+
Balanced performance and cost
+
+
o3-mini
+
openai/o3-mini
+
Reasoning-focused model (preview)
+
+
o3
+
openai/o3
+
Advanced reasoning model (preview)
+
+
+
+
Configuration Examples:
+
llm_providers:
+# Latest models (examples - use any OpenAI chat model)
+-model:openai/gpt-4o-mini
+access_key:$OPENAI_API_KEY
+default:true
+
+-model:openai/gpt-4o
+access_key:$OPENAI_API_KEY
+
+# Use any model name from OpenAI's API
+-model:openai/gpt-5
+access_key:$OPENAI_API_KEY
+
+
+
+
+
Anthropic
+
Provider Prefix:anthropic/
+
API Endpoint:/v1/messages
+
Authentication: API Key - Get your Anthropic API key from Anthropic Console.
+
Supported Chat Models: All Anthropic Claude models including Claude Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus, and all future releases.
+
+
+
+
+
+
+
+
Model Name
+
Model ID for Config
+
Description
+
+
+
+
Claude Sonnet 4
+
anthropic/claude-sonnet-4
+
Next-generation model (use any model name from Anthropic’s API)
+
+
Claude 3.5 Sonnet
+
anthropic/claude-3-5-sonnet-20241022
+
Latest high-performance model
+
+
Claude 3.5 Haiku
+
anthropic/claude-3-5-haiku-20241022
+
Fast and efficient model
+
+
Claude 3 Opus
+
anthropic/claude-3-opus-20240229
+
Most capable model for complex tasks
+
+
Claude 3 Sonnet
+
anthropic/claude-3-sonnet-20240229
+
Balanced performance model
+
+
Claude 3 Haiku
+
anthropic/claude-3-haiku-20240307
+
Fastest model
+
+
+
+
Configuration Examples:
+
llm_providers:
+# Latest models (examples - use any Anthropic chat model)
+-model:anthropic/claude-3-5-sonnet-20241022
+access_key:$ANTHROPIC_API_KEY
+
+-model:anthropic/claude-3-5-haiku-20241022
+access_key:$ANTHROPIC_API_KEY
+
+# Use any model name from Anthropic's API
+-model:anthropic/claude-sonnet-4
+access_key:$ANTHROPIC_API_KEY
+
+
+
+
+
DeepSeek
+
Provider Prefix:deepseek/
+
API Endpoint:/v1/chat/completions
+
Authentication: API Key - Get your DeepSeek API key from DeepSeek Platform.
+
Supported Chat Models: All DeepSeek chat models including DeepSeek-Chat, DeepSeek-Coder, and all future releases.
Configure multiple instances of the same provider:
+
llm_providers:
+# Production OpenAI
+-model:openai/gpt-4o
+access_key:$OPENAI_PROD_KEY
+name:openai-prod
+
+# Development OpenAI (different key/quota)
+-model:openai/gpt-4o-mini
+access_key:$OPENAI_DEV_KEY
+name:openai-dev
+
+
+
+
+
Default Model Configuration
+
Mark one model as the default for fallback scenarios:
+
llm_providers:
+-model:openai/gpt-4o-mini
+access_key:$OPENAI_API_KEY
+default:true# Used when no specific model is requested
+
+
+
+
+
Routing Preferences
+
Configure routing preferences for dynamic model selection:
+
llm_providers:
+-model:openai/gpt-4o
+access_key:$OPENAI_API_KEY
+routing_preferences:
+-name:complex_reasoning
+description:deep analysis, mathematical problem solving, and logical reasoning
+-name:code_review
+description:reviewing and analyzing existing code for bugs and improvements
+
+-model:anthropic/claude-3-5-sonnet-20241022
+access_key:$ANTHROPIC_API_KEY
+routing_preferences:
+-name:creative_writing
+description:creative content generation, storytelling, and writing assistance
+
+
+
+
+
+
Model Selection Guidelines
+
For Production Applications:
+- High Performance: OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet
+- Cost-Effective: OpenAI GPT-4o mini, Anthropic Claude 3.5 Haiku
+- Code Tasks: DeepSeek Coder, Together AI Code Llama
+- Local Deployment: Ollama with Llama 3.1 or Code Llama
+
For Development/Testing:
+- Fast Iteration: Groq models (optimized inference)
+- Local Testing: Ollama models
+- Cost Control: Smaller models like GPT-4o mini or Mistral Small
+
+
+
See Also
+
+
Client Libraries - Using different client libraries with providers