diff --git a/build_with_arch/agent.html b/build_with_arch/agent.html index f6b9a271..dbafbf38 100755 --- a/build_with_arch/agent.html +++ b/build_with_arch/agent.html @@ -108,7 +108,12 @@
  • Error Target
  • -
  • LLM Provider
  • +
  • +
  • Prompt Target
  • Guides

    diff --git a/build_with_arch/multi_turn.html b/build_with_arch/multi_turn.html index 88da6bcc..6d507b73 100755 --- a/build_with_arch/multi_turn.html +++ b/build_with_arch/multi_turn.html @@ -108,7 +108,12 @@
  • Error Target
  • -
  • LLM Provider
  • +
  • +
  • Prompt Target
  • Guides

    diff --git a/build_with_arch/rag.html b/build_with_arch/rag.html index 2da95451..ad679a1a 100755 --- a/build_with_arch/rag.html +++ b/build_with_arch/rag.html @@ -108,7 +108,12 @@
  • Error Target
  • -
  • LLM Provider
  • +
  • +
  • Prompt Target
  • Guides

    diff --git a/concepts/llm_provider.html b/concepts/llm_provider.html deleted file mode 100755 index c329f90e..00000000 --- a/concepts/llm_provider.html +++ /dev/null @@ -1,283 +0,0 @@ - - - - - - - - - -LLM Provider | Arch Docs v0.3.12 - - - - - - - - - - - - - - - -
    - Skip to content -
    - -
    -
    -
    - -
    -
    -
    -
    -
    -
    - -
    -
    -

    LLM Provider

    -

    LLM provider is a top-level primitive in Arch, helping developers centrally define, secure, observe, -and manage the usage of their LLMs. Arch builds on Envoy’s reliable cluster subsystem -to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms, -ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly -switching between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs -across applications.

    -

    Below is an example of how you can configure llm_providers with an instance of an Arch gateway.

    -
    -
    Example Configuration
    -
     1version: v0.1.0
    - 2
    - 3listeners:
    - 4  ingress_traffic:
    - 5    address: 0.0.0.0
    - 6    port: 10000
    - 7    message_format: openai
    - 8    timeout: 30s
    - 9
    -10# Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way
    -11llm_providers:
    -12  - access_key: $OPENAI_API_KEY
    -13    model: openai/gpt-4o
    -14    default: true
    -15
    -16# default system prompt used by all prompt targets
    -17system_prompt: You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.
    -18
    -19prompt_guards:
    -20  input_guards:
    -
    -
    -
    -
    -

    Note

    -

    When you start Arch, it creates a listener port for egress traffic based on the presence of llm_providers -configuration section in the arch_config.yml file. Arch binds itself to a local address such as -127.0.0.1:12000.

    -
    -

    Arch also offers vendor-agnostic SDKs and libraries to make LLM calls to API-based LLM providers (like OpenAI, -Anthropic, Mistral, Cohere, etc.) and supports calls to OSS LLMs that are hosted on your infrastructure. Arch -abstracts the complexities of integrating with different LLM providers, providing a unified interface for making -calls, handling retries, managing rate limits, and ensuring seamless integration with cloud-based and on-premise -LLMs. Simply configure the details of the LLMs your application will use, and Arch offers a unified interface to -make outbound LLM calls.

    -
    -

    Adding custom LLM Provider

    -

    We support any OpenAI compliant LLM for example mistral, openai, ollama etc. We also offer first class support for OpenAI, Anthropic, DeepSeek, Mistral, Groq, and Ollama based models. -You can easily configure an LLM that communicates over the OpenAI API interface, by following the below guide.

    -

    For example following code block shows you how to add an ollama-supported LLM in the arch_config.yaml file.

    -
    llm_providers:
    -  - model: some_custom_llm_provider/llama3.2
    -    provider_interface: openai
    -    base_url: http://host.docker.internal:11434
    -
    -
    -

    And in the following code block shows you how to add mistral llm provider in the arch_config.yaml file.

    -
    llm_providers:
    -  - name: mistral/ministral-3b-latest
    -    access_key: $MISTRAL_API_KEY
    -
    -
    -
    -
    -

    Example: Using the OpenAI Python SDK

    -
    from openai import OpenAI
    -
    -# Initialize the Arch client
    -client = OpenAI(base_url="http://127.0.0.1:2000/")
    -
    -# Define your model and messages
    -model = "llama3.2"
    -messages = [{"role": "user", "content": "What is the capital of France?"}]
    -
    -# Send the messages to the LLM through Arch
    -response = client.chat.completions.create(model=model, messages=messages)
    -
    -# Print the response
    -print("LLM Response:", response.choices[0].message.content)
    -
    -
    -
    -
    -
    -
    -
    -
    -
    - - - - - - - \ No newline at end of file diff --git a/concepts/llm_providers/client_libraries.html b/concepts/llm_providers/client_libraries.html new file mode 100755 index 00000000..5353a50f --- /dev/null +++ b/concepts/llm_providers/client_libraries.html @@ -0,0 +1,597 @@ + + + + + + + + + +Client Libraries | Arch Docs v0.3.12 + + + + + + + + + + + + + + + +
    + Skip to content +
    + +
    +
    +
    + +
    +
    +
    +
    +
    +
    + +
    +
    +

    Client Libraries

    +

    Arch provides a unified interface that works seamlessly with multiple client libraries and tools. You can use your preferred client library without changing your existing code - just point it to Arch’s gateway endpoints.

    +
    +

    Supported Clients

    +
      +
    • OpenAI SDK - Full compatibility with OpenAI’s official client

    • +
    • Anthropic SDK - Native support for Anthropic’s client library

    • +
    • cURL - Direct HTTP requests for any programming language

    • +
    • Custom HTTP Clients - Any HTTP client that supports REST APIs

    • +
    +
    +
    +

    Gateway Endpoints

    +

    Arch exposes two main endpoints:

    + ++++ + + + + + + + + + + + + + +

    Endpoint

    Purpose

    http://127.0.0.1:12000/v1/chat/completions

    OpenAI-compatible chat completions (LLM Gateway)

    http://127.0.0.1:12000/v1/messages

    Anthropic-compatible messages (LLM Gateway)

    +
    +
    +

    OpenAI (Python) SDK

    +

    The OpenAI SDK works with any provider through Arch’s OpenAI-compatible endpoint.

    +

    Installation:

    +
    pip install openai
    +
    +
    +

    Basic Usage:

    +
    from openai import OpenAI
    +
    +# Point to Arch's LLM Gateway
    +client = OpenAI(
    +    api_key="test-key",  # Can be any value for local testing
    +    base_url="http://127.0.0.1:12000/v1"
    +)
    +
    +# Use any model configured in your arch_config.yaml
    +completion = client.chat.completions.create(
    +    model="gpt-4o-mini",  # Or use :ref:`model aliases <model_aliases>` like "fast-model"
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "Hello, how are you?"
    +        }
    +    ]
    +)
    +
    +print(completion.choices[0].message.content)
    +
    +
    +

    Streaming Responses:

    +
    from openai import OpenAI
    +
    +client = OpenAI(
    +    api_key="test-key",
    +    base_url="http://127.0.0.1:12000/v1"
    +)
    +
    +stream = client.chat.completions.create(
    +    model="gpt-4o-mini",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "Tell me a short story"
    +        }
    +    ],
    +    stream=True
    +)
    +
    +# Collect streaming chunks
    +for chunk in stream:
    +    if chunk.choices[0].delta.content:
    +        print(chunk.choices[0].delta.content, end="")
    +
    +
    +

    Using with Non-OpenAI Models:

    +

    The OpenAI SDK can be used with any provider configured in Arch:

    +
    # Using Claude model through OpenAI SDK
    +completion = client.chat.completions.create(
    +    model="claude-3-5-sonnet-20241022",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "Explain quantum computing briefly"
    +        }
    +    ]
    +)
    +
    +# Using Ollama model through OpenAI SDK
    +completion = client.chat.completions.create(
    +    model="llama3.1",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "What's the capital of France?"
    +        }
    +    ]
    +)
    +
    +
    +
    +
    +

    Anthropic (Python) SDK

    +

    The Anthropic SDK works with any provider through Arch’s Anthropic-compatible endpoint.

    +

    Installation:

    +
    pip install anthropic
    +
    +
    +

    Basic Usage:

    +
    import anthropic
    +
    +# Point to Arch's LLM Gateway
    +client = anthropic.Anthropic(
    +    api_key="test-key",  # Can be any value for local testing
    +    base_url="http://127.0.0.1:12000"
    +)
    +
    +# Use any model configured in your arch_config.yaml
    +message = client.messages.create(
    +    model="claude-3-5-sonnet-20241022",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "Hello, please respond briefly!"
    +        }
    +    ]
    +)
    +
    +print(message.content[0].text)
    +
    +
    +

    Streaming Responses:

    +
    import anthropic
    +
    +client = anthropic.Anthropic(
    +    api_key="test-key",
    +    base_url="http://127.0.0.1:12000"
    +)
    +
    +with client.messages.stream(
    +    model="claude-3-5-sonnet-20241022",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "Tell me about artificial intelligence"
    +        }
    +    ]
    +) as stream:
    +    # Collect text deltas
    +    for text in stream.text_stream:
    +        print(text, end="")
    +
    +    # Get final assembled message
    +    final_message = stream.get_final_message()
    +    final_text = "".join(block.text for block in final_message.content if block.type == "text")
    +
    +
    +

    Using with Non-Anthropic Models:

    +

    The Anthropic SDK can be used with any provider configured in Arch:

    +
    # Using OpenAI model through Anthropic SDK
    +message = client.messages.create(
    +    model="gpt-4o-mini",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "Explain machine learning in simple terms"
    +        }
    +    ]
    +)
    +
    +# Using Ollama model through Anthropic SDK
    +message = client.messages.create(
    +    model="llama3.1",
    +    max_tokens=50,
    +    messages=[
    +        {
    +            "role": "user",
    +            "content": "What is Python programming?"
    +        }
    +    ]
    +)
    +
    +
    +
    +
    +

    cURL Examples

    +

    For direct HTTP requests or integration with any programming language:

    +

    OpenAI-Compatible Endpoint:

    +
    # Basic request
    +curl -X POST http://127.0.0.1:12000/v1/chat/completions \
    +  -H "Content-Type: application/json" \
    +  -H "Authorization: Bearer test-key" \
    +  -d '{
    +    "model": "gpt-4o-mini",
    +    "messages": [
    +      {"role": "user", "content": "Hello!"}
    +    ],
    +    "max_tokens": 50
    +  }'
    +
    +# Using :ref:`model aliases <model_aliases>`
    +curl -X POST http://127.0.0.1:12000/v1/chat/completions \
    +  -H "Content-Type: application/json" \
    +  -d '{
    +    "model": "fast-model",
    +    "messages": [
    +      {"role": "user", "content": "Summarize this text..."}
    +    ],
    +    "max_tokens": 100
    +  }'
    +
    +# Streaming request
    +curl -X POST http://127.0.0.1:12000/v1/chat/completions \
    +  -H "Content-Type: application/json" \
    +  -d '{
    +    "model": "gpt-4o-mini",
    +    "messages": [
    +      {"role": "user", "content": "Tell me a story"}
    +    ],
    +    "stream": true,
    +    "max_tokens": 200
    +  }'
    +
    +
    +

    Anthropic-Compatible Endpoint:

    +
    # Basic request
    +curl -X POST http://127.0.0.1:12000/v1/messages \
    +  -H "Content-Type: application/json" \
    +  -H "x-api-key: test-key" \
    +  -H "anthropic-version: 2023-06-01" \
    +  -d '{
    +    "model": "claude-3-5-sonnet-20241022",
    +    "max_tokens": 50,
    +    "messages": [
    +      {"role": "user", "content": "Hello Claude!"}
    +    ]
    +  }'
    +
    +
    +
    +
    +

    Cross-Client Compatibility

    +

    One of Arch’s key features is cross-client compatibility. You can:

    +

    Use OpenAI SDK with Claude Models:

    +
    # OpenAI client calling Claude model
    +from openai import OpenAI
    +
    +client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
    +
    +response = client.chat.completions.create(
    +    model="claude-3-5-sonnet-20241022",  # Claude model
    +    messages=[{"role": "user", "content": "Hello"}]
    +)
    +
    +
    +

    Use Anthropic SDK with OpenAI Models:

    +
    # Anthropic client calling OpenAI model
    +import anthropic
    +
    +client = anthropic.Anthropic(base_url="http://127.0.0.1:12000", api_key="test")
    +
    +response = client.messages.create(
    +    model="gpt-4o-mini",  # OpenAI model
    +    max_tokens=50,
    +    messages=[{"role": "user", "content": "Hello"}]
    +)
    +
    +
    +

    Mix and Match with Model Aliases:

    +
    # Same code works with different underlying models
    +def ask_question(client, question):
    +    return client.chat.completions.create(
    +        model="reasoning-model",  # Alias could point to any provider
    +        messages=[{"role": "user", "content": question}]
    +    )
    +
    +# Works regardless of what "reasoning-model" actually points to
    +openai_client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
    +response = ask_question(openai_client, "Solve this math problem...")
    +
    +
    +
    +
    +

    Error Handling

    +

    OpenAI SDK Error Handling:

    +
    from openai import OpenAI
    +import openai
    +
    +client = OpenAI(base_url="http://127.0.0.1:12000/v1", api_key="test")
    +
    +try:
    +    completion = client.chat.completions.create(
    +        model="nonexistent-model",
    +        messages=[{"role": "user", "content": "Hello"}]
    +    )
    +except openai.NotFoundError as e:
    +    print(f"Model not found: {e}")
    +except openai.APIError as e:
    +    print(f"API error: {e}")
    +
    +
    +

    Anthropic SDK Error Handling:

    +
    import anthropic
    +
    +client = anthropic.Anthropic(base_url="http://127.0.0.1:12000", api_key="test")
    +
    +try:
    +    message = client.messages.create(
    +        model="nonexistent-model",
    +        max_tokens=50,
    +        messages=[{"role": "user", "content": "Hello"}]
    +    )
    +except anthropic.NotFoundError as e:
    +    print(f"Model not found: {e}")
    +except anthropic.APIError as e:
    +    print(f"API error: {e}")
    +
    +
    +
    +
    +

    Best Practices

    +

    Use Model Aliases: +Instead of hardcoding provider-specific model names, use semantic aliases:

    +
    # Good - uses semantic alias
    +model = "fast-model"
    +
    +# Less ideal - hardcoded provider model
    +model = "openai/gpt-4o-mini"
    +
    +
    +

    Environment-Based Configuration: +Use different model aliases for different environments:

    +
    import os
    +
    +# Development uses cheaper/faster models
    +model = os.getenv("MODEL_ALIAS", "dev.chat.v1")
    +
    +response = client.chat.completions.create(
    +    model=model,
    +    messages=[{"role": "user", "content": "Hello"}]
    +)
    +
    +
    +

    Graceful Fallbacks: +Implement fallback logic for better reliability:

    +
    def chat_with_fallback(client, messages, primary_model="smart-model", fallback_model="fast-model"):
    +    try:
    +        return client.chat.completions.create(model=primary_model, messages=messages)
    +    except Exception as e:
    +        print(f"Primary model failed, trying fallback: {e}")
    +        return client.chat.completions.create(model=fallback_model, messages=messages)
    +
    +
    +
    +
    +

    See Also

    + +
    +
    +
    +
    +
    +
    +
    + + + + + + + \ No newline at end of file diff --git a/concepts/llm_providers/llm_providers.html b/concepts/llm_providers/llm_providers.html new file mode 100755 index 00000000..621537ae --- /dev/null +++ b/concepts/llm_providers/llm_providers.html @@ -0,0 +1,314 @@ + + + + + + + + + +LLM Providers | Arch Docs v0.3.12 + + + + + + + + + + + + + + + +
    + Skip to content +
    + +
    +
    +
    + +
    +
    +
    +
    +
    +
    + +
    +
    +

    LLM Providers

    +

    LLM Providers are a top-level primitive in Arch, helping developers centrally define, secure, observe, +and manage the usage of their LLMs. Arch builds on Envoy’s reliable cluster subsystem +to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms, +ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly +switch between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs +across applications.

    +

    Today, we are enabling you to connect to 11+ different AI providers through a unified interface with advanced routing and management capabilities. +Whether you’re using OpenAI, Anthropic, Azure OpenAI, local Ollama models, or any OpenAI-compatible provider, Arch provides seamless integration with enterprise-grade features.

    +
    +

    Core Capabilities

    +

    Multi-Provider Support +Connect to any combination of providers simultaneously (see Supported Providers & Configuration for full details):

    +
      +
    • First-Class Providers: Native integrations with OpenAI, Anthropic, DeepSeek, Mistral, Groq, Google Gemini, Together AI, xAI, Azure OpenAI, and Ollama

    • +
    • OpenAI-Compatible Providers: Any provider implementing the OpenAI Chat Completions API standard

    • +
    +

    Intelligent Routing +Three powerful routing approaches to optimize model selection:

    + +

    Unified Client Interface +Use your preferred client library without changing existing code (see Client Libraries for details):

    +
      +
    • OpenAI Python SDK: Full compatibility with all providers

    • +
    • Anthropic Python SDK: Native support with cross-provider capabilities

    • +
    • cURL & HTTP Clients: Direct REST API access for any programming language

    • +
    • Custom Integrations: Standard HTTP interfaces for seamless integration

    • +
    +
    +
    +

    Key Benefits

    +
      +
    • Provider Flexibility: Switch between providers without changing client code

    • +
    • Three Routing Methods: Choose from model-based, alias-based, or preference-aligned routing (using Arch-Router-1.5B) strategies

    • +
    • Cost Optimization: Route requests to cost-effective models based on complexity

    • +
    • Performance Optimization: Use fast models for simple tasks, powerful models for complex reasoning

    • +
    • Environment Management: Configure different models for different environments

    • +
    • Future-Proof: Easy to add new providers and upgrade models

    • +
    +
    +
    +

    Common Use Cases

    +

    Development Teams +- Use aliases like dev.chat.v1 and prod.chat.v1 for environment-specific models +- Route simple queries to fast/cheap models, complex tasks to powerful models +- Test new models safely using canary deployments (coming soon)

    +

    Production Applications +- Implement fallback strategies across multiple providers for reliability +- Use intelligent routing to optimize cost and performance automatically +- Monitor usage patterns and model performance across providers

    +

    Enterprise Deployments +- Connect to both cloud providers and on-premises models (Ollama, custom deployments) +- Apply consistent security and governance policies across all providers +- Scale across regions using different provider endpoints

    +
    +
    +

    Advanced Features

    + +
    +
    +

    Getting Started

    +

    Dive into specific areas based on your needs:

    + +
    +
    +
    +
    +
    +
    +
    + + + + + + + \ No newline at end of file diff --git a/concepts/llm_providers/model_aliases.html b/concepts/llm_providers/model_aliases.html new file mode 100755 index 00000000..9643b799 --- /dev/null +++ b/concepts/llm_providers/model_aliases.html @@ -0,0 +1,448 @@ + + + + + + + + + +Model Aliases | Arch Docs v0.3.12 + + + + + + + + + + + + + + + +
    + Skip to content +
    + +
    +
    +
    + +
    +
    +
    +
    +
    +
    + +
    +
    +

    Model Aliases

    +

    Model aliases provide semantic, version-controlled names for your models, enabling cleaner client code, easier model management, and advanced routing capabilities. Instead of using provider-specific model names like gpt-4o-mini or claude-3-5-sonnet-20241022, you can create meaningful aliases like fast-model or arch.summarize.v1.

    +

    Benefits of Model Aliases:

    +
      +
    • Semantic Naming: Use descriptive names that reflect the model’s purpose

    • +
    • Version Control: Implement versioning schemes (e.g., v1, v2) for model upgrades

    • +
    • Environment Management: Different aliases can point to different models across environments

    • +
    • Client Simplification: Clients use consistent, meaningful names regardless of underlying provider

    • +
    • Advanced Routing (Coming Soon): Enable guardrails, fallbacks, and traffic splitting at the alias level

    • +
    +
    +

    Basic Configuration

    +

    Simple Alias Mapping

    +
    +
    Basic Model Aliases
    +
    llm_providers:
    +  - model: openai/gpt-4o-mini
    +    access_key: $OPENAI_API_KEY
    +
    +  - model: openai/gpt-4o
    +    access_key: $OPENAI_API_KEY
    +
    +  - model: anthropic/claude-3-5-sonnet-20241022
    +    access_key: $ANTHROPIC_API_KEY
    +
    +  - model: ollama/llama3.1
    +    base_url: http://host.docker.internal:11434
    +
    +# Define aliases that map to the models above
    +model_aliases:
    +  # Semantic versioning approach
    +  arch.summarize.v1:
    +    target: gpt-4o-mini
    +
    +  arch.reasoning.v1:
    +    target: gpt-4o
    +
    +  arch.creative.v1:
    +    target: claude-3-5-sonnet-20241022
    +
    +  # Functional aliases
    +  fast-model:
    +    target: gpt-4o-mini
    +
    +  smart-model:
    +    target: gpt-4o
    +
    +  creative-model:
    +    target: claude-3-5-sonnet-20241022
    +
    +  # Local model alias
    +  local-chat:
    +    target: llama3.1
    +
    +
    +
    +
    +
    +

    Using Aliases

    +

    Client Code Examples

    +

    Once aliases are configured, clients can use semantic names instead of provider-specific model names:

    +
    +
    Python Client Usage
    +
    from openai import OpenAI
    +
    +client = OpenAI(base_url="http://127.0.0.1:12000/")
    +
    +# Use semantic alias instead of provider model name
    +response = client.chat.completions.create(
    +    model="arch.summarize.v1",  # Points to gpt-4o-mini
    +    messages=[{"role": "user", "content": "Summarize this document..."}]
    +)
    +
    +# Switch to a different capability
    +response = client.chat.completions.create(
    +    model="arch.reasoning.v1",  # Points to gpt-4o
    +    messages=[{"role": "user", "content": "Solve this complex problem..."}]
    +)
    +
    +
    +
    +
    +
    cURL Example
    +
    curl -X POST http://127.0.0.1:12000/v1/chat/completions \
    +  -H "Content-Type: application/json" \
    +  -d '{
    +    "model": "fast-model",
    +    "messages": [{"role": "user", "content": "Hello!"}]
    +  }'
    +
    +
    +
    +
    +
    +

    Naming Best Practices

    +

    Semantic Versioning

    +

    Use version numbers for backward compatibility and gradual model upgrades:

    +
    model_aliases:
    +  # Current production version
    +  arch.summarize.v1:
    +    target: gpt-4o-mini
    +
    +  # Beta version for testing
    +  arch.summarize.v2:
    +    target: gpt-4o
    +
    +  # Stable alias that always points to latest
    +  arch.summarize.latest:
    +    target: gpt-4o-mini
    +
    +
    +

    Purpose-Based Naming

    +

    Create aliases that reflect the intended use case:

    +
    model_aliases:
    +  # Task-specific
    +  code-reviewer:
    +    target: gpt-4o
    +
    +  document-summarizer:
    +    target: gpt-4o-mini
    +
    +  creative-writer:
    +    target: claude-3-5-sonnet-20241022
    +
    +  data-analyst:
    +    target: gpt-4o
    +
    +
    +

    Environment-Specific Aliases

    +

    Different environments can use different underlying models:

    +
    model_aliases:
    +  # Development environment - use faster/cheaper models
    +  dev.chat.v1:
    +    target: gpt-4o-mini
    +
    +  # Production environment - use more capable models
    +  prod.chat.v1:
    +    target: gpt-4o
    +
    +  # Staging environment - test new models
    +  staging.chat.v1:
    +    target: claude-3-5-sonnet-20241022
    +
    +
    +
    +
    +

    Advanced Features (Coming Soon)

    +

    The following features are planned for future releases of model aliases:

    +

    Guardrails Integration

    +

    Apply safety, cost, or latency rules at the alias level:

    +
    +
    Future Feature - Guardrails
    +
    model_aliases:
    +  arch.reasoning.v1:
    +    target: gpt-oss-120b
    +    guardrails:
    +      max_latency: 5s
    +      max_cost_per_request: 0.10
    +      block_categories: ["jailbreak", "PII"]
    +      content_filters:
    +        - type: "profanity"
    +        - type: "sensitive_data"
    +
    +
    +
    +

    Fallback Chains

    +

    Provide a chain of models if the primary target fails or hits quota limits:

    +
    +
    Future Feature - Fallbacks
    +
    model_aliases:
    +  arch.summarize.v1:
    +    target: gpt-4o-mini
    +    fallbacks:
    +      - target: llama3.1
    +        conditions: ["quota_exceeded", "timeout"]
    +      - target: claude-3-haiku-20240307
    +        conditions: ["primary_and_first_fallback_failed"]
    +
    +
    +
    +

    Traffic Splitting & Canary Deployments

    +

    Distribute traffic across multiple models for A/B testing or gradual rollouts:

    +
    +
    Future Feature - Traffic Splitting
    +
    model_aliases:
    +  arch.v1:
    +    targets:
    +      - model: llama3.1
    +        weight: 80
    +      - model: gpt-4o-mini
    +        weight: 20
    +
    +  # Canary deployment
    +  arch.experimental.v1:
    +    targets:
    +      - model: gpt-4o      # Current stable
    +        weight: 95
    +      - model: o1-preview  # New model being tested
    +        weight: 5
    +
    +
    +
    +

    Load Balancing

    +

    Distribute requests across multiple instances of the same model:

    +
    +
    Future Feature - Load Balancing
    +
    model_aliases:
    +  high-throughput-chat:
    +    load_balance:
    +      algorithm: "round_robin"  # or "least_connections", "weighted"
    +    targets:
    +      - model: gpt-4o-mini
    +        endpoint: "https://api-1.example.com"
    +      - model: gpt-4o-mini
    +        endpoint: "https://api-2.example.com"
    +      - model: gpt-4o-mini
    +        endpoint: "https://api-3.example.com"
    +
    +
    +
    +
    +
    +

    Validation Rules

    +
      +
    • Alias names must be valid identifiers (alphanumeric, dots, hyphens, underscores)

    • +
    • Target models must be defined in the llm_providers section

    • +
    • Circular references between aliases are not allowed

    • +
    • Weights in traffic splitting must sum to 100

    • +
    +
    +
    +

    See Also

    +
      +
    • LLM Providers - Learn about configuring LLM providers

    • +
    • LLM Routing - Understand how aliases work with intelligent routing

    • +
    +
    +
    +
    +
    +
    +
    +
    + + + + + + + \ No newline at end of file diff --git a/concepts/llm_providers/supported_providers.html b/concepts/llm_providers/supported_providers.html new file mode 100755 index 00000000..67d946c5 --- /dev/null +++ b/concepts/llm_providers/supported_providers.html @@ -0,0 +1,801 @@ + + + + + + + + + +Supported Providers & Configuration | Arch Docs v0.3.12 + + + + + + + + + + + + + + + +
    + Skip to content +
    + +
    +
    +
    + +
    +
    +
    +
    +
    +
    + +
    +
    +

    Supported Providers & Configuration

    +

    Arch provides first-class support for multiple LLM providers through native integrations and OpenAI-compatible interfaces. This comprehensive guide covers all supported providers, their available chat models, and detailed configuration instructions.

    +
    +

    Note

    +

    Model Support: Arch supports all chat models from each provider, not just the examples shown in this guide. The configurations below demonstrate common models for reference, but you can use any chat model available from your chosen provider.

    +
    +
    +

    Configuration Structure

    +

    All providers are configured in the llm_providers section of your arch_config.yaml file:

    +
    version: v0.1
    +
    +listeners:
    +  egress_traffic:
    +    address: 0.0.0.0
    +    port: 12000
    +    message_format: openai
    +    timeout: 30s
    +
    +llm_providers:
    +  # Provider configurations go here
    +  - model: provider/model-name
    +    access_key: $API_KEY
    +    # Additional provider-specific options
    +
    +
    +

    Common Configuration Fields:

    +
      +
    • model: Provider prefix and model name (format: provider/model-name)

    • +
    • access_key: API key for authentication (supports environment variables)

    • +
    • default: Mark a model as the default (optional, boolean)

    • +
    • name: Custom name for the provider instance (optional)

    • +
    • base_url: Custom endpoint URL (required for some providers)

    • +
    +
    +
    +

    Provider Categories

    +

    First-Class Providers +Native integrations with built-in support for provider-specific features and authentication.

    +

    OpenAI-Compatible Providers +Any provider that implements the OpenAI API interface can be configured using custom endpoints.

    +
    +
    +

    Supported API Endpoints

    +

    Arch supports the following standardized endpoints across providers:

    + +++++ + + + + + + + + + + + + + + + + +

    Endpoint

    Purpose

    Supported Clients

    /v1/chat/completions

    OpenAI-style chat completions

    OpenAI SDK, cURL, custom clients

    /v1/messages

    Anthropic-style messages

    Anthropic SDK, cURL, custom clients

    +
    +
    +

    First-Class Providers

    +
    +

    OpenAI

    +

    Provider Prefix: openai/

    +

    API Endpoint: /v1/chat/completions

    +

    Authentication: API Key - Get your OpenAI API key from OpenAI Platform.

    +

    Supported Chat Models: All OpenAI chat models including GPT-5, GPT-4o, GPT-4, GPT-3.5-turbo, and all future releases.

    + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    GPT-5

    openai/gpt-5

    Next-generation model (use any model name from OpenAI’s API)

    GPT-4o

    openai/gpt-4o

    Latest multimodal model

    GPT-4o mini

    openai/gpt-4o-mini

    Fast, cost-effective model

    GPT-4

    openai/gpt-4

    High-capability reasoning model

    GPT-3.5 Turbo

    openai/gpt-3.5-turbo

    Balanced performance and cost

    o3-mini

    openai/o3-mini

    Reasoning-focused model (preview)

    o3

    openai/o3

    Advanced reasoning model (preview)

    +

    Configuration Examples:

    +
    llm_providers:
    +  # Latest models (examples - use any OpenAI chat model)
    +  - model: openai/gpt-4o-mini
    +    access_key: $OPENAI_API_KEY
    +    default: true
    +
    +  - model: openai/gpt-4o
    +    access_key: $OPENAI_API_KEY
    +
    +  # Use any model name from OpenAI's API
    +  - model: openai/gpt-5
    +    access_key: $OPENAI_API_KEY
    +
    +
    +
    +
    +

    Anthropic

    +

    Provider Prefix: anthropic/

    +

    API Endpoint: /v1/messages

    +

    Authentication: API Key - Get your Anthropic API key from Anthropic Console.

    +

    Supported Chat Models: All Anthropic Claude models including Claude Sonnet 4, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus, and all future releases.

    + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    Claude Sonnet 4

    anthropic/claude-sonnet-4

    Next-generation model (use any model name from Anthropic’s API)

    Claude 3.5 Sonnet

    anthropic/claude-3-5-sonnet-20241022

    Latest high-performance model

    Claude 3.5 Haiku

    anthropic/claude-3-5-haiku-20241022

    Fast and efficient model

    Claude 3 Opus

    anthropic/claude-3-opus-20240229

    Most capable model for complex tasks

    Claude 3 Sonnet

    anthropic/claude-3-sonnet-20240229

    Balanced performance model

    Claude 3 Haiku

    anthropic/claude-3-haiku-20240307

    Fastest model

    +

    Configuration Examples:

    +
    llm_providers:
    +  # Latest models (examples - use any Anthropic chat model)
    +  - model: anthropic/claude-3-5-sonnet-20241022
    +    access_key: $ANTHROPIC_API_KEY
    +
    +  - model: anthropic/claude-3-5-haiku-20241022
    +    access_key: $ANTHROPIC_API_KEY
    +
    +  # Use any model name from Anthropic's API
    +  - model: anthropic/claude-sonnet-4
    +    access_key: $ANTHROPIC_API_KEY
    +
    +
    +
    +
    +

    DeepSeek

    +

    Provider Prefix: deepseek/

    +

    API Endpoint: /v1/chat/completions

    +

    Authentication: API Key - Get your DeepSeek API key from DeepSeek Platform.

    +

    Supported Chat Models: All DeepSeek chat models including DeepSeek-Chat, DeepSeek-Coder, and all future releases.

    + +++++ + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    DeepSeek Chat

    deepseek/deepseek-chat

    General purpose chat model

    DeepSeek Coder

    deepseek/deepseek-coder

    Code-specialized model

    +

    Configuration Examples:

    +
    llm_providers:
    +  - model: deepseek/deepseek-chat
    +    access_key: $DEEPSEEK_API_KEY
    +
    +  - model: deepseek/deepseek-coder
    +    access_key: $DEEPSEEK_API_KEY
    +
    +
    +
    +
    +

    Mistral AI

    +

    Provider Prefix: mistral/

    +

    API Endpoint: /v1/chat/completions

    +

    Authentication: API Key - Get your Mistral API key from Mistral AI Console.

    +

    Supported Chat Models: All Mistral chat models including Mistral Large, Mistral Small, Ministral, and all future releases.

    + +++++ + + + + + + + + + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    Mistral Large

    mistral/mistral-large-latest

    Most capable model

    Mistral Medium

    mistral/mistral-medium-latest

    Balanced performance

    Mistral Small

    mistral/mistral-small-latest

    Fast and efficient

    Ministral 3B

    mistral/ministral-3b-latest

    Compact model

    +

    Configuration Examples: +Configuration Examples:

    +
    llm_providers:
    +  - model: mistral/mistral-large-latest
    +    access_key: $MISTRAL_API_KEY
    +
    +  - model: mistral/mistral-small-latest
    +    access_key: $MISTRAL_API_KEY
    +
    +
    +
    +
    +

    Groq

    +

    Provider Prefix: groq/

    +

    API Endpoint: /openai/v1/chat/completions (transformed internally)

    +

    Authentication: API Key - Get your Groq API key from Groq Console.

    +

    Supported Chat Models: All Groq chat models including Llama 3, Mixtral, Gemma, and all future releases.

    + +++++ + + + + + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    Llama 3.1 8B

    groq/llama3-8b-8192

    Fast inference Llama model

    Llama 3.1 70B

    groq/llama3-70b-8192

    Larger Llama model

    Mixtral 8x7B

    groq/mixtral-8x7b-32768

    Mixture of experts model

    +

    Configuration Examples:

    +
    llm_providers:
    +  - model: groq/llama3-8b-8192
    +    access_key: $GROQ_API_KEY
    +
    +  - model: groq/mixtral-8x7b-32768
    +    access_key: $GROQ_API_KEY
    +
    +
    +
    +
    +

    Google Gemini

    +

    Provider Prefix: gemini/

    +

    API Endpoint: /v1beta/openai/chat/completions (transformed internally)

    +

    Authentication: API Key - Get your Google AI API key from Google AI Studio.

    +

    Supported Chat Models: All Google Gemini chat models including Gemini 1.5 Pro, Gemini 1.5 Flash, and all future releases.

    + +++++ + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    Gemini 1.5 Pro

    gemini/gemini-1.5-pro

    Advanced reasoning and creativity

    Gemini 1.5 Flash

    gemini/gemini-1.5-flash

    Fast and efficient model

    +

    Configuration Examples:

    +
    llm_providers:
    +  - model: gemini/gemini-1.5-pro
    +    access_key: $GOOGLE_API_KEY
    +
    +  - model: gemini/gemini-1.5-flash
    +    access_key: $GOOGLE_API_KEY
    +
    +
    +
    +
    +

    Together AI

    +

    Provider Prefix: together_ai/

    +

    API Endpoint: /v1/chat/completions

    +

    Authentication: API Key - Get your Together AI API key from Together AI Settings.

    +

    Supported Chat Models: All Together AI chat models including Llama, CodeLlama, Mixtral, Qwen, and hundreds of other open-source models.

    + +++++ + + + + + + + + + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    Meta Llama 2 7B

    together_ai/meta-llama/Llama-2-7b-chat-hf

    Open source chat model

    Meta Llama 2 13B

    together_ai/meta-llama/Llama-2-13b-chat-hf

    Larger open source model

    Code Llama 34B

    together_ai/codellama/CodeLlama-34b-Instruct-hf

    Code-specialized model

    +

    Configuration Examples:

    +
    llm_providers:
    +  - model: together_ai/meta-llama/Llama-2-7b-chat-hf
    +    access_key: $TOGETHER_API_KEY
    +
    +  - model: together_ai/codellama/CodeLlama-34b-Instruct-hf
    +    access_key: $TOGETHER_API_KEY
    +
    +
    +
    +
    +

    xAI

    +

    Provider Prefix: xai/

    +

    API Endpoint: /v1/chat/completions

    +

    Authentication: API Key - Get your xAI API key from xAI Console.

    +

    Supported Chat Models: All xAI chat models including Grok Beta and all future releases.

    + +++++ + + + + + + + + + + + + +

    Model Name

    Model ID for Config

    Description

    Grok Beta

    xai/grok-beta

    Conversational AI model

    +

    Configuration Examples:

    +
    llm_providers:
    +  - model: xai/grok-beta
    +    access_key: $XAI_API_KEY
    +
    +
    +
    +
    +
    +

    Providers Requiring Base URL

    +
    +

    Azure OpenAI

    +

    Provider Prefix: azure_openai/

    +

    API Endpoint: /openai/deployments/{deployment-name}/chat/completions (constructed automatically)

    +

    Authentication: API Key + Base URL - Get your Azure OpenAI API key from Azure Portal → Your OpenAI Resource → Keys and Endpoint.

    +

    Supported Chat Models: All Azure OpenAI chat models including GPT-4o, GPT-4, GPT-3.5-turbo deployed in your Azure subscription.

    +
    llm_providers:
    +  # Single deployment
    +  - model: azure_openai/gpt-4o
    +    access_key: $AZURE_OPENAI_API_KEY
    +    base_url: https://your-resource.openai.azure.com
    +
    +  # Multiple deployments
    +  - model: azure_openai/gpt-4o-mini
    +    access_key: $AZURE_OPENAI_API_KEY
    +    base_url: https://your-resource.openai.azure.com
    +
    +
    +
    +
    +

    Ollama

    +

    Provider Prefix: ollama/

    +

    API Endpoint: /v1/chat/completions (Ollama’s OpenAI-compatible endpoint)

    +

    Authentication: None (Base URL only) - Install Ollama from Ollama.com and pull your desired models.

    +

    Supported Chat Models: All chat models available in your local Ollama installation. Use ollama list to see installed models.

    +
    llm_providers:
    +  # Local Ollama installation
    +  - model: ollama/llama3.1
    +    base_url: http://localhost:11434
    +
    +  # Ollama in Docker (from host)
    +  - model: ollama/codellama
    +    base_url: http://host.docker.internal:11434
    +
    +
    +
    +
    +

    OpenAI-Compatible Providers

    +

    Supported Models: Any chat models from providers that implement the OpenAI Chat Completions API standard.

    +

    For providers that implement the OpenAI API but aren’t natively supported:

    +
    llm_providers:
    +  # Generic OpenAI-compatible provider
    +  - model: custom-provider/custom-model
    +    base_url: https://api.customprovider.com
    +    provider_interface: openai
    +    access_key: $CUSTOM_API_KEY
    +
    +  # Local deployment
    +  - model: local/llama2-7b
    +    base_url: http://localhost:8000
    +    provider_interface: openai
    +
    +
    +
    +
    +
    +

    Advanced Configuration

    +
    +

    Multiple Provider Instances

    +

    Configure multiple instances of the same provider:

    +
    llm_providers:
    +  # Production OpenAI
    +  - model: openai/gpt-4o
    +    access_key: $OPENAI_PROD_KEY
    +    name: openai-prod
    +
    +  # Development OpenAI (different key/quota)
    +  - model: openai/gpt-4o-mini
    +    access_key: $OPENAI_DEV_KEY
    +    name: openai-dev
    +
    +
    +
    +
    +

    Default Model Configuration

    +

    Mark one model as the default for fallback scenarios:

    +
    llm_providers:
    +  - model: openai/gpt-4o-mini
    +    access_key: $OPENAI_API_KEY
    +    default: true  # Used when no specific model is requested
    +
    +
    +
    +
    +

    Routing Preferences

    +

    Configure routing preferences for dynamic model selection:

    +
    llm_providers:
    +  - model: openai/gpt-4o
    +    access_key: $OPENAI_API_KEY
    +    routing_preferences:
    +      - name: complex_reasoning
    +        description: deep analysis, mathematical problem solving, and logical reasoning
    +      - name: code_review
    +        description: reviewing and analyzing existing code for bugs and improvements
    +
    +  - model: anthropic/claude-3-5-sonnet-20241022
    +    access_key: $ANTHROPIC_API_KEY
    +    routing_preferences:
    +      - name: creative_writing
    +        description: creative content generation, storytelling, and writing assistance
    +
    +
    +
    +
    +
    +

    Model Selection Guidelines

    +

    For Production Applications: +- High Performance: OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet +- Cost-Effective: OpenAI GPT-4o mini, Anthropic Claude 3.5 Haiku +- Code Tasks: DeepSeek Coder, Together AI Code Llama +- Local Deployment: Ollama with Llama 3.1 or Code Llama

    +

    For Development/Testing: +- Fast Iteration: Groq models (optimized inference) +- Local Testing: Ollama models +- Cost Control: Smaller models like GPT-4o mini or Mistral Small

    +
    +
    +

    See Also

    + +
    +
    +
    +
    +
    +
    +
    + + + + + + + \ No newline at end of file diff --git a/concepts/prompt_target.html b/concepts/prompt_target.html index ce6f6968..2b92e937 100755 --- a/concepts/prompt_target.html +++ b/concepts/prompt_target.html @@ -19,7 +19,7 @@ - +