plano/docs/source/concepts/llm_provider.rst

.. _llm_provider:

LLM Provider
============

**LLM provider** is a top-level primitive in Arch, helping developers centrally define, secure, observe,
and manage the usage of their LLMs. Arch builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_
to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,
ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly
switching between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs
across applications.


Below is an example of how you can configure ``llm_providers`` with an instance of an Arch gateway.

.. literalinclude:: includes/arch_config.yaml
    :language: yaml
    :linenos:
    :lines: 1-20
    :emphasize-lines: 10-16
    :caption: Example Configuration

.. Note::
    When you start Arch, it creates a listener port for egress traffic based on the presence of ``llm_providers``
    configuration section in the ``arch_config.yml`` file. Arch binds itself to a local address such as
    ``127.0.0.1:12000``.

Arch also offers vendor-agnostic SDKs and libraries to make LLM calls to API-based LLM providers (like OpenAI,
Anthropic, Mistral, Cohere, etc.) and supports calls to OSS LLMs that are hosted on your infrastructure. Arch
abstracts the complexities of integrating with different LLM providers, providing a unified interface for making
calls, handling retries, managing rate limits, and ensuring seamless integration with cloud-based and on-premise
LLMs. Simply configure the details of the LLMs your application will use, and Arch offers a unified interface to
make outbound LLM calls.

Adding custom LLM Provider
--------------------------

We support any OpenAI compliant LLM for example mistral, openai, ollama etc. We offer first class support for openai and ollama. You can easily configure an LLM that communicates over the OpenAI API interface, by following the below guide.

For example following code block shows you how to add an ollama-supported LLM in the `arch_config.yaml` file.

.. code-block:: yaml

    - name: local-llama
      provider_interface: openai
      model: llama3.2
      endpoint: host.docker.internal:11434


For example following code block shows you how to add mistral llm provider in the `arch_config.yaml` file.

.. code-block:: yaml

    - name: mistral-ai
      provider_interface: openai
      model: ministral-3b-latest
      endpoint: api.mistral.ai:443
      protocol: https


Example: Using the OpenAI Python SDK
------------------------------------

.. code-block:: python

   from openai import OpenAI

   # Initialize the Arch client
   client = OpenAI(base_url="http://127.0.0.12000/")

   # Define your LLM provider and prompt
   llm_provider = "openai"
   prompt = "What is the capital of France?"

   # Send the prompt to the LLM through Arch
   response = client.completions.create(llm_provider=llm_provider, prompt=prompt)

   # Print the response
   print("LLM Response:", response)
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`.. _llm_provider:`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
			`LLM Provider`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`============`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
Fix errors and improve Doc (#143) * Fix link issues and add icons * Improve Doc * fix test * making minor modifications to shuguangs' doc changes --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com> 2024-10-08 13:18:34 -07:00			`LLM provider is a top-level primitive in Arch, helping developers centrally define, secure, observe,`
docs: update llm_provider.rst (#448) minor fix 2025-03-29 06:35:55 +09:00			and manage the usage of their LLMs. Arch builds on Envoy's reliable `cluster subsystem <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/upstream/cluster_manager>`_
add precommit check (#97) * add precommit check * remove check * Revert "remove check" This reverts commit 9987b62b9b97dca46ce2499b3ff2f6eb589097ec. * fix checks * fix whitespace errors 2024-09-30 14:54:01 -07:00			`to manage egress traffic to LLMs, which includes intelligent routing, retry and fail-over mechanisms,`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`ensuring high availability and fault tolerance. This abstraction also enables developers to seamlessly`
			`switching between LLM providers or upgrade LLM versions, simplifying the integration and scaling of LLMs`
			`across applications.`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00

			Below is an example of how you can configure ``llm_providers`` with an instance of an Arch gateway.

Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`.. literalinclude:: includes/arch_config.yaml`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00			`:language: yaml`
			`:linenos:`
			`:lines: 1-20`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`:emphasize-lines: 10-16`
			`:caption: Example Configuration`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
			`.. Note::`
add precommit check (#97) * add precommit check * remove check * Revert "remove check" This reverts commit 9987b62b9b97dca46ce2499b3ff2f6eb589097ec. * fix checks * fix whitespace errors 2024-09-30 14:54:01 -07:00			When you start Arch, it creates a listener port for egress traffic based on the presence of ``llm_providers``
fixed cli to use poetry as well. this way we make it easy to have the… (#160) 2024-10-09 15:53:12 -07:00			configuration section in the ``arch_config.yml`` file. Arch binds itself to a local address such as
Update doc (#178) * Update doc * Update links 2024-10-10 22:30:54 -07:00			``127.0.0.1:12000``.
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
add precommit check (#97) * add precommit check * remove check * Revert "remove check" This reverts commit 9987b62b9b97dca46ce2499b3ff2f6eb589097ec. * fix checks * fix whitespace errors 2024-09-30 14:54:01 -07:00			`Arch also offers vendor-agnostic SDKs and libraries to make LLM calls to API-based LLM providers (like OpenAI,`
			`Anthropic, Mistral, Cohere, etc.) and supports calls to OSS LLMs that are hosted on your infrastructure. Arch`
			`abstracts the complexities of integrating with different LLM providers, providing a unified interface for making`
			`calls, handling retries, managing rate limits, and ensuring seamless integration with cloud-based and on-premise`
			`LLMs. Simply configure the details of the LLMs your application will use, and Arch offers a unified interface to`
			`make outbound LLM calls.`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
add support for custom llm with ssl support (#380) * add support for custom llm with ssl support Add support for using custom llm that are served through https protocol. * add instructions on how to add custom inference endpoint * fix formatting * add more details * Apply suggestions from code review Co-authored-by: Salman Paracha <salman.paracha@gmail.com> * Apply suggestions from code review * fix precommit --------- Co-authored-by: Salman Paracha <salman.paracha@gmail.com> 2025-01-24 17:14:24 -08:00			`Adding custom LLM Provider`
			`--------------------------`

			`We support any OpenAI compliant LLM for example mistral, openai, ollama etc. We offer first class support for openai and ollama. You can easily configure an LLM that communicates over the OpenAI API interface, by following the below guide.`

			For example following code block shows you how to add an ollama-supported LLM in the `arch_config.yaml` file.
fix llm_provider format (#385) 2025-01-24 20:35:56 -08:00
add support for custom llm with ssl support (#380) * add support for custom llm with ssl support Add support for using custom llm that are served through https protocol. * add instructions on how to add custom inference endpoint * fix formatting * add more details * Apply suggestions from code review Co-authored-by: Salman Paracha <salman.paracha@gmail.com> * Apply suggestions from code review * fix precommit --------- Co-authored-by: Salman Paracha <salman.paracha@gmail.com> 2025-01-24 17:14:24 -08:00			`.. code-block:: yaml`

			`- name: local-llama`
			`provider_interface: openai`
			`model: llama3.2`
			`endpoint: host.docker.internal:11434`


			For example following code block shows you how to add mistral llm provider in the `arch_config.yaml` file.

			`.. code-block:: yaml`

			`- name: mistral-ai`
			`provider_interface: openai`
			`model: ministral-3b-latest`
			`endpoint: api.mistral.ai:443`
			`protocol: https`


Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`Example: Using the OpenAI Python SDK`
			`------------------------------------`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
			`.. code-block:: python`

Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`from openai import OpenAI`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
			`# Initialize the Arch client`
Update doc (#178) * Update doc * Update links 2024-10-10 22:30:54 -07:00			`client = OpenAI(base_url="http://127.0.0.12000/")`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00
			`# Define your LLM provider and prompt`
Doc Update (#129) * init update * Update terminology.rst * fix the branch to create an index.html, and fix pre-commit issues * Doc update * made several changes to the docs after Shuguang's revision * fixing pre-commit issues * fixed the reference file to the final prompt config file * added google analytics --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-10-06 16:54:34 -07:00			`llm_provider = "openai"`
Salmanap/docs v1 push (#92) * updated model serving, updated the config references, architecture docs and added the llm_provider section * several documentation changes to improve sections like life_of_a_request, model serving subsystem --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> 2024-09-27 15:37:49 -07:00			`prompt = "What is the capital of France?"`

			`# Send the prompt to the LLM through Arch`
			`response = client.completions.create(llm_provider=llm_provider, prompt=prompt)`

			`# Print the response`
add precommit check (#97) * add precommit check * remove check * Revert "remove check" This reverts commit 9987b62b9b97dca46ce2499b3ff2f6eb589097ec. * fix checks * fix whitespace errors 2024-09-30 14:54:01 -07:00			`print("LLM Response:", response)`