add precommit check (#97)

* add precommit check * remove check * Revert "remove check" This reverts commit 9987b62b9b. * fix checks * fix whitespace errors
2026-05-06 14:22:51 +02:00 · 2024-09-30 14:54:01 -07:00 · 2024-09-30 14:54:01 -07:00 · 4182879717
commit 4182879717
parent 1e61452310
26 changed files with 292 additions and 312 deletions
--- a/docs/source/intro/architecture/model_serving/model_serving.rst
+++ b/docs/source/intro/architecture/model_serving/model_serving.rst
@ -3,10 +3,10 @@
 Model Serving
 -------------

-Arch is a set of **two** self-contained processes that are designed to run alongside your application 
-servers (or on a separate host connected via a network). The first process is designated to manage low-level 
-networking and HTTP related comcerns, and the other process is for **model serving**, which helps Arch make 
-intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built 
+Arch is a set of **two** self-contained processes that are designed to run alongside your application
+servers (or on a separate host connected via a network). The first process is designated to manage low-level
+networking and HTTP related comcerns, and the other process is for **model serving**, which helps Arch make
+intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
 :ref:`LLMs <llms_in_arch>` in Arch.

 .. image:: /_static/img/arch-system-architecture.jpg
@ -15,16 +15,16 @@ intelligent decisions about the incoming prompts. The model server is designed t

 _____________________________________________________________________________________________________________

-Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't 
-have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically) 
-can process prompts quickly and forward control back to the applicaton host. There are three modes in which Arch 
+Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
+have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
+can process prompts quickly and forward control back to the applicaton host. There are three modes in which Arch
 can be configured to run its **model server** subsystem:

 Local Serving (CPU - Moderate)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The following bash commands enable you to configure the model server subsystem in Arch to run local on device 
-and only use CPU devices. This will be the slowest option but can be useful in dev/test scenarios where GPUs 
-might not be available. 
+The following bash commands enable you to configure the model server subsystem in Arch to run local on device
+and only use CPU devices. This will be the slowest option but can be useful in dev/test scenarios where GPUs
+might not be available.

 .. code-block:: bash

@ -32,25 +32,25 @@ might not be available.

 Local Serving (GPU- Fast)
 ^^^^^^^^^^^^^^^^^^^^^^^^^
-The following bash commands enable you to configure the model server subsystem in Arch to run locally on the 
+The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
 machine and utilize the GPU available for fast inference across all model use cases, including function calling
 guardails, etc.

 .. code-block:: bash

-    archgw up --local 
+    archgw up --local

 Cloud Serving (GPU - Blazing Fast)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to 
-cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance 
-of your applications. 
+The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
+cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
+of your applications.

 .. code-block:: bash

-    archgw up 
+    archgw up

 .. Note::
-    Arch's model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with averlage latency 
-    of 200ms (10x faster than GPT-4o). Please refer to our :ref:`getting started guide <getting_started>` to know 
-    how to generate API keys for model serving
+    Arch's model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with averlage latency
+    of 200ms (10x faster than GPT-4o). Please refer to our :ref:`getting started guide <getting_started>` to know
+    how to generate API keys for model serving