Fix errors and improve Doc (#143)

* Fix link issues and add icons * Improve Doc * fix test * making minor modifications to shuguangs' doc changes --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
2026-07-23 16:51:04 +02:00 · 2024-10-08 13:18:34 -07:00 · 2024-10-08 13:18:34 -07:00 · b30ad791f7
commit b30ad791f7
parent 3ed50e61d2
27 changed files with 396 additions and 329 deletions
--- a/docs/source/concepts/tech_overview/model_serving.rst
+++ b/docs/source/concepts/tech_overview/model_serving.rst
@ -1,19 +1,18 @@
-.. _arch_model_serving:
+.. _model_serving:

 Model Serving
-------------
+=============

-Arch is a set of **two** self-contained processes that are designed to run alongside your application
+Arch is a set of `two` self-contained processes that are designed to run alongside your application
 servers (or on a separate host connected via a network). The first process is designated to manage low-level
-networking and HTTP related comcerns, and the other process is for **model serving**, which helps Arch make
+networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
 intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
 LLMs in Arch.

 .. image:: /_static/img/arch-system-architecture.jpg
   :align: center
-   :width: 50%
+   :width: 40%

-_____________________________________________________________________________________________________________

 Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
 have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
@ -21,7 +20,7 @@ can process prompts quickly and forward control back to the applicaton host. The
 can be configured to run its **model server** subsystem:

 Local Serving (CPU - Moderate)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+------------------------------
 The following bash commands enable you to configure the model server subsystem in Arch to run local on device
 and only use CPU devices. This will be the slowest option but can be useful in dev/test scenarios where GPUs
 might not be available.
@ -30,18 +29,18 @@ might not be available.

    $ archgw up --local-cpu

-Local Serving (GPU- Fast)
-^^^^^^^^^^^^^^^^^^^^^^^^^
+Local Serving (GPU - Fast)
+--------------------------
 The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
 machine and utilize the GPU available for fast inference across all model use cases, including function calling
 guardails, etc.

 .. code-block:: console

-    $ archgw up --local
+    $ archgw up --local-gpu

 Cloud Serving (GPU - Blazing Fast)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------------
 The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
 cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
 of your applications.