mirror of
https://github.com/katanemo/plano.git
synced 2026-05-03 12:52:56 +02:00
Fix errors and improve Doc (#143)
* Fix link issues and add icons * Improve Doc * fix test * making minor modifications to shuguangs' doc changes --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
parent
3ed50e61d2
commit
b30ad791f7
27 changed files with 396 additions and 329 deletions
|
|
@ -1,19 +1,18 @@
|
|||
.. _arch_model_serving:
|
||||
.. _model_serving:
|
||||
|
||||
Model Serving
|
||||
-------------
|
||||
=============
|
||||
|
||||
Arch is a set of **two** self-contained processes that are designed to run alongside your application
|
||||
Arch is a set of `two` self-contained processes that are designed to run alongside your application
|
||||
servers (or on a separate host connected via a network). The first process is designated to manage low-level
|
||||
networking and HTTP related comcerns, and the other process is for **model serving**, which helps Arch make
|
||||
networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
|
||||
intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
|
||||
LLMs in Arch.
|
||||
|
||||
.. image:: /_static/img/arch-system-architecture.jpg
|
||||
:align: center
|
||||
:width: 50%
|
||||
:width: 40%
|
||||
|
||||
_____________________________________________________________________________________________________________
|
||||
|
||||
Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
|
||||
have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
|
||||
|
|
@ -21,7 +20,7 @@ can process prompts quickly and forward control back to the applicaton host. The
|
|||
can be configured to run its **model server** subsystem:
|
||||
|
||||
Local Serving (CPU - Moderate)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
------------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run local on device
|
||||
and only use CPU devices. This will be the slowest option but can be useful in dev/test scenarios where GPUs
|
||||
might not be available.
|
||||
|
|
@ -30,18 +29,18 @@ might not be available.
|
|||
|
||||
$ archgw up --local-cpu
|
||||
|
||||
Local Serving (GPU- Fast)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Local Serving (GPU - Fast)
|
||||
--------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
|
||||
machine and utilize the GPU available for fast inference across all model use cases, including function calling
|
||||
guardails, etc.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ archgw up --local
|
||||
$ archgw up --local-gpu
|
||||
|
||||
Cloud Serving (GPU - Blazing Fast)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
----------------------------------
|
||||
The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
|
||||
of your applications.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue