mirror of
https://github.com/katanemo/plano.git
synced 2026-04-27 09:46:28 +02:00
Tweak readme docs for minor nits (#461)
Co-authored-by: darkdatter <msylvia@tradestax.io>
This commit is contained in:
parent
4d2d8bd7a1
commit
e7b0de2a72
13 changed files with 38 additions and 38 deletions
|
|
@ -5,7 +5,7 @@ Model Serving
|
|||
|
||||
Arch is a set of `two` self-contained processes that are designed to run alongside your application
|
||||
servers (or on a separate host connected via a network). The first process is designated to manage low-level
|
||||
networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
|
||||
networking and HTTP related concerns, and the other process is for model serving, which helps Arch make
|
||||
intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
|
||||
LLMs in Arch.
|
||||
|
||||
|
|
@ -16,7 +16,7 @@ LLMs in Arch.
|
|||
|
||||
Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
|
||||
have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
|
||||
can process prompts quickly and forward control back to the applicaton host. There are three modes in which Arch
|
||||
can process prompts quickly and forward control back to the application host. There are three modes in which Arch
|
||||
can be configured to run its **model server** subsystem:
|
||||
|
||||
Local Serving (CPU - Moderate)
|
||||
|
|
@ -32,7 +32,7 @@ might not be available.
|
|||
Cloud Serving (GPU - Blazing Fast)
|
||||
----------------------------------
|
||||
The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
|
||||
cloud serving for function calling and guardrails scenarios to dramatically improve the speed and overall performance
|
||||
of your applications.
|
||||
|
||||
.. code-block:: console
|
||||
|
|
@ -40,6 +40,6 @@ of your applications.
|
|||
$ archgw up
|
||||
|
||||
.. Note::
|
||||
Arch's model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with averlage latency
|
||||
Arch's model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with average latency
|
||||
of 200ms (10x faster than GPT-4o). Please refer to our :ref:`Get Started <quickstart>` to know
|
||||
how to generate API keys for model serving
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue