mirror of
https://github.com/katanemo/plano.git
synced 2026-06-05 14:45:15 +02:00
fixed cli to use poetry as well. this way we make it easy to have the… (#160)
This commit is contained in:
parent
e81ca8d5cf
commit
1acf43ff7a
26 changed files with 771 additions and 116 deletions
|
|
@ -29,16 +29,6 @@ might not be available.
|
|||
|
||||
$ archgw up --local-cpu
|
||||
|
||||
Local Serving (GPU - Fast)
|
||||
--------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
|
||||
machine and utilize the GPU available for fast inference across all model use cases, including function calling
|
||||
guardails, etc.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ archgw up --local-gpu
|
||||
|
||||
Cloud Serving (GPU - Blazing Fast)
|
||||
----------------------------------
|
||||
The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
|
|
|
|||
|
|
@ -107,7 +107,7 @@ traffic, apply rate limits, and utilize a large set of traffic management capabi
|
|||
|
||||
.. Attention::
|
||||
When you start Arch, it automatically creates a listener port for egress calls to upstream LLMs. This is based on the
|
||||
``llm_providers`` configuration section in the ``prompt_config.yml`` file. Arch binds itself to a local address such as
|
||||
``llm_providers`` configuration section in the ``arch_config.yml`` file. Arch binds itself to a local address such as
|
||||
127.0.0.1:12000/v1.
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue