fixed cli to use poetry as well. this way we make it easy to have the… (#160)

2026-06-05 14:45:15 +02:00 · 2024-10-09 15:53:12 -07:00 · 2024-10-09 15:53:12 -07:00 · 1acf43ff7a
commit 1acf43ff7a
parent e81ca8d5cf
26 changed files with 771 additions and 116 deletions
--- a/docs/source/concepts/tech_overview/model_serving.rst
+++ b/docs/source/concepts/tech_overview/model_serving.rst
@ -29,16 +29,6 @@ might not be available.

    $ archgw up --local-cpu

-Local Serving (GPU - Fast)
--------------------------
-The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
-machine and utilize the GPU available for fast inference across all model use cases, including function calling
-guardails, etc.
-
-.. code-block:: console
-
-    $ archgw up --local-gpu
-
 Cloud Serving (GPU - Blazing Fast)
 ----------------------------------
 The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
--- a/docs/source/concepts/tech_overview/prompt.rst
+++ b/docs/source/concepts/tech_overview/prompt.rst
@ -107,7 +107,7 @@ traffic, apply rate limits, and utilize a large set of traffic management capabi

 .. Attention::
   When you start Arch, it automatically creates a listener port for egress calls to upstream LLMs. This is based on the
-   ``llm_providers`` configuration section in the ``prompt_config.yml`` file. Arch binds itself to a local address such as
+   ``llm_providers`` configuration section in the ``arch_config.yml`` file. Arch binds itself to a local address such as
   127.0.0.1:12000/v1.