Salmanap/docs v1 push (#92)

* updated model serving, updated the config references, architecture docs and added the llm_provider section

* several documentation changes to improve sections like life_of_a_request, model serving subsystem

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local>
This commit is contained in:
Salman Paracha 2024-09-27 15:37:49 -07:00 committed by GitHub
parent 8a4e11077c
commit 7168b14ed3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 375 additions and 119 deletions

View file

@ -1,23 +1,37 @@
.. _arch_overview_prompt_handling:
Prompts
=======
-------
Arch's primary design point is to securely accept, process and handle prompts. To do that effectively,
Arch relies on Envoy's HTTP `connection management <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/http/http_connection_management>`_,
subsystem and its prompt-handler subsystem engineered with purpose-built :ref:`LLMs <llms_in_arch>` to implement
critical functionality on behalf of developers so that you can stay focused on business logic.
subsystem and its **prompt handler** subsystem engineered with purpose-built :ref:`LLMs <llms_in_arch>` to
implement critical functionality on behalf of developers so that you can stay focused on business logic.
.. Note::
Arch's **prompt handler** subsystem interacts with the **model** subsytem through Envoy's cluster manager
system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts. Read more
about the :ref:`model subsystem <arch_model_serving>` and how the LLMs are hosted in Arch.
Messages
--------
Arch accepts messages directly from the body of the HTTP request in a format that follows the `Hugging Face Messages API <https://huggingface.co/docs/text-generation-inference/en/messages_api>`_.
This design allows developers to pass a list of messages, where each message is represented as a dictionary
containing two key-value pairs:
- **Role**: Defines the role of the message sender, such as "user" or "assistant".
- **Content**: Contains the actual text of the message.
Prompt Guardrails
-----------------
Arch is engineered with :ref:`Arch-Guard <llms_in_arch>`, an industry leading safety layer, powered by a
compact and high-performimg LLM that monitors incoming prompts to detect and reject jailbreak attempts and
several safety related concerns, ensuring that unauthorized or harmful behaviors are intercepted early in
the process. Arch-Guard is a composite model combining work from the industry leading Meta LLama models and
purposely-tuned models that offer exceptional overall performance.
compact and high-performimg LLM that monitors incoming prompts to detect and reject jailbreak attempts -
ensuring that unauthorized or harmful behaviors are intercepted early in the process.
To add prompt guardrails, see example below:
To add jailbreak guardrails, see example below:
.. literalinclude:: /_config/getting-started.yml
:language: yaml
@ -26,9 +40,9 @@ To add prompt guardrails, see example below:
:caption: :download:`arch-getting-started.yml </_config/getting-started.yml>`
.. Note::
As a roadmap item, Arch will expose the ability for developers to define custom guardrails via Arch-Guard-v2,
which would enforce instructions defined by the application developer to control conversational flow. To
offer feedback on our roadmap, please visit our `github page <https://github.com/orgs/katanemo/projects/1>`_
As a roadmap item, Arch will expose the ability for developers to define custom guardrails via Arch-Guard-v2,
and add support for additional safety checks defined by developers and hazardous categories like, violent crimes, privacy, hate,
etc. To offer feedback on our roadmap, please visit our `github page <https://github.com/orgs/katanemo/projects/1>`_
Prompt Targets
@ -132,7 +146,6 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
print("OpenAI Response:", response.choices[0].text.strip())
In these examples:
The ArchClient is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.