mirror of
https://github.com/katanemo/plano.git
synced 2026-04-30 19:36:34 +02:00
Fix errors and improve Doc (#143)
* Fix link issues and add icons * Improve Doc * fix test * making minor modifications to shuguangs' doc changes --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-261.local> Co-authored-by: Adil Hafeez <adil@katanemo.com>
This commit is contained in:
parent
3ed50e61d2
commit
b30ad791f7
27 changed files with 396 additions and 329 deletions
55
docs/source/concepts/tech_overview/error_target.rst
Normal file
55
docs/source/concepts/tech_overview/error_target.rst
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
.. _error_target:
|
||||
|
||||
Error Target
|
||||
=============
|
||||
|
||||
**Error targets** are designed to capture and manage specific issues or exceptions that occur during Arch's function or system's execution.
|
||||
|
||||
These endpoints receive errors forwarded from Arch when issues arise, such as improper function/API calls, guardrail violations, or other processing errors.
|
||||
The errors are communicated to the application via headers like ``X-Arch-[ERROR-TYPE]``, enabling you to respond appropriately and handle errors gracefully.
|
||||
|
||||
|
||||
Key Concepts
|
||||
------------
|
||||
|
||||
- **Error Type**: Categorizes the nature of the error, such as "ValidationError" or "RuntimeError." These error types help in identifying what kind of issue occurred and provide context for troubleshooting.
|
||||
|
||||
- **Error Message**: A clear, human-readable message describing the error. This should provide enough detail to inform users or developers of the root cause or required action.
|
||||
|
||||
- **Target Prompt**: The specific prompt or operation where the error occurred. Understanding where the error happened helps with debugging and pinpointing the source of the problem.
|
||||
|
||||
- **Parameter-Specific Errors**: Errors that arise due to invalid or missing parameters when invoking a function. These errors are critical for ensuring the correctness of inputs.
|
||||
|
||||
|
||||
Error Header Example
|
||||
--------------------
|
||||
|
||||
.. code-block:: bash
|
||||
:caption: Error Header Example
|
||||
|
||||
HTTP/1.1 400 Bad Request
|
||||
X-Arch-Error-Type: FunctionValidationError
|
||||
X-Arch-Error-Message: Tools call parsing failure
|
||||
X-Arch-Target-Prompt: createUser
|
||||
Content-Type: application/json
|
||||
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Please create a user with the following ID: 1234"
|
||||
},
|
||||
{
|
||||
"role": "system",
|
||||
"content": "Expected a string for 'user_id', but got an integer."
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
Best Practices and Tips
|
||||
-----------------------
|
||||
|
||||
- **Graceful Degradation**: If an error occurs, fail gracefully by providing fallback logic or alternative flows when possible.
|
||||
|
||||
- **Log Errors**: Always log errors on the server side for later analysis.
|
||||
|
||||
- **Client-Side Handling**: Make sure the client can interpret error responses and provide meaningful feedback to the user. Clients should not display raw error codes or stack traces but rather handle them gracefully.
|
||||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
Listener
|
||||
---------
|
||||
Listener is a top level primitive in Arch, which simplifies the configuration required to bind incoming
|
||||
**Listener** is a top level primitive in Arch, which simplifies the configuration required to bind incoming
|
||||
connections from downstream clients, and for egress connections to LLMs (hosted or API)
|
||||
|
||||
Arch builds on Envoy's Listener subsystem to streamline connection managemet for developers. Arch minimizes
|
||||
|
|
@ -15,23 +15,23 @@ Downstream (Ingress)
|
|||
Developers can configure Arch to accept connections from downstream clients. A downstream listener acts as the
|
||||
primary entry point for incoming traffic, handling initial connection setup, including network filtering, gurdrails,
|
||||
and additional network security checks. For more details on prompt security and safety,
|
||||
see :ref:`here <arch_overview_prompt_handling>`
|
||||
see :ref:`here <arch_overview_prompt_handling>`.
|
||||
|
||||
Upstream (Egress)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Arch automatically configures a listener to route requests from your application to upstream LLM API providers (or hosts).
|
||||
When you start Arch, it creates a listener for egress traffic based on the presence of the ``llm_providers`` configuration
|
||||
section in the ``prompt_config.yml`` file. Arch binds itself to a local address such as ``127.0.0.1:9000/v1`` or a DNS-based
|
||||
address like ``arch.local:9000/v1`` for outgoing traffic. For more details on LLM providers, read :ref:`here <llm_provider>`
|
||||
When you start Arch, it creates a listener for egress traffic based on the presence of the ``listener`` configuration
|
||||
section in the configuration file. Arch binds itself to a local address such as ``127.0.0.1:9000/v1`` or a DNS-based
|
||||
address like ``arch.local:9000/v1`` for outgoing traffic. For more details on LLM providers, read :ref:`here <llm_provider>`.
|
||||
|
||||
Configure Listener
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To configure a Downstream (Ingress) Listner, simply add the ``listener`` directive to your ``prompt_config.yml`` file:
|
||||
To configure a Downstream (Ingress) Listner, simply add the ``listener`` directive to your configuration file:
|
||||
|
||||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:lines: 1-18
|
||||
:emphasize-lines: 2-5
|
||||
:emphasize-lines: 3-7
|
||||
:caption: Example Configuration
|
||||
|
|
|
|||
|
|
@ -1,19 +1,18 @@
|
|||
.. _arch_model_serving:
|
||||
.. _model_serving:
|
||||
|
||||
Model Serving
|
||||
-------------
|
||||
=============
|
||||
|
||||
Arch is a set of **two** self-contained processes that are designed to run alongside your application
|
||||
Arch is a set of `two` self-contained processes that are designed to run alongside your application
|
||||
servers (or on a separate host connected via a network). The first process is designated to manage low-level
|
||||
networking and HTTP related comcerns, and the other process is for **model serving**, which helps Arch make
|
||||
networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
|
||||
intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
|
||||
LLMs in Arch.
|
||||
|
||||
.. image:: /_static/img/arch-system-architecture.jpg
|
||||
:align: center
|
||||
:width: 50%
|
||||
:width: 40%
|
||||
|
||||
_____________________________________________________________________________________________________________
|
||||
|
||||
Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
|
||||
have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
|
||||
|
|
@ -21,7 +20,7 @@ can process prompts quickly and forward control back to the applicaton host. The
|
|||
can be configured to run its **model server** subsystem:
|
||||
|
||||
Local Serving (CPU - Moderate)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
------------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run local on device
|
||||
and only use CPU devices. This will be the slowest option but can be useful in dev/test scenarios where GPUs
|
||||
might not be available.
|
||||
|
|
@ -30,18 +29,18 @@ might not be available.
|
|||
|
||||
$ archgw up --local-cpu
|
||||
|
||||
Local Serving (GPU- Fast)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Local Serving (GPU - Fast)
|
||||
--------------------------
|
||||
The following bash commands enable you to configure the model server subsystem in Arch to run locally on the
|
||||
machine and utilize the GPU available for fast inference across all model use cases, including function calling
|
||||
guardails, etc.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ archgw up --local
|
||||
$ archgw up --local-gpu
|
||||
|
||||
Cloud Serving (GPU - Blazing Fast)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
----------------------------------
|
||||
The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
|
||||
of your applications.
|
||||
|
|
|
|||
|
|
@ -1,17 +1,17 @@
|
|||
.. _arch_overview_prompt_handling:
|
||||
|
||||
Prompt
|
||||
=================
|
||||
Prompts
|
||||
=======
|
||||
|
||||
Arch's primary design point is to securely accept, process and handle prompts. To do that effectively,
|
||||
Arch relies on Envoy's HTTP `connection management <https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/http/http_connection_management>`_,
|
||||
subsystem and its **prompt handler** subsystem engineered with purpose-built LLMs to
|
||||
implement critical functionality on behalf of developers so that you can stay focused on business logic.
|
||||
|
||||
.. Note::
|
||||
Arch's **prompt handler** subsystem interacts with the **model** subsytem through Envoy's cluster manager
|
||||
system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts. Read more
|
||||
about the :ref:`model subsystem <arch_model_serving>` and how the LLMs are hosted in Arch.
|
||||
Arch's **prompt handler** subsystem interacts with the **model subsytem** through Envoy's cluster manager system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts.
|
||||
|
||||
.. seealso::
|
||||
Read more about the :ref:`model subsystem <model_serving>` and how the LLMs are hosted in Arch.
|
||||
|
||||
Messages
|
||||
--------
|
||||
|
|
@ -24,7 +24,7 @@ containing two key-value pairs:
|
|||
- **Content**: Contains the actual text of the message.
|
||||
|
||||
|
||||
Prompt Guardrails
|
||||
Prompt Guard
|
||||
-----------------
|
||||
|
||||
Arch is engineered with :ref:`Arch-Guard <prompt_guard>`, an industry leading safety layer, powered by a
|
||||
|
|
@ -36,12 +36,12 @@ To add jailbreak guardrails, see example below:
|
|||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:lines: 1-45
|
||||
:emphasize-lines: 22-26
|
||||
:lines: 1-25
|
||||
:emphasize-lines: 21-25
|
||||
:caption: Example Configuration
|
||||
|
||||
.. Note::
|
||||
As a roadmap item, Arch will expose the ability for developers to define custom guardrails via Arch-Guard-v2,
|
||||
As a roadmap item, Arch will expose the ability for developers to define custom guardrails via Arch-Guard,
|
||||
and add support for additional safety checks defined by developers and hazardous categories like, violent crimes, privacy, hate,
|
||||
etc. To offer feedback on our roadmap, please visit our `github page <https://github.com/orgs/katanemo/projects/1>`_
|
||||
|
||||
|
|
@ -59,10 +59,14 @@ Configuring ``prompt_targets`` is simple. See example below:
|
|||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
:linenos:
|
||||
:emphasize-lines: 29-38
|
||||
:emphasize-lines: 39-53
|
||||
:caption: Example Configuration
|
||||
|
||||
|
||||
.. seealso::
|
||||
|
||||
Check :ref:`Prompt Target <prompt_target>` for more details!
|
||||
|
||||
Intent Detection and Prompt Matching:
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
|
@ -127,10 +131,6 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
|
|||
|
||||
print("OpenAI Response:", response.choices[0].text.strip())
|
||||
|
||||
In these examples:
|
||||
|
||||
The OpenAI client is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to 127.0.0.1:51001, assuming Arch is
|
||||
running locally and bound to that address and port.
|
||||
|
||||
In these examples, the OpenAI client is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to ``127.0.0.1:51001``, assuming Arch is running locally and bound to that address and port.
|
||||
This setup allows you to take advantage of Arch's advanced traffic management features while interacting with LLM APIs like OpenAI.
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ The request processing path in Arch has three main parts:
|
|||
forwarding prompts ``prompt_targets`` and establishes the lifecycle of any **upstream** connection to a
|
||||
hosted endpoint that implements domain-specific business logic for incoming promots. This is where knowledge
|
||||
of targets and endpoint health, load balancing and connection pooling exists.
|
||||
* :ref:`Model serving subsystem <arch_model_serving>` which helps Arch make intelligent decisions about the
|
||||
* :ref:`Model serving subsystem <model_serving>` which helps Arch make intelligent decisions about the
|
||||
incoming prompts. The model server is designed to call the purpose-built LLMs in Arch.
|
||||
|
||||
The three subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.
|
||||
|
|
|
|||
|
|
@ -9,6 +9,7 @@ Tech Overview
|
|||
terminology
|
||||
threading_model
|
||||
listener
|
||||
model_serving
|
||||
prompt
|
||||
model_serving
|
||||
request_lifecycle
|
||||
error_target
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ to keep things consistent in logs, traces and in code.
|
|||
:width: 100%
|
||||
:align: center
|
||||
|
||||
**Listener**: A listener is a named network location (e.g., port, address, path etc.) that Arch listens on to process prompts
|
||||
**Listener**: A :ref:`listener <arch_overview_listeners>` is a named network location (e.g., port, address, path etc.) that Arch listens on to process prompts
|
||||
before forwarding them to your application server endpoints. rch enables you to configure one listener for downstream connections
|
||||
(like port 80, 443) and creates a separate internal listener for calls that initiate from your application code to LLMs.
|
||||
|
||||
|
|
@ -22,25 +22,25 @@ before forwarding them to your application server endpoints. rch enables you to
|
|||
|
||||
When you start Arch, you specify a listener address/port that you want to bind downstream. But, Arch uses are predefined port
|
||||
that you can use (``127.0.0.1:10000``) to proxy egress calls originating from your application to LLMs (API-based or hosted).
|
||||
For more details, check out :ref:`LLM providers <llm_provider>`
|
||||
For more details, check out :ref:`LLM provider <llm_provider>`.
|
||||
|
||||
**Instance**: An instance of the Arch gateway. When you start Arch it creates at most two processes. One to handle Layer 7
|
||||
networking operations (auth, tls, observability, etc) and the second process to serve models that enable it to make smart
|
||||
decisions on how to accept, handle and forward prompts. The second process is optional, as the model serving sevice could be
|
||||
hosted on a different network (an API call). But these two processes are considered a single instance of Arch.
|
||||
|
||||
**Prompt Targets**: Arch offers a primitive called ``prompt_targets`` to help separate business logic from undifferentiated
|
||||
**Prompt Target**: Arch offers a primitive called :ref:`prompt_target <prompt_target>` to help separate business logic from undifferentiated
|
||||
work in building generative AI apps. Prompt targets are endpoints that receive prompts that are processed by Arch.
|
||||
For example, Arch enriches incoming prompts with metadata like knowing when a request is a follow-up or clarifying prompt
|
||||
so that you can build faster, more accurate retrieval (RAG) apps. To support agentic apps, like scheduling travel plans or
|
||||
sharing comments on a document - via prompts, Bolt uses its function calling abilities to extract critical information from
|
||||
the incoming prompt (or a set of prompts) needed by a downstream backend API or function call before calling it directly.
|
||||
|
||||
**Error Targets**: Error targets are those endpoints that receive forwarded errors from Arch when issues arise,
|
||||
**Error Target**: :ref:`Error targets <error_target>` are those endpoints that receive forwarded errors from Arch when issues arise,
|
||||
such as failing to properly call a function/API, detecting violations of guardrails, or encountering other processing errors.
|
||||
These errors are communicated to the application via headers (X-Arch-[ERROR-TYPE]), allowing it to handle the errors gracefully
|
||||
These errors are communicated to the application via headers ``X-Arch-[ERROR-TYPE]``, allowing it to handle the errors gracefully
|
||||
and take appropriate actions.
|
||||
|
||||
**Model Serving**: Arch is a set of **two** self-contained processes that are designed to run alongside your application servers
|
||||
(or on a separate hostconnected via a network).The **model serving** process helps Arch make intelligent decisions about the
|
||||
**Model Serving**: Arch is a set of `two` self-contained processes that are designed to run alongside your application servers
|
||||
(or on a separate hostconnected via a network).The :ref:`model serving <model_serving>` process helps Arch make intelligent decisions about the
|
||||
incoming prompts. The model server is designed to call the (fast) purpose-built LLMs in Arch.
|
||||
|
|
|
|||
|
|
@ -13,7 +13,7 @@ thread. All the functionality around prompt handling from a downstream client is
|
|||
This allows the majority of Arch to be largely single threaded (embarrassingly parallel) with a small amount
|
||||
of more complex code handling coordination between the worker threads.
|
||||
|
||||
Generally Arch is written to be 100% non-blocking.
|
||||
Generally, Arch is written to be 100% non-blocking.
|
||||
|
||||
.. tip::
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue