mirror of
https://github.com/katanemo/plano.git
synced 2026-04-27 01:36:33 +02:00
Tweak readme docs for minor nits (#461)
Co-authored-by: darkdatter <msylvia@tradestax.io>
This commit is contained in:
parent
4d2d8bd7a1
commit
e7b0de2a72
13 changed files with 38 additions and 38 deletions
|
|
@ -5,7 +5,7 @@ Listener
|
|||
**Listener** is a top level primitive in Arch, which simplifies the configuration required to bind incoming
|
||||
connections from downstream clients, and for egress connections to LLMs (hosted or API)
|
||||
|
||||
Arch builds on Envoy's Listener subsystem to streamline connection managemet for developers. Arch minimizes
|
||||
Arch builds on Envoy's Listener subsystem to streamline connection management for developers. Arch minimizes
|
||||
the complexity of Envoy's listener setup by using best-practices and exposing only essential settings,
|
||||
making it easier for developers to bind connections without deep knowledge of Envoy’s configuration model. This
|
||||
simplification ensures that connections are secure, reliable, and optimized for performance.
|
||||
|
|
@ -13,7 +13,7 @@ simplification ensures that connections are secure, reliable, and optimized for
|
|||
Downstream (Ingress)
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
Developers can configure Arch to accept connections from downstream clients. A downstream listener acts as the
|
||||
primary entry point for incoming traffic, handling initial connection setup, including network filtering, gurdrails,
|
||||
primary entry point for incoming traffic, handling initial connection setup, including network filtering, guardrails,
|
||||
and additional network security checks. For more details on prompt security and safety,
|
||||
see :ref:`here <arch_overview_prompt_handling>`.
|
||||
|
||||
|
|
@ -27,7 +27,7 @@ address like ``arch.local:12000/v1`` for outgoing traffic. For more details on L
|
|||
Configure Listener
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To configure a Downstream (Ingress) Listner, simply add the ``listener`` directive to your configuration file:
|
||||
To configure a Downstream (Ingress) Listener, simply add the ``listener`` directive to your configuration file:
|
||||
|
||||
.. literalinclude:: ../includes/arch_config.yaml
|
||||
:language: yaml
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ Model Serving
|
|||
|
||||
Arch is a set of `two` self-contained processes that are designed to run alongside your application
|
||||
servers (or on a separate host connected via a network). The first process is designated to manage low-level
|
||||
networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
|
||||
networking and HTTP related concerns, and the other process is for model serving, which helps Arch make
|
||||
intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
|
||||
LLMs in Arch.
|
||||
|
||||
|
|
@ -16,7 +16,7 @@ LLMs in Arch.
|
|||
|
||||
Arch' is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don't
|
||||
have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
|
||||
can process prompts quickly and forward control back to the applicaton host. There are three modes in which Arch
|
||||
can process prompts quickly and forward control back to the application host. There are three modes in which Arch
|
||||
can be configured to run its **model server** subsystem:
|
||||
|
||||
Local Serving (CPU - Moderate)
|
||||
|
|
@ -32,7 +32,7 @@ might not be available.
|
|||
Cloud Serving (GPU - Blazing Fast)
|
||||
----------------------------------
|
||||
The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
|
||||
cloud serving for function calling and guardrails scenarios to dramatically improve the speed and overall performance
|
||||
of your applications.
|
||||
|
||||
.. code-block:: console
|
||||
|
|
@ -40,6 +40,6 @@ of your applications.
|
|||
$ archgw up
|
||||
|
||||
.. Note::
|
||||
Arch's model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with averlage latency
|
||||
Arch's model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with average latency
|
||||
of 200ms (10x faster than GPT-4o). Please refer to our :ref:`Get Started <quickstart>` to know
|
||||
how to generate API keys for model serving
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@ Arch relies on Envoy's HTTP `connection management <https://www.envoyproxy.io/do
|
|||
subsystem and its **prompt handler** subsystem engineered with purpose-built LLMs to
|
||||
implement critical functionality on behalf of developers so that you can stay focused on business logic.
|
||||
|
||||
Arch's **prompt handler** subsystem interacts with the **model subsytem** through Envoy's cluster manager system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts.
|
||||
Arch's **prompt handler** subsystem interacts with the **model subsystem** through Envoy's cluster manager system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts.
|
||||
|
||||
.. seealso::
|
||||
Read more about the :ref:`model subsystem <model_serving>` and how the LLMs are hosted in Arch.
|
||||
|
|
@ -28,7 +28,7 @@ Prompt Guard
|
|||
-----------------
|
||||
|
||||
Arch is engineered with `Arch-Guard <https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d>`_, an industry leading safety layer, powered by a
|
||||
compact and high-performimg LLM that monitors incoming prompts to detect and reject jailbreak attempts -
|
||||
compact and high-performing LLM that monitors incoming prompts to detect and reject jailbreak attempts -
|
||||
ensuring that unauthorized or harmful behaviors are intercepted early in the process.
|
||||
|
||||
To add jailbreak guardrails, see example below:
|
||||
|
|
@ -50,7 +50,7 @@ Prompt Targets
|
|||
--------------
|
||||
|
||||
Once a prompt passes any configured guardrail checks, Arch processes the contents of the incoming conversation
|
||||
and identifies where to forwad the conversation to via its ``prompt target`` primitve. Prompt targets are endpoints
|
||||
and identifies where to forward the conversation to via its ``prompt target`` primitive. Prompt targets are endpoints
|
||||
that receive prompts that are processed by Arch. For example, Arch enriches incoming prompts with metadata like knowing
|
||||
when a user's intent has changed so that you can build faster, more accurate RAG apps.
|
||||
|
||||
|
|
@ -72,7 +72,7 @@ Intent Matching
|
|||
|
||||
Arch uses fast text embedding and intent recognition approaches to first detect the intent of each incoming prompt.
|
||||
This intent matching phase analyzes the prompt's content and matches it against predefined prompt targets, ensuring that each prompt is forwarded to the most appropriate endpoint.
|
||||
Arch’s intent matching framework considers both the name and description of each prompt target, and uses a composite matching score between embedding similarity and intent classification scores to enchance accuracy in forwarding decisions.
|
||||
Arch’s intent matching framework considers both the name and description of each prompt target, and uses a composite matching score between embedding similarity and intent classification scores to enhance accuracy in forwarding decisions.
|
||||
|
||||
- **Intent Recognition**: NLI techniques further refine the matching process by evaluating the semantic alignment between the prompt and potential targets.
|
||||
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ Request Lifecycle
|
|||
|
||||
Below we describe the events in the lifecycle of a request passing through an Arch gateway instance. We first
|
||||
describe how Arch fits into the request path and then the internal events that take place following
|
||||
the arrival of a request at Arch from downtream clients. We follow the request until the corresponding
|
||||
the arrival of a request at Arch from downstream clients. We follow the request until the corresponding
|
||||
dispatch upstream and the response path.
|
||||
|
||||
.. image:: /_static/img/network-topology-ingress-egress.jpg
|
||||
|
|
@ -59,7 +59,7 @@ The request processing path in Arch has three main parts:
|
|||
lifecycle. The downstream and upstream HTTP/2 codec lives here.
|
||||
* :ref:`Prompt handler subsystem <arch_overview_prompt_handling>` which is responsible for selecting and
|
||||
forwarding prompts ``prompt_targets`` and establishes the lifecycle of any **upstream** connection to a
|
||||
hosted endpoint that implements domain-specific business logic for incoming promots. This is where knowledge
|
||||
hosted endpoint that implements domain-specific business logic for incoming prompts. This is where knowledge
|
||||
of targets and endpoint health, load balancing and connection pooling exists.
|
||||
* :ref:`Model serving subsystem <model_serving>` which helps Arch make intelligent decisions about the
|
||||
incoming prompts. The model server is designed to call the purpose-built LLMs in Arch.
|
||||
|
|
@ -67,7 +67,7 @@ The request processing path in Arch has three main parts:
|
|||
The three subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.
|
||||
|
||||
Also, Arch utilizes `Envoy event-based thread model <https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310>`_.
|
||||
A main thread is responsible forthe server lifecycle, configuration processing, stats, etc. and some number of
|
||||
A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of
|
||||
:ref:`worker threads <arch_overview_threading>` process requests. All threads operate around an event loop (`libevent <https://libevent.org/>`_)
|
||||
and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker
|
||||
thread maintains its own pool of TCP connections to upstream endpoints.
|
||||
|
|
@ -99,7 +99,7 @@ A brief outline of the lifecycle of a request and response using the example con
|
|||
that harmful or unwanted behaviors are detected early in the request processing pipeline.
|
||||
|
||||
3. **Intent Matching**:
|
||||
The decrypted data stream is deframed by the HTTP/2 codec in Arch's HTTP connection manager. Arch performs
|
||||
The decrypted data stream is de-framed by the HTTP/2 codec in Arch's HTTP connection manager. Arch performs
|
||||
intent matching via is **prompt-handler** subsystem using the name and description of the defined prompt targets,
|
||||
determining which endpoint should handle the prompt.
|
||||
|
||||
|
|
@ -162,7 +162,7 @@ Post-request processing
|
|||
Once a request completes, the stream is destroyed. The following also takes places:
|
||||
|
||||
* The post-request :ref:`monitoring <monitoring>` are updated (e.g. timing, active requests, upgrades, health checks).
|
||||
Some statistics are updated earlier however, during request processing. Stats are batchedand written by the main
|
||||
Some statistics are updated earlier however, during request processing. Stats are batched and written by the main
|
||||
thread periodically.
|
||||
* :ref:`Access logs <arch_access_logging>` are written to the access log
|
||||
* :ref:`Trace <arch_overview_tracing>` spans are finalized. If our example request was traced, a
|
||||
|
|
|
|||
|
|
@ -7,12 +7,12 @@ A few definitions before we dive into the main architecture documentation. Also
|
|||
to keep things consistent in logs and traces, and introduces and clarifies concepts are is relates to LLM applications.
|
||||
|
||||
**Agent**: An application that uses LLMs to handle wide-ranging tasks from users via prompts. This could be as simple
|
||||
as retrieving or summarizing data from an API, or being able to trigger compleix actions like adjusting ad campaigns, or
|
||||
as retrieving or summarizing data from an API, or being able to trigger complex actions like adjusting ad campaigns, or
|
||||
changing travel plans via prompts.
|
||||
|
||||
**Arch Config**: Arch operates based on a configuration that controls the behavior of a single instance of the Arch gateway.
|
||||
This where you enable capabilities like LLM routing, fast function calling (via prompt_targets), applying guardrails, and enabling critical
|
||||
features like metrics and tracing. For the full configuration reference of `arch_config.yaml` see :ref:`here <configuration_refernce>`.
|
||||
features like metrics and tracing. For the full configuration reference of `arch_config.yaml` see :ref:`here <configuration_reference>`.
|
||||
|
||||
**Downstream(Ingress)**: An downstream client (web application, etc.) connects to Arch, sends prompts, and receives responses.
|
||||
|
||||
|
|
@ -37,11 +37,11 @@ code to LLMs.
|
|||
undifferentiated work in building generative AI apps. Prompt targets are endpoints that receive prompts that are processed by Arch.
|
||||
For example, Arch enriches incoming prompts with metadata like knowing when a request is a follow-up or clarifying prompt so that you
|
||||
can build faster, more accurate retrieval (RAG) apps. To support agentic apps, like scheduling travel plans or sharing comments on a
|
||||
document - via prompts, Arch uses its function calling abilities to extract critical information fromthe incoming prompt (or a set of
|
||||
document - via prompts, Arch uses its function calling abilities to extract critical information from the incoming prompt (or a set of
|
||||
prompts) needed by a downstream backend API or function call before calling it directly.
|
||||
|
||||
**Model Serving**: Arch is a set of `two` self-contained processes that are designed to run alongside your application servers
|
||||
(or on a separate hostconnected via a network).The :ref:`model serving <model_serving>` process helps Arch make intelligent decisions
|
||||
(or on a separate host connected via a network).The :ref:`model serving <model_serving>` process helps Arch make intelligent decisions
|
||||
about the incoming prompts. The model server is designed to call the (fast) purpose-built LLMs in Arch.
|
||||
|
||||
**Error Target**: :ref:`Error targets <error_target>` are those endpoints that receive forwarded errors from Arch when issues arise,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue