mirror of
https://github.com/katanemo/plano.git
synced 2026-04-26 09:16:24 +02:00
parent
7b51cce2f7
commit
11fba23f1f
14 changed files with 82 additions and 118 deletions
|
|
@ -16,8 +16,6 @@ Key Concepts
|
|||
|
||||
- **Error Message**: A clear, human-readable message describing the error. This should provide enough detail to inform users or developers of the root cause or required action.
|
||||
|
||||
- **Target Prompt**: The specific prompt or operation where the error occurred. Understanding where the error happened helps with debugging and pinpointing the source of the problem.
|
||||
|
||||
- **Parameter-Specific Errors**: Errors that arise due to invalid or missing parameters when invoking a function. These errors are critical for ensuring the correctness of inputs.
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -27,7 +27,7 @@ containing two key-value pairs:
|
|||
Prompt Guard
|
||||
-----------------
|
||||
|
||||
Arch is engineered with :ref:`Arch-Guard <prompt_guard>`, an industry leading safety layer, powered by a
|
||||
Arch is engineered with `Arch-Guard <https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d>`_, an industry leading safety layer, powered by a
|
||||
compact and high-performimg LLM that monitors incoming prompts to detect and reject jailbreak attempts -
|
||||
ensuring that unauthorized or harmful behaviors are intercepted early in the process.
|
||||
|
||||
|
|
@ -50,7 +50,7 @@ Prompt Targets
|
|||
--------------
|
||||
|
||||
Once a prompt passes any configured guardrail checks, Arch processes the contents of the incoming conversation
|
||||
and identifies where to forwad the conversation to via its ``prompt_targets`` primitve. Prompt targets are endpoints
|
||||
and identifies where to forwad the conversation to via its ``prompt target`` primitve. Prompt targets are endpoints
|
||||
that receive prompts that are processed by Arch. For example, Arch enriches incoming prompts with metadata like knowing
|
||||
when a user's intent has changed so that you can build faster, more accurate RAG apps.
|
||||
|
||||
|
|
@ -67,48 +67,39 @@ Configuring ``prompt_targets`` is simple. See example below:
|
|||
|
||||
Check :ref:`Prompt Target <prompt_target>` for more details!
|
||||
|
||||
Intent Detection and Prompt Matching:
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Intent Matching
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
Arch uses fast Natural Language Inference (NLI) and embedding approaches to first detect the intent of each
|
||||
incoming prompt. This intent detection phase analyzes the prompt's content and matches it against predefined
|
||||
prompt targets, ensuring that each prompt is forwarded to the most appropriate endpoint. Arch’s intent
|
||||
detection framework considers both the name and description of each prompt target, and uses a composite matching
|
||||
score between an NLI and cosine similarity to enchance accuracy in forwarding decisions.
|
||||
Arch uses fast text embedding and intent recognition approaches to first detect the intent of each incoming prompt.
|
||||
This intent matching phase analyzes the prompt's content and matches it against predefined prompt targets, ensuring that each prompt is forwarded to the most appropriate endpoint.
|
||||
Arch’s intent matching framework considers both the name and description of each prompt target, and uses a composite matching score between embedding similarity and intent classification scores to enchance accuracy in forwarding decisions.
|
||||
|
||||
- **Embeddings**: By embedding the prompt and comparing it to known target vectors, Arch effectively identifies
|
||||
the closest match, ensuring that the prompt is handled by the correct downstream service.
|
||||
- **Intent Recognition**: NLI techniques further refine the matching process by evaluating the semantic alignment between the prompt and potential targets.
|
||||
|
||||
- **NLI**: NLI techniques further refine the matching process by evaluating the semantic alignment between the
|
||||
prompt and potential targets.
|
||||
- **Text Embedding**: By embedding the prompt and comparing it to known target vectors, Arch effectively identifies the closest match, ensuring that the prompt is handled by the correct downstream service.
|
||||
|
||||
Agentic Apps via Prompt Targets
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To support agentic apps, like scheduling travel plans or sharing comments on a document - via prompts, Arch uses
|
||||
its function calling abilities to extract critical information from the incoming prompt (or a set of prompts)
|
||||
needed by a downstream backend API or function call before calling it directly. For more details on how you can
|
||||
build agentic applications using Arch, see our full guide :ref:`here <arch_agent_guide>`:
|
||||
To support agentic apps, like scheduling travel plans or sharing comments on a document - via prompts, Arch uses its function calling abilities to extract critical information from the incoming prompt (or a set of prompts) needed by a downstream backend API or function call before calling it directly.
|
||||
For more details on how you can build agentic applications using Arch, see our full guide :ref:`here <arch_agent_guide>`:
|
||||
|
||||
.. Note::
|
||||
Arch :ref:`Arch-Function <function_calling>` is the dedicated agentic model engineered in Arch to extract information from
|
||||
a (set of) prompts and executes necessary backend API calls. This allows for efficient handling of agentic tasks,
|
||||
such as scheduling data retrieval, by dynamically interacting with backend services. Arch-Function is a flagship 1.3
|
||||
billion parameter model that matches performance with frontier models like Claude Sonnet 3.5 ang GPT-4, while
|
||||
being 100x cheaper ($0.05M/token hosted) and 10x faster (p50 latencies of 200ms).
|
||||
`Arch-Function <https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68>`_ is a collection of dedicated agentic models engineered in Arch to extract information from a (set of) prompts and executes necessary backend API calls.
|
||||
This allows for efficient handling of agentic tasks, such as scheduling data retrieval, by dynamically interacting with backend services.
|
||||
Arch-Function achieves state-of-the-art performance, comparable with frontier models like Claude Sonnet 3.5 ang GPT-4, while being 100x cheaper ($0.05M/token hosted) and 10x faster (p50 latencies of 200ms).
|
||||
|
||||
Prompting LLMs
|
||||
--------------
|
||||
Arch is a single piece of software that is designed to manage both ingress and egress prompt traffic, drawing its
|
||||
distributed proxy nature from the robust `Envoy <https://envoyproxy.io>`_. This makes it extremely efficient and capable
|
||||
of handling upstream connections to LLMs. If your application is originating code to an API-based LLM, simply use
|
||||
the OpenAI client and configure it with Arch. By sending traffic through Arch, you can propagate traces, manage and monitor
|
||||
traffic, apply rate limits, and utilize a large set of traffic management capabilities in a centralized way.
|
||||
Arch is a single piece of software that is designed to manage both ingress and egress prompt traffic, drawing its distributed proxy nature from the robust `Envoy <https://envoyproxy.io>`_.
|
||||
This makes it extremely efficient and capable of handling upstream connections to LLMs.
|
||||
If your application is originating code to an API-based LLM, simply use the OpenAI client and configure it with Arch.
|
||||
By sending traffic through Arch, you can propagate traces, manage and monitor traffic, apply rate limits, and utilize a large set of traffic management capabilities in a centralized way.
|
||||
|
||||
.. Attention::
|
||||
When you start Arch, it automatically creates a listener port for egress calls to upstream LLMs. This is based on the
|
||||
``llm_providers`` configuration section in the ``arch_config.yml`` file. Arch binds itself to a local address such as
|
||||
127.0.0.1:12000/v1.
|
||||
``127.0.0.1:12000``.
|
||||
|
||||
|
||||
Example: Using OpenAI Client with Arch as an Egress Gateway
|
||||
|
|
@ -119,7 +110,7 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
|
|||
import openai
|
||||
|
||||
# Set the OpenAI API base URL to the Arch gateway endpoint
|
||||
openai.api_base = "http://127.0.0.1:12000/v1"
|
||||
openai.api_base = "http://127.0.0.1:12000"
|
||||
|
||||
# No need to set openai.api_key since it's configured in Arch's gateway
|
||||
|
||||
|
|
@ -132,5 +123,5 @@ Example: Using OpenAI Client with Arch as an Egress Gateway
|
|||
print("OpenAI Response:", response.choices[0].text.strip())
|
||||
|
||||
In these examples, the OpenAI client is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to ``127.0.0.1:51001``, assuming Arch is running locally and bound to that address and port.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to ``127.0.0.1:12000``, assuming Arch is running locally and bound to that address and port.
|
||||
This setup allows you to take advantage of Arch's advanced traffic management features while interacting with LLM APIs like OpenAI.
|
||||
|
|
|
|||
|
|
@ -87,8 +87,6 @@ Today, only support a static bootstrap configuration file for simplicity today:
|
|||
Request Flow (Ingress)
|
||||
----------------------
|
||||
|
||||
Overview
|
||||
^^^^^^^^
|
||||
A brief outline of the lifecycle of a request and response using the example configuration above:
|
||||
|
||||
1. **TCP Connection Establishment**:
|
||||
|
|
@ -105,7 +103,7 @@ A brief outline of the lifecycle of a request and response using the example con
|
|||
intent matching via is **prompt-handler** subsystem using the name and description of the defined prompt targets,
|
||||
determining which endpoint should handle the prompt.
|
||||
|
||||
4. **Parameter Gathering with Arch-FC**:
|
||||
4. **Parameter Gathering with Arch-Function**:
|
||||
If a prompt target requires specific parameters, Arch engages Arch-FC to extract the necessary details
|
||||
from the incoming prompt(s). This process gathers the critical information needed for downstream API calls.
|
||||
|
||||
|
|
@ -115,7 +113,7 @@ A brief outline of the lifecycle of a request and response using the example con
|
|||
|
||||
6. **Default Summarization by Upstream LLM**:
|
||||
By default, if no specific endpoint processing is needed, the prompt is sent to an upstream LLM for summarization.
|
||||
This ensures that responses are concise and relevant, enhancing user experience in RAG (Retrieval-Augmented Generation)
|
||||
This ensures that responses are concise and relevant, enhancing user experience in RAG (Retrieval Augmented Generation)
|
||||
and agentic applications.
|
||||
|
||||
7. **Error Handling and Forwarding**:
|
||||
|
|
@ -134,11 +132,7 @@ A brief outline of the lifecycle of a request and response using the example con
|
|||
Request Flow (Egress)
|
||||
---------------------
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
A brief outline of the lifecycle of a request and response in the context of egress traffic from an application
|
||||
to Large Language Models (LLMs) via Arch:
|
||||
A brief outline of the lifecycle of a request and response in the context of egress traffic from an application to Large Language Models (LLMs) via Arch:
|
||||
|
||||
1. **HTTP Connection Establishment to LLM**:
|
||||
Arch initiates an HTTP connection to the upstream LLM service. This connection is handled by Arch’s egress listener
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ networking operations (auth, tls, observability, etc) and the second process to
|
|||
decisions on how to accept, handle and forward prompts. The second process is optional, as the model serving sevice could be
|
||||
hosted on a different network (an API call). But these two processes are considered a single instance of Arch.
|
||||
|
||||
**Prompt Target**: Arch offers a primitive called :ref:`prompt_target <prompt_target>` to help separate business logic from undifferentiated
|
||||
**Prompt Target**: Arch offers a primitive called :ref:`prompt target <prompt_target>` to help separate business logic from undifferentiated
|
||||
work in building generative AI apps. Prompt targets are endpoints that receive prompts that are processed by Arch.
|
||||
For example, Arch enriches incoming prompts with metadata like knowing when a request is a follow-up or clarifying prompt
|
||||
so that you can build faster, more accurate retrieval (RAG) apps. To support agentic apps, like scheduling travel plans or
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue