mirror of
https://github.com/katanemo/plano.git
synced 2026-06-29 15:49:40 +02:00
deploy: 11fba23f1f
This commit is contained in:
parent
1075c1f42c
commit
ebe1cbd1fd
14 changed files with 132 additions and 176 deletions
|
|
@ -188,7 +188,7 @@ across applications.</p>
|
|||
<p class="admonition-title">Note</p>
|
||||
<p>When you start Arch, it creates a listener port for egress traffic based on the presence of <code class="docutils literal notranslate"><span class="pre">llm_providers</span></code>
|
||||
configuration section in the <code class="docutils literal notranslate"><span class="pre">arch_config.yml</span></code> file. Arch binds itself to a local address such as
|
||||
<code class="docutils literal notranslate"><span class="pre">127.0.0.1:51001/v1</span></code>.</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">127.0.0.1:12000</span></code>.</p>
|
||||
</div>
|
||||
<p>Arch also offers vendor-agnostic SDKs and libraries to make LLM calls to API-based LLM providers (like OpenAI,
|
||||
Anthropic, Mistral, Cohere, etc.) and supports calls to OSS LLMs that are hosted on your infrastructure. Arch
|
||||
|
|
@ -201,7 +201,7 @@ make outbound LLM calls.</p>
|
|||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="kn">from</span> <span class="nn">openai</span> <span class="kn">import</span> <span class="n">OpenAI</span>
|
||||
</span><span id="line-2">
|
||||
</span><span id="line-3"><span class="c1"># Initialize the Arch client</span>
|
||||
</span><span id="line-4"><span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">(</span><span class="n">base_url</span><span class="o">=</span><span class="s2">"http://127.0.0.1:51001/v1"</span><span class="p">)</span>
|
||||
</span><span id="line-4"><span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">(</span><span class="n">base_url</span><span class="o">=</span><span class="s2">"http://127.0.0.12000/"</span><span class="p">)</span>
|
||||
</span><span id="line-5">
|
||||
</span><span id="line-6"><span class="c1"># Define your LLM provider and prompt</span>
|
||||
</span><span id="line-7"><span class="n">llm_provider</span> <span class="o">=</span> <span class="s2">"openai"</span>
|
||||
|
|
|
|||
|
|
@ -254,22 +254,22 @@ Here is a full list of parameter attributes that Arch can support:</p>
|
|||
</section>
|
||||
<section id="example-configuration">
|
||||
<h3>Example Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#example-configuration" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#example-configuration'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="n">prompt_targets</span><span class="p">:</span>
|
||||
</span><span id="line-2"> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">get_weather</span>
|
||||
</span><span id="line-3"> <span class="n">description</span><span class="p">:</span> <span class="n">Get</span> <span class="n">the</span> <span class="n">current</span> <span class="n">weather</span> <span class="k">for</span> <span class="n">a</span> <span class="n">location</span>
|
||||
</span><span id="line-4"> <span class="n">parameters</span><span class="p">:</span>
|
||||
</span><span id="line-5"> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">location</span>
|
||||
</span><span id="line-6"> <span class="n">description</span><span class="p">:</span> <span class="n">The</span> <span class="n">city</span> <span class="ow">and</span> <span class="n">state</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span> <span class="n">San</span> <span class="n">Francisco</span><span class="p">,</span> <span class="n">New</span> <span class="n">York</span>
|
||||
</span><span id="line-7"> <span class="nb">type</span><span class="p">:</span> <span class="nb">str</span>
|
||||
</span><span id="line-8"> <span class="n">required</span><span class="p">:</span> <span class="n">true</span>
|
||||
</span><span id="line-9"> <span class="o">-</span> <span class="n">name</span><span class="p">:</span> <span class="n">unit</span>
|
||||
</span><span id="line-10"> <span class="n">description</span><span class="p">:</span> <span class="n">The</span> <span class="n">unit</span> <span class="n">of</span> <span class="n">temperature</span>
|
||||
</span><span id="line-11"> <span class="nb">type</span><span class="p">:</span> <span class="nb">str</span>
|
||||
</span><span id="line-12"> <span class="n">default</span><span class="p">:</span> <span class="n">fahrenheit</span>
|
||||
</span><span id="line-13"> <span class="n">enum</span><span class="p">:</span> <span class="p">[</span><span class="n">celsius</span><span class="p">,</span> <span class="n">fahrenheit</span><span class="p">]</span>
|
||||
</span><span id="line-14"> <span class="n">endpoint</span><span class="p">:</span>
|
||||
</span><span id="line-15"> <span class="n">name</span><span class="p">:</span> <span class="n">api_server</span>
|
||||
</span><span id="line-16"> <span class="n">path</span><span class="p">:</span> <span class="o">/</span><span class="n">weather</span>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">prompt_targets</span><span class="p">:</span>
|
||||
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">get_weather</span>
|
||||
</span><span id="line-3"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Get the current weather for a location</span>
|
||||
</span><span id="line-4"><span class="w"> </span><span class="nt">parameters</span><span class="p">:</span>
|
||||
</span><span id="line-5"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">location</span>
|
||||
</span><span id="line-6"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">The city and state, e.g. San Francisco, New York</span>
|
||||
</span><span id="line-7"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">str</span>
|
||||
</span><span id="line-8"><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
|
||||
</span><span id="line-9"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">unit</span>
|
||||
</span><span id="line-10"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">The unit of temperature</span>
|
||||
</span><span id="line-11"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">str</span>
|
||||
</span><span id="line-12"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fahrenheit</span>
|
||||
</span><span id="line-13"><span class="w"> </span><span class="nt">enum</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">celsius</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">fahrenheit</span><span class="p p-Indicator">]</span>
|
||||
</span><span id="line-14"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
|
||||
</span><span id="line-15"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api_server</span>
|
||||
</span><span id="line-16"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/weather</span>
|
||||
</span></code></pre></div>
|
||||
</div>
|
||||
</section>
|
||||
|
|
|
|||
|
|
@ -162,7 +162,6 @@ The errors are communicated to the application via headers like <code class="doc
|
|||
<ul class="simple">
|
||||
<li><p><strong>Error Type</strong>: Categorizes the nature of the error, such as “ValidationError” or “RuntimeError.” These error types help in identifying what kind of issue occurred and provide context for troubleshooting.</p></li>
|
||||
<li><p><strong>Error Message</strong>: A clear, human-readable message describing the error. This should provide enough detail to inform users or developers of the root cause or required action.</p></li>
|
||||
<li><p><strong>Target Prompt</strong>: The specific prompt or operation where the error occurred. Understanding where the error happened helps with debugging and pinpointing the source of the problem.</p></li>
|
||||
<li><p><strong>Parameter-Specific Errors</strong>: Errors that arise due to invalid or missing parameters when invoking a function. These errors are critical for ensuring the correctness of inputs.</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
|
|
|
|||
|
|
@ -177,7 +177,7 @@ containing two key-value pairs:</p>
|
|||
</section>
|
||||
<section id="prompt-guard">
|
||||
<h2>Prompt Guard<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#prompt-guard" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#prompt-guard'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Arch is engineered with <a class="reference internal" href="../../guides/prompt_guard.html#prompt-guard"><span class="std std-ref">Arch-Guard</span></a>, an industry leading safety layer, powered by a
|
||||
<p>Arch is engineered with <a class="reference external" href="https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d" rel="nofollow noopener">Arch-Guard<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>, an industry leading safety layer, powered by a
|
||||
compact and high-performimg LLM that monitors incoming prompts to detect and reject jailbreak attempts -
|
||||
ensuring that unauthorized or harmful behaviors are intercepted early in the process.</p>
|
||||
<p>To add jailbreak guardrails, see example below:</p>
|
||||
|
|
@ -221,7 +221,7 @@ etc. To offer feedback on our roadmap, please visit our <a class="reference exte
|
|||
<section id="prompt-targets">
|
||||
<h2>Prompt Targets<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#prompt-targets" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#prompt-targets'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Once a prompt passes any configured guardrail checks, Arch processes the contents of the incoming conversation
|
||||
and identifies where to forwad the conversation to via its <code class="docutils literal notranslate"><span class="pre">prompt_targets</span></code> primitve. Prompt targets are endpoints
|
||||
and identifies where to forwad the conversation to via its <code class="docutils literal notranslate"><span class="pre">prompt</span> <span class="pre">target</span></code> primitve. Prompt targets are endpoints
|
||||
that receive prompts that are processed by Arch. For example, Arch enriches incoming prompts with metadata like knowing
|
||||
when a user’s intent has changed so that you can build faster, more accurate RAG apps.</p>
|
||||
<p>Configuring <code class="docutils literal notranslate"><span class="pre">prompt_targets</span></code> is simple. See example below:</p>
|
||||
|
|
@ -302,55 +302,46 @@ when a user’s intent has changed so that you can build faster, more accurate R
|
|||
<p class="admonition-title">See also</p>
|
||||
<p>Check <a class="reference internal" href="../prompt_target.html#prompt-target"><span class="std std-ref">Prompt Target</span></a> for more details!</p>
|
||||
</div>
|
||||
<section id="intent-detection-and-prompt-matching">
|
||||
<h3>Intent Detection and Prompt Matching:<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#intent-detection-and-prompt-matching" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#intent-detection-and-prompt-matching'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>Arch uses fast Natural Language Inference (NLI) and embedding approaches to first detect the intent of each
|
||||
incoming prompt. This intent detection phase analyzes the prompt’s content and matches it against predefined
|
||||
prompt targets, ensuring that each prompt is forwarded to the most appropriate endpoint. Arch’s intent
|
||||
detection framework considers both the name and description of each prompt target, and uses a composite matching
|
||||
score between an NLI and cosine similarity to enchance accuracy in forwarding decisions.</p>
|
||||
<section id="intent-matching">
|
||||
<h3>Intent Matching<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#intent-matching" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#intent-matching'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>Arch uses fast text embedding and intent recognition approaches to first detect the intent of each incoming prompt.
|
||||
This intent matching phase analyzes the prompt’s content and matches it against predefined prompt targets, ensuring that each prompt is forwarded to the most appropriate endpoint.
|
||||
Arch’s intent matching framework considers both the name and description of each prompt target, and uses a composite matching score between embedding similarity and intent classification scores to enchance accuracy in forwarding decisions.</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Embeddings</strong>: By embedding the prompt and comparing it to known target vectors, Arch effectively identifies
|
||||
the closest match, ensuring that the prompt is handled by the correct downstream service.</p></li>
|
||||
<li><p><strong>NLI</strong>: NLI techniques further refine the matching process by evaluating the semantic alignment between the
|
||||
prompt and potential targets.</p></li>
|
||||
<li><p><strong>Intent Recognition</strong>: NLI techniques further refine the matching process by evaluating the semantic alignment between the prompt and potential targets.</p></li>
|
||||
<li><p><strong>Text Embedding</strong>: By embedding the prompt and comparing it to known target vectors, Arch effectively identifies the closest match, ensuring that the prompt is handled by the correct downstream service.</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="agentic-apps-via-prompt-targets">
|
||||
<h3>Agentic Apps via Prompt Targets<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#agentic-apps-via-prompt-targets" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#agentic-apps-via-prompt-targets'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>To support agentic apps, like scheduling travel plans or sharing comments on a document - via prompts, Arch uses
|
||||
its function calling abilities to extract critical information from the incoming prompt (or a set of prompts)
|
||||
needed by a downstream backend API or function call before calling it directly. For more details on how you can
|
||||
build agentic applications using Arch, see our full guide <a class="reference internal" href="../../build_with_arch/agent.html#arch-agent-guide"><span class="std std-ref">here</span></a>:</p>
|
||||
<p>To support agentic apps, like scheduling travel plans or sharing comments on a document - via prompts, Arch uses its function calling abilities to extract critical information from the incoming prompt (or a set of prompts) needed by a downstream backend API or function call before calling it directly.
|
||||
For more details on how you can build agentic applications using Arch, see our full guide <a class="reference internal" href="../../build_with_arch/agent.html#arch-agent-guide"><span class="std std-ref">here</span></a>:</p>
|
||||
<div class="admonition note">
|
||||
<p class="admonition-title">Note</p>
|
||||
<p>Arch <a class="reference internal" href="../../guides/function_calling.html#function-calling"><span class="std std-ref">Arch-Function</span></a> is the dedicated agentic model engineered in Arch to extract information from
|
||||
a (set of) prompts and executes necessary backend API calls. This allows for efficient handling of agentic tasks,
|
||||
such as scheduling data retrieval, by dynamically interacting with backend services. Arch-Function is a flagship 1.3
|
||||
billion parameter model that matches performance with frontier models like Claude Sonnet 3.5 ang GPT-4, while
|
||||
being 100x cheaper ($0.05M/token hosted) and 10x faster (p50 latencies of 200ms).</p>
|
||||
<p><a class="reference external" href="https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68" rel="nofollow noopener">Arch-Function<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> is a collection of dedicated agentic models engineered in Arch to extract information from a (set of) prompts and executes necessary backend API calls.
|
||||
This allows for efficient handling of agentic tasks, such as scheduling data retrieval, by dynamically interacting with backend services.
|
||||
Arch-Function achieves state-of-the-art performance, comparable with frontier models like Claude Sonnet 3.5 ang GPT-4, while being 100x cheaper ($0.05M/token hosted) and 10x faster (p50 latencies of 200ms).</p>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
<section id="prompting-llms">
|
||||
<h2>Prompting LLMs<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#prompting-llms" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#prompting-llms'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Arch is a single piece of software that is designed to manage both ingress and egress prompt traffic, drawing its
|
||||
distributed proxy nature from the robust <a class="reference external" href="https://envoyproxy.io" rel="nofollow noopener">Envoy<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. This makes it extremely efficient and capable
|
||||
of handling upstream connections to LLMs. If your application is originating code to an API-based LLM, simply use
|
||||
the OpenAI client and configure it with Arch. By sending traffic through Arch, you can propagate traces, manage and monitor
|
||||
traffic, apply rate limits, and utilize a large set of traffic management capabilities in a centralized way.</p>
|
||||
<p>Arch is a single piece of software that is designed to manage both ingress and egress prompt traffic, drawing its distributed proxy nature from the robust <a class="reference external" href="https://envoyproxy.io" rel="nofollow noopener">Envoy<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>.
|
||||
This makes it extremely efficient and capable of handling upstream connections to LLMs.
|
||||
If your application is originating code to an API-based LLM, simply use the OpenAI client and configure it with Arch.
|
||||
By sending traffic through Arch, you can propagate traces, manage and monitor traffic, apply rate limits, and utilize a large set of traffic management capabilities in a centralized way.</p>
|
||||
<div class="admonition attention">
|
||||
<p class="admonition-title">Attention</p>
|
||||
<p>When you start Arch, it automatically creates a listener port for egress calls to upstream LLMs. This is based on the
|
||||
<code class="docutils literal notranslate"><span class="pre">llm_providers</span></code> configuration section in the <code class="docutils literal notranslate"><span class="pre">arch_config.yml</span></code> file. Arch binds itself to a local address such as
|
||||
127.0.0.1:12000/v1.</p>
|
||||
<code class="docutils literal notranslate"><span class="pre">127.0.0.1:12000</span></code>.</p>
|
||||
</div>
|
||||
<section id="example-using-openai-client-with-arch-as-an-egress-gateway">
|
||||
<h3>Example: Using OpenAI Client with Arch as an Egress Gateway<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#example-using-openai-client-with-arch-as-an-egress-gateway" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#example-using-openai-client-with-arch-as-an-egress-gateway'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="kn">import</span> <span class="nn">openai</span>
|
||||
</span><span id="line-2">
|
||||
</span><span id="line-3"><span class="c1"># Set the OpenAI API base URL to the Arch gateway endpoint</span>
|
||||
</span><span id="line-4"><span class="n">openai</span><span class="o">.</span><span class="n">api_base</span> <span class="o">=</span> <span class="s2">"http://127.0.0.1:12000/v1"</span>
|
||||
</span><span id="line-4"><span class="n">openai</span><span class="o">.</span><span class="n">api_base</span> <span class="o">=</span> <span class="s2">"http://127.0.0.1:12000"</span>
|
||||
</span><span id="line-5">
|
||||
</span><span id="line-6"><span class="c1"># No need to set openai.api_key since it's configured in Arch's gateway</span>
|
||||
</span><span id="line-7">
|
||||
|
|
@ -364,7 +355,7 @@ traffic, apply rate limits, and utilize a large set of traffic management capabi
|
|||
</span></code></pre></div>
|
||||
</div>
|
||||
<p>In these examples, the OpenAI client is used to send traffic directly through the Arch egress proxy to the LLM of your choice, such as OpenAI.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to <code class="docutils literal notranslate"><span class="pre">127.0.0.1:51001</span></code>, assuming Arch is running locally and bound to that address and port.
|
||||
The OpenAI client is configured to route traffic via Arch by setting the proxy to <code class="docutils literal notranslate"><span class="pre">127.0.0.1:12000</span></code>, assuming Arch is running locally and bound to that address and port.
|
||||
This setup allows you to take advantage of Arch’s advanced traffic management features while interacting with LLM APIs like OpenAI.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
|
@ -392,7 +383,7 @@ This setup allows you to take advantage of Arch’s advanced traffic management
|
|||
<li><a :data-current="activeSection === '#messages'" class="reference internal" href="#messages">Messages</a></li>
|
||||
<li><a :data-current="activeSection === '#prompt-guard'" class="reference internal" href="#prompt-guard">Prompt Guard</a></li>
|
||||
<li><a :data-current="activeSection === '#prompt-targets'" class="reference internal" href="#prompt-targets">Prompt Targets</a><ul>
|
||||
<li><a :data-current="activeSection === '#intent-detection-and-prompt-matching'" class="reference internal" href="#intent-detection-and-prompt-matching">Intent Detection and Prompt Matching:</a></li>
|
||||
<li><a :data-current="activeSection === '#intent-matching'" class="reference internal" href="#intent-matching">Intent Matching</a></li>
|
||||
<li><a :data-current="activeSection === '#agentic-apps-via-prompt-targets'" class="reference internal" href="#agentic-apps-via-prompt-targets">Agentic Apps via Prompt Targets</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
|
|
|
|||
|
|
@ -287,8 +287,6 @@ enables scaling to very high core count CPUs.</p>
|
|||
</section>
|
||||
<section id="request-flow-ingress">
|
||||
<h2>Request Flow (Ingress)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#request-flow-ingress" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#request-flow-ingress'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<section id="overview">
|
||||
<h3>Overview<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#overview" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#overview'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>A brief outline of the lifecycle of a request and response using the example configuration above:</p>
|
||||
<ol class="arabic simple">
|
||||
<li><p><strong>TCP Connection Establishment</strong>:
|
||||
|
|
@ -302,7 +300,7 @@ that harmful or unwanted behaviors are detected early in the request processing
|
|||
The decrypted data stream is deframed by the HTTP/2 codec in Arch’s HTTP connection manager. Arch performs
|
||||
intent matching via is <strong>prompt-handler</strong> subsystem using the name and description of the defined prompt targets,
|
||||
determining which endpoint should handle the prompt.</p></li>
|
||||
<li><p><strong>Parameter Gathering with Arch-FC</strong>:
|
||||
<li><p><strong>Parameter Gathering with Arch-Function</strong>:
|
||||
If a prompt target requires specific parameters, Arch engages Arch-FC to extract the necessary details
|
||||
from the incoming prompt(s). This process gathers the critical information needed for downstream API calls.</p></li>
|
||||
<li><p><strong>API Call Execution</strong>:
|
||||
|
|
@ -310,7 +308,7 @@ Arch routes the prompt to the appropriate backend API or function call. If an en
|
|||
load balancing is performed, circuit breakers are checked, and the request is proxied to the upstream endpoint.</p></li>
|
||||
<li><p><strong>Default Summarization by Upstream LLM</strong>:
|
||||
By default, if no specific endpoint processing is needed, the prompt is sent to an upstream LLM for summarization.
|
||||
This ensures that responses are concise and relevant, enhancing user experience in RAG (Retrieval-Augmented Generation)
|
||||
This ensures that responses are concise and relevant, enhancing user experience in RAG (Retrieval Augmented Generation)
|
||||
and agentic applications.</p></li>
|
||||
<li><p><strong>Error Handling and Forwarding</strong>:
|
||||
Errors encountered during processing, such as failed function calls or guardrail detections, are forwarded to
|
||||
|
|
@ -326,14 +324,9 @@ The upstream endpoint’s TLS transport socket encrypts the response, which is t
|
|||
Responses pass through HTTP filters in reverse order, ensuring any necessary processing or modification before final delivery.</p></li>
|
||||
</ol>
|
||||
</section>
|
||||
</section>
|
||||
<section id="request-flow-egress">
|
||||
<h2>Request Flow (Egress)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#request-flow-egress" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#request-flow-egress'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
</section>
|
||||
<section id="id1">
|
||||
<h2>Overview<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id1" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#id1'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>A brief outline of the lifecycle of a request and response in the context of egress traffic from an application
|
||||
to Large Language Models (LLMs) via Arch:</p>
|
||||
<p>A brief outline of the lifecycle of a request and response in the context of egress traffic from an application to Large Language Models (LLMs) via Arch:</p>
|
||||
<ol class="arabic simple">
|
||||
<li><p><strong>HTTP Connection Establishment to LLM</strong>:
|
||||
Arch initiates an HTTP connection to the upstream LLM service. This connection is handled by Arch’s egress listener
|
||||
|
|
@ -393,12 +386,8 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
<li><a :data-current="activeSection === '#network-topology'" class="reference internal" href="#network-topology">Network topology</a></li>
|
||||
<li><a :data-current="activeSection === '#high-level-architecture'" class="reference internal" href="#high-level-architecture">High level architecture</a></li>
|
||||
<li><a :data-current="activeSection === '#configuration'" class="reference internal" href="#configuration">Configuration</a></li>
|
||||
<li><a :data-current="activeSection === '#request-flow-ingress'" class="reference internal" href="#request-flow-ingress">Request Flow (Ingress)</a><ul>
|
||||
<li><a :data-current="activeSection === '#overview'" class="reference internal" href="#overview">Overview</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a :data-current="activeSection === '#request-flow-egress'" class="reference internal" href="#request-flow-egress">Request Flow (Egress)</a></li>
|
||||
<li><a :data-current="activeSection === '#id1'" class="reference internal" href="#id1">Overview</a><ul>
|
||||
<li><a :data-current="activeSection === '#request-flow-ingress'" class="reference internal" href="#request-flow-ingress">Request Flow (Ingress)</a></li>
|
||||
<li><a :data-current="activeSection === '#request-flow-egress'" class="reference internal" href="#request-flow-egress">Request Flow (Egress)</a><ul>
|
||||
<li><a :data-current="activeSection === '#post-request-processing'" class="reference internal" href="#post-request-processing">Post-request processing</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
|
|
|
|||
|
|
@ -182,7 +182,6 @@
|
|||
<li class="toctree-l2"><a class="reference internal" href="request_lifecycle.html#configuration">Configuration</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="request_lifecycle.html#request-flow-ingress">Request Flow (Ingress)</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="request_lifecycle.html#request-flow-egress">Request Flow (Egress)</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="request_lifecycle.html#id1">Overview</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="error_target.html">Error Target</a><ul>
|
||||
|
|
|
|||
|
|
@ -173,7 +173,7 @@ For more details, check out <a class="reference internal" href="../llm_provider.
|
|||
networking operations (auth, tls, observability, etc) and the second process to serve models that enable it to make smart
|
||||
decisions on how to accept, handle and forward prompts. The second process is optional, as the model serving sevice could be
|
||||
hosted on a different network (an API call). But these two processes are considered a single instance of Arch.</p>
|
||||
<p><strong>Prompt Target</strong>: Arch offers a primitive called <a class="reference internal" href="../prompt_target.html#prompt-target"><span class="std std-ref">prompt_target</span></a> to help separate business logic from undifferentiated
|
||||
<p><strong>Prompt Target</strong>: Arch offers a primitive called <a class="reference internal" href="../prompt_target.html#prompt-target"><span class="std std-ref">prompt target</span></a> to help separate business logic from undifferentiated
|
||||
work in building generative AI apps. Prompt targets are endpoints that receive prompts that are processed by Arch.
|
||||
For example, Arch enriches incoming prompts with metadata like knowing when a request is a follow-up or clarifying prompt
|
||||
so that you can build faster, more accurate retrieval (RAG) apps. To support agentic apps, like scheduling travel plans or
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue