mirror of
https://github.com/katanemo/plano.git
synced 2026-07-02 15:51:02 +02:00
deploy: e7b0de2a72
This commit is contained in:
parent
f50f1bb4a6
commit
ed2124f773
29 changed files with 64 additions and 64 deletions
|
|
@ -232,7 +232,7 @@ The errors are communicated to the application via headers like <code class="doc
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -159,14 +159,14 @@
|
|||
<span id="arch-overview-listeners"></span><h1>Listener<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#listener"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<p><strong>Listener</strong> is a top level primitive in Arch, which simplifies the configuration required to bind incoming
|
||||
connections from downstream clients, and for egress connections to LLMs (hosted or API)</p>
|
||||
<p>Arch builds on Envoy’s Listener subsystem to streamline connection managemet for developers. Arch minimizes
|
||||
<p>Arch builds on Envoy’s Listener subsystem to streamline connection management for developers. Arch minimizes
|
||||
the complexity of Envoy’s listener setup by using best-practices and exposing only essential settings,
|
||||
making it easier for developers to bind connections without deep knowledge of Envoy’s configuration model. This
|
||||
simplification ensures that connections are secure, reliable, and optimized for performance.</p>
|
||||
<section id="downstream-ingress">
|
||||
<h2>Downstream (Ingress)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#downstream-ingress" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#downstream-ingress'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Developers can configure Arch to accept connections from downstream clients. A downstream listener acts as the
|
||||
primary entry point for incoming traffic, handling initial connection setup, including network filtering, gurdrails,
|
||||
primary entry point for incoming traffic, handling initial connection setup, including network filtering, guardrails,
|
||||
and additional network security checks. For more details on prompt security and safety,
|
||||
see <a class="reference internal" href="prompt.html#arch-overview-prompt-handling"><span class="std std-ref">here</span></a>.</p>
|
||||
</section>
|
||||
|
|
@ -179,7 +179,7 @@ address like <code class="docutils literal notranslate"><span class="pre">arch.l
|
|||
</section>
|
||||
<section id="configure-listener">
|
||||
<h2>Configure Listener<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configure-listener" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#configure-listener'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>To configure a Downstream (Ingress) Listner, simply add the <code class="docutils literal notranslate"><span class="pre">listener</span></code> directive to your configuration file:</p>
|
||||
<p>To configure a Downstream (Ingress) Listener, simply add the <code class="docutils literal notranslate"><span class="pre">listener</span></code> directive to your configuration file:</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id1">
|
||||
<div class="code-block-caption"><span class="caption-text">Example Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id1"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="linenos"> 1</span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v0.1</span>
|
||||
|
|
@ -236,7 +236,7 @@ address like <code class="docutils literal notranslate"><span class="pre">arch.l
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -159,14 +159,14 @@
|
|||
<span id="id1"></span><h1>Model Serving<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#model-serving"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<p>Arch is a set of <cite>two</cite> self-contained processes that are designed to run alongside your application
|
||||
servers (or on a separate host connected via a network). The first process is designated to manage low-level
|
||||
networking and HTTP related comcerns, and the other process is for model serving, which helps Arch make
|
||||
networking and HTTP related concerns, and the other process is for model serving, which helps Arch make
|
||||
intelligent decisions about the incoming prompts. The model server is designed to call the purpose-built
|
||||
LLMs in Arch.</p>
|
||||
<a class="reference internal image-reference" href="../../_images/arch-system-architecture.jpg"><img alt="../../_images/arch-system-architecture.jpg" class="align-center" src="../../_images/arch-system-architecture.jpg" style="width: 40%;"/>
|
||||
</a>
|
||||
<p>Arch’ is designed to be deployed in your cloud VPC, on a on-premises host, and can work on devices that don’t
|
||||
have a GPU. Note, GPU devices are need for fast and cost-efficient use, so that Arch (model server, specifically)
|
||||
can process prompts quickly and forward control back to the applicaton host. There are three modes in which Arch
|
||||
can process prompts quickly and forward control back to the application host. There are three modes in which Arch
|
||||
can be configured to run its <strong>model server</strong> subsystem:</p>
|
||||
<section id="local-serving-cpu-moderate">
|
||||
<h2>Local Serving (CPU - Moderate)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#local-serving-cpu-moderate" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#local-serving-cpu-moderate'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
|
|
@ -180,14 +180,14 @@ might not be available.</p>
|
|||
<section id="cloud-serving-gpu-blazing-fast">
|
||||
<h2>Cloud Serving (GPU - Blazing Fast)<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#cloud-serving-gpu-blazing-fast" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#cloud-serving-gpu-blazing-fast'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>The command below instructs Arch to intelligently use GPUs locally for fast intent detection, but default to
|
||||
cloud serving for function calling and guardails scenarios to dramatically improve the speed and overall performance
|
||||
cloud serving for function calling and guardrails scenarios to dramatically improve the speed and overall performance
|
||||
of your applications.</p>
|
||||
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="gp">$ </span>archgw<span class="w"> </span>up
|
||||
</span></code></pre></div>
|
||||
</div>
|
||||
<div class="admonition note">
|
||||
<p class="admonition-title">Note</p>
|
||||
<p>Arch’s model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with averlage latency
|
||||
<p>Arch’s model serving in the cloud is priced at $0.05M/token (156x cheaper than GPT-4o) with average latency
|
||||
of 200ms (10x faster than GPT-4o). Please refer to our <a class="reference internal" href="../../get_started/quickstart.html#quickstart"><span class="std std-ref">Get Started</span></a> to know
|
||||
how to generate API keys for model serving</p>
|
||||
</div>
|
||||
|
|
@ -223,7 +223,7 @@ how to generate API keys for model serving</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -161,7 +161,7 @@
|
|||
Arch relies on Envoy’s HTTP <a class="reference external" href="https://www.envoyproxy.io/docs/envoy/v1.31.2/intro/arch_overview/http/http_connection_management" rel="nofollow noopener">connection management<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>,
|
||||
subsystem and its <strong>prompt handler</strong> subsystem engineered with purpose-built LLMs to
|
||||
implement critical functionality on behalf of developers so that you can stay focused on business logic.</p>
|
||||
<p>Arch’s <strong>prompt handler</strong> subsystem interacts with the <strong>model subsytem</strong> through Envoy’s cluster manager system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts.</p>
|
||||
<p>Arch’s <strong>prompt handler</strong> subsystem interacts with the <strong>model subsystem</strong> through Envoy’s cluster manager system to ensure robust, resilient and fault-tolerant experience in managing incoming prompts.</p>
|
||||
<div class="admonition seealso">
|
||||
<p class="admonition-title">See also</p>
|
||||
<p>Read more about the <a class="reference internal" href="model_serving.html#model-serving"><span class="std std-ref">model subsystem</span></a> and how the LLMs are hosted in Arch.</p>
|
||||
|
|
@ -181,7 +181,7 @@ containing two key-value pairs:</p>
|
|||
<section id="prompt-guard">
|
||||
<h2>Prompt Guard<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#prompt-guard" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#prompt-guard'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Arch is engineered with <a class="reference external" href="https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d" rel="nofollow noopener">Arch-Guard<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>, an industry leading safety layer, powered by a
|
||||
compact and high-performimg LLM that monitors incoming prompts to detect and reject jailbreak attempts -
|
||||
compact and high-performing LLM that monitors incoming prompts to detect and reject jailbreak attempts -
|
||||
ensuring that unauthorized or harmful behaviors are intercepted early in the process.</p>
|
||||
<p>To add jailbreak guardrails, see example below:</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id1">
|
||||
|
|
@ -224,7 +224,7 @@ etc. To offer feedback on our roadmap, please visit our <a class="reference exte
|
|||
<section id="prompt-targets">
|
||||
<h2>Prompt Targets<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#prompt-targets" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#prompt-targets'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p>Once a prompt passes any configured guardrail checks, Arch processes the contents of the incoming conversation
|
||||
and identifies where to forwad the conversation to via its <code class="docutils literal notranslate"><span class="pre">prompt</span> <span class="pre">target</span></code> primitve. Prompt targets are endpoints
|
||||
and identifies where to forward the conversation to via its <code class="docutils literal notranslate"><span class="pre">prompt</span> <span class="pre">target</span></code> primitive. Prompt targets are endpoints
|
||||
that receive prompts that are processed by Arch. For example, Arch enriches incoming prompts with metadata like knowing
|
||||
when a user’s intent has changed so that you can build faster, more accurate RAG apps.</p>
|
||||
<p>Configuring <code class="docutils literal notranslate"><span class="pre">prompt_targets</span></code> is simple. See example below:</p>
|
||||
|
|
@ -304,7 +304,7 @@ when a user’s intent has changed so that you can build faster, more accurate R
|
|||
<h3>Intent Matching<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#intent-matching" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#intent-matching'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>Arch uses fast text embedding and intent recognition approaches to first detect the intent of each incoming prompt.
|
||||
This intent matching phase analyzes the prompt’s content and matches it against predefined prompt targets, ensuring that each prompt is forwarded to the most appropriate endpoint.
|
||||
Arch’s intent matching framework considers both the name and description of each prompt target, and uses a composite matching score between embedding similarity and intent classification scores to enchance accuracy in forwarding decisions.</p>
|
||||
Arch’s intent matching framework considers both the name and description of each prompt target, and uses a composite matching score between embedding similarity and intent classification scores to enhance accuracy in forwarding decisions.</p>
|
||||
<ul class="simple">
|
||||
<li><p><strong>Intent Recognition</strong>: NLI techniques further refine the matching process by evaluating the semantic alignment between the prompt and potential targets.</p></li>
|
||||
<li><p><strong>Text Embedding</strong>: By embedding the prompt and comparing it to known target vectors, Arch effectively identifies the closest match, ensuring that the prompt is handled by the correct downstream service.</p></li>
|
||||
|
|
@ -397,7 +397,7 @@ This setup allows you to take advantage of Arch’s advanced traffic management
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -159,7 +159,7 @@
|
|||
<span id="lifecycle-of-a-request"></span><h1>Request Lifecycle<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#request-lifecycle"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<p>Below we describe the events in the lifecycle of a request passing through an Arch gateway instance. We first
|
||||
describe how Arch fits into the request path and then the internal events that take place following
|
||||
the arrival of a request at Arch from downtream clients. We follow the request until the corresponding
|
||||
the arrival of a request at Arch from downstream clients. We follow the request until the corresponding
|
||||
dispatch upstream and the response path.</p>
|
||||
<a class="reference internal image-reference" href="../../_images/network-topology-ingress-egress.jpg"><img alt="../../_images/network-topology-ingress-egress.jpg" class="align-center" src="../../_images/network-topology-ingress-egress.jpg" style="width: 100%;"/>
|
||||
</a>
|
||||
|
|
@ -200,14 +200,14 @@ processing. It is responsible for managing the downstream (ingress) and the upst
|
|||
lifecycle. The downstream and upstream HTTP/2 codec lives here.</p></li>
|
||||
<li><p><a class="reference internal" href="prompt.html#arch-overview-prompt-handling"><span class="std std-ref">Prompt handler subsystem</span></a> which is responsible for selecting and
|
||||
forwarding prompts <code class="docutils literal notranslate"><span class="pre">prompt_targets</span></code> and establishes the lifecycle of any <strong>upstream</strong> connection to a
|
||||
hosted endpoint that implements domain-specific business logic for incoming promots. This is where knowledge
|
||||
hosted endpoint that implements domain-specific business logic for incoming prompts. This is where knowledge
|
||||
of targets and endpoint health, load balancing and connection pooling exists.</p></li>
|
||||
<li><p><a class="reference internal" href="model_serving.html#model-serving"><span class="std std-ref">Model serving subsystem</span></a> which helps Arch make intelligent decisions about the
|
||||
incoming prompts. The model server is designed to call the purpose-built LLMs in Arch.</p></li>
|
||||
</ul>
|
||||
<p>The three subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.</p>
|
||||
<p>Also, Arch utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>.
|
||||
A main thread is responsible forthe server lifecycle, configuration processing, stats, etc. and some number of
|
||||
A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of
|
||||
<a class="reference internal" href="threading_model.html#arch-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>)
|
||||
and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker
|
||||
thread maintains its own pool of TCP connections to upstream endpoints.</p>
|
||||
|
|
@ -295,7 +295,7 @@ decrypts incoming data for processing.</p></li>
|
|||
Arch first checks the incoming prompts for guardrails such as jailbreak attempts. This ensures
|
||||
that harmful or unwanted behaviors are detected early in the request processing pipeline.</p></li>
|
||||
<li><p><strong>Intent Matching</strong>:
|
||||
The decrypted data stream is deframed by the HTTP/2 codec in Arch’s HTTP connection manager. Arch performs
|
||||
The decrypted data stream is de-framed by the HTTP/2 codec in Arch’s HTTP connection manager. Arch performs
|
||||
intent matching via is <strong>prompt-handler</strong> subsystem using the name and description of the defined prompt targets,
|
||||
determining which endpoint should handle the prompt.</p></li>
|
||||
<li><p><strong>Parameter Gathering with Arch-Function</strong>:
|
||||
|
|
@ -350,7 +350,7 @@ passing it through any egress processing pipeline defined by the application, su
|
|||
<p>Once a request completes, the stream is destroyed. The following also takes places:</p>
|
||||
<ul class="simple">
|
||||
<li><p>The post-request <a class="reference internal" href="../../guides/observability/monitoring.html#monitoring"><span class="std std-ref">monitoring</span></a> are updated (e.g. timing, active requests, upgrades, health checks).
|
||||
Some statistics are updated earlier however, during request processing. Stats are batchedand written by the main
|
||||
Some statistics are updated earlier however, during request processing. Stats are batched and written by the main
|
||||
thread periodically.</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#arch-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#arch-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
|
||||
|
|
@ -397,7 +397,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -219,7 +219,7 @@
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -160,11 +160,11 @@
|
|||
<p>A few definitions before we dive into the main architecture documentation. Also note, Arch borrows from Envoy’s terminology
|
||||
to keep things consistent in logs and traces, and introduces and clarifies concepts are is relates to LLM applications.</p>
|
||||
<p><strong>Agent</strong>: An application that uses LLMs to handle wide-ranging tasks from users via prompts. This could be as simple
|
||||
as retrieving or summarizing data from an API, or being able to trigger compleix actions like adjusting ad campaigns, or
|
||||
as retrieving or summarizing data from an API, or being able to trigger complex actions like adjusting ad campaigns, or
|
||||
changing travel plans via prompts.</p>
|
||||
<p><strong>Arch Config</strong>: Arch operates based on a configuration that controls the behavior of a single instance of the Arch gateway.
|
||||
This where you enable capabilities like LLM routing, fast function calling (via prompt_targets), applying guardrails, and enabling critical
|
||||
features like metrics and tracing. For the full configuration reference of <cite>arch_config.yaml</cite> see <a class="reference internal" href="../../resources/configuration_reference.html#configuration-refernce"><span class="std std-ref">here</span></a>.</p>
|
||||
features like metrics and tracing. For the full configuration reference of <cite>arch_config.yaml</cite> see <a class="reference internal" href="../../resources/configuration_reference.html#configuration-reference"><span class="std std-ref">here</span></a>.</p>
|
||||
<p><strong>Downstream(Ingress)</strong>: An downstream client (web application, etc.) connects to Arch, sends prompts, and receives responses.</p>
|
||||
<p><strong>Upstream(Egress)</strong>: An upstream host that receives connections and prompts from Arch, and returns context or responses for a prompt</p>
|
||||
<a class="reference internal image-reference" href="../../_images/network-topology-ingress-egress.jpg"><img alt="../../_images/network-topology-ingress-egress.jpg" class="align-center" src="../../_images/network-topology-ingress-egress.jpg" style="width: 100%;"/>
|
||||
|
|
@ -183,10 +183,10 @@ For more details, check out <a class="reference internal" href="../llm_provider.
|
|||
undifferentiated work in building generative AI apps. Prompt targets are endpoints that receive prompts that are processed by Arch.
|
||||
For example, Arch enriches incoming prompts with metadata like knowing when a request is a follow-up or clarifying prompt so that you
|
||||
can build faster, more accurate retrieval (RAG) apps. To support agentic apps, like scheduling travel plans or sharing comments on a
|
||||
document - via prompts, Arch uses its function calling abilities to extract critical information fromthe incoming prompt (or a set of
|
||||
document - via prompts, Arch uses its function calling abilities to extract critical information from the incoming prompt (or a set of
|
||||
prompts) needed by a downstream backend API or function call before calling it directly.</p>
|
||||
<p><strong>Model Serving</strong>: Arch is a set of <cite>two</cite> self-contained processes that are designed to run alongside your application servers
|
||||
(or on a separate hostconnected via a network).The <a class="reference internal" href="model_serving.html#model-serving"><span class="std std-ref">model serving</span></a> process helps Arch make intelligent decisions
|
||||
(or on a separate host connected via a network).The <a class="reference internal" href="model_serving.html#model-serving"><span class="std std-ref">model serving</span></a> process helps Arch make intelligent decisions
|
||||
about the incoming prompts. The model server is designed to call the (fast) purpose-built LLMs in Arch.</p>
|
||||
<p><strong>Error Target</strong>: <a class="reference internal" href="error_target.html#error-target"><span class="std std-ref">Error targets</span></a> are those endpoints that receive forwarded errors from Arch when issues arise,
|
||||
such as failing to properly call a function/API, detecting violations of guardrails, or encountering other processing errors.
|
||||
|
|
@ -216,7 +216,7 @@ and take appropriate actions.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
|
|
@ -194,7 +194,7 @@ hardware threads on the machine.</p>
|
|||
</div><footer class="py-6 border-t border-border md:py-0">
|
||||
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
|
||||
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
|
||||
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue