This commit is contained in:
salmanap 2025-04-13 06:52:52 +00:00
parent f50f1bb4a6
commit ed2124f773
29 changed files with 64 additions and 64 deletions

View file

@ -159,7 +159,7 @@
<span id="lifecycle-of-a-request"></span><h1>Request Lifecycle<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#request-lifecycle"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<p>Below we describe the events in the lifecycle of a request passing through an Arch gateway instance. We first
describe how Arch fits into the request path and then the internal events that take place following
the arrival of a request at Arch from downtream clients. We follow the request until the corresponding
the arrival of a request at Arch from downstream clients. We follow the request until the corresponding
dispatch upstream and the response path.</p>
<a class="reference internal image-reference" href="../../_images/network-topology-ingress-egress.jpg"><img alt="../../_images/network-topology-ingress-egress.jpg" class="align-center" src="../../_images/network-topology-ingress-egress.jpg" style="width: 100%;"/>
</a>
@ -200,14 +200,14 @@ processing. It is responsible for managing the downstream (ingress) and the upst
lifecycle. The downstream and upstream HTTP/2 codec lives here.</p></li>
<li><p><a class="reference internal" href="prompt.html#arch-overview-prompt-handling"><span class="std std-ref">Prompt handler subsystem</span></a> which is responsible for selecting and
forwarding prompts <code class="docutils literal notranslate"><span class="pre">prompt_targets</span></code> and establishes the lifecycle of any <strong>upstream</strong> connection to a
hosted endpoint that implements domain-specific business logic for incoming promots. This is where knowledge
hosted endpoint that implements domain-specific business logic for incoming prompts. This is where knowledge
of targets and endpoint health, load balancing and connection pooling exists.</p></li>
<li><p><a class="reference internal" href="model_serving.html#model-serving"><span class="std std-ref">Model serving subsystem</span></a> which helps Arch make intelligent decisions about the
incoming prompts. The model server is designed to call the purpose-built LLMs in Arch.</p></li>
</ul>
<p>The three subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.</p>
<p>Also, Arch utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>.
A main thread is responsible forthe server lifecycle, configuration processing, stats, etc. and some number of
A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of
<a class="reference internal" href="threading_model.html#arch-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>)
and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker
thread maintains its own pool of TCP connections to upstream endpoints.</p>
@ -295,7 +295,7 @@ decrypts incoming data for processing.</p></li>
Arch first checks the incoming prompts for guardrails such as jailbreak attempts. This ensures
that harmful or unwanted behaviors are detected early in the request processing pipeline.</p></li>
<li><p><strong>Intent Matching</strong>:
The decrypted data stream is deframed by the HTTP/2 codec in Archs HTTP connection manager. Arch performs
The decrypted data stream is de-framed by the HTTP/2 codec in Archs HTTP connection manager. Arch performs
intent matching via is <strong>prompt-handler</strong> subsystem using the name and description of the defined prompt targets,
determining which endpoint should handle the prompt.</p></li>
<li><p><strong>Parameter Gathering with Arch-Function</strong>:
@ -350,7 +350,7 @@ passing it through any egress processing pipeline defined by the application, su
<p>Once a request completes, the stream is destroyed. The following also takes places:</p>
<ul class="simple">
<li><p>The post-request <a class="reference internal" href="../../guides/observability/monitoring.html#monitoring"><span class="std std-ref">monitoring</span></a> are updated (e.g. timing, active requests, upgrades, health checks).
Some statistics are updated earlier however, during request processing. Stats are batchedand written by the main
Some statistics are updated earlier however, during request processing. Stats are batched and written by the main
thread periodically.</p></li>
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#arch-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#arch-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
@ -397,7 +397,7 @@ processing request headers and then finalized by the HCM during post-request pro
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 06, 2025. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2025, Katanemo Labs, Inc Last updated: Apr 13, 2025. </p>
</div>
</div>
</footer>