This commit is contained in:
salmanap 2024-10-08 20:19:06 +00:00
parent f4b686c7fc
commit 3e881c6eec
28 changed files with 819 additions and 820 deletions

View file

@ -18,8 +18,8 @@
<link href="./docs/concepts/tech_overview/request_lifecycle.html" rel="canonical"/>
<link href="../../_static/favicon.ico" rel="icon"/>
<link href="../../search.html" rel="search" title="Search"/>
<link href="../llm_provider.html" rel="next" title="LLM Provider"/>
<link href="prompt.html" rel="prev" title="Prompt"/>
<link href="error_target.html" rel="next" title="Error Target"/>
<link href="model_serving.html" rel="prev" title="Model Serving"/>
<script>
<!-- Prevent Flash of wrong theme -->
const userPreference = localStorage.getItem('darkMode');
@ -101,9 +101,10 @@
<li class="toctree-l2"><a class="reference internal" href="terminology.html">Terminology</a></li>
<li class="toctree-l2"><a class="reference internal" href="threading_model.html">Threading Model</a></li>
<li class="toctree-l2"><a class="reference internal" href="listener.html">Listener</a></li>
<li class="toctree-l2"><a class="reference internal" href="prompt.html">Prompts</a></li>
<li class="toctree-l2"><a class="reference internal" href="model_serving.html">Model Serving</a></li>
<li class="toctree-l2"><a class="reference internal" href="prompt.html">Prompt</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Request Lifecycle</a></li>
<li class="toctree-l2"><a class="reference internal" href="error_target.html">Error Target</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../llm_provider.html">LLM Provider</a></li>
@ -128,7 +129,6 @@
<p class="caption" role="heading"><span class="caption-text">Resources</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../resources/configuration_reference.html">Configuration Reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../resources/error_target.html">Error Targets</a></li>
</ul>
</nav>
</div>
@ -199,7 +199,7 @@ lifecycle. The downstream and upstream HTTP/2 codec lives here.</p></li>
forwarding prompts <code class="docutils literal notranslate"><span class="pre">prompt_targets</span></code> and establishes the lifecycle of any <strong>upstream</strong> connection to a
hosted endpoint that implements domain-specific business logic for incoming promots. This is where knowledge
of targets and endpoint health, load balancing and connection pooling exists.</p></li>
<li><p><a class="reference internal" href="model_serving.html#arch-model-serving"><span class="std std-ref">Model serving subsystem</span></a> which helps Arch make intelligent decisions about the
<li><p><a class="reference internal" href="model_serving.html#model-serving"><span class="std std-ref">Model serving subsystem</span></a> which helps Arch make intelligent decisions about the
incoming prompts. The model server is designed to call the purpose-built LLMs in Arch.</p></li>
</ul>
<p>The three subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.</p>
@ -214,7 +214,7 @@ enables scaling to very high core count CPUs.</p>
<section id="configuration">
<h2>Configuration<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configuration" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#configuration'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>Today, only support a static bootstrap configuration file for simplicity today:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"0.1-beta"</span>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v0.1</span>
</span><span id="line-2">
</span><span id="line-3"><span class="nt">listener</span><span class="p">:</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span><span class="w"> </span><span class="c1"># or 127.0.0.1</span>
@ -224,67 +224,64 @@ enables scaling to very high core count CPUs.</p>
</span><span id="line-8">
</span><span id="line-9"><span class="c1"># Centralized way to manage LLMs, manage keys, retry logic, failover and limits in a central way</span>
</span><span id="line-10"><span class="nt">llm_providers</span><span class="p">:</span>
</span><span id="line-11"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"OpenAI"</span>
</span><span id="line-12"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="s">"openai"</span>
</span><span id="line-13"><span class="w"> </span><span class="nt">access_key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">$OPENAI_API_KEY</span>
</span><span id="line-11"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">OpenAI</span>
</span><span id="line-12"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai</span>
</span><span id="line-13"><span class="w"> </span><span class="nt">access_key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">OPENAI_API_KEY</span>
</span><span id="line-14"><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gpt-4o</span>
</span><span id="line-15"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-16"><span class="w"> </span><span class="nt">stream</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-17">
</span><span id="line-18"><span class="c1"># default system prompt used by all prompt targets</span>
</span><span id="line-19"><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
</span><span id="line-20"><span class="w"> </span><span class="no">You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.</span>
</span><span id="line-21">
</span><span id="line-22"><span class="nt">prompt_guards</span><span class="p">:</span>
</span><span id="line-23"><span class="w"> </span><span class="nt">input_guards</span><span class="p">:</span>
</span><span id="line-24"><span class="w"> </span><span class="nt">jailbreak</span><span class="p">:</span>
</span><span id="line-25"><span class="w"> </span><span class="nt">on_exception</span><span class="p">:</span>
</span><span id="line-26"><span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s">"Looks</span><span class="nv"> </span><span class="s">like</span><span class="nv"> </span><span class="s">you're</span><span class="nv"> </span><span class="s">curious</span><span class="nv"> </span><span class="s">about</span><span class="nv"> </span><span class="s">my</span><span class="nv"> </span><span class="s">abilities,</span><span class="nv"> </span><span class="s">but</span><span class="nv"> </span><span class="s">I</span><span class="nv"> </span><span class="s">can</span><span class="nv"> </span><span class="s">only</span><span class="nv"> </span><span class="s">provide</span><span class="nv"> </span><span class="s">assistance</span><span class="nv"> </span><span class="s">within</span><span class="nv"> </span><span class="s">my</span><span class="nv"> </span><span class="s">programmed</span><span class="nv"> </span><span class="s">parameters."</span>
</span><span id="line-27">
</span><span id="line-28"><span class="nt">prompt_targets</span><span class="p">:</span>
</span><span id="line-29"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"reboot_network_device"</span>
</span><span id="line-30"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s">"Helps</span><span class="nv"> </span><span class="s">network</span><span class="nv"> </span><span class="s">operators</span><span class="nv"> </span><span class="s">perform</span><span class="nv"> </span><span class="s">device</span><span class="nv"> </span><span class="s">operations</span><span class="nv"> </span><span class="s">like</span><span class="nv"> </span><span class="s">rebooting</span><span class="nv"> </span><span class="s">a</span><span class="nv"> </span><span class="s">device."</span>
</span><span id="line-19"><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a network assistant that just offers facts; not advice on manufacturers or purchasing decisions.</span>
</span><span id="line-20">
</span><span id="line-21"><span class="nt">prompt_guards</span><span class="p">:</span>
</span><span id="line-22"><span class="w"> </span><span class="nt">input_guards</span><span class="p">:</span>
</span><span id="line-23"><span class="w"> </span><span class="nt">jailbreak</span><span class="p">:</span>
</span><span id="line-24"><span class="w"> </span><span class="nt">on_exception</span><span class="p">:</span>
</span><span id="line-25"><span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Looks like you're curious about my abilities, but I can only provide assistance within my programmed parameters.</span>
</span><span id="line-26">
</span><span id="line-27"><span class="nt">prompt_targets</span><span class="p">:</span>
</span><span id="line-28"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">information_extraction</span>
</span><span id="line-29"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-30"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">handel all scenarios that are question and answer in nature. Like summarization, information extraction, etc.</span>
</span><span id="line-31"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-32"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
</span><span id="line-33"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s">"/agent/action"</span>
</span><span id="line-34"><span class="w"> </span><span class="nt">parameters</span><span class="p">:</span>
</span><span id="line-35"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"device_id"</span>
</span><span id="line-36"><span class="w"> </span><span class="c1"># additional type options include: int | float | bool | string | list | dict</span>
</span><span id="line-37"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="s">"string"</span>
</span><span id="line-38"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s">"Identifier</span><span class="nv"> </span><span class="s">of</span><span class="nv"> </span><span class="s">the</span><span class="nv"> </span><span class="s">network</span><span class="nv"> </span><span class="s">device</span><span class="nv"> </span><span class="s">to</span><span class="nv"> </span><span class="s">reboot."</span>
</span><span id="line-39"><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-40"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"confirmation"</span>
</span><span id="line-41"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="s">"string"</span>
</span><span id="line-42"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s">"Confirmation</span><span class="nv"> </span><span class="s">flag</span><span class="nv"> </span><span class="s">to</span><span class="nv"> </span><span class="s">proceed</span><span class="nv"> </span><span class="s">with</span><span class="nv"> </span><span class="s">reboot."</span>
</span><span id="line-43"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="s">"no"</span>
</span><span id="line-44"><span class="w"> </span><span class="nt">enum</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">yes</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">no</span><span class="p p-Indicator">]</span>
</span><span id="line-45">
</span><span id="line-46"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="s">"information_extraction"</span>
</span><span id="line-47"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-48"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="s">"This</span><span class="nv"> </span><span class="s">prompt</span><span class="nv"> </span><span class="s">handles</span><span class="nv"> </span><span class="s">all</span><span class="nv"> </span><span class="s">scenarios</span><span class="nv"> </span><span class="s">that</span><span class="nv"> </span><span class="s">are</span><span class="nv"> </span><span class="s">question</span><span class="nv"> </span><span class="s">and</span><span class="nv"> </span><span class="s">answer</span><span class="nv"> </span><span class="s">in</span><span class="nv"> </span><span class="s">nature.</span><span class="nv"> </span><span class="s">Like</span><span class="nv"> </span><span class="s">summarization,</span><span class="nv"> </span><span class="s">information</span><span class="nv"> </span><span class="s">extraction,</span><span class="nv"> </span><span class="s">etc."</span>
</span><span id="line-49"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-50"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
</span><span id="line-51"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="s">"/agent/summary"</span>
</span><span id="line-52"><span class="w"> </span><span class="c1"># Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
</span><span id="line-53"><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-54"><span class="w"> </span><span class="c1"># override system prompt for this prompt target</span>
</span><span id="line-55"><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
</span><span id="line-56"><span class="w"> </span><span class="no">You are a helpful information extraction assistant. Use the information that is provided to you.</span>
</span><span id="line-57">
</span><span id="line-58"><span class="nt">error_target</span><span class="p">:</span>
</span><span id="line-59"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-60"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">error_target_1</span>
</span><span id="line-61"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/error</span>
</span><span id="line-62">
</span><span id="line-63"><span class="c1"># Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
</span><span id="line-64"><span class="nt">endpoints</span><span class="p">:</span>
</span><span id="line-65"><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
</span><span id="line-66"><span class="w"> </span><span class="c1"># value could be ip address or a hostname with port</span>
</span><span id="line-67"><span class="w"> </span><span class="c1"># this could also be a list of endpoints for load balancing</span>
</span><span id="line-68"><span class="w"> </span><span class="c1"># for example endpoint: [ ip1:port, ip2:port ]</span>
</span><span id="line-69"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="s">"127.0.0.1:80"</span>
</span><span id="line-70"><span class="w"> </span><span class="c1"># max time to wait for a connection to be established</span>
</span><span id="line-71"><span class="w"> </span><span class="nt">connect_timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.005s</span>
</span><span id="line-33"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/agent/summary</span>
</span><span id="line-34"><span class="w"> </span><span class="c1"># Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
</span><span id="line-35"><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-36"><span class="w"> </span><span class="c1"># override system prompt for this prompt target</span>
</span><span id="line-37"><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a helpful information extraction assistant. Use the information that is provided to you.</span>
</span><span id="line-38">
</span><span id="line-39"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">reboot_network_device</span>
</span><span id="line-40"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Reboot a specific network device</span>
</span><span id="line-41"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-42"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
</span><span id="line-43"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/agent/action</span>
</span><span id="line-44"><span class="w"> </span><span class="nt">parameters</span><span class="p">:</span>
</span><span id="line-45"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">device_id</span>
</span><span id="line-46"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">str</span>
</span><span id="line-47"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Identifier of the network device to reboot.</span>
</span><span id="line-48"><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-49"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">confirmation</span>
</span><span id="line-50"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">bool</span>
</span><span id="line-51"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Confirmation flag to proceed with reboot.</span>
</span><span id="line-52"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-53"><span class="w"> </span><span class="nt">enum</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">true</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">false</span><span class="p p-Indicator">]</span>
</span><span id="line-54">
</span><span id="line-55"><span class="nt">error_target</span><span class="p">:</span>
</span><span id="line-56"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-57"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">error_target_1</span>
</span><span id="line-58"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/error</span>
</span><span id="line-59">
</span><span id="line-60"><span class="c1"># Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
</span><span id="line-61"><span class="nt">endpoints</span><span class="p">:</span>
</span><span id="line-62"><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
</span><span id="line-63"><span class="w"> </span><span class="c1"># value could be ip address or a hostname with port</span>
</span><span id="line-64"><span class="w"> </span><span class="c1"># this could also be a list of endpoints for load balancing</span>
</span><span id="line-65"><span class="w"> </span><span class="c1"># for example endpoint: [ ip1:port, ip2:port ]</span>
</span><span id="line-66"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">127.0.0.1:80</span>
</span><span id="line-67"><span class="w"> </span><span class="c1"># max time to wait for a connection to be established</span>
</span><span id="line-68"><span class="w"> </span><span class="nt">connect_timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.005s</span>
</span></code></pre></div>
</div>
</section>
@ -374,16 +371,16 @@ processing request headers and then finalized by the HCM during post-request pro
</section>
</div><div class="flex justify-between items-center pt-6 mt-12 border-t border-border gap-4">
<div class="mr-auto">
<a class="inline-flex items-center justify-center rounded-md text-sm font-medium transition-colors border border-input hover:bg-accent hover:text-accent-foreground py-2 px-4" href="prompt.html">
<a class="inline-flex items-center justify-center rounded-md text-sm font-medium transition-colors border border-input hover:bg-accent hover:text-accent-foreground py-2 px-4" href="model_serving.html">
<svg class="mr-2 h-4 w-4" fill="none" height="24" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="2" viewbox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg">
<polyline points="15 18 9 12 15 6"></polyline>
</svg>
Prompt
Model Serving
</a>
</div>
<div class="ml-auto">
<a class="inline-flex items-center justify-center rounded-md text-sm font-medium transition-colors border border-input hover:bg-accent hover:text-accent-foreground py-2 px-4" href="../llm_provider.html">
LLM Provider
<a class="inline-flex items-center justify-center rounded-md text-sm font-medium transition-colors border border-input hover:bg-accent hover:text-accent-foreground py-2 px-4" href="error_target.html">
Error Target
<svg class="ml-2 h-4 w-4" fill="none" height="24" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="2" viewbox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg">
<polyline points="9 18 15 12 9 6"></polyline>
</svg>