mirror of
https://github.com/katanemo/plano.git
synced 2026-04-26 01:06:25 +02:00
deploy: ba651aaf71
This commit is contained in:
parent
7f81de50d5
commit
89a8885328
12 changed files with 31 additions and 31 deletions
|
|
@ -191,7 +191,7 @@ processing. It is responsible for managing the inbound(edge) and outbound(egress
|
|||
<li><p><a class="reference internal" href="model_serving.html#bright-staff"><span class="std std-ref">Bright Staff controller subsystem</span></a> is Plano’s memory-efficient, lightweight controller for agentic traffic. It sits inside the Plano data plane and makes real-time decisions about how prompts are handled, forwarded, and processed.</p></li>
|
||||
</ul>
|
||||
<p>These two subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.</p>
|
||||
<p>Also, Plano utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of <a class="reference internal" href="threading_model.html#arch-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>) and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker thread maintains its own pool of TCP connections to upstream endpoints.</p>
|
||||
<p>Also, Plano utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of <a class="reference internal" href="threading_model.html#plano-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>) and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker thread maintains its own pool of TCP connections to upstream endpoints.</p>
|
||||
<p>Worker threads rarely share state and operate in a trivially parallel fashion. This threading model
|
||||
enables scaling to very high core count CPUs.</p>
|
||||
</section>
|
||||
|
|
@ -275,8 +275,8 @@ Once the LLM processes the prompt, Plano receives the response from the LLM serv
|
|||
<li><p>The post-request <a class="reference internal" href="../../guides/observability/monitoring.html#monitoring"><span class="std std-ref">monitoring</span></a> are updated (e.g. timing, active requests, upgrades, health checks).
|
||||
Some statistics are updated earlier however, during request processing. Stats are batched and written by the main
|
||||
thread periodically.</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#arch-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#arch-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
|
||||
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#plano-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#plano-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
|
||||
trace span, describing the duration and details of the request would be created by the HCM when
|
||||
processing request headers and then finalized by the HCM during post-request processing.</p></li>
|
||||
</ul>
|
||||
|
|
@ -305,7 +305,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</span><span id="line-18"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
|
||||
</span><span id="line-19"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
|
||||
</span><span id="line-20"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/agent/summary</span>
|
||||
</span><span id="line-21"><span class="w"> </span><span class="c1"># Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
|
||||
</span><span id="line-21"><span class="w"> </span><span class="c1"># Plano uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
|
||||
</span><span id="line-22"><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
|
||||
</span><span id="line-23"><span class="w"> </span><span class="c1"># override system prompt for this prompt target</span>
|
||||
</span><span id="line-24"><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a helpful information extraction assistant. Use the information that is provided to you.</span>
|
||||
|
|
@ -326,7 +326,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</span><span id="line-39"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
|
||||
</span><span id="line-40"><span class="w"> </span><span class="nt">enum</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">true</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">false</span><span class="p p-Indicator">]</span>
|
||||
</span><span id="line-41">
|
||||
</span><span id="line-42"><span class="c1"># Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
|
||||
</span><span id="line-42"><span class="c1"># Plano creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
|
||||
</span><span id="line-43"><span class="nt">endpoints</span><span class="p">:</span>
|
||||
</span><span id="line-44"><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
|
||||
</span><span id="line-45"><span class="w"> </span><span class="c1"># value could be ip address or a hostname with port</span>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue