This commit is contained in:
salmanap 2026-02-13 23:17:25 +00:00
parent 7f81de50d5
commit 89a8885328
12 changed files with 31 additions and 31 deletions

View file

@ -1,4 +1,4 @@
# Arch Gateway configuration version
# Plano Gateway configuration version
version: v0.3.0
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)

View file

@ -221,7 +221,7 @@ and a context builder that prepares retrieval context before the agent runs.</p>
</span><span id="line-30"><span class="linenos">30</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">agent</span>
</span><span id="line-31"><span class="linenos">31</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">agent_1</span>
</span><span id="line-32"><span class="linenos">32</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8001</span>
</span><span id="line-33"><span class="linenos">33</span><span class="w"> </span><span class="nt">router</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">arch_agent_router</span>
</span><span id="line-33"><span class="linenos">33</span><span class="w"> </span><span class="nt">router</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano_agent_router</span>
</span><span id="line-34"><span class="linenos">34</span><span class="w"> </span><span class="nt">agents</span><span class="p">:</span>
</span><span id="line-35"><span class="linenos">35</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">rag_agent</span>
</span><span id="line-36"><span class="linenos">36</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">virtual assistant for retrieval augmented generation tasks</span>

View file

@ -214,7 +214,7 @@
</span><span id="line-6"> <span class="n">base_url</span><span class="o">=</span><span class="s2">"http://127.0.0.1:12000/v1"</span>
</span><span id="line-7"><span class="p">)</span>
</span><span id="line-8">
</span><span id="line-9"><span class="c1"># Use any model configured in your arch_config.yaml</span>
</span><span id="line-9"><span class="c1"># Use any model configured in your plano_config.yaml</span>
</span><span id="line-10"><span class="n">completion</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">chat</span><span class="o">.</span><span class="n">completions</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span><span id="line-11"> <span class="n">model</span><span class="o">=</span><span class="s2">"gpt-4o-mini"</span><span class="p">,</span> <span class="c1"># Or use :ref:`model aliases &lt;model_aliases&gt;` like "fast-model"</span>
</span><span id="line-12"> <span class="n">max_tokens</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
@ -372,7 +372,7 @@
</span><span id="line-6"> <span class="n">base_url</span><span class="o">=</span><span class="s2">"http://127.0.0.1:12000"</span>
</span><span id="line-7"><span class="p">)</span>
</span><span id="line-8">
</span><span id="line-9"><span class="c1"># Use any model configured in your arch_config.yaml</span>
</span><span id="line-9"><span class="c1"># Use any model configured in your plano_config.yaml</span>
</span><span id="line-10"><span class="n">message</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">messages</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span><span id="line-11"> <span class="n">model</span><span class="o">=</span><span class="s2">"claude-3-5-sonnet-20241022"</span><span class="p">,</span>
</span><span id="line-12"> <span class="n">max_tokens</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>

View file

@ -289,7 +289,7 @@ processess conversational messages on your behalf.</p>
<p>Example 1: Adjusting Retrieval</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span><code><span id="line-1">User: What are the benefits of renewable energy?
</span><span id="line-2">**[Plano]**: Check if there is an available &lt;prompt_target&gt; that can handle this user query.
</span><span id="line-3">**[Plano]**: Found "get_info_for_energy_source" prompt_target in arch_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
</span><span id="line-3">**[Plano]**: Found "get_info_for_energy_source" prompt_target in plano_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
</span><span id="line-4">...
</span><span id="line-5">Assistant: Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind.
</span><span id="line-6">
@ -303,13 +303,13 @@ processess conversational messages on your behalf.</p>
<h3>Example 2: Switching Intent<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#example-2-switching-intent" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#example-2-switching-intent'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span><code><span id="line-1">User: What are the symptoms of diabetes?
</span><span id="line-2">**[Plano]**: Check if there is an available &lt;prompt_target&gt; that can handle this user query.
</span><span id="line-3">**[Plano]**: Found "diseases_symptoms" prompt_target in arch_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
</span><span id="line-3">**[Plano]**: Found "diseases_symptoms" prompt_target in plano_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
</span><span id="line-4">...
</span><span id="line-5">Assistant: Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision.
</span><span id="line-6">
</span><span id="line-7">User: How is it diagnosed?
</span><span id="line-8">**[Plano]**: New intent detected.
</span><span id="line-9">**[Plano]**: Found "disease_diagnoses" prompt_target in arch_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
</span><span id="line-9">**[Plano]**: Found "disease_diagnoses" prompt_target in plano_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
</span><span id="line-10">...
</span><span id="line-11">Assistant: Diabetes is diagnosed through blood tests like fasting blood sugar, A1C, or an oral glucose tolerance test.
</span></code></pre></div>
@ -415,7 +415,7 @@ response from your APIs.</p>
</section>
<section id="demo-app">
<h3>Demo App<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#demo-app" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#demo-app'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>For your convenience, weve built a <a class="reference external" href="https://github.com/katanemo/archgw/tree/main/demos/samples_python/multi_turn_rag_agent" rel="nofollow noopener">demo app<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>
<p>For your convenience, weve built a <a class="reference external" href="https://github.com/katanemo/plano/tree/main/demos/samples_python/multi_turn_rag_agent" rel="nofollow noopener">demo app<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>
that you can test and modify locally for multi-turn RAG scenarios.</p>
<figure class="align-center" id="id6">
<a class="reference internal image-reference" href="../_images/mutli-turn-example.png"><img alt="../_images/mutli-turn-example.png" src="../_images/mutli-turn-example.png" style="width: 100%;"/>

View file

@ -161,7 +161,7 @@
</nav>
<div id="content" role="main">
<section id="access-logging">
<span id="arch-access-logging"></span><h1>Access Logging<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#access-logging"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<span id="plano-access-logging"></span><h1>Access Logging<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#access-logging"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<p>Access logging in Plano refers to the logging of detailed information about each request and response that flows through Plano.
It provides visibility into the traffic passing through Plano, which is crucial for monitoring, debugging, and analyzing the
behavior of AI applications and their interactions.</p>

View file

@ -161,7 +161,7 @@
</nav>
<div id="content" role="main">
<section id="tracing">
<span id="arch-overview-tracing"></span><h1>Tracing<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#tracing"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<span id="plano-overview-tracing"></span><h1>Tracing<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#tracing"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<section id="overview">
<h2>Overview<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#overview" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#overview'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p><a class="reference external" href="https://opentelemetry.io/" rel="nofollow noopener">OpenTelemetry<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> is an open-source observability framework providing APIs

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.6
llms.txt (auto-generated)
Generated (UTC): 2026-02-13T23:08:39.285024+00:00
Generated (UTC): 2026-02-13T23:17:22.564025+00:00
Table of contents
- Agents (concepts/agents)
@ -199,7 +199,7 @@ listeners:
- type: agent
name: agent_1
port: 8001
router: arch_agent_router
router: plano_agent_router
agents:
- id: rag_agent
description: virtual assistant for retrieval augmented generation tasks
@ -396,7 +396,7 @@ client = OpenAI(
base_url="http://127.0.0.1:12000/v1"
)
# Use any model configured in your arch_config.yaml
# Use any model configured in your plano_config.yaml
completion = client.chat.completions.create(
model="gpt-4o-mini", # Or use :ref:`model aliases <model_aliases>` like "fast-model"
max_tokens=50,
@ -560,7 +560,7 @@ client = anthropic.Anthropic(
base_url="http://127.0.0.1:12000"
)
# Use any model configured in your arch_config.yaml
# Use any model configured in your plano_config.yaml
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=50,
@ -2319,7 +2319,7 @@ Example 1: Adjusting Retrieval
User: What are the benefits of renewable energy?
**[Plano]**: Check if there is an available <prompt_target> that can handle this user query.
**[Plano]**: Found "get_info_for_energy_source" prompt_target in arch_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
**[Plano]**: Found "get_info_for_energy_source" prompt_target in plano_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
...
Assistant: Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind.
@ -2332,13 +2332,13 @@ Example 2: Switching Intent
User: What are the symptoms of diabetes?
**[Plano]**: Check if there is an available <prompt_target> that can handle this user query.
**[Plano]**: Found "diseases_symptoms" prompt_target in arch_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
**[Plano]**: Found "diseases_symptoms" prompt_target in plano_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
...
Assistant: Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision.
User: How is it diagnosed?
**[Plano]**: New intent detected.
**[Plano]**: Found "disease_diagnoses" prompt_target in arch_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
**[Plano]**: Found "disease_diagnoses" prompt_target in plano_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
...
Assistant: Diabetes is diagnosed through blood tests like fasting blood sugar, A1C, or an oral glucose tolerance test.
@ -5723,12 +5723,12 @@ Doc: resources/configuration_reference
Configuration Reference
The following is a complete reference of the plano_config.yml that controls the behavior of a single instance of
the Arch gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
the Plano gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
where prompts get routed to, apply guardrails, and enable critical agent observability features.
Plano Configuration - Full Reference
# Arch Gateway configuration version
# Plano Gateway configuration version
version: v0.3.0
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
@ -6142,7 +6142,7 @@ prompt_targets:
endpoint:
name: app_server
path: /agent/summary
# Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
# Plano uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
auto_llm_dispatch_on_response: true
# override system prompt for this prompt target
system_prompt: You are a helpful information extraction assistant. Use the information that is provided to you.
@ -6163,7 +6163,7 @@ prompt_targets:
default: false
enum: [true, false]
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
# Plano creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
endpoints:
app_server:
# value could be ip address or a hostname with port

Binary file not shown.

View file

@ -162,11 +162,11 @@
<section id="configuration-reference">
<span id="id1"></span><h1>Configuration Reference<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configuration-reference"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<p>The following is a complete reference of the <code class="docutils literal notranslate"><span class="pre">plano_config.yml</span></code> that controls the behavior of a single instance of
the Arch gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
the Plano gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
where prompts get routed to, apply guardrails, and enable critical agent observability features.</p>
<div class="literal-block-wrapper docutils container" id="id2">
<div class="code-block-caption"><span class="caption-text"><a class="reference download internal" download="" href="../_downloads/ca9d3b7116524473d8adbde7cf15d167/arch_config_full_reference.yaml"><code class="xref download docutils literal notranslate"><span class="pre">Plano</span> <span class="pre">Configuration</span> <span class="pre">-</span> <span class="pre">Full</span> <span class="pre">Reference</span></code></a></span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id2"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="linenos"> 1</span><span class="c1"># Arch Gateway configuration version</span>
<div class="code-block-caption"><span class="caption-text"><a class="reference download internal" download="" href="../_downloads/c86f9e8fb1f2994b1ba4a0b98481410e/plano_config_full_reference.yaml"><code class="xref download docutils literal notranslate"><span class="pre">Plano</span> <span class="pre">Configuration</span> <span class="pre">-</span> <span class="pre">Full</span> <span class="pre">Reference</span></code></a></span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id2"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="linenos"> 1</span><span class="c1"># Plano Gateway configuration version</span>
</span><span id="line-2"><span class="linenos"> 2</span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v0.3.0</span>
</span><span id="line-3"><span class="linenos"> 3</span>
</span><span id="line-4"><span class="linenos"> 4</span><span class="c1"># External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)</span>

View file

@ -191,7 +191,7 @@ processing. It is responsible for managing the inbound(edge) and outbound(egress
<li><p><a class="reference internal" href="model_serving.html#bright-staff"><span class="std std-ref">Bright Staff controller subsystem</span></a> is Planos memory-efficient, lightweight controller for agentic traffic. It sits inside the Plano data plane and makes real-time decisions about how prompts are handled, forwarded, and processed.</p></li>
</ul>
<p>These two subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.</p>
<p>Also, Plano utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of <a class="reference internal" href="threading_model.html#arch-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>) and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker thread maintains its own pool of TCP connections to upstream endpoints.</p>
<p>Also, Plano utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of <a class="reference internal" href="threading_model.html#plano-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>) and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker thread maintains its own pool of TCP connections to upstream endpoints.</p>
<p>Worker threads rarely share state and operate in a trivially parallel fashion. This threading model
enables scaling to very high core count CPUs.</p>
</section>
@ -275,8 +275,8 @@ Once the LLM processes the prompt, Plano receives the response from the LLM serv
<li><p>The post-request <a class="reference internal" href="../../guides/observability/monitoring.html#monitoring"><span class="std std-ref">monitoring</span></a> are updated (e.g. timing, active requests, upgrades, health checks).
Some statistics are updated earlier however, during request processing. Stats are batched and written by the main
thread periodically.</p></li>
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#arch-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#arch-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#plano-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#plano-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
trace span, describing the duration and details of the request would be created by the HCM when
processing request headers and then finalized by the HCM during post-request processing.</p></li>
</ul>
@ -305,7 +305,7 @@ processing request headers and then finalized by the HCM during post-request pro
</span><span id="line-18"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-19"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
</span><span id="line-20"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/agent/summary</span>
</span><span id="line-21"><span class="w"> </span><span class="c1"># Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
</span><span id="line-21"><span class="w"> </span><span class="c1"># Plano uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
</span><span id="line-22"><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-23"><span class="w"> </span><span class="c1"># override system prompt for this prompt target</span>
</span><span id="line-24"><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a helpful information extraction assistant. Use the information that is provided to you.</span>
@ -326,7 +326,7 @@ processing request headers and then finalized by the HCM during post-request pro
</span><span id="line-39"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-40"><span class="w"> </span><span class="nt">enum</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">true</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">false</span><span class="p p-Indicator">]</span>
</span><span id="line-41">
</span><span id="line-42"><span class="c1"># Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
</span><span id="line-42"><span class="c1"># Plano creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
</span><span id="line-43"><span class="nt">endpoints</span><span class="p">:</span>
</span><span id="line-44"><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
</span><span id="line-45"><span class="w"> </span><span class="c1"># value could be ip address or a hostname with port</span>

View file

@ -161,7 +161,7 @@
</nav>
<div id="content" role="main">
<section id="threading-model">
<span id="arch-overview-threading"></span><h1>Threading Model<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#threading-model"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<span id="plano-overview-threading"></span><h1>Threading Model<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#threading-model"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
<p>Plano builds on top of Envoys single process with multiple threads architecture.</p>
<p>A single <em>primary</em> thread controls various sporadic coordination tasks while some number of <em>worker</em>
threads perform filtering, and forwarding.</p>

File diff suppressed because one or more lines are too long