mirror of
https://github.com/katanemo/plano.git
synced 2026-06-08 14:55:14 +02:00
deploy: ba651aaf71
This commit is contained in:
parent
7f81de50d5
commit
89a8885328
12 changed files with 31 additions and 31 deletions
|
|
@ -1,4 +1,4 @@
|
|||
# Arch Gateway configuration version
|
||||
# Plano Gateway configuration version
|
||||
version: v0.3.0
|
||||
|
||||
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
|
||||
|
|
@ -221,7 +221,7 @@ and a context builder that prepares retrieval context before the agent runs.</p>
|
|||
</span><span id="line-30"><span class="linenos">30</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">agent</span>
|
||||
</span><span id="line-31"><span class="linenos">31</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">agent_1</span>
|
||||
</span><span id="line-32"><span class="linenos">32</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8001</span>
|
||||
</span><span id="line-33"><span class="linenos">33</span><span class="w"> </span><span class="nt">router</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">arch_agent_router</span>
|
||||
</span><span id="line-33"><span class="linenos">33</span><span class="w"> </span><span class="nt">router</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano_agent_router</span>
|
||||
</span><span id="line-34"><span class="linenos">34</span><span class="w"> </span><span class="nt">agents</span><span class="p">:</span>
|
||||
</span><span id="line-35"><span class="linenos">35</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">rag_agent</span>
|
||||
</span><span id="line-36"><span class="linenos">36</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">virtual assistant for retrieval augmented generation tasks</span>
|
||||
|
|
|
|||
|
|
@ -214,7 +214,7 @@
|
|||
</span><span id="line-6"> <span class="n">base_url</span><span class="o">=</span><span class="s2">"http://127.0.0.1:12000/v1"</span>
|
||||
</span><span id="line-7"><span class="p">)</span>
|
||||
</span><span id="line-8">
|
||||
</span><span id="line-9"><span class="c1"># Use any model configured in your arch_config.yaml</span>
|
||||
</span><span id="line-9"><span class="c1"># Use any model configured in your plano_config.yaml</span>
|
||||
</span><span id="line-10"><span class="n">completion</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">chat</span><span class="o">.</span><span class="n">completions</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
|
||||
</span><span id="line-11"> <span class="n">model</span><span class="o">=</span><span class="s2">"gpt-4o-mini"</span><span class="p">,</span> <span class="c1"># Or use :ref:`model aliases <model_aliases>` like "fast-model"</span>
|
||||
</span><span id="line-12"> <span class="n">max_tokens</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
|
||||
|
|
@ -372,7 +372,7 @@
|
|||
</span><span id="line-6"> <span class="n">base_url</span><span class="o">=</span><span class="s2">"http://127.0.0.1:12000"</span>
|
||||
</span><span id="line-7"><span class="p">)</span>
|
||||
</span><span id="line-8">
|
||||
</span><span id="line-9"><span class="c1"># Use any model configured in your arch_config.yaml</span>
|
||||
</span><span id="line-9"><span class="c1"># Use any model configured in your plano_config.yaml</span>
|
||||
</span><span id="line-10"><span class="n">message</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">messages</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
|
||||
</span><span id="line-11"> <span class="n">model</span><span class="o">=</span><span class="s2">"claude-3-5-sonnet-20241022"</span><span class="p">,</span>
|
||||
</span><span id="line-12"> <span class="n">max_tokens</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
|
||||
|
|
|
|||
|
|
@ -289,7 +289,7 @@ processess conversational messages on your behalf.</p>
|
|||
<p>Example 1: Adjusting Retrieval</p>
|
||||
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span><code><span id="line-1">User: What are the benefits of renewable energy?
|
||||
</span><span id="line-2">**[Plano]**: Check if there is an available <prompt_target> that can handle this user query.
|
||||
</span><span id="line-3">**[Plano]**: Found "get_info_for_energy_source" prompt_target in arch_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
|
||||
</span><span id="line-3">**[Plano]**: Found "get_info_for_energy_source" prompt_target in plano_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
|
||||
</span><span id="line-4">...
|
||||
</span><span id="line-5">Assistant: Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind.
|
||||
</span><span id="line-6">
|
||||
|
|
@ -303,13 +303,13 @@ processess conversational messages on your behalf.</p>
|
|||
<h3>Example 2: Switching Intent<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#example-2-switching-intent" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#example-2-switching-intent'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span><code><span id="line-1">User: What are the symptoms of diabetes?
|
||||
</span><span id="line-2">**[Plano]**: Check if there is an available <prompt_target> that can handle this user query.
|
||||
</span><span id="line-3">**[Plano]**: Found "diseases_symptoms" prompt_target in arch_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
|
||||
</span><span id="line-3">**[Plano]**: Found "diseases_symptoms" prompt_target in plano_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
|
||||
</span><span id="line-4">...
|
||||
</span><span id="line-5">Assistant: Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision.
|
||||
</span><span id="line-6">
|
||||
</span><span id="line-7">User: How is it diagnosed?
|
||||
</span><span id="line-8">**[Plano]**: New intent detected.
|
||||
</span><span id="line-9">**[Plano]**: Found "disease_diagnoses" prompt_target in arch_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
|
||||
</span><span id="line-9">**[Plano]**: Found "disease_diagnoses" prompt_target in plano_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
|
||||
</span><span id="line-10">...
|
||||
</span><span id="line-11">Assistant: Diabetes is diagnosed through blood tests like fasting blood sugar, A1C, or an oral glucose tolerance test.
|
||||
</span></code></pre></div>
|
||||
|
|
@ -415,7 +415,7 @@ response from your APIs.</p>
|
|||
</section>
|
||||
<section id="demo-app">
|
||||
<h3>Demo App<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#demo-app" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#demo-app'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
|
||||
<p>For your convenience, we’ve built a <a class="reference external" href="https://github.com/katanemo/archgw/tree/main/demos/samples_python/multi_turn_rag_agent" rel="nofollow noopener">demo app<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>
|
||||
<p>For your convenience, we’ve built a <a class="reference external" href="https://github.com/katanemo/plano/tree/main/demos/samples_python/multi_turn_rag_agent" rel="nofollow noopener">demo app<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>
|
||||
that you can test and modify locally for multi-turn RAG scenarios.</p>
|
||||
<figure class="align-center" id="id6">
|
||||
<a class="reference internal image-reference" href="../_images/mutli-turn-example.png"><img alt="../_images/mutli-turn-example.png" src="../_images/mutli-turn-example.png" style="width: 100%;"/>
|
||||
|
|
|
|||
|
|
@ -161,7 +161,7 @@
|
|||
</nav>
|
||||
<div id="content" role="main">
|
||||
<section id="access-logging">
|
||||
<span id="arch-access-logging"></span><h1>Access Logging<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#access-logging"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<span id="plano-access-logging"></span><h1>Access Logging<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#access-logging"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<p>Access logging in Plano refers to the logging of detailed information about each request and response that flows through Plano.
|
||||
It provides visibility into the traffic passing through Plano, which is crucial for monitoring, debugging, and analyzing the
|
||||
behavior of AI applications and their interactions.</p>
|
||||
|
|
|
|||
|
|
@ -161,7 +161,7 @@
|
|||
</nav>
|
||||
<div id="content" role="main">
|
||||
<section id="tracing">
|
||||
<span id="arch-overview-tracing"></span><h1>Tracing<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#tracing"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<span id="plano-overview-tracing"></span><h1>Tracing<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#tracing"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<section id="overview">
|
||||
<h2>Overview<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#overview" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#overview'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
|
||||
<p><a class="reference external" href="https://opentelemetry.io/" rel="nofollow noopener">OpenTelemetry<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a> is an open-source observability framework providing APIs
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
Plano Docs v0.4.6
|
||||
llms.txt (auto-generated)
|
||||
Generated (UTC): 2026-02-13T23:08:39.285024+00:00
|
||||
Generated (UTC): 2026-02-13T23:17:22.564025+00:00
|
||||
|
||||
Table of contents
|
||||
- Agents (concepts/agents)
|
||||
|
|
@ -199,7 +199,7 @@ listeners:
|
|||
- type: agent
|
||||
name: agent_1
|
||||
port: 8001
|
||||
router: arch_agent_router
|
||||
router: plano_agent_router
|
||||
agents:
|
||||
- id: rag_agent
|
||||
description: virtual assistant for retrieval augmented generation tasks
|
||||
|
|
@ -396,7 +396,7 @@ client = OpenAI(
|
|||
base_url="http://127.0.0.1:12000/v1"
|
||||
)
|
||||
|
||||
# Use any model configured in your arch_config.yaml
|
||||
# Use any model configured in your plano_config.yaml
|
||||
completion = client.chat.completions.create(
|
||||
model="gpt-4o-mini", # Or use :ref:`model aliases <model_aliases>` like "fast-model"
|
||||
max_tokens=50,
|
||||
|
|
@ -560,7 +560,7 @@ client = anthropic.Anthropic(
|
|||
base_url="http://127.0.0.1:12000"
|
||||
)
|
||||
|
||||
# Use any model configured in your arch_config.yaml
|
||||
# Use any model configured in your plano_config.yaml
|
||||
message = client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=50,
|
||||
|
|
@ -2319,7 +2319,7 @@ Example 1: Adjusting Retrieval
|
|||
|
||||
User: What are the benefits of renewable energy?
|
||||
**[Plano]**: Check if there is an available <prompt_target> that can handle this user query.
|
||||
**[Plano]**: Found "get_info_for_energy_source" prompt_target in arch_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
|
||||
**[Plano]**: Found "get_info_for_energy_source" prompt_target in plano_config.yaml. Forward prompt to the endpoint configured in "get_info_for_energy_source"
|
||||
...
|
||||
Assistant: Renewable energy reduces greenhouse gas emissions, lowers air pollution, and provides sustainable power sources like solar and wind.
|
||||
|
||||
|
|
@ -2332,13 +2332,13 @@ Example 2: Switching Intent
|
|||
|
||||
User: What are the symptoms of diabetes?
|
||||
**[Plano]**: Check if there is an available <prompt_target> that can handle this user query.
|
||||
**[Plano]**: Found "diseases_symptoms" prompt_target in arch_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
|
||||
**[Plano]**: Found "diseases_symptoms" prompt_target in plano_config.yaml. Forward disease=diabeteres to "diseases_symptoms" prompt target
|
||||
...
|
||||
Assistant: Common symptoms include frequent urination, excessive thirst, fatigue, and blurry vision.
|
||||
|
||||
User: How is it diagnosed?
|
||||
**[Plano]**: New intent detected.
|
||||
**[Plano]**: Found "disease_diagnoses" prompt_target in arch_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
|
||||
**[Plano]**: Found "disease_diagnoses" prompt_target in plano_config.yaml. Forward disease=diabeteres to "disease_diagnoses" prompt target
|
||||
...
|
||||
Assistant: Diabetes is diagnosed through blood tests like fasting blood sugar, A1C, or an oral glucose tolerance test.
|
||||
|
||||
|
|
@ -5723,12 +5723,12 @@ Doc: resources/configuration_reference
|
|||
Configuration Reference
|
||||
|
||||
The following is a complete reference of the plano_config.yml that controls the behavior of a single instance of
|
||||
the Arch gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
|
||||
the Plano gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
|
||||
where prompts get routed to, apply guardrails, and enable critical agent observability features.
|
||||
|
||||
Plano Configuration - Full Reference
|
||||
|
||||
# Arch Gateway configuration version
|
||||
# Plano Gateway configuration version
|
||||
version: v0.3.0
|
||||
|
||||
# External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)
|
||||
|
|
@ -6142,7 +6142,7 @@ prompt_targets:
|
|||
endpoint:
|
||||
name: app_server
|
||||
path: /agent/summary
|
||||
# Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
|
||||
# Plano uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM
|
||||
auto_llm_dispatch_on_response: true
|
||||
# override system prompt for this prompt target
|
||||
system_prompt: You are a helpful information extraction assistant. Use the information that is provided to you.
|
||||
|
|
@ -6163,7 +6163,7 @@ prompt_targets:
|
|||
default: false
|
||||
enum: [true, false]
|
||||
|
||||
# Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
# Plano creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.
|
||||
endpoints:
|
||||
app_server:
|
||||
# value could be ip address or a hostname with port
|
||||
|
|
|
|||
BIN
objects.inv
BIN
objects.inv
Binary file not shown.
|
|
@ -162,11 +162,11 @@
|
|||
<section id="configuration-reference">
|
||||
<span id="id1"></span><h1>Configuration Reference<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configuration-reference"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<p>The following is a complete reference of the <code class="docutils literal notranslate"><span class="pre">plano_config.yml</span></code> that controls the behavior of a single instance of
|
||||
the Arch gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
|
||||
the Plano gateway. This where you enable capabilities like routing to upstream LLm providers, defining prompt_targets
|
||||
where prompts get routed to, apply guardrails, and enable critical agent observability features.</p>
|
||||
<div class="literal-block-wrapper docutils container" id="id2">
|
||||
<div class="code-block-caption"><span class="caption-text"><a class="reference download internal" download="" href="../_downloads/ca9d3b7116524473d8adbde7cf15d167/arch_config_full_reference.yaml"><code class="xref download docutils literal notranslate"><span class="pre">Plano</span> <span class="pre">Configuration</span> <span class="pre">-</span> <span class="pre">Full</span> <span class="pre">Reference</span></code></a></span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id2"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="linenos"> 1</span><span class="c1"># Arch Gateway configuration version</span>
|
||||
<div class="code-block-caption"><span class="caption-text"><a class="reference download internal" download="" href="../_downloads/c86f9e8fb1f2994b1ba4a0b98481410e/plano_config_full_reference.yaml"><code class="xref download docutils literal notranslate"><span class="pre">Plano</span> <span class="pre">Configuration</span> <span class="pre">-</span> <span class="pre">Full</span> <span class="pre">Reference</span></code></a></span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id2"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
|
||||
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="linenos"> 1</span><span class="c1"># Plano Gateway configuration version</span>
|
||||
</span><span id="line-2"><span class="linenos"> 2</span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v0.3.0</span>
|
||||
</span><span id="line-3"><span class="linenos"> 3</span>
|
||||
</span><span id="line-4"><span class="linenos"> 4</span><span class="c1"># External HTTP agents - API type is controlled by request path (/v1/responses, /v1/messages, /v1/chat/completions)</span>
|
||||
|
|
|
|||
|
|
@ -191,7 +191,7 @@ processing. It is responsible for managing the inbound(edge) and outbound(egress
|
|||
<li><p><a class="reference internal" href="model_serving.html#bright-staff"><span class="std std-ref">Bright Staff controller subsystem</span></a> is Plano’s memory-efficient, lightweight controller for agentic traffic. It sits inside the Plano data plane and makes real-time decisions about how prompts are handled, forwarded, and processed.</p></li>
|
||||
</ul>
|
||||
<p>These two subsystems are bridged with either the HTTP router filter, and the cluster manager subsystems of Envoy.</p>
|
||||
<p>Also, Plano utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of <a class="reference internal" href="threading_model.html#arch-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>) and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker thread maintains its own pool of TCP connections to upstream endpoints.</p>
|
||||
<p>Also, Plano utilizes <a class="reference external" href="https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310" rel="nofollow noopener">Envoy event-based thread model<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>. A main thread is responsible for the server lifecycle, configuration processing, stats, etc. and some number of <a class="reference internal" href="threading_model.html#plano-overview-threading"><span class="std std-ref">worker threads</span></a> process requests. All threads operate around an event loop (<a class="reference external" href="https://libevent.org/" rel="nofollow noopener">libevent<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>) and any given downstream TCP connection will be handled by exactly one worker thread for its lifetime. Each worker thread maintains its own pool of TCP connections to upstream endpoints.</p>
|
||||
<p>Worker threads rarely share state and operate in a trivially parallel fashion. This threading model
|
||||
enables scaling to very high core count CPUs.</p>
|
||||
</section>
|
||||
|
|
@ -275,8 +275,8 @@ Once the LLM processes the prompt, Plano receives the response from the LLM serv
|
|||
<li><p>The post-request <a class="reference internal" href="../../guides/observability/monitoring.html#monitoring"><span class="std std-ref">monitoring</span></a> are updated (e.g. timing, active requests, upgrades, health checks).
|
||||
Some statistics are updated earlier however, during request processing. Stats are batched and written by the main
|
||||
thread periodically.</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#arch-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#arch-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
|
||||
<li><p><a class="reference internal" href="../../guides/observability/access_logging.html#plano-access-logging"><span class="std std-ref">Access logs</span></a> are written to the access log</p></li>
|
||||
<li><p><a class="reference internal" href="../../guides/observability/tracing.html#plano-overview-tracing"><span class="std std-ref">Trace</span></a> spans are finalized. If our example request was traced, a
|
||||
trace span, describing the duration and details of the request would be created by the HCM when
|
||||
processing request headers and then finalized by the HCM during post-request processing.</p></li>
|
||||
</ul>
|
||||
|
|
@ -305,7 +305,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</span><span id="line-18"><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
|
||||
</span><span id="line-19"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
|
||||
</span><span id="line-20"><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/agent/summary</span>
|
||||
</span><span id="line-21"><span class="w"> </span><span class="c1"># Arch uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
|
||||
</span><span id="line-21"><span class="w"> </span><span class="c1"># Plano uses the default LLM and treats the response from the endpoint as the prompt to send to the LLM</span>
|
||||
</span><span id="line-22"><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
|
||||
</span><span id="line-23"><span class="w"> </span><span class="c1"># override system prompt for this prompt target</span>
|
||||
</span><span id="line-24"><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a helpful information extraction assistant. Use the information that is provided to you.</span>
|
||||
|
|
@ -326,7 +326,7 @@ processing request headers and then finalized by the HCM during post-request pro
|
|||
</span><span id="line-39"><span class="w"> </span><span class="nt">default</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
|
||||
</span><span id="line-40"><span class="w"> </span><span class="nt">enum</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">true</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">false</span><span class="p p-Indicator">]</span>
|
||||
</span><span id="line-41">
|
||||
</span><span id="line-42"><span class="c1"># Arch creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
|
||||
</span><span id="line-42"><span class="c1"># Plano creates a round-robin load balancing between different endpoints, managed via the cluster subsystem.</span>
|
||||
</span><span id="line-43"><span class="nt">endpoints</span><span class="p">:</span>
|
||||
</span><span id="line-44"><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
|
||||
</span><span id="line-45"><span class="w"> </span><span class="c1"># value could be ip address or a hostname with port</span>
|
||||
|
|
|
|||
|
|
@ -161,7 +161,7 @@
|
|||
</nav>
|
||||
<div id="content" role="main">
|
||||
<section id="threading-model">
|
||||
<span id="arch-overview-threading"></span><h1>Threading Model<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#threading-model"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<span id="plano-overview-threading"></span><h1>Threading Model<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() => $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#threading-model"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h1>
|
||||
<p>Plano builds on top of Envoy’s single process with multiple threads architecture.</p>
|
||||
<p>A single <em>primary</em> thread controls various sporadic coordination tasks while some number of <em>worker</em>
|
||||
threads perform filtering, and forwarding.</p>
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
Loading…
Add table
Add a link
Reference in a new issue