This commit is contained in:
Spherrrical 2026-06-24 17:14:50 +00:00
parent 3f910c4943
commit e8c1f79969
6 changed files with 561 additions and 159 deletions

View file

@ -86,6 +86,24 @@ routing_preferences:
selection_policy:
prefer: cheapest
# model_metrics_sources: external catalogs the router reads to reorder candidate
# models for selection_policy.prefer. A `cost` source ranks `prefer: cheapest`;
# a `latency` source ranks `prefer: fastest`. Both are optional.
model_metrics_sources:
# Cost catalog. provider: models.dev | digitalocean (default url per provider).
- type: cost
provider: models.dev
url: https://models.dev/api.json # optional; omit to use the provider default
refresh_interval: 3600 # optional, seconds
model_aliases: # optional: catalog key -> Plano model name
openai/gpt-oss-120b: openai/gpt-4o
# Latency catalog (Prometheus). Used for selection_policy.prefer: fastest.
- type: latency
provider: prometheus
url: http://prometheus:9090
query: avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))
refresh_interval: 60
# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
listeners:
# Agent listener for routing requests to multiple agents

View file

@ -353,6 +353,174 @@
</section>
</section>
</section>
<section id="cost-and-latency-aware-selection">
<span id="cost-latency-aware-selection"></span><h2>Cost- and latency-aware selection<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#cost-and-latency-aware-selection" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#cost-and-latency-aware-selection'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>When a route lists more than one candidate model, you can let Plano reorder that
candidate pool using <strong>live cost or latency data</strong> instead of relying solely on the
order you wrote them in. This is controlled per route with <code class="docutils literal notranslate"><span class="pre">selection_policy</span></code> and
backed by one or more <code class="docutils literal notranslate"><span class="pre">model_metrics_sources</span></code>.</p>
<p>This is useful when several models are equally capable for a route and you want Plano
to always reach for the cheapest (or fastest) option first, with the others kept as
fallbacks.</p>
<section id="selection-policy">
<h3>Selection policy<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#selection-policy" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#selection-policy'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>Attach an optional <code class="docutils literal notranslate"><span class="pre">selection_policy</span></code> to any entry in <code class="docutils literal notranslate"><span class="pre">routing_preferences</span></code>:</p>
<div class="literal-block-wrapper docutils container" id="id12">
<div class="code-block-caption"><span class="caption-text">Per-route selection policy</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id12"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">routing_preferences</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">code review</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">reviewing, analyzing, and suggesting improvements to existing code</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">models</span><span class="p">:</span>
</span><span id="line-5"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">anthropic/claude-sonnet-4-5</span>
</span><span id="line-6"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">groq/llama-3.3-70b-versatile</span>
</span><span id="line-7"><span class="w"> </span><span class="nt">selection_policy</span><span class="p">:</span>
</span><span id="line-8"><span class="w"> </span><span class="nt">prefer</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cheapest</span><span class="w"> </span><span class="c1"># cheapest | fastest | none</span>
</span></code></pre></div>
</div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">prefer</span></code> accepts:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">cheapest</span></code> — order candidates by total price (input + output rate) ascending, using a <code class="docutils literal notranslate"><span class="pre">cost</span></code> metrics source.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fastest</span></code> — order candidates by observed latency ascending, using a <code class="docutils literal notranslate"><span class="pre">latency</span></code> metrics source.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">none</span></code> (default) — keep the order you declared; no reordering.</p></li>
</ul>
<p>Models that have no data in the selected source are ranked <strong>last</strong>, in their original
order, so routing always degrades gracefully rather than dropping a candidate.</p>
</section>
<section id="configuring-the-pricing-source">
<h3>Configuring the pricing source<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#configuring-the-pricing-source" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#configuring-the-pricing-source'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p><code class="docutils literal notranslate"><span class="pre">cheapest</span></code> routing needs a price catalog. Planos <strong>default pricing provider is
DigitalOcean</strong> — its GenAI model catalog is public (no API key, no signup), so cost data
is available out of the box and is what <code class="docutils literal notranslate"><span class="pre">planoai</span> <span class="pre">obs</span></code> uses if you dont configure
anything. The pricing source is fully swappable: point Plano at <a class="reference external" href="https://models.dev/" rel="nofollow noopener">models.dev<svg fill="currentColor" height="1em" stroke="none" viewbox="0 96 960 960" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M188 868q-11-11-11-28t11-28l436-436H400q-17 0-28.5-11.5T360 336q0-17 11.5-28.5T400 296h320q17 0 28.5 11.5T760 336v320q0 17-11.5 28.5T720 696q-17 0-28.5-11.5T680 656V432L244 868q-11 11-28 11t-28-11Z"></path></svg></a>,
or at <strong>any endpoint that exposes a supported pricing structure</strong>.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">provider</span></code> field selects which response schema Plano expects (and therefore how it
parses the catalog); the optional <code class="docutils literal notranslate"><span class="pre">url</span></code> lets you override the endpoint — for example to
use a mirror, a cached copy, or an internal catalog service that returns the same shape.</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 18.0%"/>
<col style="width: 34.0%"/>
<col style="width: 28.0%"/>
<col style="width: 20.0%"/>
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p><code class="docutils literal notranslate"><span class="pre">provider</span></code></p></th>
<th class="head"><p>Default catalog URL</p></th>
<th class="head"><p>Key format</p></th>
<th class="head"><p>Expected structure</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">digitalocean</span></code> <em>(default)</em></p></td>
<td><p>DigitalOcean GenAI model catalog</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">lowercase(creator)/model_id</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">{</span> <span class="pre">data:</span> <span class="pre">[</span> <span class="pre">{</span> <span class="pre">model_id,</span> <span class="pre">pricing:</span> <span class="pre">{</span> <span class="pre">input_price_per_million,</span> <span class="pre">output_price_per_million</span> <span class="pre">}</span> <span class="pre">}</span> <span class="pre">]</span> <span class="pre">}</span></code></p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">models.dev</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">https://models.dev/api.json</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">creator/model</span></code> (e.g. <code class="docutils literal notranslate"><span class="pre">anthropic/claude-sonnet-4-5</span></code>)</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">{</span> <span class="pre">&lt;provider&gt;:</span> <span class="pre">{</span> <span class="pre">models:</span> <span class="pre">{</span> <span class="pre">&lt;model&gt;:</span> <span class="pre">{</span> <span class="pre">cost:</span> <span class="pre">{</span> <span class="pre">input,</span> <span class="pre">output</span> <span class="pre">}</span> <span class="pre">}</span> <span class="pre">}</span> <span class="pre">}</span> <span class="pre">}</span></code></p></td>
</tr>
</tbody>
</table>
<p>Because the source is selected per <code class="docutils literal notranslate"><span class="pre">provider</span></code>, switching is a one-line change. To stay
on the default DigitalOcean catalog you can omit <code class="docutils literal notranslate"><span class="pre">model_metrics_sources</span></code> entirely for
<code class="docutils literal notranslate"><span class="pre">planoai</span> <span class="pre">obs</span></code>, or declare it explicitly for routing:</p>
<div class="literal-block-wrapper docutils container" id="id13">
<div class="code-block-caption"><span class="caption-text">Default cost source (DigitalOcean)</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id13"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">model_metrics_sources</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cost</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">digitalocean</span><span class="w"> </span><span class="c1"># default; uses the public DO GenAI catalog</span>
</span></code></pre></div>
</div>
</div>
<p>To switch to models.dev — an open, community-maintained catalog covering a broad range of
providers and models — change the <code class="docutils literal notranslate"><span class="pre">provider</span></code> (and optionally <code class="docutils literal notranslate"><span class="pre">url</span></code>):</p>
<div class="literal-block-wrapper docutils container" id="id14">
<div class="code-block-caption"><span class="caption-text">Cost source backed by models.dev</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id14"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">model_metrics_sources</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cost</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">models.dev</span><span class="w"> </span><span class="c1"># models.dev | digitalocean</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https://models.dev/api.json</span><span class="w"> </span><span class="c1"># optional; defaults per provider</span>
</span><span id="line-5"><span class="w"> </span><span class="nt">refresh_interval</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3600</span><span class="w"> </span><span class="c1"># optional, seconds; refetch on this interval</span>
</span><span id="line-6"><span class="w"> </span><span class="nt">model_aliases</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional; see below</span>
</span><span id="line-7"><span class="w"> </span><span class="nt">openai/gpt-oss-120b</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o</span>
</span></code></pre></div>
</div>
</div>
<p>To use your own endpoint, pick the <code class="docutils literal notranslate"><span class="pre">provider</span></code> whose structure your endpoint matches and
override <code class="docutils literal notranslate"><span class="pre">url</span></code> — Plano parses the response with that providers schema:</p>
<div class="literal-block-wrapper docutils container" id="id15">
<div class="code-block-caption"><span class="caption-text">Custom endpoint exposing the DigitalOcean catalog structure</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id15"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">model_metrics_sources</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cost</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">digitalocean</span><span class="w"> </span><span class="c1"># selects the DO response schema</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https://catalog.internal.example.com/pricing</span>
</span></code></pre></div>
</div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The cost metric used for ranking is the sum of the input and output per-million-token
rates — a relative signal for ordering candidates, not a per-request bill. For actual
per-request cost, see the observability console below.</p>
</div>
</section>
<section id="matching-catalog-keys-to-your-models">
<h3>Matching catalog keys to your models<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#matching-catalog-keys-to-your-models" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#matching-catalog-keys-to-your-models'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>The router looks up each candidate model by the exact name you use in
<code class="docutils literal notranslate"><span class="pre">routing_preferences</span></code> (e.g. <code class="docutils literal notranslate"><span class="pre">anthropic/claude-sonnet-4-5</span></code>). models.dev keys models as
<code class="docutils literal notranslate"><span class="pre">creator/model</span></code>, which lines up with Planos <code class="docutils literal notranslate"><span class="pre">provider/model</span></code> naming, so most models
match automatically.</p>
<p>When a catalog key does not match your model name — for example a version skew, or an
open-weight model you serve under a different provider — use <code class="docutils literal notranslate"><span class="pre">model_aliases</span></code> to map the
<strong>catalog key</strong> to the <strong>Plano model name</strong> used in your routing preferences:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">model_metrics_sources</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cost</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">models.dev</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">model_aliases</span><span class="p">:</span>
</span><span id="line-5"><span class="w"> </span><span class="c1"># catalog key : plano model name</span>
</span><span id="line-6"><span class="w"> </span><span class="nt">openai/gpt-oss-120b</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o</span>
</span></code></pre></div>
</div>
</section>
<section id="latency-source">
<h3>Latency source<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#latency-source" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#latency-source'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p><code class="docutils literal notranslate"><span class="pre">fastest</span></code> routing reads observed latency from a Prometheus instance. Provide the query
that returns a per-model latency value (lower is faster), labelled by <code class="docutils literal notranslate"><span class="pre">model_name</span></code>:</p>
<div class="literal-block-wrapper docutils container" id="id16">
<div class="code-block-caption"><span class="caption-text">Latency source backed by Prometheus</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id16"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">model_metrics_sources</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">latency</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">prometheus</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://prometheus:9090</span>
</span><span id="line-5"><span class="w"> </span><span class="nt">query</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))</span>
</span><span id="line-6"><span class="w"> </span><span class="nt">refresh_interval</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">60</span>
</span></code></pre></div>
</div>
</div>
<p>You can declare both a <code class="docutils literal notranslate"><span class="pre">cost</span></code> and a <code class="docutils literal notranslate"><span class="pre">latency</span></code> source at the same time; each route
picks whichever it needs based on its <code class="docutils literal notranslate"><span class="pre">selection_policy</span></code>.</p>
</section>
<section id="cost-in-the-observability-console">
<h3>Cost in the observability console<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#cost-in-the-observability-console" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#cost-in-the-observability-console'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p><code class="docutils literal notranslate"><span class="pre">planoai</span> <span class="pre">obs</span></code> displays a per-request USD cost column derived from the same pricing
catalog. By default it reads the <code class="docutils literal notranslate"><span class="pre">cost</span></code> source from your config (the first
<code class="docutils literal notranslate"><span class="pre">type:</span> <span class="pre">cost</span></code> entry under <code class="docutils literal notranslate"><span class="pre">model_metrics_sources</span></code>); you can also override it on the
command line:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="c1"># Use the cost source from ./config.yaml (default)</span>
</span><span id="line-2">planoai<span class="w"> </span>obs
</span><span id="line-3">
</span><span id="line-4"><span class="c1"># Or override the provider / endpoint explicitly</span>
</span><span id="line-5">planoai<span class="w"> </span>obs<span class="w"> </span>--pricing-provider<span class="w"> </span>models.dev
</span><span id="line-6">planoai<span class="w"> </span>obs<span class="w"> </span>--pricing-url<span class="w"> </span>https://models.dev/api.json
</span></code></pre></div>
</div>
<p>If no source is configured and no override is given, <code class="docutils literal notranslate"><span class="pre">planoai</span> <span class="pre">obs</span></code> falls back to the
DigitalOcean catalog so the cost column still populates out of the box.</p>
</section>
</section>
<section id="id7">
<h2>Plano-Orchestrator<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id7" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#id7'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>Plano-Orchestrator is a <strong>preference-based routing model</strong> specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.</p>
@ -587,8 +755,8 @@ instead of a file.</p></li>
<section id="combining-routing-methods">
<h2>Combining Routing Methods<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#combining-routing-methods" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#combining-routing-methods'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
<p>You can combine static model selection with dynamic routing preferences for maximum flexibility:</p>
<div class="literal-block-wrapper docutils container" id="id12">
<div class="code-block-caption"><span class="caption-text">Hybrid Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id12"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="literal-block-wrapper docutils container" id="id17">
<div class="code-block-caption"><span class="caption-text">Hybrid Routing Configuration</span><a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#id17"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></div>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">v0.4.0</span>
</span><span id="line-2">
</span><span id="line-3"><span class="nt">model_providers</span><span class="p">:</span>
@ -723,6 +891,14 @@ instead of a file.</p></li>
</li>
</ul>
</li>
<li><a :data-current="activeSection === '#cost-and-latency-aware-selection'" class="reference internal" href="#cost-and-latency-aware-selection">Cost- and latency-aware selection</a><ul>
<li><a :data-current="activeSection === '#selection-policy'" class="reference internal" href="#selection-policy">Selection policy</a></li>
<li><a :data-current="activeSection === '#configuring-the-pricing-source'" class="reference internal" href="#configuring-the-pricing-source">Configuring the pricing source</a></li>
<li><a :data-current="activeSection === '#matching-catalog-keys-to-your-models'" class="reference internal" href="#matching-catalog-keys-to-your-models">Matching catalog keys to your models</a></li>
<li><a :data-current="activeSection === '#latency-source'" class="reference internal" href="#latency-source">Latency source</a></li>
<li><a :data-current="activeSection === '#cost-in-the-observability-console'" class="reference internal" href="#cost-in-the-observability-console">Cost in the observability console</a></li>
</ul>
</li>
<li><a :data-current="activeSection === '#id7'" class="reference internal" href="#id7">Plano-Orchestrator</a></li>
<li><a :data-current="activeSection === '#self-hosting-plano-orchestrator'" class="reference internal" href="#self-hosting-plano-orchestrator">Self-hosting Plano-Orchestrator</a><ul>
<li><a :data-current="activeSection === '#using-ollama-recommended-for-local-development'" class="reference internal" href="#using-ollama-recommended-for-local-development">Using Ollama (recommended for local development)</a></li>

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.25
llms.txt (auto-generated)
Generated (UTC): 2026-06-24T17:14:10.499782+00:00
Generated (UTC): 2026-06-24T17:14:45.476309+00:00
Table of contents
- Agents (concepts/agents)
@ -4287,6 +4287,178 @@ response = client.chat.completions.create(
# No model specified - router will analyze and choose claude-sonnet-4-5
)
Cost- and latency-aware selection
When a route lists more than one candidate model, you can let Plano reorder that
candidate pool using live cost or latency data instead of relying solely on the
order you wrote them in. This is controlled per route with selection_policy and
backed by one or more model_metrics_sources.
This is useful when several models are equally capable for a route and you want Plano
to always reach for the cheapest (or fastest) option first, with the others kept as
fallbacks.
Selection policy
Attach an optional selection_policy to any entry in routing_preferences:
Per-route selection policy
routing_preferences:
- name: code review
description: reviewing, analyzing, and suggesting improvements to existing code
models:
- anthropic/claude-sonnet-4-5
- groq/llama-3.3-70b-versatile
selection_policy:
prefer: cheapest # cheapest | fastest | none
prefer accepts:
cheapest — order candidates by total price (input + output rate) ascending, using a cost metrics source.
fastest — order candidates by observed latency ascending, using a latency metrics source.
none (default) — keep the order you declared; no reordering.
Models that have no data in the selected source are ranked last, in their original
order, so routing always degrades gracefully rather than dropping a candidate.
Configuring the pricing source
cheapest routing needs a price catalog. Planos default pricing provider is
DigitalOcean — its GenAI model catalog is public (no API key, no signup), so cost data
is available out of the box and is what planoai obs uses if you dont configure
anything. The pricing source is fully swappable: point Plano at models.dev,
or at any endpoint that exposes a supported pricing structure.
The provider field selects which response schema Plano expects (and therefore how it
parses the catalog); the optional url lets you override the endpoint — for example to
use a mirror, a cached copy, or an internal catalog service that returns the same shape.
provider
Default catalog URL
Key format
Expected structure
digitalocean (default)
DigitalOcean GenAI model catalog
lowercase(creator)/model_id
{ data: [ { model_id, pricing: { input_price_per_million, output_price_per_million } } ] }
models.dev
https://models.dev/api.json
creator/model (e.g. anthropic/claude-sonnet-4-5)
{ <provider>: { models: { <model>: { cost: { input, output } } } } }
Because the source is selected per provider, switching is a one-line change. To stay
on the default DigitalOcean catalog you can omit model_metrics_sources entirely for
planoai obs, or declare it explicitly for routing:
Default cost source (DigitalOcean)
model_metrics_sources:
- type: cost
provider: digitalocean # default; uses the public DO GenAI catalog
To switch to models.dev — an open, community-maintained catalog covering a broad range of
providers and models — change the provider (and optionally url):
Cost source backed by models.dev
model_metrics_sources:
- type: cost
provider: models.dev # models.dev | digitalocean
url: https://models.dev/api.json # optional; defaults per provider
refresh_interval: 3600 # optional, seconds; refetch on this interval
model_aliases: # optional; see below
openai/gpt-oss-120b: openai/gpt-4o
To use your own endpoint, pick the provider whose structure your endpoint matches and
override url — Plano parses the response with that providers schema:
Custom endpoint exposing the DigitalOcean catalog structure
model_metrics_sources:
- type: cost
provider: digitalocean # selects the DO response schema
url: https://catalog.internal.example.com/pricing
The cost metric used for ranking is the sum of the input and output per-million-token
rates — a relative signal for ordering candidates, not a per-request bill. For actual
per-request cost, see the observability console below.
Matching catalog keys to your models
The router looks up each candidate model by the exact name you use in
routing_preferences (e.g. anthropic/claude-sonnet-4-5). models.dev keys models as
creator/model, which lines up with Planos provider/model naming, so most models
match automatically.
When a catalog key does not match your model name — for example a version skew, or an
open-weight model you serve under a different provider — use model_aliases to map the
catalog key to the Plano model name used in your routing preferences:
model_metrics_sources:
- type: cost
provider: models.dev
model_aliases:
# catalog key : plano model name
openai/gpt-oss-120b: openai/gpt-4o
Latency source
fastest routing reads observed latency from a Prometheus instance. Provide the query
that returns a per-model latency value (lower is faster), labelled by model_name:
Latency source backed by Prometheus
model_metrics_sources:
- type: latency
provider: prometheus
url: http://prometheus:9090
query: avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))
refresh_interval: 60
You can declare both a cost and a latency source at the same time; each route
picks whichever it needs based on its selection_policy.
Cost in the observability console
planoai obs displays a per-request USD cost column derived from the same pricing
catalog. By default it reads the cost source from your config (the first
type: cost entry under model_metrics_sources); you can also override it on the
command line:
# Use the cost source from ./config.yaml (default)
planoai obs
# Or override the provider / endpoint explicitly
planoai obs --pricing-provider models.dev
planoai obs --pricing-url https://models.dev/api.json
If no source is configured and no override is given, planoai obs falls back to the
DigitalOcean catalog so the cost column still populates out of the box.
Plano-Orchestrator
Plano-Orchestrator is a preference-based routing model specifically designed to address the limitations of traditional LLM routing. It delivers production-ready performance with low latency and high accuracy while solving key routing challenges.
@ -7072,6 +7244,24 @@ routing_preferences:
selection_policy:
prefer: cheapest
# model_metrics_sources: external catalogs the router reads to reorder candidate
# models for selection_policy.prefer. A `cost` source ranks `prefer: cheapest`;
# a `latency` source ranks `prefer: fastest`. Both are optional.
model_metrics_sources:
# Cost catalog. provider: models.dev | digitalocean (default url per provider).
- type: cost
provider: models.dev
url: https://models.dev/api.json # optional; omit to use the provider default
refresh_interval: 3600 # optional, seconds
model_aliases: # optional: catalog key -> Plano model name
openai/gpt-oss-120b: openai/gpt-4o
# Latency catalog (Prometheus). Used for selection_policy.prefer: fastest.
- type: latency
provider: prometheus
url: http://prometheus:9090
query: avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))
refresh_interval: 60
# HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access
listeners:
# Agent listener for routing requests to multiple agents

Binary file not shown.

View file

@ -273,163 +273,181 @@ credentials.</p>
</span><span id="line-86"><span class="linenos"> 86</span><span class="w"> </span><span class="nt">selection_policy</span><span class="p">:</span>
</span><span id="line-87"><span class="linenos"> 87</span><span class="w"> </span><span class="nt">prefer</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cheapest</span>
</span><span id="line-88"><span class="linenos"> 88</span>
</span><span id="line-89"><span class="linenos"> 89</span><span class="c1"># HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access</span>
</span><span id="line-90"><span class="linenos"> 90</span><span class="nt">listeners</span><span class="p">:</span>
</span><span id="line-91"><span class="linenos"> 91</span><span class="w"> </span><span class="c1"># Agent listener for routing requests to multiple agents</span>
</span><span id="line-92"><span class="linenos"> 92</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">agent</span>
</span><span id="line-93"><span class="linenos"> 93</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">travel_booking_service</span>
</span><span id="line-94"><span class="linenos"> 94</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8001</span>
</span><span id="line-95"><span class="linenos"> 95</span><span class="w"> </span><span class="nt">router</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano_orchestrator_v1</span>
</span><span id="line-96"><span class="linenos"> 96</span><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
</span><span id="line-97"><span class="linenos"> 97</span><span class="w"> </span><span class="nt">agents</span><span class="p">:</span>
</span><span id="line-98"><span class="linenos"> 98</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">rag_agent</span>
</span><span id="line-99"><span class="linenos"> 99</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">virtual assistant for retrieval augmented generation tasks</span>
</span><span id="line-100"><span class="linenos">100</span><span class="w"> </span><span class="nt">input_filters</span><span class="p">:</span>
</span><span id="line-101"><span class="linenos">101</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">input_guards</span>
</span><span id="line-102"><span class="linenos">102</span>
</span><span id="line-103"><span class="linenos">103</span><span class="w"> </span><span class="c1"># Model listener for direct LLM access</span>
</span><span id="line-104"><span class="linenos">104</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">model</span>
</span><span id="line-105"><span class="linenos">105</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">model_1</span>
</span><span id="line-106"><span class="linenos">106</span><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
</span><span id="line-107"><span class="linenos">107</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">12000</span>
</span><span id="line-108"><span class="linenos">108</span><span class="w"> </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">30s</span><span class="w"> </span><span class="c1"># Request timeout (e.g. "30s", "60s")</span>
</span><span id="line-109"><span class="linenos">109</span><span class="w"> </span><span class="nt">max_retries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span><span class="w"> </span><span class="c1"># Number of retries on upstream failure</span>
</span><span id="line-110"><span class="linenos">110</span><span class="w"> </span><span class="nt">input_filters</span><span class="p">:</span><span class="w"> </span><span class="c1"># Filters applied before forwarding to LLM</span>
</span><span id="line-111"><span class="linenos">111</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">input_guards</span>
</span><span id="line-112"><span class="linenos">112</span><span class="w"> </span><span class="nt">output_filters</span><span class="p">:</span><span class="w"> </span><span class="c1"># Filters applied to LLM responses before returning to client</span>
</span><span id="line-113"><span class="linenos">113</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">input_guards</span>
</span><span id="line-114"><span class="linenos">114</span>
</span><span id="line-115"><span class="linenos">115</span><span class="w"> </span><span class="c1"># Prompt listener for function calling (for prompt_targets)</span>
</span><span id="line-116"><span class="linenos">116</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">prompt</span>
</span><span id="line-117"><span class="linenos">117</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">prompt_function_listener</span>
</span><span id="line-118"><span class="linenos">118</span><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
</span><span id="line-119"><span class="linenos">119</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10000</span>
</span><span id="line-89"><span class="linenos"> 89</span><span class="c1"># model_metrics_sources: external catalogs the router reads to reorder candidate</span>
</span><span id="line-90"><span class="linenos"> 90</span><span class="c1"># models for selection_policy.prefer. A `cost` source ranks `prefer: cheapest`;</span>
</span><span id="line-91"><span class="linenos"> 91</span><span class="c1"># a `latency` source ranks `prefer: fastest`. Both are optional.</span>
</span><span id="line-92"><span class="linenos"> 92</span><span class="nt">model_metrics_sources</span><span class="p">:</span>
</span><span id="line-93"><span class="linenos"> 93</span><span class="w"> </span><span class="c1"># Cost catalog. provider: models.dev | digitalocean (default url per provider).</span>
</span><span id="line-94"><span class="linenos"> 94</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">cost</span>
</span><span id="line-95"><span class="linenos"> 95</span><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">models.dev</span>
</span><span id="line-96"><span class="linenos"> 96</span><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https://models.dev/api.json</span><span class="w"> </span><span class="c1"># optional; omit to use the provider default</span>
</span><span id="line-97"><span class="linenos"> 97</span><span class="w"> </span><span class="nt">refresh_interval</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3600</span><span class="w"> </span><span class="c1"># optional, seconds</span>
</span><span id="line-98"><span class="linenos"> 98</span><span class="w"> </span><span class="nt">model_aliases</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional: catalog key -&gt; Plano model name</span>
</span><span id="line-99"><span class="linenos"> 99</span><span class="w"> </span><span class="nt">openai/gpt-oss-120b</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o</span>
</span><span id="line-100"><span class="linenos">100</span><span class="w"> </span><span class="c1"># Latency catalog (Prometheus). Used for selection_policy.prefer: fastest.</span>
</span><span id="line-101"><span class="linenos">101</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">latency</span>
</span><span id="line-102"><span class="linenos">102</span><span class="w"> </span><span class="nt">provider</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">prometheus</span>
</span><span id="line-103"><span class="linenos">103</span><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://prometheus:9090</span>
</span><span id="line-104"><span class="linenos">104</span><span class="w"> </span><span class="nt">query</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">avg by (model_name) (rate(plano_llm_latency_seconds_sum[5m]))</span>
</span><span id="line-105"><span class="linenos">105</span><span class="w"> </span><span class="nt">refresh_interval</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">60</span>
</span><span id="line-106"><span class="linenos">106</span>
</span><span id="line-107"><span class="linenos">107</span><span class="c1"># HTTP listeners - entry points for agent routing, prompt targets, and direct LLM access</span>
</span><span id="line-108"><span class="linenos">108</span><span class="nt">listeners</span><span class="p">:</span>
</span><span id="line-109"><span class="linenos">109</span><span class="w"> </span><span class="c1"># Agent listener for routing requests to multiple agents</span>
</span><span id="line-110"><span class="linenos">110</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">agent</span>
</span><span id="line-111"><span class="linenos">111</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">travel_booking_service</span>
</span><span id="line-112"><span class="linenos">112</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8001</span>
</span><span id="line-113"><span class="linenos">113</span><span class="w"> </span><span class="nt">router</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">plano_orchestrator_v1</span>
</span><span id="line-114"><span class="linenos">114</span><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
</span><span id="line-115"><span class="linenos">115</span><span class="w"> </span><span class="nt">agents</span><span class="p">:</span>
</span><span id="line-116"><span class="linenos">116</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">rag_agent</span>
</span><span id="line-117"><span class="linenos">117</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">virtual assistant for retrieval augmented generation tasks</span>
</span><span id="line-118"><span class="linenos">118</span><span class="w"> </span><span class="nt">input_filters</span><span class="p">:</span>
</span><span id="line-119"><span class="linenos">119</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">input_guards</span>
</span><span id="line-120"><span class="linenos">120</span>
</span><span id="line-121"><span class="linenos">121</span><span class="c1"># Reusable service endpoints</span>
</span><span id="line-122"><span class="linenos">122</span><span class="nt">endpoints</span><span class="p">:</span>
</span><span id="line-123"><span class="linenos">123</span><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
</span><span id="line-124"><span class="linenos">124</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">127.0.0.1:80</span>
</span><span id="line-125"><span class="linenos">125</span><span class="w"> </span><span class="nt">connect_timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.005s</span>
</span><span id="line-126"><span class="linenos">126</span><span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http</span><span class="w"> </span><span class="c1"># http or https</span>
</span><span id="line-127"><span class="linenos">127</span>
</span><span id="line-128"><span class="linenos">128</span><span class="w"> </span><span class="nt">mistral_local</span><span class="p">:</span>
</span><span id="line-129"><span class="linenos">129</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">127.0.0.1:8001</span>
</span><span id="line-130"><span class="linenos">130</span>
</span><span id="line-131"><span class="linenos">131</span><span class="w"> </span><span class="nt">secure_service</span><span class="p">:</span>
</span><span id="line-132"><span class="linenos">132</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api.example.com:443</span>
</span><span id="line-133"><span class="linenos">133</span><span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https</span>
</span><span id="line-134"><span class="linenos">134</span><span class="w"> </span><span class="nt">http_host</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api.example.com</span><span class="w"> </span><span class="c1"># Override the Host header sent upstream</span>
</span><span id="line-135"><span class="linenos">135</span>
</span><span id="line-136"><span class="linenos">136</span><span class="c1"># Optional top-level system prompt applied to all prompt_targets</span>
</span><span id="line-137"><span class="linenos">137</span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
</span><span id="line-138"><span class="linenos">138</span><span class="w"> </span><span class="no">You are a helpful assistant. Always respond concisely and accurately.</span>
</span><span id="line-139"><span class="linenos">139</span>
</span><span id="line-140"><span class="linenos">140</span><span class="c1"># Prompt targets for function calling and API orchestration</span>
</span><span id="line-141"><span class="linenos">141</span><span class="nt">prompt_targets</span><span class="p">:</span>
</span><span id="line-142"><span class="linenos">142</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">get_current_weather</span>
</span><span id="line-143"><span class="linenos">143</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Get current weather at a location.</span>
</span><span id="line-144"><span class="linenos">144</span><span class="w"> </span><span class="nt">parameters</span><span class="p">:</span>
</span><span id="line-145"><span class="linenos">145</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">location</span>
</span><span id="line-146"><span class="linenos">146</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">The location to get the weather for</span>
</span><span id="line-147"><span class="linenos">147</span><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-148"><span class="linenos">148</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">string</span>
</span><span id="line-149"><span class="linenos">149</span><span class="w"> </span><span class="nt">format</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">City, State</span>
</span><span id="line-150"><span class="linenos">150</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">days</span>
</span><span id="line-151"><span class="linenos">151</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">the number of days for the request</span>
</span><span id="line-152"><span class="linenos">152</span><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-153"><span class="linenos">153</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">int</span>
</span><span id="line-154"><span class="linenos">154</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-155"><span class="linenos">155</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
</span><span id="line-156"><span class="linenos">156</span><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/weather</span>
</span><span id="line-157"><span class="linenos">157</span><span class="w"> </span><span class="nt">http_method</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">POST</span>
</span><span id="line-158"><span class="linenos">158</span><span class="w"> </span><span class="c1"># Per-target system prompt (overrides top-level system_prompt for this target)</span>
</span><span id="line-159"><span class="linenos">159</span><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a weather expert. Provide accurate and concise weather information.</span>
</span><span id="line-160"><span class="linenos">160</span><span class="w"> </span><span class="c1"># auto_llm_dispatch_on_response: when true, the LLM is called again with the</span>
</span><span id="line-161"><span class="linenos">161</span><span class="w"> </span><span class="c1"># function response to produce a final natural-language answer for the user</span>
</span><span id="line-162"><span class="linenos">162</span><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-163"><span class="linenos">163</span>
</span><span id="line-164"><span class="linenos">164</span><span class="c1"># Rate limits - control token usage per model and request selector</span>
</span><span id="line-165"><span class="linenos">165</span><span class="nt">ratelimits</span><span class="p">:</span>
</span><span id="line-166"><span class="linenos">166</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o</span>
</span><span id="line-167"><span class="linenos">167</span><span class="w"> </span><span class="nt">selector</span><span class="p">:</span>
</span><span id="line-168"><span class="linenos">168</span><span class="w"> </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-user-id</span><span class="w"> </span><span class="c1"># HTTP header key used to identify the rate-limit subject</span>
</span><span id="line-169"><span class="linenos">169</span><span class="w"> </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="s">"*"</span><span class="w"> </span><span class="c1"># Wildcard matches any value; use a specific string to target one</span>
</span><span id="line-170"><span class="linenos">170</span><span class="w"> </span><span class="nt">limit</span><span class="p">:</span>
</span><span id="line-171"><span class="linenos">171</span><span class="w"> </span><span class="nt">tokens</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">100000</span><span class="w"> </span><span class="c1"># Maximum tokens allowed in the given time unit</span>
</span><span id="line-172"><span class="linenos">172</span><span class="w"> </span><span class="nt">unit</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">hour</span><span class="w"> </span><span class="c1"># Time unit: "minute", "hour", or "day"</span>
</span><span id="line-173"><span class="linenos">173</span>
</span><span id="line-174"><span class="linenos">174</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o-mini</span>
</span><span id="line-175"><span class="linenos">175</span><span class="w"> </span><span class="nt">selector</span><span class="p">:</span>
</span><span id="line-176"><span class="linenos">176</span><span class="w"> </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-org-id</span>
</span><span id="line-177"><span class="linenos">177</span><span class="w"> </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acme-corp</span>
</span><span id="line-178"><span class="linenos">178</span><span class="w"> </span><span class="nt">limit</span><span class="p">:</span>
</span><span id="line-179"><span class="linenos">179</span><span class="w"> </span><span class="nt">tokens</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">500000</span>
</span><span id="line-180"><span class="linenos">180</span><span class="w"> </span><span class="nt">unit</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">day</span>
</span><span id="line-121"><span class="linenos">121</span><span class="w"> </span><span class="c1"># Model listener for direct LLM access</span>
</span><span id="line-122"><span class="linenos">122</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">model</span>
</span><span id="line-123"><span class="linenos">123</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">model_1</span>
</span><span id="line-124"><span class="linenos">124</span><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
</span><span id="line-125"><span class="linenos">125</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">12000</span>
</span><span id="line-126"><span class="linenos">126</span><span class="w"> </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">30s</span><span class="w"> </span><span class="c1"># Request timeout (e.g. "30s", "60s")</span>
</span><span id="line-127"><span class="linenos">127</span><span class="w"> </span><span class="nt">max_retries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span><span class="w"> </span><span class="c1"># Number of retries on upstream failure</span>
</span><span id="line-128"><span class="linenos">128</span><span class="w"> </span><span class="nt">input_filters</span><span class="p">:</span><span class="w"> </span><span class="c1"># Filters applied before forwarding to LLM</span>
</span><span id="line-129"><span class="linenos">129</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">input_guards</span>
</span><span id="line-130"><span class="linenos">130</span><span class="w"> </span><span class="nt">output_filters</span><span class="p">:</span><span class="w"> </span><span class="c1"># Filters applied to LLM responses before returning to client</span>
</span><span id="line-131"><span class="linenos">131</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">input_guards</span>
</span><span id="line-132"><span class="linenos">132</span>
</span><span id="line-133"><span class="linenos">133</span><span class="w"> </span><span class="c1"># Prompt listener for function calling (for prompt_targets)</span>
</span><span id="line-134"><span class="linenos">134</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">prompt</span>
</span><span id="line-135"><span class="linenos">135</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">prompt_function_listener</span>
</span><span id="line-136"><span class="linenos">136</span><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.0.0.0</span>
</span><span id="line-137"><span class="linenos">137</span><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10000</span>
</span><span id="line-138"><span class="linenos">138</span>
</span><span id="line-139"><span class="linenos">139</span><span class="c1"># Reusable service endpoints</span>
</span><span id="line-140"><span class="linenos">140</span><span class="nt">endpoints</span><span class="p">:</span>
</span><span id="line-141"><span class="linenos">141</span><span class="w"> </span><span class="nt">app_server</span><span class="p">:</span>
</span><span id="line-142"><span class="linenos">142</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">127.0.0.1:80</span>
</span><span id="line-143"><span class="linenos">143</span><span class="w"> </span><span class="nt">connect_timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.005s</span>
</span><span id="line-144"><span class="linenos">144</span><span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http</span><span class="w"> </span><span class="c1"># http or https</span>
</span><span id="line-145"><span class="linenos">145</span>
</span><span id="line-146"><span class="linenos">146</span><span class="w"> </span><span class="nt">mistral_local</span><span class="p">:</span>
</span><span id="line-147"><span class="linenos">147</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">127.0.0.1:8001</span>
</span><span id="line-148"><span class="linenos">148</span>
</span><span id="line-149"><span class="linenos">149</span><span class="w"> </span><span class="nt">secure_service</span><span class="p">:</span>
</span><span id="line-150"><span class="linenos">150</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api.example.com:443</span>
</span><span id="line-151"><span class="linenos">151</span><span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https</span>
</span><span id="line-152"><span class="linenos">152</span><span class="w"> </span><span class="nt">http_host</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">api.example.com</span><span class="w"> </span><span class="c1"># Override the Host header sent upstream</span>
</span><span id="line-153"><span class="linenos">153</span>
</span><span id="line-154"><span class="linenos">154</span><span class="c1"># Optional top-level system prompt applied to all prompt_targets</span>
</span><span id="line-155"><span class="linenos">155</span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
</span><span id="line-156"><span class="linenos">156</span><span class="w"> </span><span class="no">You are a helpful assistant. Always respond concisely and accurately.</span>
</span><span id="line-157"><span class="linenos">157</span>
</span><span id="line-158"><span class="linenos">158</span><span class="c1"># Prompt targets for function calling and API orchestration</span>
</span><span id="line-159"><span class="linenos">159</span><span class="nt">prompt_targets</span><span class="p">:</span>
</span><span id="line-160"><span class="linenos">160</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">get_current_weather</span>
</span><span id="line-161"><span class="linenos">161</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Get current weather at a location.</span>
</span><span id="line-162"><span class="linenos">162</span><span class="w"> </span><span class="nt">parameters</span><span class="p">:</span>
</span><span id="line-163"><span class="linenos">163</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">location</span>
</span><span id="line-164"><span class="linenos">164</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">The location to get the weather for</span>
</span><span id="line-165"><span class="linenos">165</span><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-166"><span class="linenos">166</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">string</span>
</span><span id="line-167"><span class="linenos">167</span><span class="w"> </span><span class="nt">format</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">City, State</span>
</span><span id="line-168"><span class="linenos">168</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">days</span>
</span><span id="line-169"><span class="linenos">169</span><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">the number of days for the request</span>
</span><span id="line-170"><span class="linenos">170</span><span class="w"> </span><span class="nt">required</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-171"><span class="linenos">171</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">int</span>
</span><span id="line-172"><span class="linenos">172</span><span class="w"> </span><span class="nt">endpoint</span><span class="p">:</span>
</span><span id="line-173"><span class="linenos">173</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">app_server</span>
</span><span id="line-174"><span class="linenos">174</span><span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/weather</span>
</span><span id="line-175"><span class="linenos">175</span><span class="w"> </span><span class="nt">http_method</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">POST</span>
</span><span id="line-176"><span class="linenos">176</span><span class="w"> </span><span class="c1"># Per-target system prompt (overrides top-level system_prompt for this target)</span>
</span><span id="line-177"><span class="linenos">177</span><span class="w"> </span><span class="nt">system_prompt</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">You are a weather expert. Provide accurate and concise weather information.</span>
</span><span id="line-178"><span class="linenos">178</span><span class="w"> </span><span class="c1"># auto_llm_dispatch_on_response: when true, the LLM is called again with the</span>
</span><span id="line-179"><span class="linenos">179</span><span class="w"> </span><span class="c1"># function response to produce a final natural-language answer for the user</span>
</span><span id="line-180"><span class="linenos">180</span><span class="w"> </span><span class="nt">auto_llm_dispatch_on_response</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-181"><span class="linenos">181</span>
</span><span id="line-182"><span class="linenos">182</span><span class="c1"># Global behavior overrides</span>
</span><span id="line-183"><span class="linenos">183</span><span class="nt">overrides</span><span class="p">:</span>
</span><span id="line-184"><span class="linenos">184</span><span class="w"> </span><span class="c1"># Threshold for routing a request to a prompt_target (0.01.0). Lower = more permissive.</span>
</span><span id="line-185"><span class="linenos">185</span><span class="w"> </span><span class="nt">prompt_target_intent_matching_threshold</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.7</span>
</span><span id="line-186"><span class="linenos">186</span><span class="w"> </span><span class="c1"># Trim conversation history to fit within the model's context window</span>
</span><span id="line-187"><span class="linenos">187</span><span class="w"> </span><span class="nt">optimize_context_window</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-188"><span class="linenos">188</span><span class="w"> </span><span class="c1"># Use Plano's agent orchestrator for multi-agent request routing</span>
</span><span id="line-189"><span class="linenos">189</span><span class="w"> </span><span class="nt">use_agent_orchestrator</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-190"><span class="linenos">190</span><span class="w"> </span><span class="c1"># Connect timeout for upstream provider clusters (e.g., "5s", "10s"). Default: "5s"</span>
</span><span id="line-191"><span class="linenos">191</span><span class="w"> </span><span class="nt">upstream_connect_timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10s</span>
</span><span id="line-192"><span class="linenos">192</span><span class="w"> </span><span class="c1"># Path to the trusted CA bundle for upstream TLS verification</span>
</span><span id="line-193"><span class="linenos">193</span><span class="w"> </span><span class="nt">upstream_tls_ca_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/etc/ssl/certs/ca-certificates.crt</span>
</span><span id="line-194"><span class="linenos">194</span><span class="w"> </span><span class="c1"># Model used for intent-based LLM routing (must be listed in model_providers)</span>
</span><span id="line-195"><span class="linenos">195</span><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
</span><span id="line-196"><span class="linenos">196</span><span class="w"> </span><span class="c1"># Model used for agent orchestration (must be listed in model_providers)</span>
</span><span id="line-197"><span class="linenos">197</span><span class="w"> </span><span class="nt">agent_orchestration_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
</span><span id="line-198"><span class="linenos">198</span><span class="w"> </span><span class="c1"># Disable agentic signal analysis (frustration, repetition, escalation, etc.)</span>
</span><span id="line-199"><span class="linenos">199</span><span class="w"> </span><span class="c1"># on LLM responses to save CPU. Default: false.</span>
</span><span id="line-200"><span class="linenos">200</span><span class="w"> </span><span class="nt">disable_signals</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-201"><span class="linenos">201</span>
</span><span id="line-202"><span class="linenos">202</span><span class="c1"># Model affinity — pin routing decisions for agentic loops</span>
</span><span id="line-203"><span class="linenos">203</span><span class="nt">routing</span><span class="p">:</span>
</span><span id="line-204"><span class="linenos">204</span><span class="w"> </span><span class="nt">session_ttl_seconds</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">600</span><span class="w"> </span><span class="c1"># How long a pinned session lasts (default: 600s / 10 min)</span>
</span><span id="line-205"><span class="linenos">205</span><span class="w"> </span><span class="nt">session_max_entries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10000</span><span class="w"> </span><span class="c1"># Max cached sessions before eviction (upper limit: 10000)</span>
</span><span id="line-206"><span class="linenos">206</span><span class="w"> </span><span class="c1"># session_cache controls the backend used to store affinity state.</span>
</span><span id="line-207"><span class="linenos">207</span><span class="w"> </span><span class="c1"># "memory" (default) is in-process and works for single-instance deployments.</span>
</span><span id="line-208"><span class="linenos">208</span><span class="w"> </span><span class="c1"># "redis" shares state across replicas — required for multi-replica / Kubernetes setups.</span>
</span><span id="line-209"><span class="linenos">209</span><span class="w"> </span><span class="nt">session_cache</span><span class="p">:</span>
</span><span id="line-210"><span class="linenos">210</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (default) or "redis"</span>
</span><span id="line-211"><span class="linenos">211</span><span class="w"> </span><span class="c1"># url is required when type is "redis". Supports redis:// and rediss:// (TLS).</span>
</span><span id="line-212"><span class="linenos">212</span><span class="w"> </span><span class="c1"># url: redis://localhost:6379</span>
</span><span id="line-213"><span class="linenos">213</span><span class="w"> </span><span class="c1"># tenant_header: x-org-id # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}</span>
</span><span id="line-214"><span class="linenos">214</span>
</span><span id="line-215"><span class="linenos">215</span><span class="c1"># State storage for multi-turn conversation history</span>
</span><span id="line-216"><span class="linenos">216</span><span class="nt">state_storage</span><span class="p">:</span>
</span><span id="line-217"><span class="linenos">217</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (in-process) or "postgres" (persistent)</span>
</span><span id="line-218"><span class="linenos">218</span><span class="w"> </span><span class="c1"># connection_string is required when type is postgres.</span>
</span><span id="line-219"><span class="linenos">219</span><span class="w"> </span><span class="c1"># Supports environment variable substitution: $VAR or ${VAR}</span>
</span><span id="line-220"><span class="linenos">220</span><span class="w"> </span><span class="c1"># connection_string: postgresql://user:$DB_PASS@localhost:5432/plano</span>
</span><span id="line-221"><span class="linenos">221</span>
</span><span id="line-222"><span class="linenos">222</span><span class="c1"># Input guardrails applied globally to all incoming requests</span>
</span><span id="line-223"><span class="linenos">223</span><span class="nt">prompt_guards</span><span class="p">:</span>
</span><span id="line-224"><span class="linenos">224</span><span class="w"> </span><span class="nt">input_guards</span><span class="p">:</span>
</span><span id="line-225"><span class="linenos">225</span><span class="w"> </span><span class="nt">jailbreak</span><span class="p">:</span>
</span><span id="line-226"><span class="linenos">226</span><span class="w"> </span><span class="nt">on_exception</span><span class="p">:</span>
</span><span id="line-227"><span class="linenos">227</span><span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s">"I'm</span><span class="nv"> </span><span class="s">sorry,</span><span class="nv"> </span><span class="s">I</span><span class="nv"> </span><span class="s">can't</span><span class="nv"> </span><span class="s">help</span><span class="nv"> </span><span class="s">with</span><span class="nv"> </span><span class="s">that</span><span class="nv"> </span><span class="s">request."</span>
</span><span id="line-228"><span class="linenos">228</span>
</span><span id="line-229"><span class="linenos">229</span><span class="c1"># OpenTelemetry tracing configuration</span>
</span><span id="line-230"><span class="linenos">230</span><span class="nt">tracing</span><span class="p">:</span>
</span><span id="line-231"><span class="linenos">231</span><span class="w"> </span><span class="c1"># Random sampling percentage (1-100)</span>
</span><span id="line-232"><span class="linenos">232</span><span class="w"> </span><span class="nt">random_sampling</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">100</span>
</span><span id="line-233"><span class="linenos">233</span><span class="w"> </span><span class="c1"># Include internal Plano spans in traces</span>
</span><span id="line-234"><span class="linenos">234</span><span class="w"> </span><span class="nt">trace_arch_internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-235"><span class="linenos">235</span><span class="w"> </span><span class="c1"># gRPC endpoint for OpenTelemetry collector (e.g., Jaeger, Tempo)</span>
</span><span id="line-236"><span class="linenos">236</span><span class="w"> </span><span class="nt">opentracing_grpc_endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://localhost:4317</span>
</span><span id="line-237"><span class="linenos">237</span><span class="w"> </span><span class="nt">span_attributes</span><span class="p">:</span>
</span><span id="line-238"><span class="linenos">238</span><span class="w"> </span><span class="c1"># Propagate request headers whose names start with these prefixes as span attributes</span>
</span><span id="line-239"><span class="linenos">239</span><span class="w"> </span><span class="nt">header_prefixes</span><span class="p">:</span>
</span><span id="line-240"><span class="linenos">240</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-user-</span>
</span><span id="line-241"><span class="linenos">241</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-org-</span>
</span><span id="line-242"><span class="linenos">242</span><span class="w"> </span><span class="c1"># Static key/value pairs added to every span</span>
</span><span id="line-243"><span class="linenos">243</span><span class="w"> </span><span class="nt">static</span><span class="p">:</span>
</span><span id="line-244"><span class="linenos">244</span><span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">production</span>
</span><span id="line-245"><span class="linenos">245</span><span class="w"> </span><span class="nt">service.team</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">platform</span>
</span><span id="line-182"><span class="linenos">182</span><span class="c1"># Rate limits - control token usage per model and request selector</span>
</span><span id="line-183"><span class="linenos">183</span><span class="nt">ratelimits</span><span class="p">:</span>
</span><span id="line-184"><span class="linenos">184</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o</span>
</span><span id="line-185"><span class="linenos">185</span><span class="w"> </span><span class="nt">selector</span><span class="p">:</span>
</span><span id="line-186"><span class="linenos">186</span><span class="w"> </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-user-id</span><span class="w"> </span><span class="c1"># HTTP header key used to identify the rate-limit subject</span>
</span><span id="line-187"><span class="linenos">187</span><span class="w"> </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="s">"*"</span><span class="w"> </span><span class="c1"># Wildcard matches any value; use a specific string to target one</span>
</span><span id="line-188"><span class="linenos">188</span><span class="w"> </span><span class="nt">limit</span><span class="p">:</span>
</span><span id="line-189"><span class="linenos">189</span><span class="w"> </span><span class="nt">tokens</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">100000</span><span class="w"> </span><span class="c1"># Maximum tokens allowed in the given time unit</span>
</span><span id="line-190"><span class="linenos">190</span><span class="w"> </span><span class="nt">unit</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">hour</span><span class="w"> </span><span class="c1"># Time unit: "minute", "hour", or "day"</span>
</span><span id="line-191"><span class="linenos">191</span>
</span><span id="line-192"><span class="linenos">192</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">openai/gpt-4o-mini</span>
</span><span id="line-193"><span class="linenos">193</span><span class="w"> </span><span class="nt">selector</span><span class="p">:</span>
</span><span id="line-194"><span class="linenos">194</span><span class="w"> </span><span class="nt">key</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-org-id</span>
</span><span id="line-195"><span class="linenos">195</span><span class="w"> </span><span class="nt">value</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">acme-corp</span>
</span><span id="line-196"><span class="linenos">196</span><span class="w"> </span><span class="nt">limit</span><span class="p">:</span>
</span><span id="line-197"><span class="linenos">197</span><span class="w"> </span><span class="nt">tokens</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">500000</span>
</span><span id="line-198"><span class="linenos">198</span><span class="w"> </span><span class="nt">unit</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">day</span>
</span><span id="line-199"><span class="linenos">199</span>
</span><span id="line-200"><span class="linenos">200</span><span class="c1"># Global behavior overrides</span>
</span><span id="line-201"><span class="linenos">201</span><span class="nt">overrides</span><span class="p">:</span>
</span><span id="line-202"><span class="linenos">202</span><span class="w"> </span><span class="c1"># Threshold for routing a request to a prompt_target (0.01.0). Lower = more permissive.</span>
</span><span id="line-203"><span class="linenos">203</span><span class="w"> </span><span class="nt">prompt_target_intent_matching_threshold</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0.7</span>
</span><span id="line-204"><span class="linenos">204</span><span class="w"> </span><span class="c1"># Trim conversation history to fit within the model's context window</span>
</span><span id="line-205"><span class="linenos">205</span><span class="w"> </span><span class="nt">optimize_context_window</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
</span><span id="line-206"><span class="linenos">206</span><span class="w"> </span><span class="c1"># Use Plano's agent orchestrator for multi-agent request routing</span>
</span><span id="line-207"><span class="linenos">207</span><span class="w"> </span><span class="nt">use_agent_orchestrator</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-208"><span class="linenos">208</span><span class="w"> </span><span class="c1"># Connect timeout for upstream provider clusters (e.g., "5s", "10s"). Default: "5s"</span>
</span><span id="line-209"><span class="linenos">209</span><span class="w"> </span><span class="nt">upstream_connect_timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10s</span>
</span><span id="line-210"><span class="linenos">210</span><span class="w"> </span><span class="c1"># Path to the trusted CA bundle for upstream TLS verification</span>
</span><span id="line-211"><span class="linenos">211</span><span class="w"> </span><span class="nt">upstream_tls_ca_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/etc/ssl/certs/ca-certificates.crt</span>
</span><span id="line-212"><span class="linenos">212</span><span class="w"> </span><span class="c1"># Model used for intent-based LLM routing (must be listed in model_providers)</span>
</span><span id="line-213"><span class="linenos">213</span><span class="w"> </span><span class="nt">llm_routing_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
</span><span id="line-214"><span class="linenos">214</span><span class="w"> </span><span class="c1"># Model used for agent orchestration (must be listed in model_providers)</span>
</span><span id="line-215"><span class="linenos">215</span><span class="w"> </span><span class="nt">agent_orchestration_model</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Plano-Orchestrator</span>
</span><span id="line-216"><span class="linenos">216</span><span class="w"> </span><span class="c1"># Disable agentic signal analysis (frustration, repetition, escalation, etc.)</span>
</span><span id="line-217"><span class="linenos">217</span><span class="w"> </span><span class="c1"># on LLM responses to save CPU. Default: false.</span>
</span><span id="line-218"><span class="linenos">218</span><span class="w"> </span><span class="nt">disable_signals</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-219"><span class="linenos">219</span>
</span><span id="line-220"><span class="linenos">220</span><span class="c1"># Model affinity — pin routing decisions for agentic loops</span>
</span><span id="line-221"><span class="linenos">221</span><span class="nt">routing</span><span class="p">:</span>
</span><span id="line-222"><span class="linenos">222</span><span class="w"> </span><span class="nt">session_ttl_seconds</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">600</span><span class="w"> </span><span class="c1"># How long a pinned session lasts (default: 600s / 10 min)</span>
</span><span id="line-223"><span class="linenos">223</span><span class="w"> </span><span class="nt">session_max_entries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10000</span><span class="w"> </span><span class="c1"># Max cached sessions before eviction (upper limit: 10000)</span>
</span><span id="line-224"><span class="linenos">224</span><span class="w"> </span><span class="c1"># session_cache controls the backend used to store affinity state.</span>
</span><span id="line-225"><span class="linenos">225</span><span class="w"> </span><span class="c1"># "memory" (default) is in-process and works for single-instance deployments.</span>
</span><span id="line-226"><span class="linenos">226</span><span class="w"> </span><span class="c1"># "redis" shares state across replicas — required for multi-replica / Kubernetes setups.</span>
</span><span id="line-227"><span class="linenos">227</span><span class="w"> </span><span class="nt">session_cache</span><span class="p">:</span>
</span><span id="line-228"><span class="linenos">228</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (default) or "redis"</span>
</span><span id="line-229"><span class="linenos">229</span><span class="w"> </span><span class="c1"># url is required when type is "redis". Supports redis:// and rediss:// (TLS).</span>
</span><span id="line-230"><span class="linenos">230</span><span class="w"> </span><span class="c1"># url: redis://localhost:6379</span>
</span><span id="line-231"><span class="linenos">231</span><span class="w"> </span><span class="c1"># tenant_header: x-org-id # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}</span>
</span><span id="line-232"><span class="linenos">232</span>
</span><span id="line-233"><span class="linenos">233</span><span class="c1"># State storage for multi-turn conversation history</span>
</span><span id="line-234"><span class="linenos">234</span><span class="nt">state_storage</span><span class="p">:</span>
</span><span id="line-235"><span class="linenos">235</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (in-process) or "postgres" (persistent)</span>
</span><span id="line-236"><span class="linenos">236</span><span class="w"> </span><span class="c1"># connection_string is required when type is postgres.</span>
</span><span id="line-237"><span class="linenos">237</span><span class="w"> </span><span class="c1"># Supports environment variable substitution: $VAR or ${VAR}</span>
</span><span id="line-238"><span class="linenos">238</span><span class="w"> </span><span class="c1"># connection_string: postgresql://user:$DB_PASS@localhost:5432/plano</span>
</span><span id="line-239"><span class="linenos">239</span>
</span><span id="line-240"><span class="linenos">240</span><span class="c1"># Input guardrails applied globally to all incoming requests</span>
</span><span id="line-241"><span class="linenos">241</span><span class="nt">prompt_guards</span><span class="p">:</span>
</span><span id="line-242"><span class="linenos">242</span><span class="w"> </span><span class="nt">input_guards</span><span class="p">:</span>
</span><span id="line-243"><span class="linenos">243</span><span class="w"> </span><span class="nt">jailbreak</span><span class="p">:</span>
</span><span id="line-244"><span class="linenos">244</span><span class="w"> </span><span class="nt">on_exception</span><span class="p">:</span>
</span><span id="line-245"><span class="linenos">245</span><span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s">"I'm</span><span class="nv"> </span><span class="s">sorry,</span><span class="nv"> </span><span class="s">I</span><span class="nv"> </span><span class="s">can't</span><span class="nv"> </span><span class="s">help</span><span class="nv"> </span><span class="s">with</span><span class="nv"> </span><span class="s">that</span><span class="nv"> </span><span class="s">request."</span>
</span><span id="line-246"><span class="linenos">246</span>
</span><span id="line-247"><span class="linenos">247</span><span class="c1"># OpenTelemetry tracing configuration</span>
</span><span id="line-248"><span class="linenos">248</span><span class="nt">tracing</span><span class="p">:</span>
</span><span id="line-249"><span class="linenos">249</span><span class="w"> </span><span class="c1"># Random sampling percentage (1-100)</span>
</span><span id="line-250"><span class="linenos">250</span><span class="w"> </span><span class="nt">random_sampling</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">100</span>
</span><span id="line-251"><span class="linenos">251</span><span class="w"> </span><span class="c1"># Include internal Plano spans in traces</span>
</span><span id="line-252"><span class="linenos">252</span><span class="w"> </span><span class="nt">trace_arch_internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-253"><span class="linenos">253</span><span class="w"> </span><span class="c1"># gRPC endpoint for OpenTelemetry collector (e.g., Jaeger, Tempo)</span>
</span><span id="line-254"><span class="linenos">254</span><span class="w"> </span><span class="nt">opentracing_grpc_endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://localhost:4317</span>
</span><span id="line-255"><span class="linenos">255</span><span class="w"> </span><span class="nt">span_attributes</span><span class="p">:</span>
</span><span id="line-256"><span class="linenos">256</span><span class="w"> </span><span class="c1"># Propagate request headers whose names start with these prefixes as span attributes</span>
</span><span id="line-257"><span class="linenos">257</span><span class="w"> </span><span class="nt">header_prefixes</span><span class="p">:</span>
</span><span id="line-258"><span class="linenos">258</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-user-</span>
</span><span id="line-259"><span class="linenos">259</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-org-</span>
</span><span id="line-260"><span class="linenos">260</span><span class="w"> </span><span class="c1"># Static key/value pairs added to every span</span>
</span><span id="line-261"><span class="linenos">261</span><span class="w"> </span><span class="nt">static</span><span class="p">:</span>
</span><span id="line-262"><span class="linenos">262</span><span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">production</span>
</span><span id="line-263"><span class="linenos">263</span><span class="w"> </span><span class="nt">service.team</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">platform</span>
</span></code></pre></div>
</div>
</div>

File diff suppressed because one or more lines are too long