This commit is contained in:
Spherrrical 2026-04-14 02:31:18 +00:00
parent 26c5b13fd6
commit 0dd2552f91
34 changed files with 166 additions and 66 deletions

View file

@ -178,6 +178,14 @@ overrides:
routing:
session_ttl_seconds: 600 # How long a pinned session lasts (default: 600s / 10 min)
session_max_entries: 10000 # Max cached sessions before eviction (upper limit: 10000)
# session_cache controls the backend used to store affinity state.
# "memory" (default) is in-process and works for single-instance deployments.
# "redis" shares state across replicas — required for multi-replica / Kubernetes setups.
session_cache:
type: memory # "memory" (default) or "redis"
# url is required when type is "redis". Supports redis:// and rediss:// (TLS).
# url: redis://localhost:6379
# tenant_header: x-org-id # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}
# State storage for multi-turn conversation history
state_storage:

View file

@ -267,7 +267,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -333,7 +333,7 @@ powerful abstraction for evolving your agent workflows over time.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -270,7 +270,7 @@ application to LLMs (API-based or hosted) via prompt targets.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -660,7 +660,7 @@ Implement fallback logic for better reliability:</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -304,7 +304,7 @@ Use your preferred client library without changing existing code (see <a class="
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -434,7 +434,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -1190,7 +1190,7 @@ Any provider that implements the OpenAI API interface can be configured using cu
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -473,7 +473,7 @@ that you can test and modify locally for multi-turn RAG scenarios.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -540,7 +540,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -226,7 +226,7 @@ This gives Plano several advantages:</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -337,7 +337,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -521,7 +521,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -372,7 +372,7 @@ on the stuff that matters most.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -524,6 +524,41 @@ instead of a file.</p></li>
</span></code></pre></div>
</div>
<p>To start a new routing decision (e.g., when the agents task changes), generate a new affinity ID.</p>
<section id="session-cache-backends">
<h3>Session Cache Backends<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#session-cache-backends" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#session-cache-backends'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h3>
<p>By default, Plano stores session affinity state in an in-process LRU cache. This works well for single-instance deployments, but sessions are not shared across replicas — each instance has its own independent cache.</p>
<p>For deployments with multiple Plano replicas (Kubernetes, Docker Compose with <code class="docutils literal notranslate"><span class="pre">scale</span></code>, or any load-balanced setup), use Redis as the session cache backend. All replicas connect to the same Redis instance, so an affinity decision made by one replica is honoured by every other replica in the pool.</p>
<p><strong>In-memory (default)</strong></p>
<p>No configuration required. Sessions live only for the lifetime of the process and are lost on restart.</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">routing</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">session_ttl_seconds</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">600</span><span class="w"> </span><span class="c1"># How long affinity lasts (default: 10 min)</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">session_max_entries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10000</span><span class="w"> </span><span class="c1"># LRU capacity (upper limit: 10000)</span>
</span></code></pre></div>
</div>
<p><strong>Redis</strong></p>
<p>Requires a reachable Redis instance. The <code class="docutils literal notranslate"><span class="pre">url</span></code> field supports standard Redis URI syntax, including authentication (<code class="docutils literal notranslate"><span class="pre">redis://:password@host:6379</span></code>) and TLS (<code class="docutils literal notranslate"><span class="pre">rediss://host:6380</span></code>). Redis handles TTL expiry natively, so no periodic cleanup is needed.</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">routing</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">session_ttl_seconds</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">600</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">session_cache</span><span class="p">:</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">redis</span>
</span><span id="line-5"><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">redis://localhost:6379</span>
</span></code></pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>When using Redis in a multi-tenant environment, construct the <code class="docutils literal notranslate"><span class="pre">X-Model-Affinity</span></code> header value to include a tenant identifier, for example <code class="docutils literal notranslate"><span class="pre">{tenant_id}:{session_id}</span></code>. Plano stores each key under the internal namespace <code class="docutils literal notranslate"><span class="pre">plano:affinity:{key}</span></code>, so tenant-scoped values avoid cross-tenant collisions without any additional configuration.</p>
</div>
<p><strong>Example: Kubernetes multi-replica deployment</strong></p>
<p>Deploy a Redis instance alongside your Plano pods and point all replicas at it:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><code><span id="line-1"><span class="nt">routing</span><span class="p">:</span>
</span><span id="line-2"><span class="w"> </span><span class="nt">session_ttl_seconds</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">600</span>
</span><span id="line-3"><span class="w"> </span><span class="nt">session_cache</span><span class="p">:</span>
</span><span id="line-4"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">redis</span>
</span><span id="line-5"><span class="w"> </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">redis://redis.plano.svc.cluster.local:6379</span>
</span></code></pre></div>
</div>
<p>With this configuration, any replica that first receives a request for affinity ID <code class="docutils literal notranslate"><span class="pre">abc-123</span></code> caches the routing decision in Redis. Subsequent requests for <code class="docutils literal notranslate"><span class="pre">abc-123</span></code> — regardless of which replica they land on — retrieve the same pinned model.</p>
</section>
</section>
<section id="combining-routing-methods">
<h2>Combining Routing Methods<a @click.prevent="window.navigator.clipboard.writeText($el.href); $el.setAttribute('data-tooltip', 'Copied!'); setTimeout(() =&gt; $el.setAttribute('data-tooltip', 'Copy link to this element'), 2000)" aria-label="Copy link to this element" class="headerlink" data-tooltip="Copy link to this element" href="#combining-routing-methods" x-intersect.margin.0%.0%.-70%.0%="activeSection = '#combining-routing-methods'"><svg height="1em" viewbox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76 0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71 0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71 0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76 0 5-2.24 5-5s-2.24-5-5-5z"></path></svg></a></h2>
@ -663,7 +698,10 @@ instead of a file.</p></li>
<li><a :data-current="activeSection === '#using-vllm-on-kubernetes-gpu-nodes'" class="reference internal" href="#using-vllm-on-kubernetes-gpu-nodes">Using vLLM on Kubernetes (GPU nodes)</a></li>
</ul>
</li>
<li><a :data-current="activeSection === '#model-affinity'" class="reference internal" href="#model-affinity">Model Affinity</a></li>
<li><a :data-current="activeSection === '#model-affinity'" class="reference internal" href="#model-affinity">Model Affinity</a><ul>
<li><a :data-current="activeSection === '#session-cache-backends'" class="reference internal" href="#session-cache-backends">Session Cache Backends</a></li>
</ul>
</li>
<li><a :data-current="activeSection === '#combining-routing-methods'" class="reference internal" href="#combining-routing-methods">Combining Routing Methods</a></li>
<li><a :data-current="activeSection === '#example-use-cases'" class="reference internal" href="#example-use-cases">Example Use Cases</a></li>
<li><a :data-current="activeSection === '#best-practices'" class="reference internal" href="#best-practices">Best practices</a></li>
@ -676,7 +714,7 @@ instead of a file.</p></li>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -248,7 +248,7 @@ Access logs can be exported to centralized logging systems (e.g., ELK stack or F
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -260,7 +260,7 @@ are some sample configuration files for both, respectively.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -216,7 +216,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -792,7 +792,7 @@ tools like AWS X-Ray and Datadog, enhancing observability and facilitating faste
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -1003,7 +1003,7 @@ Plano makes it easy to build and scale these systems by managing the orchestrati
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -298,7 +298,7 @@ the agent. If validation fails (<code class="docutils literal notranslate"><span
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -453,7 +453,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -1,6 +1,6 @@
Plano Docs v0.4.18
llms.txt (auto-generated)
Generated (UTC): 2026-04-09T20:13:13.329129+00:00
Generated (UTC): 2026-04-14T02:31:14.825020+00:00
Table of contents
- Agents (concepts/agents)
@ -4011,6 +4011,44 @@ routing:
To start a new routing decision (e.g., when the agents task changes), generate a new affinity ID.
Session Cache Backends
By default, Plano stores session affinity state in an in-process LRU cache. This works well for single-instance deployments, but sessions are not shared across replicas — each instance has its own independent cache.
For deployments with multiple Plano replicas (Kubernetes, Docker Compose with scale, or any load-balanced setup), use Redis as the session cache backend. All replicas connect to the same Redis instance, so an affinity decision made by one replica is honoured by every other replica in the pool.
In-memory (default)
No configuration required. Sessions live only for the lifetime of the process and are lost on restart.
routing:
session_ttl_seconds: 600 # How long affinity lasts (default: 10 min)
session_max_entries: 10000 # LRU capacity (upper limit: 10000)
Redis
Requires a reachable Redis instance. The url field supports standard Redis URI syntax, including authentication (redis://:password@host:6379) and TLS (rediss://host:6380). Redis handles TTL expiry natively, so no periodic cleanup is needed.
routing:
session_ttl_seconds: 600
session_cache:
type: redis
url: redis://localhost:6379
When using Redis in a multi-tenant environment, construct the X-Model-Affinity header value to include a tenant identifier, for example {tenant_id}:{session_id}. Plano stores each key under the internal namespace plano:affinity:{key}, so tenant-scoped values avoid cross-tenant collisions without any additional configuration.
Example: Kubernetes multi-replica deployment
Deploy a Redis instance alongside your Plano pods and point all replicas at it:
routing:
session_ttl_seconds: 600
session_cache:
type: redis
url: redis://redis.plano.svc.cluster.local:6379
With this configuration, any replica that first receives a request for affinity ID abc-123 caches the routing decision in Redis. Subsequent requests for abc-123 — regardless of which replica they land on — retrieve the same pinned model.
Combining Routing Methods
You can combine static model selection with dynamic routing preferences for maximum flexibility:
@ -6561,6 +6599,14 @@ overrides:
routing:
session_ttl_seconds: 600 # How long a pinned session lasts (default: 600s / 10 min)
session_max_entries: 10000 # Max cached sessions before eviction (upper limit: 10000)
# session_cache controls the backend used to store affinity state.
# "memory" (default) is in-process and works for single-instance deployments.
# "redis" shares state across replicas — required for multi-replica / Kubernetes setups.
session_cache:
type: memory # "memory" (default) or "redis"
# url is required when type is "redis". Supports redis:// and rediss:// (TLS).
# url: redis://localhost:6379
# tenant_header: x-org-id # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}
# State storage for multi-turn conversation history
state_storage:

View file

@ -247,7 +247,7 @@ Resources</label><div class="sd-tab-content docutils">
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -437,7 +437,7 @@ Use this page as the canonical source for command syntax, options, and recommend
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -347,38 +347,46 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
</span><span id="line-178"><span class="linenos">178</span><span class="nt">routing</span><span class="p">:</span>
</span><span id="line-179"><span class="linenos">179</span><span class="w"> </span><span class="nt">session_ttl_seconds</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">600</span><span class="w"> </span><span class="c1"># How long a pinned session lasts (default: 600s / 10 min)</span>
</span><span id="line-180"><span class="linenos">180</span><span class="w"> </span><span class="nt">session_max_entries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10000</span><span class="w"> </span><span class="c1"># Max cached sessions before eviction (upper limit: 10000)</span>
</span><span id="line-181"><span class="linenos">181</span>
</span><span id="line-182"><span class="linenos">182</span><span class="c1"># State storage for multi-turn conversation history</span>
</span><span id="line-183"><span class="linenos">183</span><span class="nt">state_storage</span><span class="p">:</span>
</span><span id="line-184"><span class="linenos">184</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (in-process) or "postgres" (persistent)</span>
</span><span id="line-185"><span class="linenos">185</span><span class="w"> </span><span class="c1"># connection_string is required when type is postgres.</span>
</span><span id="line-186"><span class="linenos">186</span><span class="w"> </span><span class="c1"># Supports environment variable substitution: $VAR or ${VAR}</span>
</span><span id="line-187"><span class="linenos">187</span><span class="w"> </span><span class="c1"># connection_string: postgresql://user:$DB_PASS@localhost:5432/plano</span>
</span><span id="line-188"><span class="linenos">188</span>
</span><span id="line-189"><span class="linenos">189</span><span class="c1"># Input guardrails applied globally to all incoming requests</span>
</span><span id="line-190"><span class="linenos">190</span><span class="nt">prompt_guards</span><span class="p">:</span>
</span><span id="line-191"><span class="linenos">191</span><span class="w"> </span><span class="nt">input_guards</span><span class="p">:</span>
</span><span id="line-192"><span class="linenos">192</span><span class="w"> </span><span class="nt">jailbreak</span><span class="p">:</span>
</span><span id="line-193"><span class="linenos">193</span><span class="w"> </span><span class="nt">on_exception</span><span class="p">:</span>
</span><span id="line-194"><span class="linenos">194</span><span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s">"I'm</span><span class="nv"> </span><span class="s">sorry,</span><span class="nv"> </span><span class="s">I</span><span class="nv"> </span><span class="s">can't</span><span class="nv"> </span><span class="s">help</span><span class="nv"> </span><span class="s">with</span><span class="nv"> </span><span class="s">that</span><span class="nv"> </span><span class="s">request."</span>
</span><span id="line-195"><span class="linenos">195</span>
</span><span id="line-196"><span class="linenos">196</span><span class="c1"># OpenTelemetry tracing configuration</span>
</span><span id="line-197"><span class="linenos">197</span><span class="nt">tracing</span><span class="p">:</span>
</span><span id="line-198"><span class="linenos">198</span><span class="w"> </span><span class="c1"># Random sampling percentage (1-100)</span>
</span><span id="line-199"><span class="linenos">199</span><span class="w"> </span><span class="nt">random_sampling</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">100</span>
</span><span id="line-200"><span class="linenos">200</span><span class="w"> </span><span class="c1"># Include internal Plano spans in traces</span>
</span><span id="line-201"><span class="linenos">201</span><span class="w"> </span><span class="nt">trace_arch_internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-202"><span class="linenos">202</span><span class="w"> </span><span class="c1"># gRPC endpoint for OpenTelemetry collector (e.g., Jaeger, Tempo)</span>
</span><span id="line-203"><span class="linenos">203</span><span class="w"> </span><span class="nt">opentracing_grpc_endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://localhost:4317</span>
</span><span id="line-204"><span class="linenos">204</span><span class="w"> </span><span class="nt">span_attributes</span><span class="p">:</span>
</span><span id="line-205"><span class="linenos">205</span><span class="w"> </span><span class="c1"># Propagate request headers whose names start with these prefixes as span attributes</span>
</span><span id="line-206"><span class="linenos">206</span><span class="w"> </span><span class="nt">header_prefixes</span><span class="p">:</span>
</span><span id="line-207"><span class="linenos">207</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-user-</span>
</span><span id="line-208"><span class="linenos">208</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-org-</span>
</span><span id="line-209"><span class="linenos">209</span><span class="w"> </span><span class="c1"># Static key/value pairs added to every span</span>
</span><span id="line-210"><span class="linenos">210</span><span class="w"> </span><span class="nt">static</span><span class="p">:</span>
</span><span id="line-211"><span class="linenos">211</span><span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">production</span>
</span><span id="line-212"><span class="linenos">212</span><span class="w"> </span><span class="nt">service.team</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">platform</span>
</span><span id="line-181"><span class="linenos">181</span><span class="w"> </span><span class="c1"># session_cache controls the backend used to store affinity state.</span>
</span><span id="line-182"><span class="linenos">182</span><span class="w"> </span><span class="c1"># "memory" (default) is in-process and works for single-instance deployments.</span>
</span><span id="line-183"><span class="linenos">183</span><span class="w"> </span><span class="c1"># "redis" shares state across replicas — required for multi-replica / Kubernetes setups.</span>
</span><span id="line-184"><span class="linenos">184</span><span class="w"> </span><span class="nt">session_cache</span><span class="p">:</span>
</span><span id="line-185"><span class="linenos">185</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (default) or "redis"</span>
</span><span id="line-186"><span class="linenos">186</span><span class="w"> </span><span class="c1"># url is required when type is "redis". Supports redis:// and rediss:// (TLS).</span>
</span><span id="line-187"><span class="linenos">187</span><span class="w"> </span><span class="c1"># url: redis://localhost:6379</span>
</span><span id="line-188"><span class="linenos">188</span><span class="w"> </span><span class="c1"># tenant_header: x-org-id # optional; when set, keys are scoped as plano:affinity:{tenant_id}:{session_id}</span>
</span><span id="line-189"><span class="linenos">189</span>
</span><span id="line-190"><span class="linenos">190</span><span class="c1"># State storage for multi-turn conversation history</span>
</span><span id="line-191"><span class="linenos">191</span><span class="nt">state_storage</span><span class="p">:</span>
</span><span id="line-192"><span class="linenos">192</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">memory</span><span class="w"> </span><span class="c1"># "memory" (in-process) or "postgres" (persistent)</span>
</span><span id="line-193"><span class="linenos">193</span><span class="w"> </span><span class="c1"># connection_string is required when type is postgres.</span>
</span><span id="line-194"><span class="linenos">194</span><span class="w"> </span><span class="c1"># Supports environment variable substitution: $VAR or ${VAR}</span>
</span><span id="line-195"><span class="linenos">195</span><span class="w"> </span><span class="c1"># connection_string: postgresql://user:$DB_PASS@localhost:5432/plano</span>
</span><span id="line-196"><span class="linenos">196</span>
</span><span id="line-197"><span class="linenos">197</span><span class="c1"># Input guardrails applied globally to all incoming requests</span>
</span><span id="line-198"><span class="linenos">198</span><span class="nt">prompt_guards</span><span class="p">:</span>
</span><span id="line-199"><span class="linenos">199</span><span class="w"> </span><span class="nt">input_guards</span><span class="p">:</span>
</span><span id="line-200"><span class="linenos">200</span><span class="w"> </span><span class="nt">jailbreak</span><span class="p">:</span>
</span><span id="line-201"><span class="linenos">201</span><span class="w"> </span><span class="nt">on_exception</span><span class="p">:</span>
</span><span id="line-202"><span class="linenos">202</span><span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="s">"I'm</span><span class="nv"> </span><span class="s">sorry,</span><span class="nv"> </span><span class="s">I</span><span class="nv"> </span><span class="s">can't</span><span class="nv"> </span><span class="s">help</span><span class="nv"> </span><span class="s">with</span><span class="nv"> </span><span class="s">that</span><span class="nv"> </span><span class="s">request."</span>
</span><span id="line-203"><span class="linenos">203</span>
</span><span id="line-204"><span class="linenos">204</span><span class="c1"># OpenTelemetry tracing configuration</span>
</span><span id="line-205"><span class="linenos">205</span><span class="nt">tracing</span><span class="p">:</span>
</span><span id="line-206"><span class="linenos">206</span><span class="w"> </span><span class="c1"># Random sampling percentage (1-100)</span>
</span><span id="line-207"><span class="linenos">207</span><span class="w"> </span><span class="nt">random_sampling</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">100</span>
</span><span id="line-208"><span class="linenos">208</span><span class="w"> </span><span class="c1"># Include internal Plano spans in traces</span>
</span><span id="line-209"><span class="linenos">209</span><span class="w"> </span><span class="nt">trace_arch_internal</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span>
</span><span id="line-210"><span class="linenos">210</span><span class="w"> </span><span class="c1"># gRPC endpoint for OpenTelemetry collector (e.g., Jaeger, Tempo)</span>
</span><span id="line-211"><span class="linenos">211</span><span class="w"> </span><span class="nt">opentracing_grpc_endpoint</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://localhost:4317</span>
</span><span id="line-212"><span class="linenos">212</span><span class="w"> </span><span class="nt">span_attributes</span><span class="p">:</span>
</span><span id="line-213"><span class="linenos">213</span><span class="w"> </span><span class="c1"># Propagate request headers whose names start with these prefixes as span attributes</span>
</span><span id="line-214"><span class="linenos">214</span><span class="w"> </span><span class="nt">header_prefixes</span><span class="p">:</span>
</span><span id="line-215"><span class="linenos">215</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-user-</span>
</span><span id="line-216"><span class="linenos">216</span><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">x-org-</span>
</span><span id="line-217"><span class="linenos">217</span><span class="w"> </span><span class="c1"># Static key/value pairs added to every span</span>
</span><span id="line-218"><span class="linenos">218</span><span class="w"> </span><span class="nt">static</span><span class="p">:</span>
</span><span id="line-219"><span class="linenos">219</span><span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">production</span>
</span><span id="line-220"><span class="linenos">220</span><span class="w"> </span><span class="nt">service.team</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">platform</span>
</span></code></pre></div>
</div>
</div>
@ -406,7 +414,7 @@ where prompts get routed to, apply guardrails, and enable critical agent observa
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -542,7 +542,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -179,7 +179,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -199,7 +199,7 @@ own deployments), and Plano reaches them via HTTP.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -485,7 +485,7 @@ processing request headers and then finalized by the HCM during post-request pro
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -200,7 +200,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -200,7 +200,7 @@ hardware threads on the machine.</p>
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 09, 2026. </p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company Last updated: Apr 14, 2026. </p>
</div>
</div>
</footer>

View file

@ -221,7 +221,7 @@
</div><footer class="py-6 border-t border-border md:py-0">
<div class="container flex flex-col items-center justify-between gap-4 md:h-24 md:flex-row">
<div class="flex flex-col items-center gap-4 px-8 md:flex-row md:gap-2 md:px-0">
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company&nbsp;Last updated: Apr 09, 2026.&nbsp;</p>
<p class="text-sm leading-loose text-center text-muted-foreground md:text-left">© 2026, Katanemo Labs, a DigitalOcean Company&nbsp;Last updated: Apr 14, 2026.&nbsp;</p>
</div>
</div>
</footer>

File diff suppressed because one or more lines are too long