diff --git a/docs/source/guides/llm_router.rst b/docs/source/guides/llm_router.rst index f294043a..25b78db5 100644 --- a/docs/source/guides/llm_router.rst +++ b/docs/source/guides/llm_router.rst @@ -413,6 +413,52 @@ Without the header, routing runs fresh on every request — no behavior change f To start a new routing decision (e.g., when the agent's task changes), generate a new affinity ID. +Session Cache Backends +~~~~~~~~~~~~~~~~~~~~~~ + +By default, Plano stores session affinity state in an in-process LRU cache. This works well for single-instance deployments, but sessions are not shared across replicas — each instance has its own independent cache. + +For deployments with multiple Plano replicas (Kubernetes, Docker Compose with ``scale``, or any load-balanced setup), use Redis as the session cache backend. All replicas connect to the same Redis instance, so an affinity decision made by one replica is honoured by every other replica in the pool. + +**In-memory (default)** + +No configuration required. Sessions live only for the lifetime of the process and are lost on restart. + +.. code-block:: yaml + + routing: + session_ttl_seconds: 600 # How long affinity lasts (default: 10 min) + session_max_entries: 10000 # LRU capacity (upper limit: 10000) + +**Redis** + +Requires a reachable Redis instance. The ``url`` field supports standard Redis URI syntax, including authentication (``redis://:password@host:6379``) and TLS (``rediss://host:6380``). Redis handles TTL expiry natively, so no periodic cleanup is needed. + +.. code-block:: yaml + + routing: + session_ttl_seconds: 600 + session_cache: + type: redis + url: redis://localhost:6379 + +.. note:: + When using Redis in a multi-tenant environment, construct the ``X-Model-Affinity`` header value to include a tenant identifier, for example ``{tenant_id}:{session_id}``. Plano stores each key under the internal namespace ``plano:affinity:{key}``, so tenant-scoped values avoid cross-tenant collisions without any additional configuration. + +**Example: Kubernetes multi-replica deployment** + +Deploy a Redis instance alongside your Plano pods and point all replicas at it: + +.. code-block:: yaml + + routing: + session_ttl_seconds: 600 + session_cache: + type: redis + url: redis://redis.plano.svc.cluster.local:6379 + +With this configuration, any replica that first receives a request for affinity ID ``abc-123`` caches the routing decision in Redis. Subsequent requests for ``abc-123`` — regardless of which replica they land on — retrieve the same pinned model. + Combining Routing Methods ------------------------- diff --git a/docs/source/resources/includes/plano_config_full_reference.yaml b/docs/source/resources/includes/plano_config_full_reference.yaml index 787b09d3..049dba67 100644 --- a/docs/source/resources/includes/plano_config_full_reference.yaml +++ b/docs/source/resources/includes/plano_config_full_reference.yaml @@ -178,6 +178,13 @@ overrides: routing: session_ttl_seconds: 600 # How long a pinned session lasts (default: 600s / 10 min) session_max_entries: 10000 # Max cached sessions before eviction (upper limit: 10000) + # session_cache controls the backend used to store affinity state. + # "memory" (default) is in-process and works for single-instance deployments. + # "redis" shares state across replicas — required for multi-replica / Kubernetes setups. + session_cache: + type: memory # "memory" (default) or "redis" + # url is required when type is "redis". Supports redis:// and rediss:// (TLS). + # url: redis://localhost:6379 # State storage for multi-turn conversation history state_storage: