10 KiB
Session Affinity — Multi-Replica Kubernetes Deployment
Production-style Kubernetes demo that proves Redis-backed session affinity
(X-Model-Affinity) works correctly when Plano runs as multiple replicas
behind a load balancer.
Architecture
┌─────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
Client ──────────►│ LoadBalancer Service (port 12000) │
│ │ │ │
│ ┌────▼────┐ ┌─────▼───┐ │
│ │ Plano │ │ Plano │ (replicas) │
│ │ Pod 0 │ │ Pod 1 │ │
│ └────┬────┘ └────┬────┘ │
│ └──────┬───────┘ │
│ ┌────▼────┐ │
│ │ Redis │ (StatefulSet) │
│ │ Pod │ shared session store │
│ └─────────┘ │
│ │
│ ┌──────────┐ │
│ │ Jaeger │ distributed tracing │
│ └──────────┘ │
└─────────────────────────────────────────┘
What makes this production-like:
| Feature | Detail |
|---|---|
| 2 Plano replicas | replicas: 2 with HPA (scales 2–5 on CPU) |
| Shared Redis | StatefulSet with PVC — sessions survive pod restarts |
| Session TTL | 600 s, enforced natively by Redis EX |
| Eviction policy | allkeys-lru — Redis auto-evicts oldest sessions under memory pressure |
| Distributed tracing | Jaeger collects spans from both pods |
| Health probes | Readiness + liveness gates traffic away from unhealthy pods |
Quick Start (local — no registry needed)
# 1. Install kind if needed
# https://kind.sigs.k8s.io/docs/user/quick-start/#installation
# brew install kind (macOS)
# 2. Set your API key
export OPENAI_API_KEY=sk-...
# or copy and edit:
cp .env.example .env
# 3. Build, deploy, and verify in one command
./run-local.sh
run-local.sh creates a kind cluster named plano-demo (if it doesn't exist),
builds the image locally, loads it into the cluster with kind load docker-image
— no registry, no push required.
Individual steps:
./run-local.sh --build-only # (re-)build and reload image into kind
./run-local.sh --deploy-only # (re-)apply k8s manifests
./run-local.sh --verify # run verify_affinity.py
./run-local.sh --down # delete k8s resources (keeps kind cluster)
./run-local.sh --delete-cluster # delete k8s resources + kind cluster
Prerequisites
| Tool | Notes |
|---|---|
kubectl |
Configured to reach a Kubernetes cluster |
docker |
To build and push the custom image |
| Container registry (optional) | Needed only when you are not using the local kind flow |
OPENAI_API_KEY |
For model inference |
| Python 3.11+ | Only for verify_affinity.py |
Cluster: run-local.sh creates and manages a kind cluster named plano-demo automatically. Install kind from https://kind.sigs.k8s.io or brew install kind.
Step 1 — Build the Image
Build a custom image from the repo root:
# From this demo directory:
./build-and-push.sh ghcr.io/yourorg/plano-redis:latest
# Or manually from the repo root:
docker build \
-f demos/llm_routing/session_affinity_redis_k8s/Dockerfile \
-t ghcr.io/yourorg/plano-redis:latest \
.
docker push ghcr.io/yourorg/plano-redis:latest
Then update the image reference in k8s/plano.yaml (skip this when using run-local.sh, which uses plano-redis:local automatically):
image: ghcr.io/yourorg/plano-redis:latest # ← replace YOUR_REGISTRY/plano-redis:latest
Step 2 — Deploy
./deploy.sh
The script:
- Creates the
plano-demonamespace - Prompts for
OPENAI_API_KEYand creates a Kubernetes Secret - Applies Redis, Jaeger, ConfigMap, and Plano manifests in order
- Waits for rollouts to complete
Expected output:
==> Applying namespace...
==> Creating API key secret...
OPENAI_API_KEY: [hidden]
==> Applying Redis (StatefulSet + Services)...
==> Applying Jaeger...
==> Applying Plano config (ConfigMap)...
==> Applying Plano deployment + HPA...
==> Waiting for Redis to be ready...
==> Waiting for Plano pods to be ready...
Deployment complete!
=== Pods ===
NAME READY STATUS NODE
redis-0 1/1 Running node-1
plano-6d8f9b-xk2pq 1/1 Running node-1
plano-6d8f9b-r7nlw 1/1 Running node-2
jaeger-5c7d8f-q9mnb 1/1 Running node-1
=== Services ===
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
plano LoadBalancer 10.96.12.50 203.0.113.42 12000:32000/TCP
redis ClusterIP None <none> 6379/TCP
jaeger ClusterIP 10.96.8.71 <none> 16686/TCP,...
Step 3 — Verify Session Affinity Across Replicas
python verify_affinity.py
The script opens a dedicated kubectl port-forward tunnel to each pod
individually. This is the definitive test: it routes requests to specific
pods rather than relying on random load-balancer assignment.
Mode: per-pod port-forward (full cross-replica proof)
Found 2 Plano pod(s): plano-6d8f9b-xk2pq, plano-6d8f9b-r7nlw
Opening per-pod port-forward tunnels...
plano-6d8f9b-xk2pq → localhost:19100
plano-6d8f9b-r7nlw → localhost:19101
==================================================================
Phase 1: Cross-replica session pinning
Pods under test : plano-6d8f9b-xk2pq, plano-6d8f9b-r7nlw
Sessions : 4
Rounds/session : 4
Each session is PINNED via one pod and VERIFIED via another.
If Redis is shared, every round must return the same model.
==================================================================
PASS k8s-session-001
model : gpt-4o-mini-2024-07-18
pod order : plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw
PASS k8s-session-002
model : gpt-5.2
pod order : plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq
PASS k8s-session-003
model : gpt-4o-mini-2024-07-18
pod order : plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw
PASS k8s-session-004
model : gpt-5.2
pod order : plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq
==================================================================
Phase 2: Redis key inspection
==================================================================
k8s-session-001
model_name : gpt-4o-mini-2024-07-18
route_name : fast_responses
TTL : 587s remaining
k8s-session-002
model_name : gpt-5.2
route_name : deep_reasoning
TTL : 581s remaining
...
==================================================================
Summary
==================================================================
All sessions were pinned consistently across replicas.
Redis session cache is working correctly in Kubernetes.
What to Look For
The cross-replica proof
Each session's pod order line shows it alternating between the two pods:
pod order: pod-A → pod-B → pod-A → pod-B
Round 1 sets the Redis key (via pod-A). Rounds 2, 3, 4 read from Redis on alternating pods. If the model stays the same across all rounds, Redis is the shared source of truth — not any in-process state.
Redis keys
kubectl exec -it redis-0 -n plano-demo -- redis-cli
127.0.0.1:6379> KEYS *
1) "k8s-session-001"
2) "k8s-session-002"
127.0.0.1:6379> GET k8s-session-001
{"model_name":"gpt-4o-mini-2024-07-18","route_name":"fast_responses"}
127.0.0.1:6379> TTL k8s-session-001
(integer) 587
Jaeger traces
kubectl port-forward svc/jaeger 16686:16686 -n plano-demo
Open http://localhost:16686, select service plano.
- Pinned requests — no span to the Arch-Router (decision served from Redis)
- First request per session — spans include the router call + a Redis
SET - Both Plano pods appear as separate instances in the trace list
Scaling up (HPA in action)
# Scale to 3 replicas manually
kubectl scale deployment/plano --replicas=3 -n plano-demo
# Run verification again — now 3 pods alternate
python verify_affinity.py --sessions 6
Existing sessions in Redis are unaffected by the scale event. New pods immediately participate in the shared session pool.
Teardown
./deploy.sh --destroy
# Then optionally:
kubectl delete namespace plano-demo
Notes
- The Redis StatefulSet uses a
PersistentVolumeClaim. Session data survives pod restarts within a TTL window but is not HA. For production, replace with Redis Sentinel, Redis Cluster, or a managed service (ElastiCache, MemoryStore). session_max_entriesis not enforced by this backend — Redis usesmaxmemory-policy: allkeys-lruinstead, which is a global limit across all keys rather than a per-application cap.- On minikube, run
minikube tunnelin a separate terminal to get an external IP for the LoadBalancer service. - On kind, switch to
NodePort(see the comment ink8s/plano.yaml).