add Redis session affinity demos (Docker Compose and Kubernetes)

2026-05-21 13:55:15 +02:00 · 2026-04-09 16:32:40 -07:00 · 2026-04-09 16:32:40 -07:00 · 90810078da
commit 90810078da
parent 50670f843d
20 changed files with 2080 additions and 0 deletions
--- a/demos/llm_routing/session_affinity_redis_k8s/README.md
+++ b/demos/llm_routing/session_affinity_redis_k8s/README.md
@ -0,0 +1,287 @@
+# Session Affinity — Multi-Replica Kubernetes Deployment
+
+Production-style Kubernetes demo that proves Redis-backed session affinity
+(`X-Model-Affinity`) works correctly when Plano runs as multiple replicas
+behind a load balancer.
+
+## Architecture
+
+```
+                     ┌─────────────────────────────────────────┐
+                     │           Kubernetes Cluster             │
+                     │                                          │
+   Client ──────────►│  LoadBalancer Service (port 12000)       │
+                     │       │               │                  │
+                     │  ┌────▼────┐    ┌─────▼───┐             │
+                     │  │ Plano   │    │ Plano   │  (replicas) │
+                     │  │ Pod 0   │    │ Pod 1   │             │
+                     │  └────┬────┘    └────┬────┘             │
+                     │       └──────┬───────┘                  │
+                     │         ┌────▼────┐                     │
+                     │         │  Redis  │ (StatefulSet)        │
+                     │         │ Pod     │ shared session store │
+                     │         └─────────┘                     │
+                     │                                          │
+                     │  ┌──────────┐                           │
+                     │  │  Jaeger  │  distributed tracing       │
+                     │  └──────────┘                           │
+                     └─────────────────────────────────────────┘
+```
+
+**What makes this production-like:**
+
+| Feature | Detail |
+|---------|--------|
+| 2 Plano replicas | `replicas: 2` with HPA (scales 2–5 on CPU) |
+| Shared Redis | StatefulSet with PVC — sessions survive pod restarts |
+| Session TTL | 600 s, enforced natively by Redis `EX` |
+| Eviction policy | `allkeys-lru` — Redis auto-evicts oldest sessions under memory pressure |
+| Distributed tracing | Jaeger collects spans from both pods |
+| Health probes | Readiness + liveness gates traffic away from unhealthy pods |
+
+## Quick Start (local — no registry needed)
+
+```bash
+# 1. Install kind if needed
+#    https://kind.sigs.k8s.io/docs/user/quick-start/#installation
+#    brew install kind   (macOS)
+
+# 2. Set your API key
+export OPENAI_API_KEY=sk-...
+# or copy and edit:
+cp .env.example .env
+
+# 3. Build, deploy, and verify in one command
+./run-local.sh
+```
+
+`run-local.sh` creates a kind cluster named `plano-demo` (if it doesn't exist),
+builds the image locally, loads it into the cluster with `kind load docker-image`
+— **no registry, no push required**.
+
+Individual steps:
+
+```bash
+./run-local.sh --build-only      # (re-)build and reload image into kind
+./run-local.sh --deploy-only     # (re-)apply k8s manifests
+./run-local.sh --verify          # run verify_affinity.py
+./run-local.sh --down            # delete k8s resources (keeps kind cluster)
+./run-local.sh --delete-cluster  # delete k8s resources + kind cluster
+```
+
+---
+
+## Prerequisites
+
+| Tool | Notes |
+|------|-------|
+| `kubectl` | Configured to reach a Kubernetes cluster |
+| `docker` | To build and push the custom image |
+| Container registry (optional) | Needed only when you are not using the local kind flow |
+| `OPENAI_API_KEY` | For model inference |
+| Python 3.11+ | Only for `verify_affinity.py` |
+
+**Cluster:** `run-local.sh` creates and manages a kind cluster named `plano-demo` automatically. Install kind from https://kind.sigs.k8s.io or `brew install kind`.
+
+## Step 1 — Build the Image
+
+Build a custom image from the repo root:
+
+```bash
+# From this demo directory:
+./build-and-push.sh ghcr.io/yourorg/plano-redis:latest
+
+# Or manually from the repo root:
+docker build \
+  -f demos/llm_routing/session_affinity_redis_k8s/Dockerfile \
+  -t ghcr.io/yourorg/plano-redis:latest \
+  .
+docker push ghcr.io/yourorg/plano-redis:latest
+```
+
+Then update the image reference in `k8s/plano.yaml` (skip this when using `run-local.sh`, which uses `plano-redis:local` automatically):
+
+```yaml
+image: ghcr.io/yourorg/plano-redis:latest  # ← replace YOUR_REGISTRY/plano-redis:latest
+```
+
+## Step 2 — Deploy
+
+```bash
+./deploy.sh
+```
+
+The script:
+1. Creates the `plano-demo` namespace
+2. Prompts for `OPENAI_API_KEY` and creates a Kubernetes Secret
+3. Applies Redis, Jaeger, ConfigMap, and Plano manifests in order
+4. Waits for rollouts to complete
+
+Expected output:
+
+```
+==> Applying namespace...
+==> Creating API key secret...
+  OPENAI_API_KEY: [hidden]
+==> Applying Redis (StatefulSet + Services)...
+==> Applying Jaeger...
+==> Applying Plano config (ConfigMap)...
+==> Applying Plano deployment + HPA...
+==> Waiting for Redis to be ready...
+==> Waiting for Plano pods to be ready...
+
+Deployment complete!
+
+=== Pods ===
+NAME                     READY   STATUS    NODE
+redis-0                  1/1     Running   node-1
+plano-6d8f9b-xk2pq       1/1     Running   node-1
+plano-6d8f9b-r7nlw       1/1     Running   node-2
+jaeger-5c7d8f-q9mnb      1/1     Running   node-1
+
+=== Services ===
+NAME      TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)
+plano     LoadBalancer   10.96.12.50    203.0.113.42   12000:32000/TCP
+redis     ClusterIP      None           <none>          6379/TCP
+jaeger    ClusterIP      10.96.8.71     <none>          16686/TCP,...
+```
+
+## Step 3 — Verify Session Affinity Across Replicas
+
+```bash
+python verify_affinity.py
+```
+
+The script opens a dedicated `kubectl port-forward` tunnel to **each pod
+individually**. This is the definitive test: it routes requests to specific
+pods rather than relying on random load-balancer assignment.
+
+```
+Mode: per-pod port-forward (full cross-replica proof)
+
+Found 2 Plano pod(s): plano-6d8f9b-xk2pq, plano-6d8f9b-r7nlw
+Opening per-pod port-forward tunnels...
+
+  plano-6d8f9b-xk2pq → localhost:19100
+  plano-6d8f9b-r7nlw → localhost:19101
+
+==================================================================
+Phase 1: Cross-replica session pinning
+  Pods under test : plano-6d8f9b-xk2pq, plano-6d8f9b-r7nlw
+  Sessions        : 4
+  Rounds/session  : 4
+
+  Each session is PINNED via one pod and VERIFIED via another.
+  If Redis is shared, every round must return the same model.
+==================================================================
+
+  PASS  k8s-session-001
+        model      : gpt-4o-mini-2024-07-18
+        pod order  : plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw
+
+  PASS  k8s-session-002
+        model      : gpt-5.2
+        pod order  : plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq
+
+  PASS  k8s-session-003
+        model      : gpt-4o-mini-2024-07-18
+        pod order  : plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw
+
+  PASS  k8s-session-004
+        model      : gpt-5.2
+        pod order  : plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq
+
+==================================================================
+Phase 2: Redis key inspection
+==================================================================
+  k8s-session-001
+    model_name : gpt-4o-mini-2024-07-18
+    route_name : fast_responses
+    TTL        : 587s remaining
+  k8s-session-002
+    model_name : gpt-5.2
+    route_name : deep_reasoning
+    TTL        : 581s remaining
+  ...
+
+==================================================================
+Summary
+==================================================================
+All sessions were pinned consistently across replicas.
+Redis session cache is working correctly in Kubernetes.
+```
+
+## What to Look For
+
+### The cross-replica proof
+
+Each session's `pod order` line shows it alternating between the two pods:
+
+```
+pod order: pod-A → pod-B → pod-A → pod-B
+```
+
+Round 1 sets the Redis key (via pod-A). Rounds 2, 3, 4 read from Redis on
+alternating pods. If the model stays the same across all rounds, Redis is the
+shared source of truth — **not** any in-process state.
+
+### Redis keys
+
+```bash
+kubectl exec -it redis-0 -n plano-demo -- redis-cli
+
+127.0.0.1:6379> KEYS *
+1) "k8s-session-001"
+2) "k8s-session-002"
+
+127.0.0.1:6379> GET k8s-session-001
+{"model_name":"gpt-4o-mini-2024-07-18","route_name":"fast_responses"}
+
+127.0.0.1:6379> TTL k8s-session-001
+(integer) 587
+```
+
+### Jaeger traces
+
+```bash
+kubectl port-forward svc/jaeger 16686:16686 -n plano-demo
+```
+
+Open **http://localhost:16686**, select service `plano`.
+
+- **Pinned requests** — no span to the Arch-Router (decision served from Redis)
+- **First request** per session — spans include the router call + a Redis `SET`
+- Both Plano pods appear as separate instances in the trace list
+
+### Scaling up (HPA in action)
+
+```bash
+# Scale to 3 replicas manually
+kubectl scale deployment/plano --replicas=3 -n plano-demo
+
+# Run verification again — now 3 pods alternate
+python verify_affinity.py --sessions 6
+```
+
+Existing sessions in Redis are unaffected by the scale event. New pods
+immediately participate in the shared session pool.
+
+## Teardown
+
+```bash
+./deploy.sh --destroy
+# Then optionally:
+kubectl delete namespace plano-demo
+```
+
+## Notes
+
+- The Redis StatefulSet uses a `PersistentVolumeClaim`. Session data survives
+  pod restarts within a TTL window but is not HA. For production, replace with
+  Redis Sentinel, Redis Cluster, or a managed service (ElastiCache, MemoryStore).
+- `session_max_entries` is not enforced by this backend — Redis uses
+  `maxmemory-policy: allkeys-lru` instead, which is a global limit across all
+  keys rather than a per-application cap.
+- On **minikube**, run `minikube tunnel` in a separate terminal to get an
+  external IP for the LoadBalancer service.
+- On **kind**, switch to `NodePort` (see the comment in `k8s/plano.yaml`).