remove session_affinity_redis and session_affinity_redis_k8s demos

This commit is contained in:
Spherrrical 2026-04-13 19:29:59 -07:00
parent 03cb09f47e
commit 05a85e9a85
20 changed files with 0 additions and 2080 deletions

View file

@ -1 +0,0 @@
OPENAI_API_KEY=sk-replace-me

View file

@ -1,247 +0,0 @@
# Session Affinity with Redis — Multi-Replica Model Pinning
This demo shows Plano's **session affinity** (`X-Model-Affinity` header) backed by a **Redis session cache** instead of the default in-memory store.
## The Problem
By default, model affinity stores routing decisions in a per-process `HashMap`.
This works for single-instance deployments, but breaks when you run multiple
Plano replicas behind a load balancer:
```
Client ──► Load Balancer ──► Replica A (session pinned here)
└──► Replica B (knows nothing about the session)
```
A request that was pinned to `gpt-4o` on Replica A will be re-routed from
scratch on Replica B, defeating the purpose of affinity.
## The Solution
Plano's `session_cache` config key accepts a `type: redis` backend that is
shared across all replicas:
```yaml
routing:
session_ttl_seconds: 300
session_cache:
type: redis
url: redis://localhost:6379
```
All replicas read and write the same Redis keyspace. A session pinned on any
replica is immediately visible to all others.
## What to Look For
| What | Expected behaviour |
|------|--------------------|
| First request with a session ID | Plano routes normally (via Arch-Router) and writes the result to Redis (`SET session-id ... EX 300`) |
| Subsequent requests with the **same** session ID | Plano reads from Redis and skips the router — same model every time |
| Requests with a **different** session ID | Routed independently; may land on a different model |
| After `session_ttl_seconds` elapses | Redis key expires; next request re-routes and sets a new pin |
| `x-plano-pinned: true` response header | Tells you the response was served from the session cache |
## Architecture
```
Client
│ X-Model-Affinity: my-session-id
Plano (brightstaff)
├── GET redis://localhost:6379/my-session-id
│ hit? → return pinned model immediately (no Arch-Router call)
│ miss? → call Arch-Router → SET key EX 300 → return routed model
Redis (shared across replicas)
```
## Prerequisites
| Requirement | Notes |
|-------------|-------|
| `planoai` CLI | `pip install planoai` |
| Docker + Docker Compose | For Redis and Jaeger |
| `OPENAI_API_KEY` | Required for routing model (Arch-Router) and downstream LLMs |
| Python 3.11+ | Only needed to run `verify_affinity.py` |
## Quick Start
```bash
# 1. Set your API key
export OPENAI_API_KEY=sk-...
# or copy and edit:
cp .env.example .env
# 2. Start Redis, Jaeger, and Plano
./run_demo.sh up
# 3. Verify session pinning works
python verify_affinity.py
```
## Manual Verification with curl
### Step 1 — Pin a session (first request sets the affinity)
```bash
curl -s http://localhost:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-model-affinity: my-session-abc" \
-d '{"model":"openai/gpt-4o-mini","messages":[{"role":"user","content":"Write a short poem about the ocean."}]}' \
| jq '{model, pinned: .x_plano_pinned}'
```
Expected output (first request — not yet pinned, Arch-Router picks the model):
```json
{
"model": "openai/gpt-5.2",
"pinned": null
}
```
### Step 2 — Confirm the pin is held on subsequent requests
```bash
for i in 1 2 3 4; do
curl -s http://localhost:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-model-affinity: my-session-abc" \
-d "{\"model\":\"openai/gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"Request $i\"}]}" \
| jq -r '"\(.model)"'
done
```
Expected output (same model for every request):
```
openai/gpt-5.2
openai/gpt-5.2
openai/gpt-5.2
openai/gpt-5.2
```
### Step 3 — Inspect the Redis key directly
```bash
docker exec plano-session-redis redis-cli \
GET my-session-abc | python3 -m json.tool
```
Expected output:
```json
{
"model_name": "openai/gpt-5.2",
"route_name": "deep_reasoning"
}
```
```bash
# Check the TTL (seconds remaining)
docker exec plano-session-redis redis-cli TTL my-session-abc
# e.g. 287
```
### Step 4 — Different sessions may get different models
```bash
for session in session-A session-B session-C; do
model=$(curl -s http://localhost:12000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-model-affinity: $session" \
-d '{"model":"openai/gpt-4o-mini","messages":[{"role":"user","content":"Explain quantum entanglement in detail with equations."}]}' \
| jq -r '.model')
echo "$session -> $model"
done
```
Sessions with content matched to `deep_reasoning` will pin to `openai/gpt-5.2`;
sessions matched to `fast_responses` will pin to `openai/gpt-4o-mini`.
## Verification Script Output
Running `python verify_affinity.py` produces output like:
```
Plano endpoint : http://localhost:12000/v1/chat/completions
Sessions : 3
Rounds/session : 4
============================================================
Phase 1: Requests WITHOUT X-Model-Affinity header
(model may vary between requests — that is expected)
============================================================
Request 1: model = openai/gpt-4o-mini
Request 2: model = openai/gpt-5.2
Request 3: model = openai/gpt-4o-mini
Models seen across 3 requests: {'openai/gpt-4o-mini', 'openai/gpt-5.2'}
============================================================
Phase 2: Requests WITH X-Model-Affinity (session pinning)
Each session should be pinned to exactly one model.
============================================================
Session 'demo-session-001':
Round 1: model = openai/gpt-4o-mini [FIRST — sets affinity]
Round 2: model = openai/gpt-4o-mini [PINNED]
Round 3: model = openai/gpt-4o-mini [PINNED]
Round 4: model = openai/gpt-4o-mini [PINNED]
Session 'demo-session-002':
Round 1: model = openai/gpt-5.2 [FIRST — sets affinity]
Round 2: model = openai/gpt-5.2 [PINNED]
Round 3: model = openai/gpt-5.2 [PINNED]
Round 4: model = openai/gpt-5.2 [PINNED]
Session 'demo-session-003':
Round 1: model = openai/gpt-4o-mini [FIRST — sets affinity]
Round 2: model = openai/gpt-4o-mini [PINNED]
Round 3: model = openai/gpt-4o-mini [PINNED]
Round 4: model = openai/gpt-4o-mini [PINNED]
============================================================
Results
============================================================
PASS demo-session-001 -> always routed to 'openai/gpt-4o-mini'
PASS demo-session-002 -> always routed to 'openai/gpt-5.2'
PASS demo-session-003 -> always routed to 'openai/gpt-4o-mini'
All sessions were pinned consistently.
Redis session cache is working correctly.
```
## Observability
Open Jaeger at **http://localhost:16686** and select service `plano`.
- Requests **without** affinity: look for a span to the Arch-Router service
- Requests **with** affinity (pinned): the Arch-Router span will be absent —
the decision was served from Redis without calling the router at all
This is the clearest observable signal that the cache is working: pinned
requests are noticeably faster and produce fewer spans.
## Switching to the In-Memory Backend
To compare against the default in-memory backend, change `config.yaml`:
```yaml
routing:
session_ttl_seconds: 300
session_cache:
type: memory # ← change this
```
In-memory mode does **not** require Redis and works identically for a
single Plano process. The difference only becomes visible when you run
multiple replicas.
## Teardown
```bash
./run_demo.sh down
```
This stops Plano, Redis, and Jaeger.

View file

@ -1,36 +0,0 @@
version: v0.4.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
routing_preferences:
- name: fast_responses
description: short factual questions, quick lookups, simple summarization, or greetings
models:
- openai/gpt-4o-mini
- name: deep_reasoning
description: multi-step reasoning, complex analysis, code review, or detailed explanations
models:
- openai/gpt-5.2
- openai/gpt-4o-mini
routing:
session_ttl_seconds: 300
session_cache:
type: redis
url: redis://localhost:6379
tracing:
random_sampling: 100
trace_arch_internal: true

View file

@ -1,23 +0,0 @@
services:
redis:
image: redis:7-alpine
container_name: plano-session-redis
restart: unless-stopped
ports:
- "6379:6379"
command: redis-server --save "" --appendonly no
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 1s
timeout: 1s
retries: 10
jaeger:
build:
context: ../../shared/jaeger
container_name: plano-session-jaeger
restart: unless-stopped
ports:
- "16686:16686"
- "4317:4317"
- "4318:4318"

View file

@ -1,94 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
DEMO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
load_env() {
if [ -f "$DEMO_DIR/.env" ]; then
set -a
# shellcheck disable=SC1091
source "$DEMO_DIR/.env"
set +a
fi
}
check_prereqs() {
local missing=()
command -v docker >/dev/null 2>&1 || missing+=("docker")
command -v planoai >/dev/null 2>&1 || missing+=("planoai (pip install planoai)")
if [ ${#missing[@]} -gt 0 ]; then
echo "ERROR: missing required tools: ${missing[*]}"
exit 1
fi
if [ -z "${OPENAI_API_KEY:-}" ]; then
echo "ERROR: OPENAI_API_KEY is not set."
echo " Create a .env file or export the variable before running."
exit 1
fi
}
start_demo() {
echo "==> Starting Redis + Jaeger..."
docker compose -f "$DEMO_DIR/docker-compose.yaml" up -d
echo "==> Waiting for Redis to be ready..."
local retries=0
until docker exec plano-session-redis redis-cli ping 2>/dev/null | grep -q PONG; do
retries=$((retries + 1))
if [ $retries -ge 15 ]; then
echo "ERROR: Redis did not become ready in time"
exit 1
fi
sleep 1
done
echo " Redis is ready."
echo "==> Starting Plano..."
planoai up "$DEMO_DIR/config.yaml"
echo ""
echo "Demo is running!"
echo ""
echo " Model endpoint: http://localhost:12000/v1/chat/completions"
echo " Jaeger UI: http://localhost:16686"
echo " Redis: localhost:6379"
echo ""
echo "Run the verification script to confirm session pinning:"
echo " python $DEMO_DIR/verify_affinity.py"
echo ""
echo "Stop the demo with: $0 down"
}
stop_demo() {
echo "==> Stopping Plano..."
planoai down 2>/dev/null || true
echo "==> Stopping Docker services..."
docker compose -f "$DEMO_DIR/docker-compose.yaml" down
echo "Demo stopped."
}
usage() {
echo "Usage: $0 [up|down]"
echo ""
echo " up Start Redis, Jaeger, and Plano (default)"
echo " down Stop all services"
}
load_env
case "${1:-up}" in
up)
check_prereqs
start_demo
;;
down)
stop_demo
;;
*)
usage
exit 1
;;
esac

View file

@ -1,146 +0,0 @@
#!/usr/bin/env python3
"""
verify_affinity.py Verify that model affinity (session pinning) works correctly.
Sends multiple requests with the same X-Model-Affinity session ID and asserts
that every response is served by the same model, demonstrating that Plano's
session cache is working as expected.
Usage:
python verify_affinity.py [--url URL] [--rounds N] [--sessions N]
"""
import argparse
import json
import sys
import urllib.error
import urllib.request
from collections import defaultdict
PLANO_URL = "http://localhost:12000/v1/chat/completions"
PROMPTS = [
"What is 2 + 2?",
"Name the capital of France.",
"How many days in a week?",
"What color is the sky?",
"Who wrote Romeo and Juliet?",
]
MESSAGES_PER_SESSION = [{"role": "user", "content": prompt} for prompt in PROMPTS]
def chat(url: str, session_id: str | None, message: str) -> dict:
payload = json.dumps(
{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": message}],
}
).encode()
headers = {"Content-Type": "application/json"}
if session_id:
headers["x-model-affinity"] = session_id
req = urllib.request.Request(url, data=payload, headers=headers, method="POST")
try:
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read())
except urllib.error.URLError as e:
print(f" ERROR: could not reach Plano at {url}: {e}", file=sys.stderr)
print(" Is the demo running? Start it with: ./run_demo.sh up", file=sys.stderr)
sys.exit(1)
def extract_model(response: dict) -> str:
return response.get("model", "<unknown>")
def run_verification(url: str, rounds: int, num_sessions: int) -> bool:
print(f"Plano endpoint : {url}")
print(f"Sessions : {num_sessions}")
print(f"Rounds/session : {rounds}")
print()
all_passed = True
# --- Phase 1: Requests without session ID ---
print("=" * 60)
print("Phase 1: Requests WITHOUT X-Model-Affinity header")
print(" (model may vary between requests — that is expected)")
print("=" * 60)
models_seen: set[str] = set()
for i in range(min(rounds, 3)):
resp = chat(url, None, PROMPTS[i % len(PROMPTS)])
model = extract_model(resp)
models_seen.add(model)
print(f" Request {i + 1}: model = {model}")
print(f" Models seen across {min(rounds, 3)} requests: {models_seen}")
print()
# --- Phase 2: Each session should always get the same model ---
print("=" * 60)
print("Phase 2: Requests WITH X-Model-Affinity (session pinning)")
print(" Each session should be pinned to exactly one model.")
print("=" * 60)
session_results: dict[str, list[str]] = defaultdict(list)
for s in range(num_sessions):
session_id = f"demo-session-{s + 1:03d}"
print(f"\n Session '{session_id}':")
for r in range(rounds):
resp = chat(url, session_id, PROMPTS[r % len(PROMPTS)])
model = extract_model(resp)
session_results[session_id].append(model)
pinned = " [PINNED]" if r > 0 else " [FIRST — sets affinity]"
print(f" Round {r + 1}: model = {model}{pinned}")
print()
print("=" * 60)
print("Results")
print("=" * 60)
for session_id, models in session_results.items():
unique_models = set(models)
if len(unique_models) == 1:
print(f" PASS {session_id} -> always routed to '{models[0]}'")
else:
print(
f" FAIL {session_id} -> inconsistent models across rounds: {unique_models}"
)
all_passed = False
print()
if all_passed:
print("All sessions were pinned consistently.")
print("Redis session cache is working correctly.")
else:
print("One or more sessions were NOT pinned consistently.")
print("Check that Redis is running and Plano is configured with:")
print(" routing:")
print(" session_cache:")
print(" type: redis")
print(" url: redis://localhost:6379")
return all_passed
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--url", default=PLANO_URL, help="Plano chat completions URL")
parser.add_argument(
"--rounds", type=int, default=4, help="Requests per session (default 4)"
)
parser.add_argument(
"--sessions", type=int, default=3, help="Number of sessions to test (default 3)"
)
args = parser.parse_args()
passed = run_verification(args.url, args.rounds, args.sessions)
sys.exit(0 if passed else 1)
if __name__ == "__main__":
main()

View file

@ -1 +0,0 @@
OPENAI_API_KEY=sk-replace-me

View file

@ -1,95 +0,0 @@
# Plano image for Redis-backed session affinity demo.
# Build context must be the repository root:
# docker build -f demos/llm_routing/session_affinity_redis_k8s/Dockerfile -t <your-image> .
# Envoy version — keep in sync with cli/planoai/consts.py ENVOY_VERSION
ARG ENVOY_VERSION=v1.37.0
# --- Dependency cache ---
FROM rust:1.93.0 AS deps
RUN rustup -v target add wasm32-wasip1
WORKDIR /arch
COPY crates/Cargo.toml crates/Cargo.lock ./
COPY crates/common/Cargo.toml common/Cargo.toml
COPY crates/hermesllm/Cargo.toml hermesllm/Cargo.toml
COPY crates/prompt_gateway/Cargo.toml prompt_gateway/Cargo.toml
COPY crates/llm_gateway/Cargo.toml llm_gateway/Cargo.toml
COPY crates/brightstaff/Cargo.toml brightstaff/Cargo.toml
RUN mkdir -p common/src && echo "" > common/src/lib.rs && \
mkdir -p hermesllm/src && echo "" > hermesllm/src/lib.rs && \
mkdir -p hermesllm/src/bin && echo "fn main() {}" > hermesllm/src/bin/fetch_models.rs && \
mkdir -p prompt_gateway/src && echo "#[no_mangle] pub fn _start() {}" > prompt_gateway/src/lib.rs && \
mkdir -p llm_gateway/src && echo "#[no_mangle] pub fn _start() {}" > llm_gateway/src/lib.rs && \
mkdir -p brightstaff/src && echo "fn main() {}" > brightstaff/src/main.rs && echo "" > brightstaff/src/lib.rs
RUN cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway || true
RUN cargo build --release -p brightstaff || true
# --- WASM plugins ---
FROM deps AS wasm-builder
RUN rm -rf common/src hermesllm/src prompt_gateway/src llm_gateway/src
COPY crates/common/src common/src
COPY crates/hermesllm/src hermesllm/src
COPY crates/prompt_gateway/src prompt_gateway/src
COPY crates/llm_gateway/src llm_gateway/src
RUN find common hermesllm prompt_gateway llm_gateway -name "*.rs" -exec touch {} +
RUN cargo build --release --target wasm32-wasip1 -p prompt_gateway -p llm_gateway
# --- Brightstaff binary ---
FROM deps AS brightstaff-builder
RUN rm -rf common/src hermesllm/src brightstaff/src
COPY crates/common/src common/src
COPY crates/hermesllm/src hermesllm/src
COPY crates/brightstaff/src brightstaff/src
RUN find common hermesllm brightstaff -name "*.rs" -exec touch {} +
RUN cargo build --release -p brightstaff
FROM docker.io/envoyproxy/envoy:${ENVOY_VERSION} AS envoy
FROM python:3.14-slim AS arch
RUN set -eux; \
apt-get update; \
apt-get upgrade -y; \
apt-get install -y --no-install-recommends gettext-base curl procps; \
apt-get clean; rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir supervisor
RUN set -eux; \
dpkg -r --force-depends libpam-modules libpam-modules-bin libpam-runtime libpam0g || true; \
dpkg -P --force-all libpam-modules libpam-modules-bin libpam-runtime libpam0g || true; \
rm -rf /etc/pam.d /lib/*/security /usr/lib/security || true
COPY --from=envoy /usr/local/bin/envoy /usr/local/bin/envoy
WORKDIR /app
RUN pip install --no-cache-dir uv
COPY cli/pyproject.toml ./
COPY cli/uv.lock ./
COPY cli/README.md ./
COPY config/plano_config_schema.yaml /config/plano_config_schema.yaml
COPY config/envoy.template.yaml /config/envoy.template.yaml
RUN pip install --no-cache-dir -e .
COPY cli/planoai planoai/
COPY config/envoy.template.yaml .
COPY config/plano_config_schema.yaml .
RUN mkdir -p /etc/supervisor/conf.d
COPY config/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY --from=wasm-builder /arch/target/wasm32-wasip1/release/prompt_gateway.wasm /etc/envoy/proxy-wasm-plugins/prompt_gateway.wasm
COPY --from=wasm-builder /arch/target/wasm32-wasip1/release/llm_gateway.wasm /etc/envoy/proxy-wasm-plugins/llm_gateway.wasm
COPY --from=brightstaff-builder /arch/target/release/brightstaff /app/brightstaff
RUN mkdir -p /var/log/supervisor && \
touch /var/log/envoy.log /var/log/supervisor/supervisord.log \
/var/log/access_ingress.log /var/log/access_ingress_prompt.log \
/var/log/access_internal.log /var/log/access_llm.log /var/log/access_agent.log
ENTRYPOINT ["/usr/local/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

View file

@ -1,287 +0,0 @@
# Session Affinity — Multi-Replica Kubernetes Deployment
Production-style Kubernetes demo that proves Redis-backed session affinity
(`X-Model-Affinity`) works correctly when Plano runs as multiple replicas
behind a load balancer.
## Architecture
```
┌─────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
Client ──────────►│ LoadBalancer Service (port 12000) │
│ │ │ │
│ ┌────▼────┐ ┌─────▼───┐ │
│ │ Plano │ │ Plano │ (replicas) │
│ │ Pod 0 │ │ Pod 1 │ │
│ └────┬────┘ └────┬────┘ │
│ └──────┬───────┘ │
│ ┌────▼────┐ │
│ │ Redis │ (StatefulSet) │
│ │ Pod │ shared session store │
│ └─────────┘ │
│ │
│ ┌──────────┐ │
│ │ Jaeger │ distributed tracing │
│ └──────────┘ │
└─────────────────────────────────────────┘
```
**What makes this production-like:**
| Feature | Detail |
|---------|--------|
| 2 Plano replicas | `replicas: 2` with HPA (scales 25 on CPU) |
| Shared Redis | StatefulSet with PVC — sessions survive pod restarts |
| Session TTL | 600 s, enforced natively by Redis `EX` |
| Eviction policy | `allkeys-lru` — Redis auto-evicts oldest sessions under memory pressure |
| Distributed tracing | Jaeger collects spans from both pods |
| Health probes | Readiness + liveness gates traffic away from unhealthy pods |
## Quick Start (local — no registry needed)
```bash
# 1. Install kind if needed
# https://kind.sigs.k8s.io/docs/user/quick-start/#installation
# brew install kind (macOS)
# 2. Set your API key
export OPENAI_API_KEY=sk-...
# or copy and edit:
cp .env.example .env
# 3. Build, deploy, and verify in one command
./run-local.sh
```
`run-local.sh` creates a kind cluster named `plano-demo` (if it doesn't exist),
builds the image locally, loads it into the cluster with `kind load docker-image`
**no registry, no push required**.
Individual steps:
```bash
./run-local.sh --build-only # (re-)build and reload image into kind
./run-local.sh --deploy-only # (re-)apply k8s manifests
./run-local.sh --verify # run verify_affinity.py
./run-local.sh --down # delete k8s resources (keeps kind cluster)
./run-local.sh --delete-cluster # delete k8s resources + kind cluster
```
---
## Prerequisites
| Tool | Notes |
|------|-------|
| `kubectl` | Configured to reach a Kubernetes cluster |
| `docker` | To build and push the custom image |
| Container registry (optional) | Needed only when you are not using the local kind flow |
| `OPENAI_API_KEY` | For model inference |
| Python 3.11+ | Only for `verify_affinity.py` |
**Cluster:** `run-local.sh` creates and manages a kind cluster named `plano-demo` automatically. Install kind from https://kind.sigs.k8s.io or `brew install kind`.
## Step 1 — Build the Image
Build a custom image from the repo root:
```bash
# From this demo directory:
./build-and-push.sh ghcr.io/yourorg/plano-redis:latest
# Or manually from the repo root:
docker build \
-f demos/llm_routing/session_affinity_redis_k8s/Dockerfile \
-t ghcr.io/yourorg/plano-redis:latest \
.
docker push ghcr.io/yourorg/plano-redis:latest
```
Then update the image reference in `k8s/plano.yaml` (skip this when using `run-local.sh`, which uses `plano-redis:local` automatically):
```yaml
image: ghcr.io/yourorg/plano-redis:latest # ← replace YOUR_REGISTRY/plano-redis:latest
```
## Step 2 — Deploy
```bash
./deploy.sh
```
The script:
1. Creates the `plano-demo` namespace
2. Prompts for `OPENAI_API_KEY` and creates a Kubernetes Secret
3. Applies Redis, Jaeger, ConfigMap, and Plano manifests in order
4. Waits for rollouts to complete
Expected output:
```
==> Applying namespace...
==> Creating API key secret...
OPENAI_API_KEY: [hidden]
==> Applying Redis (StatefulSet + Services)...
==> Applying Jaeger...
==> Applying Plano config (ConfigMap)...
==> Applying Plano deployment + HPA...
==> Waiting for Redis to be ready...
==> Waiting for Plano pods to be ready...
Deployment complete!
=== Pods ===
NAME READY STATUS NODE
redis-0 1/1 Running node-1
plano-6d8f9b-xk2pq 1/1 Running node-1
plano-6d8f9b-r7nlw 1/1 Running node-2
jaeger-5c7d8f-q9mnb 1/1 Running node-1
=== Services ===
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
plano LoadBalancer 10.96.12.50 203.0.113.42 12000:32000/TCP
redis ClusterIP None <none> 6379/TCP
jaeger ClusterIP 10.96.8.71 <none> 16686/TCP,...
```
## Step 3 — Verify Session Affinity Across Replicas
```bash
python verify_affinity.py
```
The script opens a dedicated `kubectl port-forward` tunnel to **each pod
individually**. This is the definitive test: it routes requests to specific
pods rather than relying on random load-balancer assignment.
```
Mode: per-pod port-forward (full cross-replica proof)
Found 2 Plano pod(s): plano-6d8f9b-xk2pq, plano-6d8f9b-r7nlw
Opening per-pod port-forward tunnels...
plano-6d8f9b-xk2pq → localhost:19100
plano-6d8f9b-r7nlw → localhost:19101
==================================================================
Phase 1: Cross-replica session pinning
Pods under test : plano-6d8f9b-xk2pq, plano-6d8f9b-r7nlw
Sessions : 4
Rounds/session : 4
Each session is PINNED via one pod and VERIFIED via another.
If Redis is shared, every round must return the same model.
==================================================================
PASS k8s-session-001
model : gpt-4o-mini-2024-07-18
pod order : plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw
PASS k8s-session-002
model : gpt-5.2
pod order : plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq
PASS k8s-session-003
model : gpt-4o-mini-2024-07-18
pod order : plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw
PASS k8s-session-004
model : gpt-5.2
pod order : plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq → plano-6d8f9b-r7nlw → plano-6d8f9b-xk2pq
==================================================================
Phase 2: Redis key inspection
==================================================================
k8s-session-001
model_name : gpt-4o-mini-2024-07-18
route_name : fast_responses
TTL : 587s remaining
k8s-session-002
model_name : gpt-5.2
route_name : deep_reasoning
TTL : 581s remaining
...
==================================================================
Summary
==================================================================
All sessions were pinned consistently across replicas.
Redis session cache is working correctly in Kubernetes.
```
## What to Look For
### The cross-replica proof
Each session's `pod order` line shows it alternating between the two pods:
```
pod order: pod-A → pod-B → pod-A → pod-B
```
Round 1 sets the Redis key (via pod-A). Rounds 2, 3, 4 read from Redis on
alternating pods. If the model stays the same across all rounds, Redis is the
shared source of truth — **not** any in-process state.
### Redis keys
```bash
kubectl exec -it redis-0 -n plano-demo -- redis-cli
127.0.0.1:6379> KEYS *
1) "k8s-session-001"
2) "k8s-session-002"
127.0.0.1:6379> GET k8s-session-001
{"model_name":"gpt-4o-mini-2024-07-18","route_name":"fast_responses"}
127.0.0.1:6379> TTL k8s-session-001
(integer) 587
```
### Jaeger traces
```bash
kubectl port-forward svc/jaeger 16686:16686 -n plano-demo
```
Open **http://localhost:16686**, select service `plano`.
- **Pinned requests** — no span to the Arch-Router (decision served from Redis)
- **First request** per session — spans include the router call + a Redis `SET`
- Both Plano pods appear as separate instances in the trace list
### Scaling up (HPA in action)
```bash
# Scale to 3 replicas manually
kubectl scale deployment/plano --replicas=3 -n plano-demo
# Run verification again — now 3 pods alternate
python verify_affinity.py --sessions 6
```
Existing sessions in Redis are unaffected by the scale event. New pods
immediately participate in the shared session pool.
## Teardown
```bash
./deploy.sh --destroy
# Then optionally:
kubectl delete namespace plano-demo
```
## Notes
- The Redis StatefulSet uses a `PersistentVolumeClaim`. Session data survives
pod restarts within a TTL window but is not HA. For production, replace with
Redis Sentinel, Redis Cluster, or a managed service (ElastiCache, MemoryStore).
- `session_max_entries` is not enforced by this backend — Redis uses
`maxmemory-policy: allkeys-lru` instead, which is a global limit across all
keys rather than a per-application cap.
- On **minikube**, run `minikube tunnel` in a separate terminal to get an
external IP for the LoadBalancer service.
- On **kind**, switch to `NodePort` (see the comment in `k8s/plano.yaml`).

View file

@ -1,47 +0,0 @@
#!/usr/bin/env bash
# build-and-push.sh — Build the Plano demo image and push it to your registry.
#
# Usage:
# ./build-and-push.sh <registry/image:tag>
#
# Example:
# ./build-and-push.sh ghcr.io/yourorg/plano-redis:latest
# ./build-and-push.sh docker.io/youruser/plano-redis:0.4.17
#
# The build context is the repository root. Run this script from anywhere —
# it resolves the repo root automatically.
set -euo pipefail
IMAGE="${1:-}"
if [ -z "$IMAGE" ]; then
echo "Usage: $0 <registry/image:tag>"
echo ""
echo "Example:"
echo " $0 ghcr.io/yourorg/plano-redis:latest"
exit 1
fi
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../.." && pwd)"
DOCKERFILE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/Dockerfile"
echo "Repository root : $REPO_ROOT"
echo "Dockerfile : $DOCKERFILE"
echo "Image : $IMAGE"
echo ""
echo "==> Building image (this takes a few minutes — Rust compile from scratch)..."
docker build \
--file "$DOCKERFILE" \
--tag "$IMAGE" \
--progress=plain \
"$REPO_ROOT"
echo "==> Pushing $IMAGE..."
docker push "$IMAGE"
echo ""
echo "Done. Update k8s/plano.yaml:"
echo " image: $IMAGE"
echo ""
echo "Then deploy with: ./deploy.sh"

View file

@ -1,38 +0,0 @@
version: v0.4.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
routing_preferences:
- name: fast_responses
description: short factual questions, quick lookups, simple summarization, or greetings
models:
- openai/gpt-4o-mini
- name: deep_reasoning
description: multi-step reasoning, complex analysis, code review, or detailed explanations
models:
- openai/gpt-5.2
- openai/gpt-4o-mini
# Redis is reachable inside the cluster via the service name.
routing:
session_ttl_seconds: 600
session_cache:
type: redis
url: redis://redis.plano-demo.svc.cluster.local:6379
tracing:
random_sampling: 100
trace_arch_internal: true
opentracing_grpc_endpoint: http://jaeger.plano-demo.svc.cluster.local:4317

View file

@ -1,136 +0,0 @@
#!/usr/bin/env bash
# deploy.sh — Apply all Kubernetes manifests in the correct order.
#
# Usage:
# ./deploy.sh # deploy everything
# ./deploy.sh --destroy # tear down everything (keeps namespace for safety)
# ./deploy.sh --status # show pod and service status
set -euo pipefail
DEMO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
K8S_DIR="$DEMO_DIR/k8s"
NS="plano-demo"
check_prereqs() {
local missing=()
command -v kubectl >/dev/null 2>&1 || missing+=("kubectl")
if [ ${#missing[@]} -gt 0 ]; then
echo "ERROR: missing required tools: ${missing[*]}"
exit 1
fi
if ! kubectl cluster-info &>/dev/null; then
echo "ERROR: no Kubernetes cluster reachable. Start minikube/kind or configure kubeconfig."
exit 1
fi
}
create_secret() {
if kubectl get secret plano-secrets -n "$NS" &>/dev/null; then
echo " Secret plano-secrets already exists, skipping."
return
fi
local openai_api_key="${OPENAI_API_KEY:-}"
if [ -z "$openai_api_key" ]; then
echo ""
echo "No 'plano-secrets' secret found in namespace '$NS'."
echo "Enter API keys (input is hidden):"
echo ""
read -r -s -p " OPENAI_API_KEY: " openai_api_key
echo ""
else
echo " Using OPENAI_API_KEY from environment."
fi
if [ -z "$openai_api_key" ]; then
echo "ERROR: OPENAI_API_KEY cannot be empty."
exit 1
fi
kubectl create secret generic plano-secrets \
--from-literal=OPENAI_API_KEY="$openai_api_key" \
-n "$NS"
echo " Secret created."
}
deploy() {
echo "==> Applying namespace..."
kubectl apply -f "$K8S_DIR/namespace.yaml"
echo "==> Creating API key secret..."
create_secret
echo "==> Applying Redis (StatefulSet + Services)..."
kubectl apply -f "$K8S_DIR/redis.yaml"
echo "==> Applying Jaeger..."
kubectl apply -f "$K8S_DIR/jaeger.yaml"
echo "==> Applying Plano config (ConfigMap)..."
kubectl apply -f "$K8S_DIR/plano-config.yaml"
echo "==> Applying Plano deployment + HPA..."
kubectl apply -f "$K8S_DIR/plano.yaml"
echo ""
echo "==> Waiting for Redis to be ready..."
kubectl rollout status statefulset/redis -n "$NS" --timeout=120s
echo "==> Waiting for Plano pods to be ready..."
kubectl rollout status deployment/plano -n "$NS" --timeout=120s
echo ""
echo "Deployment complete!"
show_status
echo ""
echo "Useful commands:"
echo " # Tail logs from all Plano pods:"
echo " kubectl logs -l app=plano -n $NS -f"
echo ""
echo " # Open Jaeger UI:"
echo " kubectl port-forward svc/jaeger 16686:16686 -n $NS &"
echo " open http://localhost:16686"
echo ""
echo " # Access Redis CLI:"
echo " kubectl exec -it redis-0 -n $NS -- redis-cli"
echo ""
echo " # Run the verification script:"
echo " python $DEMO_DIR/verify_affinity.py"
}
destroy() {
echo "==> Deleting Plano, Jaeger, and Redis resources..."
kubectl delete -f "$K8S_DIR/plano.yaml" --ignore-not-found
kubectl delete -f "$K8S_DIR/jaeger.yaml" --ignore-not-found
kubectl delete -f "$K8S_DIR/redis.yaml" --ignore-not-found
kubectl delete -f "$K8S_DIR/plano-config.yaml" --ignore-not-found
kubectl delete secret plano-secrets -n "$NS" --ignore-not-found
echo ""
echo "Resources deleted."
echo "Namespace '$NS' was kept. Remove it manually if desired:"
echo " kubectl delete namespace $NS"
}
show_status() {
echo ""
echo "=== Pods ==="
kubectl get pods -n "$NS" -o wide
echo ""
echo "=== Services ==="
kubectl get svc -n "$NS"
echo ""
echo "=== HPA ==="
kubectl get hpa -n "$NS" 2>/dev/null || true
}
check_prereqs
case "${1:-}" in
--destroy) destroy ;;
--status) show_status ;;
*) deploy ;;
esac

View file

@ -1,56 +0,0 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger
namespace: plano-demo
labels:
app: jaeger
spec:
replicas: 1
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/jaeger:2.3.0
ports:
- containerPort: 16686 # UI
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: jaeger
namespace: plano-demo
labels:
app: jaeger
spec:
selector:
app: jaeger
ports:
- name: ui
port: 16686
targetPort: 16686
- name: otlp-grpc
port: 4317
targetPort: 4317
- name: otlp-http
port: 4318
targetPort: 4318
---
# NodePort for UI access from your laptop.
# Access at: http://localhost:16686 after: kubectl port-forward svc/jaeger 16686:16686 -n plano-demo

View file

@ -1,6 +0,0 @@
apiVersion: v1
kind: Namespace
metadata:
name: plano-demo
labels:
app.kubernetes.io/part-of: plano-session-affinity-demo

View file

@ -1,50 +0,0 @@
---
# ConfigMap wrapping the Plano config file.
# Regenerate after editing config_k8s.yaml:
# kubectl create configmap plano-config \
# --from-file=plano_config.yaml=../config_k8s.yaml \
# -n plano-demo --dry-run=client -o yaml | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: plano-config
namespace: plano-demo
data:
plano_config.yaml: |
version: v0.4.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-5.2
access_key: $OPENAI_API_KEY
routing_preferences:
- name: fast_responses
description: short factual questions, quick lookups, simple summarization, or greetings
models:
- openai/gpt-4o-mini
- name: deep_reasoning
description: multi-step reasoning, complex analysis, code review, or detailed explanations
models:
- openai/gpt-5.2
- openai/gpt-4o-mini
routing:
session_ttl_seconds: 600
session_cache:
type: redis
url: redis://redis.plano-demo.svc.cluster.local:6379
tracing:
random_sampling: 100
trace_arch_internal: true
opentracing_grpc_endpoint: http://jaeger.plano-demo.svc.cluster.local:4317

View file

@ -1,19 +0,0 @@
# EXAMPLE — do NOT apply this file directly.
# Create the real secret with:
#
# kubectl create secret generic plano-secrets \
# --from-literal=OPENAI_API_KEY=sk-... \
# -n plano-demo
#
# Or use the deploy.sh script, which prompts for keys and creates the secret.
#
# If you use a secrets manager (AWS Secrets Manager, GCP Secret Manager, Vault)
# replace this with an ExternalSecret or a CSI driver volume mount instead.
apiVersion: v1
kind: Secret
metadata:
name: plano-secrets
namespace: plano-demo
type: Opaque
stringData:
OPENAI_API_KEY: "sk-replace-me"

View file

@ -1,130 +0,0 @@
---
# Plano Deployment — 2 replicas sharing one Redis instance.
# All replicas are stateless; routing state lives entirely in Redis.
apiVersion: apps/v1
kind: Deployment
metadata:
name: plano
namespace: plano-demo
labels:
app: plano
spec:
replicas: 2
selector:
matchLabels:
app: plano
template:
metadata:
labels:
app: plano
spec:
containers:
- name: plano
# Local dev: run-local.sh sets this to plano-redis:local and loads it
# into minikube/kind so no registry is needed.
# Production: replace with your registry image and use imagePullPolicy: Always.
image: plano-redis:local
imagePullPolicy: IfNotPresent
ports:
- containerPort: 12000
name: llm-gateway
envFrom:
- secretRef:
name: plano-secrets
env:
- name: LOG_LEVEL
value: "info"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: plano-config
mountPath: /app/plano_config.yaml
subPath: plano_config.yaml
readOnly: true
readinessProbe:
httpGet:
path: /healthz
port: 12000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 12000
initialDelaySeconds: 15
periodSeconds: 30
failureThreshold: 3
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: plano-config
configMap:
name: plano-config
---
# LoadBalancer Service — exposes Plano externally.
# On minikube, run: minikube tunnel
# On kind, use NodePort instead (see comment below).
# On cloud providers (GKE, EKS, AKS), an external IP is assigned automatically.
apiVersion: v1
kind: Service
metadata:
name: plano
namespace: plano-demo
labels:
app: plano
spec:
type: LoadBalancer
selector:
app: plano
ports:
- name: llm-gateway
port: 12000
targetPort: 12000
---
# Uncomment and use instead of the LoadBalancer above when running on kind/minikube
# without tunnel:
#
# apiVersion: v1
# kind: Service
# metadata:
# name: plano
# namespace: plano-demo
# spec:
# type: NodePort
# selector:
# app: plano
# ports:
# - name: llm-gateway
# port: 12000
# targetPort: 12000
# nodePort: 32000
---
# HorizontalPodAutoscaler — scales 2 to 5 replicas based on CPU.
# Demonstrates that new replicas join the existing session state seamlessly.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: plano
namespace: plano-demo
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: plano
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

View file

@ -1,96 +0,0 @@
---
# Redis StatefulSet — single-shard, persistence enabled.
# For production, replace with Redis Cluster or a managed service (ElastiCache, MemoryStore, etc.).
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: plano-demo
labels:
app: redis
spec:
serviceName: redis
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
name: redis
command:
- redis-server
- --appendonly
- "yes"
- --maxmemory
- "256mb"
- --maxmemory-policy
- allkeys-lru
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 15
periodSeconds: 20
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "320Mi"
cpu: "500m"
volumeMounts:
- name: redis-data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
---
# Stable DNS name: redis.plano-demo.svc.cluster.local:6379
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: plano-demo
labels:
app: redis
spec:
selector:
app: redis
ports:
- name: redis
port: 6379
targetPort: 6379
clusterIP: None # headless — StatefulSet pods get stable DNS
---
# Regular ClusterIP for application code (redis://redis:6379)
apiVersion: v1
kind: Service
metadata:
name: redis-service
namespace: plano-demo
labels:
app: redis
spec:
selector:
app: redis
ports:
- name: redis
port: 6379
targetPort: 6379

View file

@ -1,154 +0,0 @@
#!/usr/bin/env bash
# run-local.sh — Build and run the k8s session affinity demo entirely locally with kind.
# No registry, no image push required.
#
# Usage:
# ./run-local.sh # create cluster (if needed), build, deploy, verify
# ./run-local.sh --build-only # build and load the image into kind
# ./run-local.sh --deploy-only # skip build, re-apply k8s manifests
# ./run-local.sh --verify # run verify_affinity.py against the running cluster
# ./run-local.sh --down # tear down k8s resources (keeps kind cluster)
# ./run-local.sh --delete-cluster # also delete the kind cluster
set -euo pipefail
DEMO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$DEMO_DIR/../../.." && pwd)"
IMAGE_NAME="plano-redis:local"
KIND_CLUSTER="plano-demo"
# ---------------------------------------------------------------------------
# Prereq check
# ---------------------------------------------------------------------------
check_prereqs() {
local missing=()
command -v docker >/dev/null 2>&1 || missing+=("docker")
command -v kubectl >/dev/null 2>&1 || missing+=("kubectl")
command -v kind >/dev/null 2>&1 || missing+=("kind (https://kind.sigs.k8s.io/docs/user/quick-start/#installation)")
command -v python3 >/dev/null 2>&1 || missing+=("python3")
if [ ${#missing[@]} -gt 0 ]; then
echo "ERROR: missing required tools:"
for t in "${missing[@]}"; do echo " - $t"; done
exit 1
fi
}
load_env() {
if [ -f "$DEMO_DIR/.env" ]; then
set -a
# shellcheck disable=SC1091
source "$DEMO_DIR/.env"
set +a
fi
}
# ---------------------------------------------------------------------------
# Cluster lifecycle
# ---------------------------------------------------------------------------
ensure_cluster() {
if kind get clusters 2>/dev/null | grep -q "^${KIND_CLUSTER}$"; then
echo "==> kind cluster '$KIND_CLUSTER' already exists, reusing."
else
echo "==> Creating kind cluster '$KIND_CLUSTER'..."
kind create cluster --name "$KIND_CLUSTER"
echo " Cluster created."
fi
# Point kubectl at this cluster
kubectl config use-context "kind-${KIND_CLUSTER}" >/dev/null
}
# ---------------------------------------------------------------------------
# Build and load
# ---------------------------------------------------------------------------
build() {
echo "==> Building image '$IMAGE_NAME' from repo root..."
docker build \
--file "$DEMO_DIR/Dockerfile" \
--tag "$IMAGE_NAME" \
--progress=plain \
"$REPO_ROOT"
echo "==> Loading '$IMAGE_NAME' into kind cluster '$KIND_CLUSTER'..."
kind load docker-image "$IMAGE_NAME" --name "$KIND_CLUSTER"
echo " Image loaded."
}
# ---------------------------------------------------------------------------
# Deploy / verify / teardown
# ---------------------------------------------------------------------------
deploy() {
echo ""
echo "==> Deploying to Kubernetes..."
"$DEMO_DIR/deploy.sh"
}
verify() {
echo ""
echo "==> Running cross-replica verification..."
python3 "$DEMO_DIR/verify_affinity.py"
}
down() {
"$DEMO_DIR/deploy.sh" --destroy
}
delete_cluster() {
echo "==> Deleting kind cluster '$KIND_CLUSTER'..."
kind delete cluster --name "$KIND_CLUSTER"
echo " Cluster deleted."
}
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
case "${1:-}" in
--build-only)
check_prereqs
load_env
ensure_cluster
build
;;
--deploy-only)
check_prereqs
load_env
ensure_cluster
deploy
;;
--verify)
check_prereqs
verify
;;
--down)
check_prereqs
down
;;
--delete-cluster)
check_prereqs
down
delete_cluster
;;
"")
check_prereqs
load_env
ensure_cluster
echo ""
build
deploy
echo ""
echo "==> Everything is up. Running verification in 5 seconds..."
echo " (Ctrl-C to skip — run manually with: ./run-local.sh --verify)"
sleep 5
verify
;;
*)
echo "Usage: $0 [--build-only | --deploy-only | --verify | --down | --delete-cluster]"
exit 1
;;
esac

View file

@ -1,418 +0,0 @@
#!/usr/bin/env python3
"""
verify_affinity.py Prove that Redis-backed session affinity works across Plano replicas.
Strategy
--------
Kubernetes round-robin is non-deterministic, so simply hammering the LoadBalancer
service is not a reliable proof. Instead this script:
1. Discovers the two (or more) Plano pod names with kubectl.
2. Opens a kubectl port-forward tunnel to EACH pod on a separate local port.
3. Pins a session via Pod 0 (writes the Redis key).
4. Reads the same session via Pod 1 (must return the same model reads Redis).
5. Repeats across N sessions, round-robining which pod sets vs. reads the pin.
If every round returns the same model, Redis is the shared source of truth and
multi-replica affinity is proven.
Usage
-----
# From inside the cluster network (e.g. CI job or jumpbox):
python verify_affinity.py --url http://<LoadBalancer-IP>:12000
# From your laptop (uses kubectl port-forward automatically):
python verify_affinity.py
# More sessions / rounds:
python verify_affinity.py --sessions 5 --rounds 6
Requirements
------------
kubectl configured to reach the plano-demo namespace
Python 3.11+
"""
import argparse
import http.client
import json
import signal
import subprocess
import sys
import time
import urllib.error
import urllib.request
from contextlib import contextmanager
NAMESPACE = "plano-demo"
BASE_LOCAL_PORT = 19100 # port-forward starts here, increments per pod
PROMPTS = [
"Explain the difference between TCP and UDP in detail.",
"Write a merge sort implementation in Python.",
"What is quantum entanglement?",
"Describe the CAP theorem with examples.",
"How does gradient descent work in neural networks?",
"What is the time complexity of Dijkstra's algorithm?",
]
# ---------------------------------------------------------------------------
# kubectl helpers
# ---------------------------------------------------------------------------
def get_pod_names() -> list[str]:
"""Return running Plano pod names in the plano-demo namespace."""
result = subprocess.run(
[
"kubectl",
"get",
"pods",
"-n",
NAMESPACE,
"-l",
"app=plano",
"--field-selector=status.phase=Running",
"-o",
"jsonpath={.items[*].metadata.name}",
],
capture_output=True,
text=True,
check=True,
)
pods = result.stdout.strip().split()
if not pods or pods == [""]:
raise RuntimeError(
f"No running Plano pods found in namespace '{NAMESPACE}'.\n"
"Is the cluster deployed? Run: ./deploy.sh"
)
return pods
@contextmanager
def port_forward(pod_name: str, local_port: int, remote_port: int = 12000):
"""Context manager that starts and stops a kubectl port-forward."""
proc = subprocess.Popen(
[
"kubectl",
"port-forward",
f"pod/{pod_name}",
f"{local_port}:{remote_port}",
"-n",
NAMESPACE,
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
# Give the tunnel a moment to establish
time.sleep(1.5)
try:
yield f"http://localhost:{local_port}"
finally:
proc.send_signal(signal.SIGTERM)
try:
proc.wait(timeout=3)
except subprocess.TimeoutExpired:
proc.kill()
# ---------------------------------------------------------------------------
# HTTP helpers
# ---------------------------------------------------------------------------
def chat(
base_url: str,
session_id: str | None,
message: str,
model: str = "openai/gpt-4o-mini",
retries: int = 3,
retry_delay: float = 5.0,
) -> dict:
payload = json.dumps(
{
"model": model,
"messages": [{"role": "user", "content": message}],
}
).encode()
headers = {"Content-Type": "application/json"}
if session_id:
headers["x-model-affinity"] = session_id
req = urllib.request.Request(
f"{base_url}/v1/chat/completions",
data=payload,
headers=headers,
method="POST",
)
last_err: Exception | None = None
for attempt in range(retries):
try:
with urllib.request.urlopen(req, timeout=60) as resp:
body = resp.read()
if not body:
raise RuntimeError(f"Empty response body from {base_url}")
return json.loads(body)
except urllib.error.HTTPError as e:
if e.code in (503, 502, 429) and attempt < retries - 1:
time.sleep(retry_delay * (attempt + 1))
last_err = e
continue
raise RuntimeError(f"Request to {base_url} failed: {e}") from e
except (
urllib.error.URLError,
http.client.RemoteDisconnected,
RuntimeError,
) as e:
if attempt < retries - 1:
time.sleep(retry_delay * (attempt + 1))
last_err = e
continue
raise RuntimeError(f"Request to {base_url} failed: {e}") from e
except json.JSONDecodeError as e:
raise RuntimeError(f"Invalid JSON from {base_url}: {e}") from e
raise RuntimeError(
f"Request to {base_url} failed after {retries} attempts: {last_err}"
)
def extract_model(response: dict) -> str:
return response.get("model", "<unknown>")
# ---------------------------------------------------------------------------
# Verification phases
# ---------------------------------------------------------------------------
def phase_loadbalancer(url: str, rounds: int) -> None:
"""Phase 0: quick smoke test against the LoadBalancer / provided URL."""
print("=" * 66)
print(f"Phase 0: Smoke test against {url}")
print("=" * 66)
for i in range(rounds):
resp = chat(url, None, PROMPTS[i % len(PROMPTS)])
print(f" Request {i + 1}: model = {extract_model(resp)}")
print()
def phase_cross_replica(
pod_urls: dict[str, str], num_sessions: int, rounds: int
) -> bool:
"""
Phase 1 Cross-replica pinning.
For each session:
Round 1: send to pod_A (sets the Redis key)
Rounds 2+: alternate between pod_A and pod_B
Assert every round returns the same model.
"""
pod_names = list(pod_urls.keys())
all_passed = True
session_results: dict[str, dict] = {}
print("=" * 66)
print("Phase 1: Cross-replica session pinning")
print(f" Pods under test : {', '.join(pod_names)}")
print(f" Sessions : {num_sessions}")
print(f" Rounds/session : {rounds}")
print()
print(" Each session is PINNED via one pod and VERIFIED via another.")
print(" If Redis is shared, every round must return the same model.")
print("=" * 66)
for s in range(num_sessions):
session_id = f"k8s-session-{s + 1:03d}"
models_seen = []
pod_sequence = []
for r in range(rounds):
# Alternate which pod handles each round
pod_name = pod_names[r % len(pod_names)]
url = pod_urls[pod_name]
try:
resp = chat(url, session_id, PROMPTS[(s + r) % len(PROMPTS)])
model = extract_model(resp)
except RuntimeError as e:
print(f" ERROR on {pod_name} round {r + 1}: {e}")
all_passed = False
continue
models_seen.append(model)
pod_sequence.append(pod_name)
unique_models = set(models_seen)
passed = len(unique_models) == 1
session_results[session_id] = {
"passed": passed,
"model": models_seen[0] if models_seen else "<none>",
"unique_models": unique_models,
"pod_sequence": pod_sequence,
}
status = "PASS" if passed else "FAIL"
detail = models_seen[0] if passed else str(unique_models)
print(f"\n {status} {session_id}")
print(f" model : {detail}")
print(f" pod order : {''.join(pod_sequence)}")
if not passed:
all_passed = False
return all_passed
def phase_redis_inspect(num_sessions: int) -> None:
"""Phase 2: read keys directly from Redis to show what's stored."""
print()
print("=" * 66)
print("Phase 2: Redis key inspection")
print("=" * 66)
for s in range(num_sessions):
session_id = f"k8s-session-{s + 1:03d}"
result = subprocess.run(
[
"kubectl",
"exec",
"-n",
NAMESPACE,
"redis-0",
"--",
"redis-cli",
"GET",
session_id,
],
capture_output=True,
text=True,
)
raw = result.stdout.strip()
ttl_result = subprocess.run(
[
"kubectl",
"exec",
"-n",
NAMESPACE,
"redis-0",
"--",
"redis-cli",
"TTL",
session_id,
],
capture_output=True,
text=True,
)
ttl = ttl_result.stdout.strip()
if raw and raw != "(nil)":
try:
data = json.loads(raw)
print(f" {session_id}")
print(f" model_name : {data.get('model_name', '?')}")
print(f" route_name : {data.get('route_name', 'null')}")
print(f" TTL : {ttl}s remaining")
except json.JSONDecodeError:
print(f" {session_id}: (raw) {raw}")
else:
print(f" {session_id}: key not found or expired")
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
"--url",
default=None,
help="LoadBalancer URL to use instead of per-pod port-forwards. "
"When set, cross-replica proof is skipped (no pod targeting).",
)
parser.add_argument(
"--sessions", type=int, default=4, help="Number of sessions (default 4)"
)
parser.add_argument(
"--rounds", type=int, default=4, help="Rounds per session (default 4)"
)
parser.add_argument(
"--skip-redis-inspect", action="store_true", help="Skip Redis key inspection"
)
args = parser.parse_args()
if args.url:
# Simple mode: hit the LoadBalancer directly
print(f"Mode: LoadBalancer ({args.url})")
print()
phase_loadbalancer(args.url, args.rounds)
print("To get the full cross-replica proof, run without --url.")
sys.exit(0)
# Full mode: port-forward to each pod individually
print("Mode: per-pod port-forward (full cross-replica proof)")
print()
try:
pod_names = get_pod_names()
except (subprocess.CalledProcessError, RuntimeError) as e:
print(f"ERROR: {e}", file=sys.stderr)
sys.exit(1)
if len(pod_names) < 2:
print(f"WARNING: only {len(pod_names)} Plano pod(s) running.")
print(" For a true cross-replica test you need at least 2.")
print(" Scale up: kubectl scale deployment/plano --replicas=2 -n plano-demo")
print()
print(f"Found {len(pod_names)} Plano pod(s): {', '.join(pod_names)}")
print("Opening per-pod port-forward tunnels...")
print()
pod_urls: dict[str, str] = {}
contexts = []
for i, pod in enumerate(pod_names):
local_port = BASE_LOCAL_PORT + i
ctx = port_forward(pod, local_port)
url = ctx.__enter__()
pod_urls[pod] = url
contexts.append((ctx, url))
print(f" {pod} → localhost:{local_port}")
print()
try:
passed = phase_cross_replica(pod_urls, args.sessions, args.rounds)
if not args.skip_redis_inspect:
phase_redis_inspect(args.sessions)
print()
print("=" * 66)
print("Summary")
print("=" * 66)
if passed:
print("All sessions were pinned consistently across replicas.")
print("Redis session cache is working correctly in Kubernetes.")
else:
print("One or more sessions were NOT consistent across replicas.")
print("Check brightstaff logs: kubectl logs -l app=plano -n plano-demo")
finally:
for ctx, _ in contexts:
try:
ctx.__exit__(None, None, None)
except Exception:
pass
sys.exit(0 if passed else 1)
if __name__ == "__main__":
main()