4.8 KiB
Session Pinning Demo
Consistent model selection for agentic loops using
X-Session-Id.
Why Session Pinning?
When an agent runs in a loop — research → plan → implement → review → refine — each iteration hits Plano's router independently. Since the prompts vary in intent, the router may select different models for each step, breaking consistency mid-workflow.
Session pinning solves this: send an X-Session-Id header and the first request runs routing as usual, caching the decision. Every subsequent request with the same session ID returns the same model instantly ("pinned": true), without re-running the router.
Without pinning With pinning (X-Session-Id)
───────────────── ───────────────────────────
Step 1 → Claude (code_generation) Step 1 → Claude (code_generation) ← routed
Step 2 → GPT-4o (complex_reasoning) Step 2 → Claude (pinned ✓)
Step 3 → Claude (code_generation) Step 3 → Claude (pinned ✓)
Step 4 → GPT-4o (complex_reasoning) Step 4 → Claude (pinned ✓)
Step 5 → Claude (code_generation) Step 5 → Claude (pinned ✓)
↑ model switches every step ↑ one model, start to finish
Quick Start
# 1. Set API keys
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>
# 2. Start Plano
cd demos/llm_routing/session_pinning
planoai up config.yaml
# 3. Run the demo
./demo.sh # or: python3 demo.py
What the Demo Does
The script simulates an agent building a task management app in 5 iterative steps, deliberately mixing intents:
| Step | Prompt | Intent |
|---|---|---|
| 1 | Design a REST API schema for a task management app… | code generation |
| 2 | Analyze SQL vs NoSQL trade-offs for this system… | complex reasoning |
| 3 | Write the SQLAlchemy database models… | code generation |
| 4 | Review the API design for security vulnerabilities… | complex reasoning |
| 5 | Implement JWT authentication middleware… | code generation |
It runs this loop twice against the /routing/v1/chat/completions endpoint (routing decisions only — no actual LLM calls):
- Without pinning — no
X-Session-Idheader; models switch between steps - With pinning —
X-Session-Idheader included; the model selected in step 1 is reused for all 5 steps
Expected Output
══════════════════════════════════════════════════════════════════
Run 1: WITHOUT Session Pinning
──────────────────────────────────────────────────────────────────
Step 1: Design a REST API schema… → anthropic/claude-sonnet-4-20250514
Step 2: Analyze SQL vs NoSQL… → openai/gpt-4o
Step 3: Write SQLAlchemy models… → anthropic/claude-sonnet-4-20250514
Step 4: Review API for security… → openai/gpt-4o
Step 5: Implement JWT auth… → anthropic/claude-sonnet-4-20250514
✗ Models varied: anthropic/claude-sonnet-4-20250514, openai/gpt-4o
══════════════════════════════════════════════════════════════════
Run 2: WITH Session Pinning (X-Session-Id: a1b2c3d4-…)
──────────────────────────────────────────────────────────────────
Step 1: Design a REST API schema… → anthropic/claude-sonnet-4-20250514 (pinned=false)
Step 2: Analyze SQL vs NoSQL… → anthropic/claude-sonnet-4-20250514 (pinned=true)
Step 3: Write SQLAlchemy models… → anthropic/claude-sonnet-4-20250514 (pinned=true)
Step 4: Review API for security… → anthropic/claude-sonnet-4-20250514 (pinned=true)
Step 5: Implement JWT auth… → anthropic/claude-sonnet-4-20250514 (pinned=true)
✓ All 5 steps routed to anthropic/claude-sonnet-4-20250514
Configuration
Session pinning is configurable in config.yaml:
routing:
session_ttl_seconds: 600 # How long a pinned session lasts (default: 10 min)
session_max_entries: 10000 # Max cached sessions before LRU eviction
Without the X-Session-Id header, routing runs fresh every time — no breaking change to existing clients.
See Also
- Model Routing Service Demo — curl-based examples of the routing endpoint and session pinning