rename session pinning to model affinity with x-model-affinity header

This commit is contained in:
Adil Hafeez 2026-04-08 15:23:53 -07:00
parent 5789694d2f
commit da9792c2dd
14 changed files with 468 additions and 371 deletions

View file

@ -0,0 +1,135 @@
# Model Affinity Demo
> Consistent model selection for agentic loops using `X-Model-Affinity`.
## Why Model Affinity?
When an agent runs in a loop — calling tools, reasoning about results, calling more tools — each LLM request hits Plano's router independently. Because prompts vary in intent (tool selection looks like code generation, reasoning about results looks like complex analysis), the router may select **different models** for each turn, fragmenting context mid-session.
**Model affinity** solves this: send an `X-Model-Affinity` header and the first request runs routing as usual, caching the decision. Every subsequent request with the same affinity ID returns the **same model**, without re-running the router.
```
Without affinity With affinity (X-Model-Affinity)
──────────────── ───────────────────────────────
Turn 1 → claude-sonnet (tool calls) Turn 1 → claude-sonnet ← routed
Turn 2 → gpt-4o (reasoning) Turn 2 → claude-sonnet ← pinned ✓
Turn 3 → claude-sonnet (tool calls) Turn 3 → claude-sonnet ← pinned ✓
Turn 4 → gpt-4o (reasoning) Turn 4 → claude-sonnet ← pinned ✓
Turn 5 → claude-sonnet (final answer) Turn 5 → claude-sonnet ← pinned ✓
↑ model switches every turn ↑ one model, start to finish
```
---
## Quick Start
```bash
# 1. Set API keys
export OPENAI_API_KEY=<your-key>
export ANTHROPIC_API_KEY=<your-key>
# 2. Start Plano
cd demos/llm_routing/model_affinity
planoai up config.yaml
# 3. Run the demo (uv manages dependencies automatically)
./demo.sh # or: uv run demo.py
```
---
## What the Demo Does
A **database selection agent** investigates whether to use PostgreSQL or MongoDB
for an e-commerce platform. It runs a real tool-calling loop: the LLM decides
which tools to call, receives simulated results, and continues until it has
enough data to recommend a database.
Available tools:
- `get_db_benchmarks` — fetch performance data for a workload type
- `get_case_studies` — retrieve real-world e-commerce case studies
- `check_feature_support` — check if a database supports a specific feature
The demo runs the **same agent loop twice**:
1. **Without affinity** — no `X-Model-Affinity`; models may switch between turns
2. **With affinity**`X-Model-Affinity` header included; model is pinned from turn 1
Each turn is a separate `POST /v1/chat/completions` request to Plano using the
[OpenAI SDK](https://github.com/openai/openai-python). The demo prints the
model used on each turn so you can see the difference.
### Expected Output
```
Run 1: WITHOUT Model Affinity
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
turn 1 [claude-sonnet-4-20250514 ] get_db_benchmarks, get_db_benchmarks
turn 2 [gpt-4o ] get_case_studies, get_case_studies ← switched
turn 3 [claude-sonnet-4-20250514 ] check_feature_support ← switched
turn 4 [gpt-4o ] final answer ← switched
✗ Without affinity: model switched 3 time(s)
Run 2: WITH Model Affinity (X-Model-Affinity: a1b2c3d4…)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
turn 1 [claude-sonnet-4-20250514 ] get_db_benchmarks, get_db_benchmarks
turn 2 [claude-sonnet-4-20250514 ] get_case_studies, get_case_studies
turn 3 [claude-sonnet-4-20250514 ] check_feature_support
turn 4 [claude-sonnet-4-20250514 ] final answer
✓ With affinity: claude-sonnet-4-20250514 for all 4 turns
```
### How It Works
Model affinity is implemented in brightstaff. When `X-Model-Affinity` is present:
1. **First request** — routing runs normally, result is cached keyed by the affinity ID
2. **Subsequent requests** — cache hit skips routing and returns the cached model instantly
The `X-Model-Affinity` header is forwarded transparently; no changes to your OpenAI
SDK calls beyond adding the header.
```python
from openai import OpenAI
import uuid
client = OpenAI(base_url="http://localhost:12000/v1", api_key="EMPTY")
affinity_id = str(uuid.uuid4())
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
extra_headers={"X-Model-Affinity": affinity_id},
)
```
---
## Configuration
Model affinity is configurable in `config.yaml`:
```yaml
routing:
session_ttl_seconds: 600 # How long affinity lasts (default: 10 min)
session_max_entries: 10000 # Max cached sessions (upper limit: 10000)
```
Without the `X-Model-Affinity` header, routing runs fresh every time — no breaking
change to existing clients.
---
## Advanced: Agent Server Demo
The `agent.py` file is a FastAPI-based agent server that demonstrates a more
complex pattern: an external agent service that forwards `X-Model-Affinity`
on all outbound calls to Plano. Use `start_agents.sh` to run it.
## See Also
- [Model Routing Service Demo](../model_routing_service/) — curl-based examples of the routing endpoint

View file

@ -0,0 +1,429 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = ["fastapi>=0.115", "uvicorn>=0.30", "openai>=1.0.0"]
# ///
"""
Research Agent FastAPI service exposing /v1/chat/completions.
For each incoming request the agent runs 3 independent research tasks,
each with its own tool-calling loop. The tasks deliberately alternate between
code_generation and complex_reasoning intents so Plano's preference-based
router selects different models for each task.
If the client sends X-Model-Affinity, the agent forwards it on every outbound
call to Plano. The first task pins the model; all subsequent tasks skip the
router and reuse it keeping the whole session on one consistent model.
Run standalone:
uv run agent.py
PLANO_URL=http://myhost:12000 AGENT_PORT=8000 uv run agent.py
"""
import json
import logging
import os
import uuid
import uvicorn
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from openai import AsyncOpenAI
from openai.types.chat import ChatCompletionMessageParam
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [AGENT] %(levelname)s %(message)s",
)
log = logging.getLogger(__name__)
PLANO_URL = os.environ.get("PLANO_URL", "http://localhost:12000")
PORT = int(os.environ.get("AGENT_PORT", "8000"))
# ---------------------------------------------------------------------------
# Tasks — each has its own conversation so Plano routes each independently.
# Intent alternates: code_generation → complex_reasoning → code_generation.
# ---------------------------------------------------------------------------
TASKS = [
{
"name": "generate_comparison",
# Triggers code_generation routing preference (write/generate output)
"prompt": (
"Use the tools to fetch benchmark data for PostgreSQL and MongoDB "
"under a mixed workload. Then generate a compact Markdown comparison "
"table with columns: metric, PostgreSQL, MongoDB. Cover read QPS, "
"write QPS, p99 latency ms, ACID support, and horizontal scaling."
),
},
{
"name": "analyse_tradeoffs",
# Triggers complex_reasoning routing preference (analyse/reason/evaluate)
"prompt": (
"Context from prior research:\n{context}\n\n"
"Perform a deep analysis: for a high-traffic e-commerce platform that "
"requires ACID guarantees for order processing but flexible schemas for "
"product attributes, carefully reason through and evaluate the long-term "
"architectural trade-offs of each database. Consider consistency "
"guarantees, operational complexity, and scalability risks."
),
},
{
"name": "write_schema",
# Triggers code_generation routing preference (write SQL / generate code)
"prompt": (
"Context from prior research:\n{context}\n\n"
"Write the CREATE TABLE SQL schema for the database you would recommend "
"from the analysis above. Include: orders, order_items, products, and "
"users tables with appropriate primary keys, foreign keys, and indexes."
),
},
]
SYSTEM_PROMPT = (
"You are a database selection analyst for an e-commerce platform. "
"Use the available tools when you need data. "
"Be concise — each response should be a compact table, code block, "
"or 35 clear sentences."
)
# ---------------------------------------------------------------------------
# Tool definitions
# ---------------------------------------------------------------------------
TOOLS = [
{
"type": "function",
"function": {
"name": "get_db_benchmarks",
"description": (
"Fetch performance benchmark data for a database. "
"Returns read/write throughput, latency, and scaling characteristics."
),
"parameters": {
"type": "object",
"properties": {
"database": {
"type": "string",
"enum": ["postgresql", "mongodb"],
},
"workload": {
"type": "string",
"enum": ["read_heavy", "write_heavy", "mixed"],
},
},
"required": ["database", "workload"],
},
},
},
{
"type": "function",
"function": {
"name": "get_case_studies",
"description": "Retrieve e-commerce case studies for a database.",
"parameters": {
"type": "object",
"properties": {
"database": {"type": "string", "enum": ["postgresql", "mongodb"]},
},
"required": ["database"],
},
},
},
{
"type": "function",
"function": {
"name": "check_feature_support",
"description": (
"Check whether a database supports a specific feature "
"(e.g. ACID transactions, horizontal sharding, JSON documents)."
),
"parameters": {
"type": "object",
"properties": {
"database": {"type": "string", "enum": ["postgresql", "mongodb"]},
"feature": {"type": "string"},
},
"required": ["database", "feature"],
},
},
},
]
# ---------------------------------------------------------------------------
# Tool implementations (simulated — no external calls)
# ---------------------------------------------------------------------------
_BENCHMARKS = {
("postgresql", "read_heavy"): {
"read_qps": 55_000,
"write_qps": 18_000,
"p99_ms": 4,
"notes": "Excellent for complex joins; connection pooling via pgBouncer recommended",
},
("postgresql", "write_heavy"): {
"read_qps": 30_000,
"write_qps": 24_000,
"p99_ms": 8,
"notes": "WAL overhead increases at very high write volume; partitioning helps",
},
("postgresql", "mixed"): {
"read_qps": 42_000,
"write_qps": 21_000,
"p99_ms": 6,
"notes": "Solid all-round; MVCC keeps reads non-blocking",
},
("mongodb", "read_heavy"): {
"read_qps": 85_000,
"write_qps": 30_000,
"p99_ms": 2,
"notes": "Atlas Search built-in; sharding distributes read load well",
},
("mongodb", "write_heavy"): {
"read_qps": 40_000,
"write_qps": 65_000,
"p99_ms": 3,
"notes": "WiredTiger compression reduces I/O; journal writes are async-safe",
},
("mongodb", "mixed"): {
"read_qps": 60_000,
"write_qps": 50_000,
"p99_ms": 3,
"notes": "Flexible schema accelerates feature iteration",
},
}
_CASE_STUDIES = {
"postgresql": [
{
"company": "Shopify",
"scale": "100 B+ req/day",
"notes": "Moved critical order tables back to Postgres for ACID guarantees",
},
{
"company": "Zalando",
"scale": "50 M customers",
"notes": "Uses Postgres + Citus for sharded order processing",
},
{
"company": "Instacart",
"scale": "10 M orders/mo",
"notes": "Postgres for inventory; strict consistency required for stock levels",
},
],
"mongodb": [
{
"company": "eBay",
"scale": "1.5 B listings",
"notes": "Product catalogue in MongoDB for flexible attribute schemas",
},
{
"company": "Alibaba",
"scale": "billions of docs",
"notes": "Session and cart data in MongoDB; high write throughput",
},
{
"company": "Foursquare",
"scale": "10 B+ check-ins",
"notes": "Geospatial queries and flexible location schemas",
},
],
}
_FEATURES = {
("postgresql", "acid transactions"): {
"supported": True,
"notes": "Full ACID with serialisable isolation",
},
("postgresql", "horizontal sharding"): {
"supported": True,
"notes": "Via Citus extension or manual partitioning; not native",
},
("postgresql", "json documents"): {
"supported": True,
"notes": "JSONB with indexing; flexible but slower than native doc store",
},
("postgresql", "full-text search"): {
"supported": True,
"notes": "Built-in tsvector/tsquery; Elasticsearch for advanced use cases",
},
("postgresql", "multi-document transactions"): {
"supported": True,
"notes": "Native cross-table ACID",
},
("mongodb", "acid transactions"): {
"supported": True,
"notes": "Multi-document ACID since v4.0; single-doc always atomic",
},
("mongodb", "horizontal sharding"): {
"supported": True,
"notes": "Native sharding; auto-balancing across shards",
},
("mongodb", "json documents"): {
"supported": True,
"notes": "Native BSON document model; schema-free by default",
},
("mongodb", "full-text search"): {
"supported": True,
"notes": "Atlas Search (Lucene-based) for advanced full-text",
},
("mongodb", "multi-document transactions"): {
"supported": True,
"notes": "Available but adds latency; best avoided on hot paths",
},
}
def _dispatch(name: str, args: dict) -> str:
if name == "get_db_benchmarks":
key = (args["database"].lower(), args["workload"].lower())
return json.dumps(_BENCHMARKS.get(key, {"error": f"no data for {key}"}))
if name == "get_case_studies":
db = args["database"].lower()
return json.dumps(_CASE_STUDIES.get(db, {"error": f"unknown db '{db}'"}))
if name == "check_feature_support":
key = (args["database"].lower(), args["feature"].lower())
for k, v in _FEATURES.items():
if k[0] == key[0] and k[1] in key[1]:
return json.dumps(v)
return json.dumps({"error": f"feature '{args['feature']}' not in dataset"})
return json.dumps({"error": f"unknown tool '{name}'"})
# ---------------------------------------------------------------------------
# Task runner — one independent conversation per task
# ---------------------------------------------------------------------------
async def run_task(
client: AsyncOpenAI,
task_name: str,
prompt: str,
session_id: str | None,
) -> tuple[str, str]:
"""
Run a single research task with its own tool-calling loop.
Each task is an independent conversation so the router sees only
this task's intent — not the accumulated context of previous tasks.
Model affinity via X-Model-Affinity pins the model from the first task
onward, so all tasks stay on the same model.
Returns (answer, first_model_used).
"""
headers = {"X-Model-Affinity": session_id} if session_id else {}
messages: list[ChatCompletionMessageParam] = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": prompt},
]
first_model: str | None = None
while True:
resp = await client.chat.completions.create(
model="gpt-4o-mini", # Plano's router overrides this via routing_preferences
messages=messages,
tools=TOOLS,
tool_choice="auto",
max_completion_tokens=600,
extra_headers=headers or None,
)
if first_model is None:
first_model = resp.model
log.info(
"task=%s model=%s finish=%s",
task_name,
resp.model,
resp.choices[0].finish_reason,
)
choice = resp.choices[0]
if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
messages.append(choice.message)
for tc in choice.message.tool_calls:
args = json.loads(tc.function.arguments or "{}")
result = _dispatch(tc.function.name, args)
log.info(" tool %s(%s)", tc.function.name, args)
messages.append(
{"role": "tool", "content": result, "tool_call_id": tc.id}
)
else:
return (choice.message.content or "").strip(), first_model or "unknown"
# ---------------------------------------------------------------------------
# Research loop — runs all tasks, threading context forward
# ---------------------------------------------------------------------------
async def run_research_loop(
client: AsyncOpenAI,
session_id: str | None,
) -> tuple[str, list[dict]]:
"""
Run all 3 research tasks in sequence, passing each task's output as
context to the next. Returns (final_answer, routing_trace).
"""
context = ""
trace: list[dict] = []
final_answer = ""
for task in TASKS:
prompt = task["prompt"].format(context=context)
answer, model = await run_task(client, task["name"], prompt, session_id)
trace.append({"task": task["name"], "model": model})
context += f"\n### {task['name']}\n{answer}\n"
final_answer = answer
return final_answer, trace
# ---------------------------------------------------------------------------
# FastAPI app
# ---------------------------------------------------------------------------
app = FastAPI(title="Research Agent", version="1.0.0")
@app.post("/v1/chat/completions")
async def chat(request: Request) -> JSONResponse:
body = await request.json()
session_id: str | None = request.headers.get("x-model-affinity")
log.info("request session_id=%s", session_id or "none")
client = AsyncOpenAI(base_url=f"{PLANO_URL}/v1", api_key="EMPTY")
answer, trace = await run_research_loop(client, session_id)
return JSONResponse(
{
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": answer},
"finish_reason": "stop",
}
],
"routing_trace": trace,
"session_id": session_id,
}
)
@app.get("/health")
async def health() -> dict:
return {"status": "ok", "plano_url": PLANO_URL}
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
if __name__ == "__main__":
log.info("starting on port %d plano=%s", PORT, PLANO_URL)
uvicorn.run(app, host="0.0.0.0", port=PORT, log_level="warning")

View file

@ -0,0 +1,27 @@
version: v0.3.0
listeners:
- type: model
name: model_listener
port: 12000
model_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
routing_preferences:
- name: complex_reasoning
description: complex reasoning tasks, multi-step analysis, or detailed explanations
- model: anthropic/claude-sonnet-4-20250514
access_key: $ANTHROPIC_API_KEY
routing_preferences:
- name: code_generation
description: generating new code, writing functions, or creating boilerplate
tracing:
random_sampling: 100

View file

@ -0,0 +1,307 @@
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = ["openai>=1.0.0"]
# ///
"""
Model Affinity Demo Agentic Tool-Calling Loop
Runs the same agentic loop twice through Plano:
1. Without model affinity the router may pick different models per turn
2. With model affinity all turns use the model selected on turn 1
Each loop is a real tool-calling agent: the LLM decides which tools to call,
we provide simulated results, and the LLM continues until it has enough
information to produce a final answer. Each turn is a separate request to
Plano, so the router classifies intent independently every time.
Usage:
planoai up config.yaml # start Plano
uv run demo.py # run this demo
"""
import asyncio
import json
import os
import uuid
from openai import AsyncOpenAI
from openai.types.chat import ChatCompletionMessageParam
PLANO_URL = os.environ.get("PLANO_URL", "http://localhost:12000")
SYSTEM_PROMPT = (
"You are a database selection analyst. Use the provided tools to gather "
"benchmark data and case studies, then recommend PostgreSQL or MongoDB "
"for a high-traffic e-commerce backend. Be concise."
)
USER_QUERY = (
"Should we use PostgreSQL or MongoDB for our e-commerce platform? "
"We need strong consistency for orders but flexible schemas for products. "
"Use the tools to research both options, then give a recommendation."
)
TOOLS = [
{
"type": "function",
"function": {
"name": "get_db_benchmarks",
"description": "Fetch performance benchmarks for a database under a given workload.",
"parameters": {
"type": "object",
"properties": {
"database": {
"type": "string",
"enum": ["postgresql", "mongodb"],
},
"workload": {
"type": "string",
"enum": ["read_heavy", "write_heavy", "mixed"],
},
},
"required": ["database", "workload"],
},
},
},
{
"type": "function",
"function": {
"name": "get_case_studies",
"description": "Retrieve real-world e-commerce case studies for a database.",
"parameters": {
"type": "object",
"properties": {
"database": {
"type": "string",
"enum": ["postgresql", "mongodb"],
},
},
"required": ["database"],
},
},
},
{
"type": "function",
"function": {
"name": "check_feature_support",
"description": "Check if a database supports a specific feature.",
"parameters": {
"type": "object",
"properties": {
"database": {
"type": "string",
"enum": ["postgresql", "mongodb"],
},
"feature": {"type": "string"},
},
"required": ["database", "feature"],
},
},
},
]
# Simulated tool responses
_BENCHMARKS = {
("postgresql", "mixed"): {
"read_qps": 42000,
"write_qps": 21000,
"p99_ms": 6,
"notes": "Solid all-round; MVCC keeps reads non-blocking",
},
("mongodb", "mixed"): {
"read_qps": 60000,
"write_qps": 50000,
"p99_ms": 3,
"notes": "Flexible schema accelerates feature iteration",
},
}
_CASE_STUDIES = {
"postgresql": [
{"company": "Shopify", "notes": "Moved orders back to Postgres for ACID"},
{
"company": "Zalando",
"notes": "Postgres + Citus for sharded order processing",
},
],
"mongodb": [
{"company": "eBay", "notes": "Product catalogue — flexible attribute schemas"},
{"company": "Alibaba", "notes": "Session/cart data — high write throughput"},
],
}
_FEATURES = {
("postgresql", "acid transactions"): {"supported": True, "notes": "Full ACID"},
("mongodb", "acid transactions"): {
"supported": True,
"notes": "Multi-doc ACID since v4.0",
},
("postgresql", "horizontal sharding"): {
"supported": True,
"notes": "Via Citus extension",
},
("mongodb", "horizontal sharding"): {
"supported": True,
"notes": "Native auto-balancing",
},
}
def dispatch_tool(name: str, args: dict) -> str:
if name == "get_db_benchmarks":
key = (args["database"], args["workload"])
return json.dumps(_BENCHMARKS.get(key, {"error": f"no data for {key}"}))
if name == "get_case_studies":
return json.dumps(_CASE_STUDIES.get(args["database"], {"error": "unknown db"}))
if name == "check_feature_support":
key = (args["database"], args["feature"].lower())
for k, v in _FEATURES.items():
if k[0] == key[0] and k[1] in key[1]:
return json.dumps(v)
return json.dumps({"error": f"no data for {key}"})
return json.dumps({"error": f"unknown tool {name}"})
# ---------------------------------------------------------------------------
# Agentic loop — runs tool calls until the LLM produces a final answer
# ---------------------------------------------------------------------------
async def run_agent_loop(
affinity_id: str | None = None,
max_turns: int = 10,
) -> tuple[str, list[dict]]:
"""
Run a tool-calling agent loop against Plano.
Returns (final_answer, trace) where trace is a list of
{"turn": int, "model": str, "tool_calls": [...]} dicts.
"""
client = AsyncOpenAI(base_url=f"{PLANO_URL}/v1", api_key="EMPTY")
headers = {"X-Model-Affinity": affinity_id} if affinity_id else None
messages: list[ChatCompletionMessageParam] = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_QUERY},
]
trace: list[dict] = []
for turn in range(1, max_turns + 1):
resp = await client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=TOOLS,
tool_choice="auto",
max_completion_tokens=800,
extra_headers=headers,
)
choice = resp.choices[0]
turn_info: dict = {"turn": turn, "model": resp.model}
if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
tool_names = [tc.function.name for tc in choice.message.tool_calls]
turn_info["tool_calls"] = tool_names
trace.append(turn_info)
messages.append(choice.message)
for tc in choice.message.tool_calls:
args = json.loads(tc.function.arguments or "{}")
result = dispatch_tool(tc.function.name, args)
messages.append(
{"role": "tool", "content": result, "tool_call_id": tc.id}
)
else:
turn_info["tool_calls"] = []
trace.append(turn_info)
return (choice.message.content or "").strip(), trace
return "(max turns reached)", trace
# ---------------------------------------------------------------------------
# Display helpers
# ---------------------------------------------------------------------------
def short_model(model: str) -> str:
return model.split("/")[-1] if "/" in model else model
def print_trace(trace: list[dict]) -> None:
for t in trace:
model = short_model(t["model"])
tools = ", ".join(t["tool_calls"]) if t["tool_calls"] else "final answer"
print(f" turn {t['turn']} [{model:<30}] {tools}")
def print_summary(label: str, trace: list[dict]) -> None:
models = [t["model"] for t in trace]
unique = set(models)
if len(unique) == 1:
print(
f"{label}: {short_model(next(iter(unique)))} "
f"for all {len(models)} turns"
)
else:
switches = sum(1 for a, b in zip(models, models[1:]) if a != b)
names = ", ".join(sorted(short_model(m) for m in unique))
print(f"{label}: model switched {switches} time(s) — {names}")
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
async def main() -> None:
print()
print(" ╔══════════════════════════════════════════════════════════╗")
print(" ║ Model Affinity Demo — Agentic Loop ║")
print(" ╚══════════════════════════════════════════════════════════╝")
print()
print(f" Plano : {PLANO_URL}")
print(f' Query : "{USER_QUERY[:65]}"')
print()
print(" The agent calls tools (get_db_benchmarks, get_case_studies,")
print(" check_feature_support) across multiple turns. Each turn is")
print(" a separate request to Plano — the router classifies intent")
print(" independently, so different turns may get different models.")
print()
# --- Run 1: without affinity ---
print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
print(" Run 1: WITHOUT Model Affinity")
print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
print()
answer1, trace1 = await run_agent_loop(affinity_id=None)
print_trace(trace1)
print()
print_summary("Without affinity", trace1)
print()
# --- Run 2: with affinity ---
aid = str(uuid.uuid4())
print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
print(f" Run 2: WITH Model Affinity (X-Model-Affinity: {aid[:8]}…)")
print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
print()
answer2, trace2 = await run_agent_loop(affinity_id=aid)
print_trace(trace2)
print()
print_summary("With affinity ", trace2)
print()
# --- Final answer ---
print(" ══ Agent recommendation (affinity session) ════════════════")
print()
for line in answer2.splitlines():
print(f" {line}")
print()
print(" ═══════════════════════════════════════════════════════════")
print()
if __name__ == "__main__":
asyncio.run(main())

View file

@ -0,0 +1,7 @@
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# Run the demo directly against Plano (no agent server needed)
uv run "$SCRIPT_DIR/demo.py"

View file

@ -0,0 +1,28 @@
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PIDS=()
log() { echo "$(date '+%F %T') - $*"; }
cleanup() {
log "Stopping agents..."
for PID in "${PIDS[@]}"; do
kill "$PID" 2>/dev/null && log "Stopped process $PID"
done
exit 0
}
trap cleanup EXIT INT TERM
export PLANO_URL="${PLANO_URL:-http://localhost:12000}"
export AGENT_PORT="${AGENT_PORT:-8000}"
log "Starting research_agent on port $AGENT_PORT..."
uv run "$SCRIPT_DIR/agent.py" &
PIDS+=($!)
for PID in "${PIDS[@]}"; do
wait "$PID"
done