diff --git a/demos/llm_routing/model_affinity/README.md b/demos/llm_routing/model_affinity/README.md
deleted file mode 100644
index 1a1524e9..00000000
--- a/demos/llm_routing/model_affinity/README.md
+++ /dev/null
@@ -1,135 +0,0 @@
-# Model Affinity Demo
-
-> Consistent model selection for agentic loops using `X-Model-Affinity`.
-
-## Why Model Affinity?
-
-When an agent runs in a loop — calling tools, reasoning about results, calling more tools — each LLM request hits Plano's router independently. Because prompts vary in intent (tool selection looks like code generation, reasoning about results looks like complex analysis), the router may select **different models** for each turn, fragmenting context mid-session.
-
-**Model affinity** solves this: send an `X-Model-Affinity` header and the first request runs routing as usual, caching the decision. Every subsequent request with the same affinity ID returns the **same model**, without re-running the router.
-
-```
-Without affinity                         With affinity (X-Model-Affinity)
-────────────────                         ───────────────────────────────
-Turn 1 → claude-sonnet  (tool calls)     Turn 1 → claude-sonnet  ← routed
-Turn 2 → gpt-4o         (reasoning)      Turn 2 → claude-sonnet  ← pinned ✓
-Turn 3 → claude-sonnet  (tool calls)     Turn 3 → claude-sonnet  ← pinned ✓
-Turn 4 → gpt-4o         (reasoning)      Turn 4 → claude-sonnet  ← pinned ✓
-Turn 5 → claude-sonnet  (final answer)   Turn 5 → claude-sonnet  ← pinned ✓
-       ↑ model switches every turn                ↑ one model, start to finish
-```
-
----
-
-## Quick Start
-
-```bash
-# 1. Set API keys
-export OPENAI_API_KEY=<your-key>
-export ANTHROPIC_API_KEY=<your-key>
-
-# 2. Start Plano
-cd demos/llm_routing/model_affinity
-planoai up config.yaml
-
-# 3. Run the demo (uv manages dependencies automatically)
-./demo.sh          # or: uv run demo.py
-```
-
----
-
-## What the Demo Does
-
-A **database selection agent** investigates whether to use PostgreSQL or MongoDB
-for an e-commerce platform. It runs a real tool-calling loop: the LLM decides
-which tools to call, receives simulated results, and continues until it has
-enough data to recommend a database.
-
-Available tools:
-- `get_db_benchmarks` — fetch performance data for a workload type
-- `get_case_studies` — retrieve real-world e-commerce case studies
-- `check_feature_support` — check if a database supports a specific feature
-
-The demo runs the **same agent loop twice**:
-
-1. **Without affinity** — no `X-Model-Affinity`; models may switch between turns
-2. **With affinity** — `X-Model-Affinity` header included; model is pinned from turn 1
-
-Each turn is a separate `POST /v1/chat/completions` request to Plano using the
-[OpenAI SDK](https://github.com/openai/openai-python). The demo prints the
-model used on each turn so you can see the difference.
-
-### Expected Output
-
-```
-  Run 1: WITHOUT Model Affinity
-  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-    turn 1  [claude-sonnet-4-20250514     ]  get_db_benchmarks, get_db_benchmarks
-    turn 2  [gpt-4o                       ]  get_case_studies, get_case_studies     ← switched
-    turn 3  [claude-sonnet-4-20250514     ]  check_feature_support                 ← switched
-    turn 4  [gpt-4o                       ]  final answer                          ← switched
-
-  ✗  Without affinity: model switched 3 time(s)
-
-
-  Run 2: WITH Model Affinity  (X-Model-Affinity: a1b2c3d4…)
-  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-    turn 1  [claude-sonnet-4-20250514     ]  get_db_benchmarks, get_db_benchmarks
-    turn 2  [claude-sonnet-4-20250514     ]  get_case_studies, get_case_studies
-    turn 3  [claude-sonnet-4-20250514     ]  check_feature_support
-    turn 4  [claude-sonnet-4-20250514     ]  final answer
-
-  ✓  With affinity: claude-sonnet-4-20250514 for all 4 turns
-```
-
-### How It Works
-
-Model affinity is implemented in brightstaff. When `X-Model-Affinity` is present:
-
-1. **First request** — routing runs normally, result is cached keyed by the affinity ID
-2. **Subsequent requests** — cache hit skips routing and returns the cached model instantly
-
-The `X-Model-Affinity` header is forwarded transparently; no changes to your OpenAI
-SDK calls beyond adding the header.
-
-```python
-from openai import OpenAI
-import uuid
-
-client = OpenAI(base_url="http://localhost:12000/v1", api_key="EMPTY")
-
-affinity_id = str(uuid.uuid4())
-
-response = client.chat.completions.create(
-    model="gpt-4o-mini",
-    messages=[{"role": "user", "content": prompt}],
-    extra_headers={"X-Model-Affinity": affinity_id},
-)
-```
-
----
-
-## Configuration
-
-Model affinity is configurable in `config.yaml`:
-
-```yaml
-routing:
-  session_ttl_seconds: 600      # How long affinity lasts (default: 10 min)
-  session_max_entries: 10000    # Max cached sessions (upper limit: 10000)
-```
-
-Without the `X-Model-Affinity` header, routing runs fresh every time — no breaking
-change to existing clients.
-
----
-
-## Advanced: Agent Server Demo
-
-The `agent.py` file is a FastAPI-based agent server that demonstrates a more
-complex pattern: an external agent service that forwards `X-Model-Affinity`
-on all outbound calls to Plano. Use `start_agents.sh` to run it.
-
-## See Also
-
-- [Model Routing Service Demo](../model_routing_service/) — curl-based examples of the routing endpoint
diff --git a/demos/llm_routing/model_affinity/agent.py b/demos/llm_routing/model_affinity/agent.py
deleted file mode 100644
index b51bd28a..00000000
--- a/demos/llm_routing/model_affinity/agent.py
+++ /dev/null
@@ -1,429 +0,0 @@
-#!/usr/bin/env -S uv run --script
-# /// script
-# requires-python = ">=3.12"
-# dependencies = ["fastapi>=0.115", "uvicorn>=0.30", "openai>=1.0.0"]
-# ///
-"""
-Research Agent — FastAPI service exposing /v1/chat/completions.
-
-For each incoming request the agent runs 3 independent research tasks,
-each with its own tool-calling loop. The tasks deliberately alternate between
-code_generation and complex_reasoning intents so Plano's preference-based
-router selects different models for each task.
-
-If the client sends X-Model-Affinity, the agent forwards it on every outbound
-call to Plano. The first task pins the model; all subsequent tasks skip the
-router and reuse it — keeping the whole session on one consistent model.
-
-Run standalone:
-    uv run agent.py
-    PLANO_URL=http://myhost:12000 AGENT_PORT=8000 uv run agent.py
-"""
-
-import json
-import logging
-import os
-import uuid
-
-import uvicorn
-from fastapi import FastAPI, Request
-from fastapi.responses import JSONResponse
-from openai import AsyncOpenAI
-from openai.types.chat import ChatCompletionMessageParam
-
-logging.basicConfig(
-    level=logging.INFO,
-    format="%(asctime)s [AGENT] %(levelname)s %(message)s",
-)
-log = logging.getLogger(__name__)
-
-PLANO_URL = os.environ.get("PLANO_URL", "http://localhost:12000")
-PORT = int(os.environ.get("AGENT_PORT", "8000"))
-
-# ---------------------------------------------------------------------------
-# Tasks — each has its own conversation so Plano routes each independently.
-# Intent alternates: code_generation → complex_reasoning → code_generation.
-# ---------------------------------------------------------------------------
-
-TASKS = [
-    {
-        "name": "generate_comparison",
-        # Triggers code_generation routing preference (write/generate output)
-        "prompt": (
-            "Use the tools to fetch benchmark data for PostgreSQL and MongoDB "
-            "under a mixed workload. Then generate a compact Markdown comparison "
-            "table with columns: metric, PostgreSQL, MongoDB. Cover read QPS, "
-            "write QPS, p99 latency ms, ACID support, and horizontal scaling."
-        ),
-    },
-    {
-        "name": "analyse_tradeoffs",
-        # Triggers complex_reasoning routing preference (analyse/reason/evaluate)
-        "prompt": (
-            "Context from prior research:\n{context}\n\n"
-            "Perform a deep analysis: for a high-traffic e-commerce platform that "
-            "requires ACID guarantees for order processing but flexible schemas for "
-            "product attributes, carefully reason through and evaluate the long-term "
-            "architectural trade-offs of each database. Consider consistency "
-            "guarantees, operational complexity, and scalability risks."
-        ),
-    },
-    {
-        "name": "write_schema",
-        # Triggers code_generation routing preference (write SQL / generate code)
-        "prompt": (
-            "Context from prior research:\n{context}\n\n"
-            "Write the CREATE TABLE SQL schema for the database you would recommend "
-            "from the analysis above. Include: orders, order_items, products, and "
-            "users tables with appropriate primary keys, foreign keys, and indexes."
-        ),
-    },
-]
-
-SYSTEM_PROMPT = (
-    "You are a database selection analyst for an e-commerce platform. "
-    "Use the available tools when you need data. "
-    "Be concise — each response should be a compact table, code block, "
-    "or 3–5 clear sentences."
-)
-
-# ---------------------------------------------------------------------------
-# Tool definitions
-# ---------------------------------------------------------------------------
-
-TOOLS = [
-    {
-        "type": "function",
-        "function": {
-            "name": "get_db_benchmarks",
-            "description": (
-                "Fetch performance benchmark data for a database. "
-                "Returns read/write throughput, latency, and scaling characteristics."
-            ),
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "database": {
-                        "type": "string",
-                        "enum": ["postgresql", "mongodb"],
-                    },
-                    "workload": {
-                        "type": "string",
-                        "enum": ["read_heavy", "write_heavy", "mixed"],
-                    },
-                },
-                "required": ["database", "workload"],
-            },
-        },
-    },
-    {
-        "type": "function",
-        "function": {
-            "name": "get_case_studies",
-            "description": "Retrieve e-commerce case studies for a database.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "database": {"type": "string", "enum": ["postgresql", "mongodb"]},
-                },
-                "required": ["database"],
-            },
-        },
-    },
-    {
-        "type": "function",
-        "function": {
-            "name": "check_feature_support",
-            "description": (
-                "Check whether a database supports a specific feature "
-                "(e.g. ACID transactions, horizontal sharding, JSON documents)."
-            ),
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "database": {"type": "string", "enum": ["postgresql", "mongodb"]},
-                    "feature": {"type": "string"},
-                },
-                "required": ["database", "feature"],
-            },
-        },
-    },
-]
-
-# ---------------------------------------------------------------------------
-# Tool implementations (simulated — no external calls)
-# ---------------------------------------------------------------------------
-
-_BENCHMARKS = {
-    ("postgresql", "read_heavy"): {
-        "read_qps": 55_000,
-        "write_qps": 18_000,
-        "p99_ms": 4,
-        "notes": "Excellent for complex joins; connection pooling via pgBouncer recommended",
-    },
-    ("postgresql", "write_heavy"): {
-        "read_qps": 30_000,
-        "write_qps": 24_000,
-        "p99_ms": 8,
-        "notes": "WAL overhead increases at very high write volume; partitioning helps",
-    },
-    ("postgresql", "mixed"): {
-        "read_qps": 42_000,
-        "write_qps": 21_000,
-        "p99_ms": 6,
-        "notes": "Solid all-round; MVCC keeps reads non-blocking",
-    },
-    ("mongodb", "read_heavy"): {
-        "read_qps": 85_000,
-        "write_qps": 30_000,
-        "p99_ms": 2,
-        "notes": "Atlas Search built-in; sharding distributes read load well",
-    },
-    ("mongodb", "write_heavy"): {
-        "read_qps": 40_000,
-        "write_qps": 65_000,
-        "p99_ms": 3,
-        "notes": "WiredTiger compression reduces I/O; journal writes are async-safe",
-    },
-    ("mongodb", "mixed"): {
-        "read_qps": 60_000,
-        "write_qps": 50_000,
-        "p99_ms": 3,
-        "notes": "Flexible schema accelerates feature iteration",
-    },
-}
-
-_CASE_STUDIES = {
-    "postgresql": [
-        {
-            "company": "Shopify",
-            "scale": "100 B+ req/day",
-            "notes": "Moved critical order tables back to Postgres for ACID guarantees",
-        },
-        {
-            "company": "Zalando",
-            "scale": "50 M customers",
-            "notes": "Uses Postgres + Citus for sharded order processing",
-        },
-        {
-            "company": "Instacart",
-            "scale": "10 M orders/mo",
-            "notes": "Postgres for inventory; strict consistency required for stock levels",
-        },
-    ],
-    "mongodb": [
-        {
-            "company": "eBay",
-            "scale": "1.5 B listings",
-            "notes": "Product catalogue in MongoDB for flexible attribute schemas",
-        },
-        {
-            "company": "Alibaba",
-            "scale": "billions of docs",
-            "notes": "Session and cart data in MongoDB; high write throughput",
-        },
-        {
-            "company": "Foursquare",
-            "scale": "10 B+ check-ins",
-            "notes": "Geospatial queries and flexible location schemas",
-        },
-    ],
-}
-
-_FEATURES = {
-    ("postgresql", "acid transactions"): {
-        "supported": True,
-        "notes": "Full ACID with serialisable isolation",
-    },
-    ("postgresql", "horizontal sharding"): {
-        "supported": True,
-        "notes": "Via Citus extension or manual partitioning; not native",
-    },
-    ("postgresql", "json documents"): {
-        "supported": True,
-        "notes": "JSONB with indexing; flexible but slower than native doc store",
-    },
-    ("postgresql", "full-text search"): {
-        "supported": True,
-        "notes": "Built-in tsvector/tsquery; Elasticsearch for advanced use cases",
-    },
-    ("postgresql", "multi-document transactions"): {
-        "supported": True,
-        "notes": "Native cross-table ACID",
-    },
-    ("mongodb", "acid transactions"): {
-        "supported": True,
-        "notes": "Multi-document ACID since v4.0; single-doc always atomic",
-    },
-    ("mongodb", "horizontal sharding"): {
-        "supported": True,
-        "notes": "Native sharding; auto-balancing across shards",
-    },
-    ("mongodb", "json documents"): {
-        "supported": True,
-        "notes": "Native BSON document model; schema-free by default",
-    },
-    ("mongodb", "full-text search"): {
-        "supported": True,
-        "notes": "Atlas Search (Lucene-based) for advanced full-text",
-    },
-    ("mongodb", "multi-document transactions"): {
-        "supported": True,
-        "notes": "Available but adds latency; best avoided on hot paths",
-    },
-}
-
-
-def _dispatch(name: str, args: dict) -> str:
-    if name == "get_db_benchmarks":
-        key = (args["database"].lower(), args["workload"].lower())
-        return json.dumps(_BENCHMARKS.get(key, {"error": f"no data for {key}"}))
-
-    if name == "get_case_studies":
-        db = args["database"].lower()
-        return json.dumps(_CASE_STUDIES.get(db, {"error": f"unknown db '{db}'"}))
-
-    if name == "check_feature_support":
-        key = (args["database"].lower(), args["feature"].lower())
-        for k, v in _FEATURES.items():
-            if k[0] == key[0] and k[1] in key[1]:
-                return json.dumps(v)
-        return json.dumps({"error": f"feature '{args['feature']}' not in dataset"})
-
-    return json.dumps({"error": f"unknown tool '{name}'"})
-
-
-# ---------------------------------------------------------------------------
-# Task runner — one independent conversation per task
-# ---------------------------------------------------------------------------
-
-
-async def run_task(
-    client: AsyncOpenAI,
-    task_name: str,
-    prompt: str,
-    session_id: str | None,
-) -> tuple[str, str]:
-    """
-    Run a single research task with its own tool-calling loop.
-
-    Each task is an independent conversation so the router sees only
-    this task's intent — not the accumulated context of previous tasks.
-    Model affinity via X-Model-Affinity pins the model from the first task
-    onward, so all tasks stay on the same model.
-
-    Returns (answer, first_model_used).
-    """
-    headers = {"X-Model-Affinity": session_id} if session_id else {}
-    messages: list[ChatCompletionMessageParam] = [
-        {"role": "system", "content": SYSTEM_PROMPT},
-        {"role": "user", "content": prompt},
-    ]
-    first_model: str | None = None
-
-    while True:
-        resp = await client.chat.completions.create(
-            model="gpt-4o-mini",  # Plano's router overrides this via routing_preferences
-            messages=messages,
-            tools=TOOLS,
-            tool_choice="auto",
-            max_completion_tokens=600,
-            extra_headers=headers or None,
-        )
-        if first_model is None:
-            first_model = resp.model
-
-        log.info(
-            "task=%s  model=%s  finish=%s",
-            task_name,
-            resp.model,
-            resp.choices[0].finish_reason,
-        )
-
-        choice = resp.choices[0]
-        if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
-            messages.append(choice.message)
-            for tc in choice.message.tool_calls:
-                args = json.loads(tc.function.arguments or "{}")
-                result = _dispatch(tc.function.name, args)
-                log.info("  tool %s(%s)", tc.function.name, args)
-                messages.append(
-                    {"role": "tool", "content": result, "tool_call_id": tc.id}
-                )
-        else:
-            return (choice.message.content or "").strip(), first_model or "unknown"
-
-
-# ---------------------------------------------------------------------------
-# Research loop — runs all tasks, threading context forward
-# ---------------------------------------------------------------------------
-
-
-async def run_research_loop(
-    client: AsyncOpenAI,
-    session_id: str | None,
-) -> tuple[str, list[dict]]:
-    """
-    Run all 3 research tasks in sequence, passing each task's output as
-    context to the next. Returns (final_answer, routing_trace).
-    """
-    context = ""
-    trace: list[dict] = []
-    final_answer = ""
-
-    for task in TASKS:
-        prompt = task["prompt"].format(context=context)
-        answer, model = await run_task(client, task["name"], prompt, session_id)
-        trace.append({"task": task["name"], "model": model})
-        context += f"\n### {task['name']}\n{answer}\n"
-        final_answer = answer
-
-    return final_answer, trace
-
-
-# ---------------------------------------------------------------------------
-# FastAPI app
-# ---------------------------------------------------------------------------
-
-app = FastAPI(title="Research Agent", version="1.0.0")
-
-
-@app.post("/v1/chat/completions")
-async def chat(request: Request) -> JSONResponse:
-    body = await request.json()
-    session_id: str | None = request.headers.get("x-model-affinity")
-
-    log.info("request  session_id=%s", session_id or "none")
-
-    client = AsyncOpenAI(base_url=f"{PLANO_URL}/v1", api_key="EMPTY")
-    answer, trace = await run_research_loop(client, session_id)
-
-    return JSONResponse(
-        {
-            "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
-            "object": "chat.completion",
-            "choices": [
-                {
-                    "index": 0,
-                    "message": {"role": "assistant", "content": answer},
-                    "finish_reason": "stop",
-                }
-            ],
-            "routing_trace": trace,
-            "session_id": session_id,
-        }
-    )
-
-
-@app.get("/health")
-async def health() -> dict:
-    return {"status": "ok", "plano_url": PLANO_URL}
-
-
-# ---------------------------------------------------------------------------
-# Entry point
-# ---------------------------------------------------------------------------
-
-if __name__ == "__main__":
-    log.info("starting on port %d  plano=%s", PORT, PLANO_URL)
-    uvicorn.run(app, host="0.0.0.0", port=PORT, log_level="warning")
diff --git a/demos/llm_routing/model_affinity/config.yaml b/demos/llm_routing/model_affinity/config.yaml
deleted file mode 100644
index 7b98b25b..00000000
--- a/demos/llm_routing/model_affinity/config.yaml
+++ /dev/null
@@ -1,27 +0,0 @@
-version: v0.3.0
-
-listeners:
-  - type: model
-    name: model_listener
-    port: 12000
-
-model_providers:
-
-  - model: openai/gpt-4o-mini
-    access_key: $OPENAI_API_KEY
-    default: true
-
-  - model: openai/gpt-4o
-    access_key: $OPENAI_API_KEY
-    routing_preferences:
-      - name: complex_reasoning
-        description: complex reasoning tasks, multi-step analysis, or detailed explanations
-
-  - model: anthropic/claude-sonnet-4-20250514
-    access_key: $ANTHROPIC_API_KEY
-    routing_preferences:
-      - name: code_generation
-        description: generating new code, writing functions, or creating boilerplate
-
-tracing:
-  random_sampling: 100
diff --git a/demos/llm_routing/model_affinity/demo.py b/demos/llm_routing/model_affinity/demo.py
deleted file mode 100644
index f01a5a31..00000000
--- a/demos/llm_routing/model_affinity/demo.py
+++ /dev/null
@@ -1,307 +0,0 @@
-#!/usr/bin/env -S uv run --script
-# /// script
-# requires-python = ">=3.12"
-# dependencies = ["openai>=1.0.0"]
-# ///
-"""
-Model Affinity Demo — Agentic Tool-Calling Loop
-
-Runs the same agentic loop twice through Plano:
-  1. Without model affinity — the router may pick different models per turn
-  2. With model affinity  — all turns use the model selected on turn 1
-
-Each loop is a real tool-calling agent: the LLM decides which tools to call,
-we provide simulated results, and the LLM continues until it has enough
-information to produce a final answer. Each turn is a separate request to
-Plano, so the router classifies intent independently every time.
-
-Usage:
-    planoai up config.yaml          # start Plano
-    uv run demo.py                  # run this demo
-"""
-
-import asyncio
-import json
-import os
-import uuid
-
-from openai import AsyncOpenAI
-from openai.types.chat import ChatCompletionMessageParam
-
-PLANO_URL = os.environ.get("PLANO_URL", "http://localhost:12000")
-
-SYSTEM_PROMPT = (
-    "You are a database selection analyst. Use the provided tools to gather "
-    "benchmark data and case studies, then recommend PostgreSQL or MongoDB "
-    "for a high-traffic e-commerce backend. Be concise."
-)
-
-USER_QUERY = (
-    "Should we use PostgreSQL or MongoDB for our e-commerce platform? "
-    "We need strong consistency for orders but flexible schemas for products. "
-    "Use the tools to research both options, then give a recommendation."
-)
-
-TOOLS = [
-    {
-        "type": "function",
-        "function": {
-            "name": "get_db_benchmarks",
-            "description": "Fetch performance benchmarks for a database under a given workload.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "database": {
-                        "type": "string",
-                        "enum": ["postgresql", "mongodb"],
-                    },
-                    "workload": {
-                        "type": "string",
-                        "enum": ["read_heavy", "write_heavy", "mixed"],
-                    },
-                },
-                "required": ["database", "workload"],
-            },
-        },
-    },
-    {
-        "type": "function",
-        "function": {
-            "name": "get_case_studies",
-            "description": "Retrieve real-world e-commerce case studies for a database.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "database": {
-                        "type": "string",
-                        "enum": ["postgresql", "mongodb"],
-                    },
-                },
-                "required": ["database"],
-            },
-        },
-    },
-    {
-        "type": "function",
-        "function": {
-            "name": "check_feature_support",
-            "description": "Check if a database supports a specific feature.",
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "database": {
-                        "type": "string",
-                        "enum": ["postgresql", "mongodb"],
-                    },
-                    "feature": {"type": "string"},
-                },
-                "required": ["database", "feature"],
-            },
-        },
-    },
-]
-
-# Simulated tool responses
-_BENCHMARKS = {
-    ("postgresql", "mixed"): {
-        "read_qps": 42000,
-        "write_qps": 21000,
-        "p99_ms": 6,
-        "notes": "Solid all-round; MVCC keeps reads non-blocking",
-    },
-    ("mongodb", "mixed"): {
-        "read_qps": 60000,
-        "write_qps": 50000,
-        "p99_ms": 3,
-        "notes": "Flexible schema accelerates feature iteration",
-    },
-}
-
-_CASE_STUDIES = {
-    "postgresql": [
-        {"company": "Shopify", "notes": "Moved orders back to Postgres for ACID"},
-        {
-            "company": "Zalando",
-            "notes": "Postgres + Citus for sharded order processing",
-        },
-    ],
-    "mongodb": [
-        {"company": "eBay", "notes": "Product catalogue — flexible attribute schemas"},
-        {"company": "Alibaba", "notes": "Session/cart data — high write throughput"},
-    ],
-}
-
-_FEATURES = {
-    ("postgresql", "acid transactions"): {"supported": True, "notes": "Full ACID"},
-    ("mongodb", "acid transactions"): {
-        "supported": True,
-        "notes": "Multi-doc ACID since v4.0",
-    },
-    ("postgresql", "horizontal sharding"): {
-        "supported": True,
-        "notes": "Via Citus extension",
-    },
-    ("mongodb", "horizontal sharding"): {
-        "supported": True,
-        "notes": "Native auto-balancing",
-    },
-}
-
-
-def dispatch_tool(name: str, args: dict) -> str:
-    if name == "get_db_benchmarks":
-        key = (args["database"], args["workload"])
-        return json.dumps(_BENCHMARKS.get(key, {"error": f"no data for {key}"}))
-    if name == "get_case_studies":
-        return json.dumps(_CASE_STUDIES.get(args["database"], {"error": "unknown db"}))
-    if name == "check_feature_support":
-        key = (args["database"], args["feature"].lower())
-        for k, v in _FEATURES.items():
-            if k[0] == key[0] and k[1] in key[1]:
-                return json.dumps(v)
-        return json.dumps({"error": f"no data for {key}"})
-    return json.dumps({"error": f"unknown tool {name}"})
-
-
-# ---------------------------------------------------------------------------
-# Agentic loop — runs tool calls until the LLM produces a final answer
-# ---------------------------------------------------------------------------
-
-
-async def run_agent_loop(
-    affinity_id: str | None = None,
-    max_turns: int = 10,
-) -> tuple[str, list[dict]]:
-    """
-    Run a tool-calling agent loop against Plano.
-
-    Returns (final_answer, trace) where trace is a list of
-    {"turn": int, "model": str, "tool_calls": [...]} dicts.
-    """
-    client = AsyncOpenAI(base_url=f"{PLANO_URL}/v1", api_key="EMPTY")
-    headers = {"X-Model-Affinity": affinity_id} if affinity_id else None
-
-    messages: list[ChatCompletionMessageParam] = [
-        {"role": "system", "content": SYSTEM_PROMPT},
-        {"role": "user", "content": USER_QUERY},
-    ]
-    trace: list[dict] = []
-
-    for turn in range(1, max_turns + 1):
-        resp = await client.chat.completions.create(
-            model="gpt-4o-mini",
-            messages=messages,
-            tools=TOOLS,
-            tool_choice="auto",
-            max_completion_tokens=800,
-            extra_headers=headers,
-        )
-
-        choice = resp.choices[0]
-        turn_info: dict = {"turn": turn, "model": resp.model}
-
-        if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
-            tool_names = [tc.function.name for tc in choice.message.tool_calls]
-            turn_info["tool_calls"] = tool_names
-            trace.append(turn_info)
-
-            messages.append(choice.message)
-            for tc in choice.message.tool_calls:
-                args = json.loads(tc.function.arguments or "{}")
-                result = dispatch_tool(tc.function.name, args)
-                messages.append(
-                    {"role": "tool", "content": result, "tool_call_id": tc.id}
-                )
-        else:
-            turn_info["tool_calls"] = []
-            trace.append(turn_info)
-            return (choice.message.content or "").strip(), trace
-
-    return "(max turns reached)", trace
-
-
-# ---------------------------------------------------------------------------
-# Display helpers
-# ---------------------------------------------------------------------------
-
-
-def short_model(model: str) -> str:
-    return model.split("/")[-1] if "/" in model else model
-
-
-def print_trace(trace: list[dict]) -> None:
-    for t in trace:
-        model = short_model(t["model"])
-        tools = ", ".join(t["tool_calls"]) if t["tool_calls"] else "final answer"
-        print(f"    turn {t['turn']}  [{model:<30}]  {tools}")
-
-
-def print_summary(label: str, trace: list[dict]) -> None:
-    models = [t["model"] for t in trace]
-    unique = set(models)
-    if len(unique) == 1:
-        print(
-            f"  ✓  {label}: {short_model(next(iter(unique)))} "
-            f"for all {len(models)} turns"
-        )
-    else:
-        switches = sum(1 for a, b in zip(models, models[1:]) if a != b)
-        names = ", ".join(sorted(short_model(m) for m in unique))
-        print(f"  ✗  {label}: model switched {switches} time(s) — {names}")
-
-
-# ---------------------------------------------------------------------------
-# Main
-# ---------------------------------------------------------------------------
-
-
-async def main() -> None:
-    print()
-    print("  ╔══════════════════════════════════════════════════════════╗")
-    print("  ║          Model Affinity Demo — Agentic Loop             ║")
-    print("  ╚══════════════════════════════════════════════════════════╝")
-    print()
-    print(f"  Plano : {PLANO_URL}")
-    print(f'  Query : "{USER_QUERY[:65]}…"')
-    print()
-    print("  The agent calls tools (get_db_benchmarks, get_case_studies,")
-    print("  check_feature_support) across multiple turns. Each turn is")
-    print("  a separate request to Plano — the router classifies intent")
-    print("  independently, so different turns may get different models.")
-    print()
-
-    # --- Run 1: without affinity ---
-    print("  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
-    print("  Run 1: WITHOUT Model Affinity")
-    print("  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
-    print()
-    answer1, trace1 = await run_agent_loop(affinity_id=None)
-    print_trace(trace1)
-    print()
-    print_summary("Without affinity", trace1)
-    print()
-
-    # --- Run 2: with affinity ---
-    aid = str(uuid.uuid4())
-    print("  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
-    print(f"  Run 2: WITH Model Affinity  (X-Model-Affinity: {aid[:8]}…)")
-    print("  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
-    print()
-    answer2, trace2 = await run_agent_loop(affinity_id=aid)
-    print_trace(trace2)
-    print()
-    print_summary("With affinity   ", trace2)
-    print()
-
-    # --- Final answer ---
-    print("  ══ Agent recommendation (affinity session) ════════════════")
-    print()
-    for line in answer2.splitlines():
-        print(f"  {line}")
-    print()
-    print("  ═══════════════════════════════════════════════════════════")
-    print()
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
diff --git a/demos/llm_routing/model_affinity/demo.sh b/demos/llm_routing/model_affinity/demo.sh
deleted file mode 100755
index 3ce50b3c..00000000
--- a/demos/llm_routing/model_affinity/demo.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/bash
-set -e
-
-SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-
-# Run the demo directly against Plano (no agent server needed)
-uv run "$SCRIPT_DIR/demo.py"
diff --git a/demos/llm_routing/model_affinity/start_agents.sh b/demos/llm_routing/model_affinity/start_agents.sh
deleted file mode 100755
index 5baaa378..00000000
--- a/demos/llm_routing/model_affinity/start_agents.sh
+++ /dev/null
@@ -1,28 +0,0 @@
-#!/bin/bash
-set -e
-
-SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-PIDS=()
-
-log() { echo "$(date '+%F %T') - $*"; }
-
-cleanup() {
-    log "Stopping agents..."
-    for PID in "${PIDS[@]}"; do
-        kill "$PID" 2>/dev/null && log "Stopped process $PID"
-    done
-    exit 0
-}
-
-trap cleanup EXIT INT TERM
-
-export PLANO_URL="${PLANO_URL:-http://localhost:12000}"
-export AGENT_PORT="${AGENT_PORT:-8000}"
-
-log "Starting research_agent on port $AGENT_PORT..."
-uv run "$SCRIPT_DIR/agent.py" &
-PIDS+=($!)
-
-for PID in "${PIDS[@]}"; do
-    wait "$PID"
-done