diff --git a/demos/llm_routing/model_affinity/README.md b/demos/llm_routing/model_affinity/README.md deleted file mode 100644 index 1a1524e9..00000000 --- a/demos/llm_routing/model_affinity/README.md +++ /dev/null @@ -1,135 +0,0 @@ -# Model Affinity Demo - -> Consistent model selection for agentic loops using `X-Model-Affinity`. - -## Why Model Affinity? - -When an agent runs in a loop — calling tools, reasoning about results, calling more tools — each LLM request hits Plano's router independently. Because prompts vary in intent (tool selection looks like code generation, reasoning about results looks like complex analysis), the router may select **different models** for each turn, fragmenting context mid-session. - -**Model affinity** solves this: send an `X-Model-Affinity` header and the first request runs routing as usual, caching the decision. Every subsequent request with the same affinity ID returns the **same model**, without re-running the router. - -``` -Without affinity With affinity (X-Model-Affinity) -──────────────── ─────────────────────────────── -Turn 1 → claude-sonnet (tool calls) Turn 1 → claude-sonnet ← routed -Turn 2 → gpt-4o (reasoning) Turn 2 → claude-sonnet ← pinned ✓ -Turn 3 → claude-sonnet (tool calls) Turn 3 → claude-sonnet ← pinned ✓ -Turn 4 → gpt-4o (reasoning) Turn 4 → claude-sonnet ← pinned ✓ -Turn 5 → claude-sonnet (final answer) Turn 5 → claude-sonnet ← pinned ✓ - ↑ model switches every turn ↑ one model, start to finish -``` - ---- - -## Quick Start - -```bash -# 1. Set API keys -export OPENAI_API_KEY= -export ANTHROPIC_API_KEY= - -# 2. Start Plano -cd demos/llm_routing/model_affinity -planoai up config.yaml - -# 3. Run the demo (uv manages dependencies automatically) -./demo.sh # or: uv run demo.py -``` - ---- - -## What the Demo Does - -A **database selection agent** investigates whether to use PostgreSQL or MongoDB -for an e-commerce platform. It runs a real tool-calling loop: the LLM decides -which tools to call, receives simulated results, and continues until it has -enough data to recommend a database. - -Available tools: -- `get_db_benchmarks` — fetch performance data for a workload type -- `get_case_studies` — retrieve real-world e-commerce case studies -- `check_feature_support` — check if a database supports a specific feature - -The demo runs the **same agent loop twice**: - -1. **Without affinity** — no `X-Model-Affinity`; models may switch between turns -2. **With affinity** — `X-Model-Affinity` header included; model is pinned from turn 1 - -Each turn is a separate `POST /v1/chat/completions` request to Plano using the -[OpenAI SDK](https://github.com/openai/openai-python). The demo prints the -model used on each turn so you can see the difference. - -### Expected Output - -``` - Run 1: WITHOUT Model Affinity - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - turn 1 [claude-sonnet-4-20250514 ] get_db_benchmarks, get_db_benchmarks - turn 2 [gpt-4o ] get_case_studies, get_case_studies ← switched - turn 3 [claude-sonnet-4-20250514 ] check_feature_support ← switched - turn 4 [gpt-4o ] final answer ← switched - - ✗ Without affinity: model switched 3 time(s) - - - Run 2: WITH Model Affinity (X-Model-Affinity: a1b2c3d4…) - ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - turn 1 [claude-sonnet-4-20250514 ] get_db_benchmarks, get_db_benchmarks - turn 2 [claude-sonnet-4-20250514 ] get_case_studies, get_case_studies - turn 3 [claude-sonnet-4-20250514 ] check_feature_support - turn 4 [claude-sonnet-4-20250514 ] final answer - - ✓ With affinity: claude-sonnet-4-20250514 for all 4 turns -``` - -### How It Works - -Model affinity is implemented in brightstaff. When `X-Model-Affinity` is present: - -1. **First request** — routing runs normally, result is cached keyed by the affinity ID -2. **Subsequent requests** — cache hit skips routing and returns the cached model instantly - -The `X-Model-Affinity` header is forwarded transparently; no changes to your OpenAI -SDK calls beyond adding the header. - -```python -from openai import OpenAI -import uuid - -client = OpenAI(base_url="http://localhost:12000/v1", api_key="EMPTY") - -affinity_id = str(uuid.uuid4()) - -response = client.chat.completions.create( - model="gpt-4o-mini", - messages=[{"role": "user", "content": prompt}], - extra_headers={"X-Model-Affinity": affinity_id}, -) -``` - ---- - -## Configuration - -Model affinity is configurable in `config.yaml`: - -```yaml -routing: - session_ttl_seconds: 600 # How long affinity lasts (default: 10 min) - session_max_entries: 10000 # Max cached sessions (upper limit: 10000) -``` - -Without the `X-Model-Affinity` header, routing runs fresh every time — no breaking -change to existing clients. - ---- - -## Advanced: Agent Server Demo - -The `agent.py` file is a FastAPI-based agent server that demonstrates a more -complex pattern: an external agent service that forwards `X-Model-Affinity` -on all outbound calls to Plano. Use `start_agents.sh` to run it. - -## See Also - -- [Model Routing Service Demo](../model_routing_service/) — curl-based examples of the routing endpoint diff --git a/demos/llm_routing/model_affinity/agent.py b/demos/llm_routing/model_affinity/agent.py deleted file mode 100644 index b51bd28a..00000000 --- a/demos/llm_routing/model_affinity/agent.py +++ /dev/null @@ -1,429 +0,0 @@ -#!/usr/bin/env -S uv run --script -# /// script -# requires-python = ">=3.12" -# dependencies = ["fastapi>=0.115", "uvicorn>=0.30", "openai>=1.0.0"] -# /// -""" -Research Agent — FastAPI service exposing /v1/chat/completions. - -For each incoming request the agent runs 3 independent research tasks, -each with its own tool-calling loop. The tasks deliberately alternate between -code_generation and complex_reasoning intents so Plano's preference-based -router selects different models for each task. - -If the client sends X-Model-Affinity, the agent forwards it on every outbound -call to Plano. The first task pins the model; all subsequent tasks skip the -router and reuse it — keeping the whole session on one consistent model. - -Run standalone: - uv run agent.py - PLANO_URL=http://myhost:12000 AGENT_PORT=8000 uv run agent.py -""" - -import json -import logging -import os -import uuid - -import uvicorn -from fastapi import FastAPI, Request -from fastapi.responses import JSONResponse -from openai import AsyncOpenAI -from openai.types.chat import ChatCompletionMessageParam - -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s [AGENT] %(levelname)s %(message)s", -) -log = logging.getLogger(__name__) - -PLANO_URL = os.environ.get("PLANO_URL", "http://localhost:12000") -PORT = int(os.environ.get("AGENT_PORT", "8000")) - -# --------------------------------------------------------------------------- -# Tasks — each has its own conversation so Plano routes each independently. -# Intent alternates: code_generation → complex_reasoning → code_generation. -# --------------------------------------------------------------------------- - -TASKS = [ - { - "name": "generate_comparison", - # Triggers code_generation routing preference (write/generate output) - "prompt": ( - "Use the tools to fetch benchmark data for PostgreSQL and MongoDB " - "under a mixed workload. Then generate a compact Markdown comparison " - "table with columns: metric, PostgreSQL, MongoDB. Cover read QPS, " - "write QPS, p99 latency ms, ACID support, and horizontal scaling." - ), - }, - { - "name": "analyse_tradeoffs", - # Triggers complex_reasoning routing preference (analyse/reason/evaluate) - "prompt": ( - "Context from prior research:\n{context}\n\n" - "Perform a deep analysis: for a high-traffic e-commerce platform that " - "requires ACID guarantees for order processing but flexible schemas for " - "product attributes, carefully reason through and evaluate the long-term " - "architectural trade-offs of each database. Consider consistency " - "guarantees, operational complexity, and scalability risks." - ), - }, - { - "name": "write_schema", - # Triggers code_generation routing preference (write SQL / generate code) - "prompt": ( - "Context from prior research:\n{context}\n\n" - "Write the CREATE TABLE SQL schema for the database you would recommend " - "from the analysis above. Include: orders, order_items, products, and " - "users tables with appropriate primary keys, foreign keys, and indexes." - ), - }, -] - -SYSTEM_PROMPT = ( - "You are a database selection analyst for an e-commerce platform. " - "Use the available tools when you need data. " - "Be concise — each response should be a compact table, code block, " - "or 3–5 clear sentences." -) - -# --------------------------------------------------------------------------- -# Tool definitions -# --------------------------------------------------------------------------- - -TOOLS = [ - { - "type": "function", - "function": { - "name": "get_db_benchmarks", - "description": ( - "Fetch performance benchmark data for a database. " - "Returns read/write throughput, latency, and scaling characteristics." - ), - "parameters": { - "type": "object", - "properties": { - "database": { - "type": "string", - "enum": ["postgresql", "mongodb"], - }, - "workload": { - "type": "string", - "enum": ["read_heavy", "write_heavy", "mixed"], - }, - }, - "required": ["database", "workload"], - }, - }, - }, - { - "type": "function", - "function": { - "name": "get_case_studies", - "description": "Retrieve e-commerce case studies for a database.", - "parameters": { - "type": "object", - "properties": { - "database": {"type": "string", "enum": ["postgresql", "mongodb"]}, - }, - "required": ["database"], - }, - }, - }, - { - "type": "function", - "function": { - "name": "check_feature_support", - "description": ( - "Check whether a database supports a specific feature " - "(e.g. ACID transactions, horizontal sharding, JSON documents)." - ), - "parameters": { - "type": "object", - "properties": { - "database": {"type": "string", "enum": ["postgresql", "mongodb"]}, - "feature": {"type": "string"}, - }, - "required": ["database", "feature"], - }, - }, - }, -] - -# --------------------------------------------------------------------------- -# Tool implementations (simulated — no external calls) -# --------------------------------------------------------------------------- - -_BENCHMARKS = { - ("postgresql", "read_heavy"): { - "read_qps": 55_000, - "write_qps": 18_000, - "p99_ms": 4, - "notes": "Excellent for complex joins; connection pooling via pgBouncer recommended", - }, - ("postgresql", "write_heavy"): { - "read_qps": 30_000, - "write_qps": 24_000, - "p99_ms": 8, - "notes": "WAL overhead increases at very high write volume; partitioning helps", - }, - ("postgresql", "mixed"): { - "read_qps": 42_000, - "write_qps": 21_000, - "p99_ms": 6, - "notes": "Solid all-round; MVCC keeps reads non-blocking", - }, - ("mongodb", "read_heavy"): { - "read_qps": 85_000, - "write_qps": 30_000, - "p99_ms": 2, - "notes": "Atlas Search built-in; sharding distributes read load well", - }, - ("mongodb", "write_heavy"): { - "read_qps": 40_000, - "write_qps": 65_000, - "p99_ms": 3, - "notes": "WiredTiger compression reduces I/O; journal writes are async-safe", - }, - ("mongodb", "mixed"): { - "read_qps": 60_000, - "write_qps": 50_000, - "p99_ms": 3, - "notes": "Flexible schema accelerates feature iteration", - }, -} - -_CASE_STUDIES = { - "postgresql": [ - { - "company": "Shopify", - "scale": "100 B+ req/day", - "notes": "Moved critical order tables back to Postgres for ACID guarantees", - }, - { - "company": "Zalando", - "scale": "50 M customers", - "notes": "Uses Postgres + Citus for sharded order processing", - }, - { - "company": "Instacart", - "scale": "10 M orders/mo", - "notes": "Postgres for inventory; strict consistency required for stock levels", - }, - ], - "mongodb": [ - { - "company": "eBay", - "scale": "1.5 B listings", - "notes": "Product catalogue in MongoDB for flexible attribute schemas", - }, - { - "company": "Alibaba", - "scale": "billions of docs", - "notes": "Session and cart data in MongoDB; high write throughput", - }, - { - "company": "Foursquare", - "scale": "10 B+ check-ins", - "notes": "Geospatial queries and flexible location schemas", - }, - ], -} - -_FEATURES = { - ("postgresql", "acid transactions"): { - "supported": True, - "notes": "Full ACID with serialisable isolation", - }, - ("postgresql", "horizontal sharding"): { - "supported": True, - "notes": "Via Citus extension or manual partitioning; not native", - }, - ("postgresql", "json documents"): { - "supported": True, - "notes": "JSONB with indexing; flexible but slower than native doc store", - }, - ("postgresql", "full-text search"): { - "supported": True, - "notes": "Built-in tsvector/tsquery; Elasticsearch for advanced use cases", - }, - ("postgresql", "multi-document transactions"): { - "supported": True, - "notes": "Native cross-table ACID", - }, - ("mongodb", "acid transactions"): { - "supported": True, - "notes": "Multi-document ACID since v4.0; single-doc always atomic", - }, - ("mongodb", "horizontal sharding"): { - "supported": True, - "notes": "Native sharding; auto-balancing across shards", - }, - ("mongodb", "json documents"): { - "supported": True, - "notes": "Native BSON document model; schema-free by default", - }, - ("mongodb", "full-text search"): { - "supported": True, - "notes": "Atlas Search (Lucene-based) for advanced full-text", - }, - ("mongodb", "multi-document transactions"): { - "supported": True, - "notes": "Available but adds latency; best avoided on hot paths", - }, -} - - -def _dispatch(name: str, args: dict) -> str: - if name == "get_db_benchmarks": - key = (args["database"].lower(), args["workload"].lower()) - return json.dumps(_BENCHMARKS.get(key, {"error": f"no data for {key}"})) - - if name == "get_case_studies": - db = args["database"].lower() - return json.dumps(_CASE_STUDIES.get(db, {"error": f"unknown db '{db}'"})) - - if name == "check_feature_support": - key = (args["database"].lower(), args["feature"].lower()) - for k, v in _FEATURES.items(): - if k[0] == key[0] and k[1] in key[1]: - return json.dumps(v) - return json.dumps({"error": f"feature '{args['feature']}' not in dataset"}) - - return json.dumps({"error": f"unknown tool '{name}'"}) - - -# --------------------------------------------------------------------------- -# Task runner — one independent conversation per task -# --------------------------------------------------------------------------- - - -async def run_task( - client: AsyncOpenAI, - task_name: str, - prompt: str, - session_id: str | None, -) -> tuple[str, str]: - """ - Run a single research task with its own tool-calling loop. - - Each task is an independent conversation so the router sees only - this task's intent — not the accumulated context of previous tasks. - Model affinity via X-Model-Affinity pins the model from the first task - onward, so all tasks stay on the same model. - - Returns (answer, first_model_used). - """ - headers = {"X-Model-Affinity": session_id} if session_id else {} - messages: list[ChatCompletionMessageParam] = [ - {"role": "system", "content": SYSTEM_PROMPT}, - {"role": "user", "content": prompt}, - ] - first_model: str | None = None - - while True: - resp = await client.chat.completions.create( - model="gpt-4o-mini", # Plano's router overrides this via routing_preferences - messages=messages, - tools=TOOLS, - tool_choice="auto", - max_completion_tokens=600, - extra_headers=headers or None, - ) - if first_model is None: - first_model = resp.model - - log.info( - "task=%s model=%s finish=%s", - task_name, - resp.model, - resp.choices[0].finish_reason, - ) - - choice = resp.choices[0] - if choice.finish_reason == "tool_calls" and choice.message.tool_calls: - messages.append(choice.message) - for tc in choice.message.tool_calls: - args = json.loads(tc.function.arguments or "{}") - result = _dispatch(tc.function.name, args) - log.info(" tool %s(%s)", tc.function.name, args) - messages.append( - {"role": "tool", "content": result, "tool_call_id": tc.id} - ) - else: - return (choice.message.content or "").strip(), first_model or "unknown" - - -# --------------------------------------------------------------------------- -# Research loop — runs all tasks, threading context forward -# --------------------------------------------------------------------------- - - -async def run_research_loop( - client: AsyncOpenAI, - session_id: str | None, -) -> tuple[str, list[dict]]: - """ - Run all 3 research tasks in sequence, passing each task's output as - context to the next. Returns (final_answer, routing_trace). - """ - context = "" - trace: list[dict] = [] - final_answer = "" - - for task in TASKS: - prompt = task["prompt"].format(context=context) - answer, model = await run_task(client, task["name"], prompt, session_id) - trace.append({"task": task["name"], "model": model}) - context += f"\n### {task['name']}\n{answer}\n" - final_answer = answer - - return final_answer, trace - - -# --------------------------------------------------------------------------- -# FastAPI app -# --------------------------------------------------------------------------- - -app = FastAPI(title="Research Agent", version="1.0.0") - - -@app.post("/v1/chat/completions") -async def chat(request: Request) -> JSONResponse: - body = await request.json() - session_id: str | None = request.headers.get("x-model-affinity") - - log.info("request session_id=%s", session_id or "none") - - client = AsyncOpenAI(base_url=f"{PLANO_URL}/v1", api_key="EMPTY") - answer, trace = await run_research_loop(client, session_id) - - return JSONResponse( - { - "id": f"chatcmpl-{uuid.uuid4().hex[:8]}", - "object": "chat.completion", - "choices": [ - { - "index": 0, - "message": {"role": "assistant", "content": answer}, - "finish_reason": "stop", - } - ], - "routing_trace": trace, - "session_id": session_id, - } - ) - - -@app.get("/health") -async def health() -> dict: - return {"status": "ok", "plano_url": PLANO_URL} - - -# --------------------------------------------------------------------------- -# Entry point -# --------------------------------------------------------------------------- - -if __name__ == "__main__": - log.info("starting on port %d plano=%s", PORT, PLANO_URL) - uvicorn.run(app, host="0.0.0.0", port=PORT, log_level="warning") diff --git a/demos/llm_routing/model_affinity/config.yaml b/demos/llm_routing/model_affinity/config.yaml deleted file mode 100644 index 7b98b25b..00000000 --- a/demos/llm_routing/model_affinity/config.yaml +++ /dev/null @@ -1,27 +0,0 @@ -version: v0.3.0 - -listeners: - - type: model - name: model_listener - port: 12000 - -model_providers: - - - model: openai/gpt-4o-mini - access_key: $OPENAI_API_KEY - default: true - - - model: openai/gpt-4o - access_key: $OPENAI_API_KEY - routing_preferences: - - name: complex_reasoning - description: complex reasoning tasks, multi-step analysis, or detailed explanations - - - model: anthropic/claude-sonnet-4-20250514 - access_key: $ANTHROPIC_API_KEY - routing_preferences: - - name: code_generation - description: generating new code, writing functions, or creating boilerplate - -tracing: - random_sampling: 100 diff --git a/demos/llm_routing/model_affinity/demo.py b/demos/llm_routing/model_affinity/demo.py deleted file mode 100644 index f01a5a31..00000000 --- a/demos/llm_routing/model_affinity/demo.py +++ /dev/null @@ -1,307 +0,0 @@ -#!/usr/bin/env -S uv run --script -# /// script -# requires-python = ">=3.12" -# dependencies = ["openai>=1.0.0"] -# /// -""" -Model Affinity Demo — Agentic Tool-Calling Loop - -Runs the same agentic loop twice through Plano: - 1. Without model affinity — the router may pick different models per turn - 2. With model affinity — all turns use the model selected on turn 1 - -Each loop is a real tool-calling agent: the LLM decides which tools to call, -we provide simulated results, and the LLM continues until it has enough -information to produce a final answer. Each turn is a separate request to -Plano, so the router classifies intent independently every time. - -Usage: - planoai up config.yaml # start Plano - uv run demo.py # run this demo -""" - -import asyncio -import json -import os -import uuid - -from openai import AsyncOpenAI -from openai.types.chat import ChatCompletionMessageParam - -PLANO_URL = os.environ.get("PLANO_URL", "http://localhost:12000") - -SYSTEM_PROMPT = ( - "You are a database selection analyst. Use the provided tools to gather " - "benchmark data and case studies, then recommend PostgreSQL or MongoDB " - "for a high-traffic e-commerce backend. Be concise." -) - -USER_QUERY = ( - "Should we use PostgreSQL or MongoDB for our e-commerce platform? " - "We need strong consistency for orders but flexible schemas for products. " - "Use the tools to research both options, then give a recommendation." -) - -TOOLS = [ - { - "type": "function", - "function": { - "name": "get_db_benchmarks", - "description": "Fetch performance benchmarks for a database under a given workload.", - "parameters": { - "type": "object", - "properties": { - "database": { - "type": "string", - "enum": ["postgresql", "mongodb"], - }, - "workload": { - "type": "string", - "enum": ["read_heavy", "write_heavy", "mixed"], - }, - }, - "required": ["database", "workload"], - }, - }, - }, - { - "type": "function", - "function": { - "name": "get_case_studies", - "description": "Retrieve real-world e-commerce case studies for a database.", - "parameters": { - "type": "object", - "properties": { - "database": { - "type": "string", - "enum": ["postgresql", "mongodb"], - }, - }, - "required": ["database"], - }, - }, - }, - { - "type": "function", - "function": { - "name": "check_feature_support", - "description": "Check if a database supports a specific feature.", - "parameters": { - "type": "object", - "properties": { - "database": { - "type": "string", - "enum": ["postgresql", "mongodb"], - }, - "feature": {"type": "string"}, - }, - "required": ["database", "feature"], - }, - }, - }, -] - -# Simulated tool responses -_BENCHMARKS = { - ("postgresql", "mixed"): { - "read_qps": 42000, - "write_qps": 21000, - "p99_ms": 6, - "notes": "Solid all-round; MVCC keeps reads non-blocking", - }, - ("mongodb", "mixed"): { - "read_qps": 60000, - "write_qps": 50000, - "p99_ms": 3, - "notes": "Flexible schema accelerates feature iteration", - }, -} - -_CASE_STUDIES = { - "postgresql": [ - {"company": "Shopify", "notes": "Moved orders back to Postgres for ACID"}, - { - "company": "Zalando", - "notes": "Postgres + Citus for sharded order processing", - }, - ], - "mongodb": [ - {"company": "eBay", "notes": "Product catalogue — flexible attribute schemas"}, - {"company": "Alibaba", "notes": "Session/cart data — high write throughput"}, - ], -} - -_FEATURES = { - ("postgresql", "acid transactions"): {"supported": True, "notes": "Full ACID"}, - ("mongodb", "acid transactions"): { - "supported": True, - "notes": "Multi-doc ACID since v4.0", - }, - ("postgresql", "horizontal sharding"): { - "supported": True, - "notes": "Via Citus extension", - }, - ("mongodb", "horizontal sharding"): { - "supported": True, - "notes": "Native auto-balancing", - }, -} - - -def dispatch_tool(name: str, args: dict) -> str: - if name == "get_db_benchmarks": - key = (args["database"], args["workload"]) - return json.dumps(_BENCHMARKS.get(key, {"error": f"no data for {key}"})) - if name == "get_case_studies": - return json.dumps(_CASE_STUDIES.get(args["database"], {"error": "unknown db"})) - if name == "check_feature_support": - key = (args["database"], args["feature"].lower()) - for k, v in _FEATURES.items(): - if k[0] == key[0] and k[1] in key[1]: - return json.dumps(v) - return json.dumps({"error": f"no data for {key}"}) - return json.dumps({"error": f"unknown tool {name}"}) - - -# --------------------------------------------------------------------------- -# Agentic loop — runs tool calls until the LLM produces a final answer -# --------------------------------------------------------------------------- - - -async def run_agent_loop( - affinity_id: str | None = None, - max_turns: int = 10, -) -> tuple[str, list[dict]]: - """ - Run a tool-calling agent loop against Plano. - - Returns (final_answer, trace) where trace is a list of - {"turn": int, "model": str, "tool_calls": [...]} dicts. - """ - client = AsyncOpenAI(base_url=f"{PLANO_URL}/v1", api_key="EMPTY") - headers = {"X-Model-Affinity": affinity_id} if affinity_id else None - - messages: list[ChatCompletionMessageParam] = [ - {"role": "system", "content": SYSTEM_PROMPT}, - {"role": "user", "content": USER_QUERY}, - ] - trace: list[dict] = [] - - for turn in range(1, max_turns + 1): - resp = await client.chat.completions.create( - model="gpt-4o-mini", - messages=messages, - tools=TOOLS, - tool_choice="auto", - max_completion_tokens=800, - extra_headers=headers, - ) - - choice = resp.choices[0] - turn_info: dict = {"turn": turn, "model": resp.model} - - if choice.finish_reason == "tool_calls" and choice.message.tool_calls: - tool_names = [tc.function.name for tc in choice.message.tool_calls] - turn_info["tool_calls"] = tool_names - trace.append(turn_info) - - messages.append(choice.message) - for tc in choice.message.tool_calls: - args = json.loads(tc.function.arguments or "{}") - result = dispatch_tool(tc.function.name, args) - messages.append( - {"role": "tool", "content": result, "tool_call_id": tc.id} - ) - else: - turn_info["tool_calls"] = [] - trace.append(turn_info) - return (choice.message.content or "").strip(), trace - - return "(max turns reached)", trace - - -# --------------------------------------------------------------------------- -# Display helpers -# --------------------------------------------------------------------------- - - -def short_model(model: str) -> str: - return model.split("/")[-1] if "/" in model else model - - -def print_trace(trace: list[dict]) -> None: - for t in trace: - model = short_model(t["model"]) - tools = ", ".join(t["tool_calls"]) if t["tool_calls"] else "final answer" - print(f" turn {t['turn']} [{model:<30}] {tools}") - - -def print_summary(label: str, trace: list[dict]) -> None: - models = [t["model"] for t in trace] - unique = set(models) - if len(unique) == 1: - print( - f" ✓ {label}: {short_model(next(iter(unique)))} " - f"for all {len(models)} turns" - ) - else: - switches = sum(1 for a, b in zip(models, models[1:]) if a != b) - names = ", ".join(sorted(short_model(m) for m in unique)) - print(f" ✗ {label}: model switched {switches} time(s) — {names}") - - -# --------------------------------------------------------------------------- -# Main -# --------------------------------------------------------------------------- - - -async def main() -> None: - print() - print(" ╔══════════════════════════════════════════════════════════╗") - print(" ║ Model Affinity Demo — Agentic Loop ║") - print(" ╚══════════════════════════════════════════════════════════╝") - print() - print(f" Plano : {PLANO_URL}") - print(f' Query : "{USER_QUERY[:65]}…"') - print() - print(" The agent calls tools (get_db_benchmarks, get_case_studies,") - print(" check_feature_support) across multiple turns. Each turn is") - print(" a separate request to Plano — the router classifies intent") - print(" independently, so different turns may get different models.") - print() - - # --- Run 1: without affinity --- - print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━") - print(" Run 1: WITHOUT Model Affinity") - print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━") - print() - answer1, trace1 = await run_agent_loop(affinity_id=None) - print_trace(trace1) - print() - print_summary("Without affinity", trace1) - print() - - # --- Run 2: with affinity --- - aid = str(uuid.uuid4()) - print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━") - print(f" Run 2: WITH Model Affinity (X-Model-Affinity: {aid[:8]}…)") - print(" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━") - print() - answer2, trace2 = await run_agent_loop(affinity_id=aid) - print_trace(trace2) - print() - print_summary("With affinity ", trace2) - print() - - # --- Final answer --- - print(" ══ Agent recommendation (affinity session) ════════════════") - print() - for line in answer2.splitlines(): - print(f" {line}") - print() - print(" ═══════════════════════════════════════════════════════════") - print() - - -if __name__ == "__main__": - asyncio.run(main()) diff --git a/demos/llm_routing/model_affinity/demo.sh b/demos/llm_routing/model_affinity/demo.sh deleted file mode 100755 index 3ce50b3c..00000000 --- a/demos/llm_routing/model_affinity/demo.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash -set -e - -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" - -# Run the demo directly against Plano (no agent server needed) -uv run "$SCRIPT_DIR/demo.py" diff --git a/demos/llm_routing/model_affinity/start_agents.sh b/demos/llm_routing/model_affinity/start_agents.sh deleted file mode 100755 index 5baaa378..00000000 --- a/demos/llm_routing/model_affinity/start_agents.sh +++ /dev/null @@ -1,28 +0,0 @@ -#!/bin/bash -set -e - -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -PIDS=() - -log() { echo "$(date '+%F %T') - $*"; } - -cleanup() { - log "Stopping agents..." - for PID in "${PIDS[@]}"; do - kill "$PID" 2>/dev/null && log "Stopped process $PID" - done - exit 0 -} - -trap cleanup EXIT INT TERM - -export PLANO_URL="${PLANO_URL:-http://localhost:12000}" -export AGENT_PORT="${AGENT_PORT:-8000}" - -log "Starting research_agent on port $AGENT_PORT..." -uv run "$SCRIPT_DIR/agent.py" & -PIDS+=($!) - -for PID in "${PIDS[@]}"; do - wait "$PID" -done