Merge branch 'main' into adil/run-demos-without-docker

2026-05-12 01:02:56 +02:00 · 2026-03-11 11:54:34 -07:00 · 2026-03-11 11:54:34 -07:00 · 2bb0826a51
commit 2bb0826a51
parent e336150c38 6610097659
26 changed files with 2142 additions and 217 deletions
--- a/demos/README.md
+++ b/demos/README.md
@ -16,6 +16,7 @@ This directory contains demos showcasing Plano's capabilities as an AI-native pr
 | [Preference-Based Routing](llm_routing/preference_based_routing/) | Routes prompts to LLMs based on user-defined preferences and task type (e.g. code generation vs. understanding) |
 | [Model Alias Routing](llm_routing/model_alias_routing/) | Maps semantic aliases (`arch.summarize.v1`) to provider-specific models for centralized governance |
 | [Claude Code Router](llm_routing/claude_code_router/) | Extends Claude Code with multi-provider access and preference-aligned routing for coding tasks |
+| [Codex Router](llm_routing/codex_router/) | Extends Codex CLI with multi-provider access and preference-aligned routing for coding tasks |

 ## Agent Orchestration

--- a/demos/llm_routing/codex_router/README.md
+++ b/demos/llm_routing/codex_router/README.md
@ -0,0 +1,92 @@
+# Codex Router - Multi-Model Access with Intelligent Routing
+
+Plano extends Codex CLI to access multiple LLM providers through a single interface. This gives you:
+
+1. **Access to Models**: Connect to OpenAI, Anthropic, xAI, Gemini, and local models via Ollama
+2. **Intelligent Routing via Preferences for Coding Tasks**: Configure which models handle specific development tasks:
+   - Code generation and implementation
+   - Code understanding and analysis
+   - Debugging and optimization
+   - Architecture and system design
+
+Uses a [1.5B preference-aligned router LLM](https://arxiv.org/abs/2506.16655) to automatically select the best model based on your request type.
+
+## Benefits
+
+- **Single Interface**: Access multiple LLM providers through the same Codex CLI
+- **Task-Aware Routing**: Requests are analyzed and routed to models based on task type (code generation vs code understanding)
+- **Provider Flexibility**: Add or remove providers without changing your workflow
+- **Routing Transparency**: See which model handles each request and why
+
+## Quick Start
+
+### Prerequisites
+
+```bash
+# Install Codex CLI
+npm install -g @openai/codex
+
+# Install Plano CLI
+pip install planoai
+```
+
+### Step 1: Open the Demo
+
+```bash
+git clone https://github.com/katanemo/arch.git
+cd arch/demos/llm_routing/codex_router
+```
+
+### Step 2: Set API Keys
+
+```bash
+export OPENAI_API_KEY="your-openai-key-here"
+export ANTHROPIC_API_KEY="your-anthropic-key-here"
+export XAI_API_KEY="your-xai-key-here"
+export GEMINI_API_KEY="your-gemini-key-here"
+```
+
+### Step 3: Start Plano
+
+```bash
+planoai up
+# or: uvx planoai up
+```
+
+### Step 4: Launch Codex Through Plano
+
+```bash
+planoai cli-agent codex
+# or: uvx planoai cli-agent codex
+```
+
+By default, `planoai cli-agent codex` starts Codex with `gpt-5.3-codex`. With this demo config:
+
+- `code understanding` prompts are routed to `gpt-5-2025-08-07`
+- `code generation` prompts are routed to `gpt-5.3-codex`
+
+## Monitor Routing Decisions
+
+In a second terminal:
+
+```bash
+sh pretty_model_resolution.sh
+```
+
+This shows each request model and the final model selected by Plano's router.
+
+## Configuration Highlights
+
+`config.yaml` demonstrates:
+
+- OpenAI default model for Codex sessions (`gpt-5.3-codex`)
+- Routing preference override for code understanding (`gpt-5-2025-08-07`)
+- Additional providers (Anthropic, xAI, Gemini, Ollama local) to show cross-provider routing support
+
+## Optional Overrides
+
+Set a different Codex session model:
+
+```bash
+planoai cli-agent codex --settings='{"CODEX_MODEL":"gpt-5-2025-08-07"}'
+```
--- a/demos/llm_routing/codex_router/config.yaml
+++ b/demos/llm_routing/codex_router/config.yaml
@ -0,0 +1,38 @@
+version: v0.3.0
+
+listeners:
+  - type: model
+    name: model_listener
+    port: 12000
+
+model_providers:
+  # OpenAI models used by Codex defaults and preference routing
+  - model: openai/gpt-5.3-codex
+    default: true
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: code generation
+        description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+
+  - model: xai/grok-4-1-fast-non-reasoning
+    access_key: $GROK_API_KEY
+    routing_preferences:
+      - name: project understanding
+        description: understand repository structure, codebase, and code files, readmes, and other documentation
+
+  # Additional providers (optional): Codex can route to any configured model
+  # - model: anthropic/claude-sonnet-4-5
+  #   access_key: $ANTHROPIC_API_KEY
+
+  # - model: xai/grok-4-1-fast-non-reasoning
+  #   access_key: $GROK_API_KEY
+
+  - model: ollama/llama3.1
+    base_url: http://localhost:11434
+
+model_aliases:
+  arch.codex.default:
+    target: gpt-5.3-codex
+
+tracing:
+  random_sampling: 100
--- a/demos/llm_routing/codex_router/pretty_model_resolution.sh
+++ b/demos/llm_routing/codex_router/pretty_model_resolution.sh
@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+# Pretty-print Plano MODEL_RESOLUTION lines from docker logs
+# - hides Arch-Router
+# - prints timestamp
+# - colors MODEL_RESOLUTION red
+# - colors req_model cyan
+# - colors resolved_model magenta
+# - removes provider and streaming
+
+docker logs -f plano 2>&1 \
+| awk '
+/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
+  # extract timestamp between first [ and ]
+  ts=""
+  if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
+    ts=substr($0, RSTART+1, RLENGTH-2)
+  }
+
+  # split out after MODEL_RESOLUTION:
+  n = split($0, parts, /MODEL_RESOLUTION: */)
+  line = parts[2]
+
+  # remove provider and streaming fields
+  sub(/ *provider='\''[^'\'']+'\''/, "", line)
+  sub(/ *streaming=(true|false)/, "", line)
+
+  # highlight fields
+  gsub(/req_model='\''[^'\'']+'\''/, "\033[36m&\033[0m", line)
+  gsub(/resolved_model='\''[^'\'']+'\''/, "\033[35m&\033[0m", line)
+
+  # print timestamp + MODEL_RESOLUTION
+  printf "\033[90m[%s]\033[0m \033[31mMODEL_RESOLUTION\033[0m: %s\n", ts, line
+}'
--- a/demos/llm_routing/model_routing_service/README.md
+++ b/demos/llm_routing/model_routing_service/README.md
@ -0,0 +1,92 @@
+# Model Routing Service Demo
+
+This demo shows how to use the `/routing/v1/*` endpoints to get routing decisions without proxying requests to an LLM. The endpoint accepts standard LLM request formats and returns which model Plano's router would select.
+
+## Setup
+
+Make sure you have Plano CLI installed (`pip install planoai` or `uv tool install planoai`).
+
+```bash
+export OPENAI_API_KEY=<your-key>
+export ANTHROPIC_API_KEY=<your-key>
+```
+
+Start Plano:
+```bash
+cd demos/llm_routing/model_routing_service
+planoai up config.yaml
+```
+
+## Run the demo
+
+```bash
+./demo.sh
+```
+
+## Endpoints
+
+All three LLM API formats are supported:
+
+| Endpoint | Format |
+|---|---|
+| `POST /routing/v1/chat/completions` | OpenAI Chat Completions |
+| `POST /routing/v1/messages` | Anthropic Messages |
+| `POST /routing/v1/responses` | OpenAI Responses API |
+
+## Example
+
+```bash
+curl http://localhost:12000/routing/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Write a Python function for binary search"}]
+  }'
+```
+
+Response:
+```json
+{
+    "model": "anthropic/claude-sonnet-4-20250514",
+    "route": "code_generation",
+    "trace_id": "c16d1096c1af4a17abb48fb182918a88"
+}
+```
+
+The response tells you which model would handle this request and which route was matched, without actually making the LLM call.
+
+## Demo Output
+
+```
+=== Model Routing Service Demo ===
+
+--- 1. Code generation query (OpenAI format) ---
+{
+    "model": "anthropic/claude-sonnet-4-20250514",
+    "route": "code_generation",
+    "trace_id": "c16d1096c1af4a17abb48fb182918a88"
+}
+
+--- 2. Complex reasoning query (OpenAI format) ---
+{
+    "model": "openai/gpt-4o",
+    "route": "complex_reasoning",
+    "trace_id": "30795e228aff4d7696f082ed01b75ad4"
+}
+
+--- 3. Simple query - no routing match (OpenAI format) ---
+{
+    "model": "none",
+    "route": null,
+    "trace_id": "ae0b6c3b220d499fb5298ac63f4eac0e"
+}
+
+--- 4. Code generation query (Anthropic format) ---
+{
+    "model": "anthropic/claude-sonnet-4-20250514",
+    "route": "code_generation",
+    "trace_id": "26be822bbdf14a3ba19fe198e55ea4a9"
+}
+
+=== Demo Complete ===
+```
--- a/demos/llm_routing/model_routing_service/config.yaml
+++ b/demos/llm_routing/model_routing_service/config.yaml
@ -0,0 +1,27 @@
+version: v0.3.0
+
+listeners:
+  - type: model
+    name: model_listener
+    port: 12000
+
+model_providers:
+
+  - model: openai/gpt-4o-mini
+    access_key: $OPENAI_API_KEY
+    default: true
+
+  - model: openai/gpt-4o
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: complex_reasoning
+        description: complex reasoning tasks, multi-step analysis, or detailed explanations
+
+  - model: anthropic/claude-sonnet-4-20250514
+    access_key: $ANTHROPIC_API_KEY
+    routing_preferences:
+      - name: code_generation
+        description: generating new code, writing functions, or creating boilerplate
+
+tracing:
+  random_sampling: 100
--- a/demos/llm_routing/model_routing_service/demo.sh
+++ b/demos/llm_routing/model_routing_service/demo.sh
@ -0,0 +1,120 @@
+#!/bin/bash
+set -e
+
+PLANO_URL="${PLANO_URL:-http://localhost:12000}"
+
+echo "=== Model Routing Service Demo ==="
+echo ""
+echo "This demo shows how to use the /routing/v1/* endpoints to get"
+echo "routing decisions without actually proxying the request to an LLM."
+echo ""
+
+# --- Example 1: OpenAI Chat Completions format ---
+echo "--- 1. Code generation query (OpenAI format) ---"
+echo ""
+curl -s "$PLANO_URL/routing/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "messages": [
+      {"role": "user", "content": "Write a Python function that implements binary search on a sorted array"}
+    ]
+  }' | python3 -m json.tool
+echo ""
+
+# --- Example 2: Complex reasoning query ---
+echo "--- 2. Complex reasoning query (OpenAI format) ---"
+echo ""
+curl -s "$PLANO_URL/routing/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "messages": [
+      {"role": "user", "content": "Explain the trade-offs between microservices and monolithic architectures, considering scalability, team structure, and operational complexity"}
+    ]
+  }' | python3 -m json.tool
+echo ""
+
+# --- Example 3: Simple query (no routing match) ---
+echo "--- 3. Simple query - no routing match (OpenAI format) ---"
+echo ""
+curl -s "$PLANO_URL/routing/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "messages": [
+      {"role": "user", "content": "What is the capital of France?"}
+    ]
+  }' | python3 -m json.tool
+echo ""
+
+# --- Example 4: Anthropic Messages format ---
+echo "--- 4. Code generation query (Anthropic format) ---"
+echo ""
+curl -s "$PLANO_URL/routing/v1/messages" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "max_tokens": 1024,
+    "messages": [
+      {"role": "user", "content": "Create a REST API endpoint in Rust using actix-web that handles user registration"}
+    ]
+  }' | python3 -m json.tool
+echo ""
+
+# --- Example 5: Inline routing policy in request body ---
+echo "--- 5. Inline routing_policy (no config needed) ---"
+echo ""
+curl -s "$PLANO_URL/routing/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "messages": [
+      {"role": "user", "content": "Write a quicksort implementation in Go"}
+    ],
+    "routing_policy": [
+      {
+        "model": "openai/gpt-4o",
+        "routing_preferences": [
+          {"name": "coding", "description": "code generation, writing functions, debugging"}
+        ]
+      },
+      {
+        "model": "openai/gpt-4o-mini",
+        "routing_preferences": [
+          {"name": "general", "description": "general questions, simple lookups, casual conversation"}
+        ]
+      }
+    ]
+  }' | python3 -m json.tool
+echo ""
+
+# --- Example 6: Inline routing policy with Anthropic format ---
+echo "--- 6. Inline routing_policy (Anthropic format) ---"
+echo ""
+curl -s "$PLANO_URL/routing/v1/messages" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "max_tokens": 1024,
+    "messages": [
+      {"role": "user", "content": "What is the weather like today?"}
+    ],
+    "routing_policy": [
+      {
+        "model": "openai/gpt-4o",
+        "routing_preferences": [
+          {"name": "coding", "description": "code generation, writing functions, debugging"}
+        ]
+      },
+      {
+        "model": "openai/gpt-4o-mini",
+        "routing_preferences": [
+          {"name": "general", "description": "general questions, simple lookups, casual conversation"}
+        ]
+      }
+    ]
+  }' | python3 -m json.tool
+echo ""
+
+echo "=== Demo Complete ==="