mirror of
https://github.com/katanemo/plano.git
synced 2026-05-12 01:02:56 +02:00
Merge branch 'main' into adil/run-demos-without-docker
This commit is contained in:
commit
2bb0826a51
26 changed files with 2142 additions and 217 deletions
|
|
@ -16,6 +16,7 @@ This directory contains demos showcasing Plano's capabilities as an AI-native pr
|
|||
| [Preference-Based Routing](llm_routing/preference_based_routing/) | Routes prompts to LLMs based on user-defined preferences and task type (e.g. code generation vs. understanding) |
|
||||
| [Model Alias Routing](llm_routing/model_alias_routing/) | Maps semantic aliases (`arch.summarize.v1`) to provider-specific models for centralized governance |
|
||||
| [Claude Code Router](llm_routing/claude_code_router/) | Extends Claude Code with multi-provider access and preference-aligned routing for coding tasks |
|
||||
| [Codex Router](llm_routing/codex_router/) | Extends Codex CLI with multi-provider access and preference-aligned routing for coding tasks |
|
||||
|
||||
## Agent Orchestration
|
||||
|
||||
|
|
|
|||
92
demos/llm_routing/codex_router/README.md
Normal file
92
demos/llm_routing/codex_router/README.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
# Codex Router - Multi-Model Access with Intelligent Routing
|
||||
|
||||
Plano extends Codex CLI to access multiple LLM providers through a single interface. This gives you:
|
||||
|
||||
1. **Access to Models**: Connect to OpenAI, Anthropic, xAI, Gemini, and local models via Ollama
|
||||
2. **Intelligent Routing via Preferences for Coding Tasks**: Configure which models handle specific development tasks:
|
||||
- Code generation and implementation
|
||||
- Code understanding and analysis
|
||||
- Debugging and optimization
|
||||
- Architecture and system design
|
||||
|
||||
Uses a [1.5B preference-aligned router LLM](https://arxiv.org/abs/2506.16655) to automatically select the best model based on your request type.
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Single Interface**: Access multiple LLM providers through the same Codex CLI
|
||||
- **Task-Aware Routing**: Requests are analyzed and routed to models based on task type (code generation vs code understanding)
|
||||
- **Provider Flexibility**: Add or remove providers without changing your workflow
|
||||
- **Routing Transparency**: See which model handles each request and why
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
```bash
|
||||
# Install Codex CLI
|
||||
npm install -g @openai/codex
|
||||
|
||||
# Install Plano CLI
|
||||
pip install planoai
|
||||
```
|
||||
|
||||
### Step 1: Open the Demo
|
||||
|
||||
```bash
|
||||
git clone https://github.com/katanemo/arch.git
|
||||
cd arch/demos/llm_routing/codex_router
|
||||
```
|
||||
|
||||
### Step 2: Set API Keys
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="your-openai-key-here"
|
||||
export ANTHROPIC_API_KEY="your-anthropic-key-here"
|
||||
export XAI_API_KEY="your-xai-key-here"
|
||||
export GEMINI_API_KEY="your-gemini-key-here"
|
||||
```
|
||||
|
||||
### Step 3: Start Plano
|
||||
|
||||
```bash
|
||||
planoai up
|
||||
# or: uvx planoai up
|
||||
```
|
||||
|
||||
### Step 4: Launch Codex Through Plano
|
||||
|
||||
```bash
|
||||
planoai cli-agent codex
|
||||
# or: uvx planoai cli-agent codex
|
||||
```
|
||||
|
||||
By default, `planoai cli-agent codex` starts Codex with `gpt-5.3-codex`. With this demo config:
|
||||
|
||||
- `code understanding` prompts are routed to `gpt-5-2025-08-07`
|
||||
- `code generation` prompts are routed to `gpt-5.3-codex`
|
||||
|
||||
## Monitor Routing Decisions
|
||||
|
||||
In a second terminal:
|
||||
|
||||
```bash
|
||||
sh pretty_model_resolution.sh
|
||||
```
|
||||
|
||||
This shows each request model and the final model selected by Plano's router.
|
||||
|
||||
## Configuration Highlights
|
||||
|
||||
`config.yaml` demonstrates:
|
||||
|
||||
- OpenAI default model for Codex sessions (`gpt-5.3-codex`)
|
||||
- Routing preference override for code understanding (`gpt-5-2025-08-07`)
|
||||
- Additional providers (Anthropic, xAI, Gemini, Ollama local) to show cross-provider routing support
|
||||
|
||||
## Optional Overrides
|
||||
|
||||
Set a different Codex session model:
|
||||
|
||||
```bash
|
||||
planoai cli-agent codex --settings='{"CODEX_MODEL":"gpt-5-2025-08-07"}'
|
||||
```
|
||||
38
demos/llm_routing/codex_router/config.yaml
Normal file
38
demos/llm_routing/codex_router/config.yaml
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
version: v0.3.0
|
||||
|
||||
listeners:
|
||||
- type: model
|
||||
name: model_listener
|
||||
port: 12000
|
||||
|
||||
model_providers:
|
||||
# OpenAI models used by Codex defaults and preference routing
|
||||
- model: openai/gpt-5.3-codex
|
||||
default: true
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: code generation
|
||||
description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
|
||||
|
||||
- model: xai/grok-4-1-fast-non-reasoning
|
||||
access_key: $GROK_API_KEY
|
||||
routing_preferences:
|
||||
- name: project understanding
|
||||
description: understand repository structure, codebase, and code files, readmes, and other documentation
|
||||
|
||||
# Additional providers (optional): Codex can route to any configured model
|
||||
# - model: anthropic/claude-sonnet-4-5
|
||||
# access_key: $ANTHROPIC_API_KEY
|
||||
|
||||
# - model: xai/grok-4-1-fast-non-reasoning
|
||||
# access_key: $GROK_API_KEY
|
||||
|
||||
- model: ollama/llama3.1
|
||||
base_url: http://localhost:11434
|
||||
|
||||
model_aliases:
|
||||
arch.codex.default:
|
||||
target: gpt-5.3-codex
|
||||
|
||||
tracing:
|
||||
random_sampling: 100
|
||||
33
demos/llm_routing/codex_router/pretty_model_resolution.sh
Normal file
33
demos/llm_routing/codex_router/pretty_model_resolution.sh
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
#!/usr/bin/env bash
|
||||
# Pretty-print Plano MODEL_RESOLUTION lines from docker logs
|
||||
# - hides Arch-Router
|
||||
# - prints timestamp
|
||||
# - colors MODEL_RESOLUTION red
|
||||
# - colors req_model cyan
|
||||
# - colors resolved_model magenta
|
||||
# - removes provider and streaming
|
||||
|
||||
docker logs -f plano 2>&1 \
|
||||
| awk '
|
||||
/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
|
||||
# extract timestamp between first [ and ]
|
||||
ts=""
|
||||
if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
|
||||
ts=substr($0, RSTART+1, RLENGTH-2)
|
||||
}
|
||||
|
||||
# split out after MODEL_RESOLUTION:
|
||||
n = split($0, parts, /MODEL_RESOLUTION: */)
|
||||
line = parts[2]
|
||||
|
||||
# remove provider and streaming fields
|
||||
sub(/ *provider='\''[^'\'']+'\''/, "", line)
|
||||
sub(/ *streaming=(true|false)/, "", line)
|
||||
|
||||
# highlight fields
|
||||
gsub(/req_model='\''[^'\'']+'\''/, "\033[36m&\033[0m", line)
|
||||
gsub(/resolved_model='\''[^'\'']+'\''/, "\033[35m&\033[0m", line)
|
||||
|
||||
# print timestamp + MODEL_RESOLUTION
|
||||
printf "\033[90m[%s]\033[0m \033[31mMODEL_RESOLUTION\033[0m: %s\n", ts, line
|
||||
}'
|
||||
92
demos/llm_routing/model_routing_service/README.md
Normal file
92
demos/llm_routing/model_routing_service/README.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
# Model Routing Service Demo
|
||||
|
||||
This demo shows how to use the `/routing/v1/*` endpoints to get routing decisions without proxying requests to an LLM. The endpoint accepts standard LLM request formats and returns which model Plano's router would select.
|
||||
|
||||
## Setup
|
||||
|
||||
Make sure you have Plano CLI installed (`pip install planoai` or `uv tool install planoai`).
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY=<your-key>
|
||||
export ANTHROPIC_API_KEY=<your-key>
|
||||
```
|
||||
|
||||
Start Plano:
|
||||
```bash
|
||||
cd demos/llm_routing/model_routing_service
|
||||
planoai up config.yaml
|
||||
```
|
||||
|
||||
## Run the demo
|
||||
|
||||
```bash
|
||||
./demo.sh
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
All three LLM API formats are supported:
|
||||
|
||||
| Endpoint | Format |
|
||||
|---|---|
|
||||
| `POST /routing/v1/chat/completions` | OpenAI Chat Completions |
|
||||
| `POST /routing/v1/messages` | Anthropic Messages |
|
||||
| `POST /routing/v1/responses` | OpenAI Responses API |
|
||||
|
||||
## Example
|
||||
|
||||
```bash
|
||||
curl http://localhost:12000/routing/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Write a Python function for binary search"}]
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"model": "anthropic/claude-sonnet-4-20250514",
|
||||
"route": "code_generation",
|
||||
"trace_id": "c16d1096c1af4a17abb48fb182918a88"
|
||||
}
|
||||
```
|
||||
|
||||
The response tells you which model would handle this request and which route was matched, without actually making the LLM call.
|
||||
|
||||
## Demo Output
|
||||
|
||||
```
|
||||
=== Model Routing Service Demo ===
|
||||
|
||||
--- 1. Code generation query (OpenAI format) ---
|
||||
{
|
||||
"model": "anthropic/claude-sonnet-4-20250514",
|
||||
"route": "code_generation",
|
||||
"trace_id": "c16d1096c1af4a17abb48fb182918a88"
|
||||
}
|
||||
|
||||
--- 2. Complex reasoning query (OpenAI format) ---
|
||||
{
|
||||
"model": "openai/gpt-4o",
|
||||
"route": "complex_reasoning",
|
||||
"trace_id": "30795e228aff4d7696f082ed01b75ad4"
|
||||
}
|
||||
|
||||
--- 3. Simple query - no routing match (OpenAI format) ---
|
||||
{
|
||||
"model": "none",
|
||||
"route": null,
|
||||
"trace_id": "ae0b6c3b220d499fb5298ac63f4eac0e"
|
||||
}
|
||||
|
||||
--- 4. Code generation query (Anthropic format) ---
|
||||
{
|
||||
"model": "anthropic/claude-sonnet-4-20250514",
|
||||
"route": "code_generation",
|
||||
"trace_id": "26be822bbdf14a3ba19fe198e55ea4a9"
|
||||
}
|
||||
|
||||
=== Demo Complete ===
|
||||
```
|
||||
27
demos/llm_routing/model_routing_service/config.yaml
Normal file
27
demos/llm_routing/model_routing_service/config.yaml
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
version: v0.3.0
|
||||
|
||||
listeners:
|
||||
- type: model
|
||||
name: model_listener
|
||||
port: 12000
|
||||
|
||||
model_providers:
|
||||
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/gpt-4o
|
||||
access_key: $OPENAI_API_KEY
|
||||
routing_preferences:
|
||||
- name: complex_reasoning
|
||||
description: complex reasoning tasks, multi-step analysis, or detailed explanations
|
||||
|
||||
- model: anthropic/claude-sonnet-4-20250514
|
||||
access_key: $ANTHROPIC_API_KEY
|
||||
routing_preferences:
|
||||
- name: code_generation
|
||||
description: generating new code, writing functions, or creating boilerplate
|
||||
|
||||
tracing:
|
||||
random_sampling: 100
|
||||
120
demos/llm_routing/model_routing_service/demo.sh
Executable file
120
demos/llm_routing/model_routing_service/demo.sh
Executable file
|
|
@ -0,0 +1,120 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
PLANO_URL="${PLANO_URL:-http://localhost:12000}"
|
||||
|
||||
echo "=== Model Routing Service Demo ==="
|
||||
echo ""
|
||||
echo "This demo shows how to use the /routing/v1/* endpoints to get"
|
||||
echo "routing decisions without actually proxying the request to an LLM."
|
||||
echo ""
|
||||
|
||||
# --- Example 1: OpenAI Chat Completions format ---
|
||||
echo "--- 1. Code generation query (OpenAI format) ---"
|
||||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Write a Python function that implements binary search on a sorted array"}
|
||||
]
|
||||
}' | python3 -m json.tool
|
||||
echo ""
|
||||
|
||||
# --- Example 2: Complex reasoning query ---
|
||||
echo "--- 2. Complex reasoning query (OpenAI format) ---"
|
||||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Explain the trade-offs between microservices and monolithic architectures, considering scalability, team structure, and operational complexity"}
|
||||
]
|
||||
}' | python3 -m json.tool
|
||||
echo ""
|
||||
|
||||
# --- Example 3: Simple query (no routing match) ---
|
||||
echo "--- 3. Simple query - no routing match (OpenAI format) ---"
|
||||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the capital of France?"}
|
||||
]
|
||||
}' | python3 -m json.tool
|
||||
echo ""
|
||||
|
||||
# --- Example 4: Anthropic Messages format ---
|
||||
echo "--- 4. Code generation query (Anthropic format) ---"
|
||||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/messages" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"max_tokens": 1024,
|
||||
"messages": [
|
||||
{"role": "user", "content": "Create a REST API endpoint in Rust using actix-web that handles user registration"}
|
||||
]
|
||||
}' | python3 -m json.tool
|
||||
echo ""
|
||||
|
||||
# --- Example 5: Inline routing policy in request body ---
|
||||
echo "--- 5. Inline routing_policy (no config needed) ---"
|
||||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Write a quicksort implementation in Go"}
|
||||
],
|
||||
"routing_policy": [
|
||||
{
|
||||
"model": "openai/gpt-4o",
|
||||
"routing_preferences": [
|
||||
{"name": "coding", "description": "code generation, writing functions, debugging"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"model": "openai/gpt-4o-mini",
|
||||
"routing_preferences": [
|
||||
{"name": "general", "description": "general questions, simple lookups, casual conversation"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}' | python3 -m json.tool
|
||||
echo ""
|
||||
|
||||
# --- Example 6: Inline routing policy with Anthropic format ---
|
||||
echo "--- 6. Inline routing_policy (Anthropic format) ---"
|
||||
echo ""
|
||||
curl -s "$PLANO_URL/routing/v1/messages" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"max_tokens": 1024,
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is the weather like today?"}
|
||||
],
|
||||
"routing_policy": [
|
||||
{
|
||||
"model": "openai/gpt-4o",
|
||||
"routing_preferences": [
|
||||
{"name": "coding", "description": "code generation, writing functions, debugging"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"model": "openai/gpt-4o-mini",
|
||||
"routing_preferences": [
|
||||
{"name": "general", "description": "general questions, simple lookups, casual conversation"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}' | python3 -m json.tool
|
||||
echo ""
|
||||
|
||||
echo "=== Demo Complete ==="
|
||||
Loading…
Add table
Add a link
Reference in a new issue