diff --git a/demos/llm_routing/openclaw_routing/README.md b/demos/llm_routing/openclaw_routing/README.md new file mode 100644 index 00000000..93cc5822 --- /dev/null +++ b/demos/llm_routing/openclaw_routing/README.md @@ -0,0 +1,122 @@ +# OpenClaw + Plano: Smart Model Routing for Personal AI Assistants + +OpenClaw is an open-source personal AI assistant that connects to WhatsApp, Telegram, Slack, and Discord. By pointing it at Plano instead of a single LLM provider, every message is automatically routed to the best model — conversational requests go to Kimi K2.5 (cost-effective), while code generation, testing, and complex reasoning go to Claude (most capable) — with zero application code changes. + +## Architecture + +``` +[WhatsApp / Telegram / Slack / Discord] + | + [OpenClaw Gateway] + ws://127.0.0.1:18789 + | + [Plano :12000] ──────────────> Kimi K2.5 (conversation, agentic tasks) + | $0.60/M input tokens + |──────────────────────> Claude (code, tests, reasoning) + | + [Arch-Router 1.5B] + (local via Ollama, ~200ms) +``` + +Plano's 1.5B [Arch-Router](https://arxiv.org/abs/2506.16655) model analyzes each prompt locally and selects the best backend based on configured routing preferences. + +## Prerequisites + +- **Docker** running +- **Ollama** installed ([ollama.com](https://ollama.com)) +- **Plano CLI**: `uv tool install planoai` or `pip install planoai` +- **OpenClaw**: `npm install -g openclaw@latest` +- **API keys**: + - `MOONSHOT_API_KEY` — from [Moonshot AI](https://platform.moonshot.cn/) + - `ANTHROPIC_API_KEY` — from [Anthropic](https://console.anthropic.com/) + +## Quick Start + +### 1. Set Environment Variables + +```bash +export MOONSHOT_API_KEY="your-moonshot-key" +export ANTHROPIC_API_KEY="your-anthropic-key" +``` + +### 2. Start the Demo + +```bash +cd demos/llm_routing/openclaw_routing +bash run_demo.sh +``` + +This will: +- Pull the Arch-Router model into Ollama +- Start Jaeger for tracing +- Start Plano on port 12000 + +### 3. Configure OpenClaw + +In `~/.openclaw/openclaw.json`, set: + +```json +{ + "agent": { + "model": "kimi-k2.5", + "baseURL": "http://127.0.0.1:12000/v1" + } +} +``` + +Then run: + +```bash +openclaw onboard --install-daemon +``` + +### 4. Test Routing + +Run the test script to verify routing decisions: + +```bash +bash test_routing.sh +``` + +## Demo Scenarios + +| # | Message | Expected Route | Why | +|---|---------|---------------|-----| +| 1 | "Hey, what's up? Tell me something interesting." | **Kimi K2.5** | General conversation — cheap and fast | +| 2 | "Remind me tomorrow at 9am and ping Slack about the deploy" | **Kimi K2.5** | Agentic multi-step task orchestration | +| 3 | "Write a Python rate limiter with the token bucket algorithm" | **Claude** | Code generation — needs precision | +| 4 | "Write unit tests for the auth middleware, cover edge cases" | **Claude** | Testing & evaluation — needs thoroughness | +| 5 | "Compare WebSockets vs SSE vs polling for 10K concurrent users" | **Claude** | Complex reasoning — needs deep analysis | + +OpenClaw's code doesn't change at all. It points at `http://127.0.0.1:12000/v1` instead of a direct provider URL. Plano's Arch-Router analyzes each prompt in ~200ms and picks the right backend. + +## Monitoring + +### Routing Decisions + +Watch Plano logs for model selection: + +```bash +docker logs plano 2>&1 | grep MODEL_RESOLUTION +``` + +### Jaeger Tracing + +Open [http://localhost:16686](http://localhost:16686) to see full traces of each request, including which model was selected and the routing latency. + +## Cost Impact + +For a personal assistant handling ~1000 requests/day with a 60/40 conversation-to-code split: + +| Without Plano (all Claude) | With Plano (routed) | +|---|---| +| 1000 req x Claude pricing | 600 req x Kimi K2.5 + 400 req x Claude | +| ~$3.00/day input tokens | ~$0.36 + $1.20 = **$1.56/day** (~48% savings) | + +Same quality where it matters (code, tests), lower cost where it doesn't (chat). + +## Stopping the Demo + +```bash +bash run_demo.sh down +``` diff --git a/demos/llm_routing/openclaw_routing/config.yaml b/demos/llm_routing/openclaw_routing/config.yaml new file mode 100644 index 00000000..8cbdde08 --- /dev/null +++ b/demos/llm_routing/openclaw_routing/config.yaml @@ -0,0 +1,48 @@ +version: v0.1.0 + +routing: + model: Arch-Router + llm_provider: arch-router + +listeners: + egress_traffic: + address: 0.0.0.0 + port: 12000 + message_format: openai + timeout: 30s + +llm_providers: + + # Arch Router - the 1.5B preference-aligned routing model (runs locally via Ollama) + - name: arch-router + model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M + base_url: http://host.docker.internal:11434 + + # Kimi K2.5 — Moonshot AI's open model (1T MoE, 32B active params) + # Great for general conversation, agentic tasks, and multimodal work + # OpenAI-compatible API at $0.60/M input, $2.50/M output tokens + - model: openai/kimi-k2.5 + access_key: $MOONSHOT_API_KEY + base_url: https://api.moonshot.ai/v1 + provider_interface: openai + default: true + routing_preferences: + - name: general conversation + description: general chat, greetings, casual conversation, Q&A, and everyday questions + - name: agentic tasks + description: coordinating multi-step workflows, device automation, scheduling, and task orchestration across channels + + # Claude — Anthropic's most capable model + # Best for complex reasoning, code, tool use, and evaluation + - model: anthropic/claude-sonnet-4-5 + access_key: $ANTHROPIC_API_KEY + routing_preferences: + - name: testing and evaluation + description: writing tests, running evaluations, QA checks, verifying correctness, and debugging failures + - name: code generation + description: generating code, writing scripts, implementing functions, and building tool integrations + - name: complex reasoning + description: multi-step analysis, planning, architectural decisions, and deep problem-solving + +tracing: + random_sampling: 100 diff --git a/demos/llm_routing/openclaw_routing/docker-compose.yaml b/demos/llm_routing/openclaw_routing/docker-compose.yaml new file mode 100644 index 00000000..9828cd17 --- /dev/null +++ b/demos/llm_routing/openclaw_routing/docker-compose.yaml @@ -0,0 +1,8 @@ +services: + jaeger: + build: + context: ../../shared/jaeger + ports: + - "16686:16686" + - "4317:4317" + - "4318:4318" diff --git a/demos/llm_routing/openclaw_routing/run_demo.sh b/demos/llm_routing/openclaw_routing/run_demo.sh new file mode 100755 index 00000000..6654d621 --- /dev/null +++ b/demos/llm_routing/openclaw_routing/run_demo.sh @@ -0,0 +1,59 @@ +#!/bin/bash +set -e + +echo "=== OpenClaw + Plano Routing Demo ===" + +# Check prerequisites +command -v docker >/dev/null || { echo "Error: Docker not found"; exit 1; } +command -v ollama >/dev/null || { echo "Error: Ollama not found. Install from https://ollama.com"; exit 1; } + +# Check/create .env file +if [ -f ".env" ]; then + echo ".env file already exists" +else + if [ -z "${MOONSHOT_API_KEY:-}" ]; then + echo "Error: MOONSHOT_API_KEY not set" + exit 1 + fi + if [ -z "${ANTHROPIC_API_KEY:-}" ]; then + echo "Error: ANTHROPIC_API_KEY not set" + exit 1 + fi + echo "Creating .env file..." + echo "MOONSHOT_API_KEY=$MOONSHOT_API_KEY" > .env + echo "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" >> .env +fi + +# Pull Arch-Router model if needed +echo "Pulling Arch-Router model..." +ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M + +start_demo() { + # Start Jaeger for tracing + echo "Starting Jaeger..." + docker compose up -d + + # Start Plano gateway + echo "Starting Plano..." + planoai up --service plano --foreground +} + +stop_demo() { + docker compose down + planoai down +} + +if [ "${1:-}" == "down" ]; then + stop_demo +else + start_demo + echo "" + echo "=== Plano is running on http://localhost:12000 ===" + echo "=== Jaeger UI at http://localhost:16686 ===" + echo "" + echo "Configure OpenClaw to use Plano as its LLM endpoint:" + echo ' In ~/.openclaw/openclaw.json, set:' + echo ' { "agent": { "model": "kimi-k2.5", "baseURL": "http://127.0.0.1:12000/v1" } }' + echo "" + echo "Then run: openclaw onboard --install-daemon" +fi diff --git a/demos/llm_routing/openclaw_routing/test_routing.sh b/demos/llm_routing/openclaw_routing/test_routing.sh new file mode 100755 index 00000000..d630f920 --- /dev/null +++ b/demos/llm_routing/openclaw_routing/test_routing.sh @@ -0,0 +1,60 @@ +#!/usr/bin/env bash +set -euo pipefail + +PLANO_URL="http://localhost:12000/v1/chat/completions" + +echo "=== Testing Plano Routing Decisions ===" +echo "" + +# Scenario 1: General conversation -> should route to Kimi K2.5 +echo "--- Scenario 1: General Conversation (expect: Kimi K2.5) ---" +curl -s "$PLANO_URL" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "kimi-k2.5", + "messages": [{"role": "user", "content": "Hey! What is the weather like today? Can you tell me a fun fact?"}] + }' | jq '{model: .model, content: .choices[0].message.content[:100]}' +echo "" + +# Scenario 2: Agentic task -> should route to Kimi K2.5 +echo "--- Scenario 2: Agentic Task (expect: Kimi K2.5) ---" +curl -s "$PLANO_URL" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "kimi-k2.5", + "messages": [{"role": "user", "content": "Schedule a reminder for tomorrow at 9am to review the pull request, then send a message to the team Slack channel about the deployment."}] + }' | jq '{model: .model, content: .choices[0].message.content[:100]}' +echo "" + +# Scenario 3: Code generation -> should route to Claude +echo "--- Scenario 3: Code Generation (expect: Claude) ---" +curl -s "$PLANO_URL" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "kimi-k2.5", + "messages": [{"role": "user", "content": "Write a Python function that implements a rate limiter using the token bucket algorithm with async support."}] + }' | jq '{model: .model, content: .choices[0].message.content[:100]}' +echo "" + +# Scenario 4: Testing/evaluation -> should route to Claude +echo "--- Scenario 4: Testing & Evaluation (expect: Claude) ---" +curl -s "$PLANO_URL" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "kimi-k2.5", + "messages": [{"role": "user", "content": "Write unit tests for this authentication middleware. Test edge cases: expired tokens, malformed headers, missing credentials, and concurrent requests."}] + }' | jq '{model: .model, content: .choices[0].message.content[:100]}' +echo "" + +# Scenario 5: Complex reasoning -> should route to Claude +echo "--- Scenario 5: Complex Reasoning (expect: Claude) ---" +curl -s "$PLANO_URL" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "kimi-k2.5", + "messages": [{"role": "user", "content": "Analyze the trade-offs between using WebSockets vs SSE vs long-polling for real-time notifications in a distributed messaging system with 10K concurrent users."}] + }' | jq '{model: .model, content: .choices[0].message.content[:100]}' +echo "" + +echo "=== Check Plano logs for MODEL_RESOLUTION details ===" +echo "Run: docker logs plano 2>&1 | grep MODEL_RESOLUTION"