diff --git a/demos/llm_routing/openclaw_routing/README.md b/demos/llm_routing/openclaw_routing/README.md deleted file mode 100644 index 93cc5822..00000000 --- a/demos/llm_routing/openclaw_routing/README.md +++ /dev/null @@ -1,122 +0,0 @@ -# OpenClaw + Plano: Smart Model Routing for Personal AI Assistants - -OpenClaw is an open-source personal AI assistant that connects to WhatsApp, Telegram, Slack, and Discord. By pointing it at Plano instead of a single LLM provider, every message is automatically routed to the best model — conversational requests go to Kimi K2.5 (cost-effective), while code generation, testing, and complex reasoning go to Claude (most capable) — with zero application code changes. - -## Architecture - -``` -[WhatsApp / Telegram / Slack / Discord] - | - [OpenClaw Gateway] - ws://127.0.0.1:18789 - | - [Plano :12000] ──────────────> Kimi K2.5 (conversation, agentic tasks) - | $0.60/M input tokens - |──────────────────────> Claude (code, tests, reasoning) - | - [Arch-Router 1.5B] - (local via Ollama, ~200ms) -``` - -Plano's 1.5B [Arch-Router](https://arxiv.org/abs/2506.16655) model analyzes each prompt locally and selects the best backend based on configured routing preferences. - -## Prerequisites - -- **Docker** running -- **Ollama** installed ([ollama.com](https://ollama.com)) -- **Plano CLI**: `uv tool install planoai` or `pip install planoai` -- **OpenClaw**: `npm install -g openclaw@latest` -- **API keys**: - - `MOONSHOT_API_KEY` — from [Moonshot AI](https://platform.moonshot.cn/) - - `ANTHROPIC_API_KEY` — from [Anthropic](https://console.anthropic.com/) - -## Quick Start - -### 1. Set Environment Variables - -```bash -export MOONSHOT_API_KEY="your-moonshot-key" -export ANTHROPIC_API_KEY="your-anthropic-key" -``` - -### 2. Start the Demo - -```bash -cd demos/llm_routing/openclaw_routing -bash run_demo.sh -``` - -This will: -- Pull the Arch-Router model into Ollama -- Start Jaeger for tracing -- Start Plano on port 12000 - -### 3. Configure OpenClaw - -In `~/.openclaw/openclaw.json`, set: - -```json -{ - "agent": { - "model": "kimi-k2.5", - "baseURL": "http://127.0.0.1:12000/v1" - } -} -``` - -Then run: - -```bash -openclaw onboard --install-daemon -``` - -### 4. Test Routing - -Run the test script to verify routing decisions: - -```bash -bash test_routing.sh -``` - -## Demo Scenarios - -| # | Message | Expected Route | Why | -|---|---------|---------------|-----| -| 1 | "Hey, what's up? Tell me something interesting." | **Kimi K2.5** | General conversation — cheap and fast | -| 2 | "Remind me tomorrow at 9am and ping Slack about the deploy" | **Kimi K2.5** | Agentic multi-step task orchestration | -| 3 | "Write a Python rate limiter with the token bucket algorithm" | **Claude** | Code generation — needs precision | -| 4 | "Write unit tests for the auth middleware, cover edge cases" | **Claude** | Testing & evaluation — needs thoroughness | -| 5 | "Compare WebSockets vs SSE vs polling for 10K concurrent users" | **Claude** | Complex reasoning — needs deep analysis | - -OpenClaw's code doesn't change at all. It points at `http://127.0.0.1:12000/v1` instead of a direct provider URL. Plano's Arch-Router analyzes each prompt in ~200ms and picks the right backend. - -## Monitoring - -### Routing Decisions - -Watch Plano logs for model selection: - -```bash -docker logs plano 2>&1 | grep MODEL_RESOLUTION -``` - -### Jaeger Tracing - -Open [http://localhost:16686](http://localhost:16686) to see full traces of each request, including which model was selected and the routing latency. - -## Cost Impact - -For a personal assistant handling ~1000 requests/day with a 60/40 conversation-to-code split: - -| Without Plano (all Claude) | With Plano (routed) | -|---|---| -| 1000 req x Claude pricing | 600 req x Kimi K2.5 + 400 req x Claude | -| ~$3.00/day input tokens | ~$0.36 + $1.20 = **$1.56/day** (~48% savings) | - -Same quality where it matters (code, tests), lower cost where it doesn't (chat). - -## Stopping the Demo - -```bash -bash run_demo.sh down -``` diff --git a/demos/llm_routing/openclaw_routing/config.yaml b/demos/llm_routing/openclaw_routing/config.yaml deleted file mode 100644 index 8cbdde08..00000000 --- a/demos/llm_routing/openclaw_routing/config.yaml +++ /dev/null @@ -1,48 +0,0 @@ -version: v0.1.0 - -routing: - model: Arch-Router - llm_provider: arch-router - -listeners: - egress_traffic: - address: 0.0.0.0 - port: 12000 - message_format: openai - timeout: 30s - -llm_providers: - - # Arch Router - the 1.5B preference-aligned routing model (runs locally via Ollama) - - name: arch-router - model: arch/hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M - base_url: http://host.docker.internal:11434 - - # Kimi K2.5 — Moonshot AI's open model (1T MoE, 32B active params) - # Great for general conversation, agentic tasks, and multimodal work - # OpenAI-compatible API at $0.60/M input, $2.50/M output tokens - - model: openai/kimi-k2.5 - access_key: $MOONSHOT_API_KEY - base_url: https://api.moonshot.ai/v1 - provider_interface: openai - default: true - routing_preferences: - - name: general conversation - description: general chat, greetings, casual conversation, Q&A, and everyday questions - - name: agentic tasks - description: coordinating multi-step workflows, device automation, scheduling, and task orchestration across channels - - # Claude — Anthropic's most capable model - # Best for complex reasoning, code, tool use, and evaluation - - model: anthropic/claude-sonnet-4-5 - access_key: $ANTHROPIC_API_KEY - routing_preferences: - - name: testing and evaluation - description: writing tests, running evaluations, QA checks, verifying correctness, and debugging failures - - name: code generation - description: generating code, writing scripts, implementing functions, and building tool integrations - - name: complex reasoning - description: multi-step analysis, planning, architectural decisions, and deep problem-solving - -tracing: - random_sampling: 100 diff --git a/demos/llm_routing/openclaw_routing/docker-compose.yaml b/demos/llm_routing/openclaw_routing/docker-compose.yaml deleted file mode 100644 index 9828cd17..00000000 --- a/demos/llm_routing/openclaw_routing/docker-compose.yaml +++ /dev/null @@ -1,8 +0,0 @@ -services: - jaeger: - build: - context: ../../shared/jaeger - ports: - - "16686:16686" - - "4317:4317" - - "4318:4318" diff --git a/demos/llm_routing/openclaw_routing/run_demo.sh b/demos/llm_routing/openclaw_routing/run_demo.sh deleted file mode 100755 index 6654d621..00000000 --- a/demos/llm_routing/openclaw_routing/run_demo.sh +++ /dev/null @@ -1,59 +0,0 @@ -#!/bin/bash -set -e - -echo "=== OpenClaw + Plano Routing Demo ===" - -# Check prerequisites -command -v docker >/dev/null || { echo "Error: Docker not found"; exit 1; } -command -v ollama >/dev/null || { echo "Error: Ollama not found. Install from https://ollama.com"; exit 1; } - -# Check/create .env file -if [ -f ".env" ]; then - echo ".env file already exists" -else - if [ -z "${MOONSHOT_API_KEY:-}" ]; then - echo "Error: MOONSHOT_API_KEY not set" - exit 1 - fi - if [ -z "${ANTHROPIC_API_KEY:-}" ]; then - echo "Error: ANTHROPIC_API_KEY not set" - exit 1 - fi - echo "Creating .env file..." - echo "MOONSHOT_API_KEY=$MOONSHOT_API_KEY" > .env - echo "ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY" >> .env -fi - -# Pull Arch-Router model if needed -echo "Pulling Arch-Router model..." -ollama pull hf.co/katanemo/Arch-Router-1.5B.gguf:Q4_K_M - -start_demo() { - # Start Jaeger for tracing - echo "Starting Jaeger..." - docker compose up -d - - # Start Plano gateway - echo "Starting Plano..." - planoai up --service plano --foreground -} - -stop_demo() { - docker compose down - planoai down -} - -if [ "${1:-}" == "down" ]; then - stop_demo -else - start_demo - echo "" - echo "=== Plano is running on http://localhost:12000 ===" - echo "=== Jaeger UI at http://localhost:16686 ===" - echo "" - echo "Configure OpenClaw to use Plano as its LLM endpoint:" - echo ' In ~/.openclaw/openclaw.json, set:' - echo ' { "agent": { "model": "kimi-k2.5", "baseURL": "http://127.0.0.1:12000/v1" } }' - echo "" - echo "Then run: openclaw onboard --install-daemon" -fi diff --git a/demos/llm_routing/openclaw_routing/test_routing.sh b/demos/llm_routing/openclaw_routing/test_routing.sh deleted file mode 100755 index d630f920..00000000 --- a/demos/llm_routing/openclaw_routing/test_routing.sh +++ /dev/null @@ -1,60 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail - -PLANO_URL="http://localhost:12000/v1/chat/completions" - -echo "=== Testing Plano Routing Decisions ===" -echo "" - -# Scenario 1: General conversation -> should route to Kimi K2.5 -echo "--- Scenario 1: General Conversation (expect: Kimi K2.5) ---" -curl -s "$PLANO_URL" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "kimi-k2.5", - "messages": [{"role": "user", "content": "Hey! What is the weather like today? Can you tell me a fun fact?"}] - }' | jq '{model: .model, content: .choices[0].message.content[:100]}' -echo "" - -# Scenario 2: Agentic task -> should route to Kimi K2.5 -echo "--- Scenario 2: Agentic Task (expect: Kimi K2.5) ---" -curl -s "$PLANO_URL" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "kimi-k2.5", - "messages": [{"role": "user", "content": "Schedule a reminder for tomorrow at 9am to review the pull request, then send a message to the team Slack channel about the deployment."}] - }' | jq '{model: .model, content: .choices[0].message.content[:100]}' -echo "" - -# Scenario 3: Code generation -> should route to Claude -echo "--- Scenario 3: Code Generation (expect: Claude) ---" -curl -s "$PLANO_URL" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "kimi-k2.5", - "messages": [{"role": "user", "content": "Write a Python function that implements a rate limiter using the token bucket algorithm with async support."}] - }' | jq '{model: .model, content: .choices[0].message.content[:100]}' -echo "" - -# Scenario 4: Testing/evaluation -> should route to Claude -echo "--- Scenario 4: Testing & Evaluation (expect: Claude) ---" -curl -s "$PLANO_URL" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "kimi-k2.5", - "messages": [{"role": "user", "content": "Write unit tests for this authentication middleware. Test edge cases: expired tokens, malformed headers, missing credentials, and concurrent requests."}] - }' | jq '{model: .model, content: .choices[0].message.content[:100]}' -echo "" - -# Scenario 5: Complex reasoning -> should route to Claude -echo "--- Scenario 5: Complex Reasoning (expect: Claude) ---" -curl -s "$PLANO_URL" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "kimi-k2.5", - "messages": [{"role": "user", "content": "Analyze the trade-offs between using WebSockets vs SSE vs long-polling for real-time notifications in a distributed messaging system with 10K concurrent users."}] - }' | jq '{model: .model, content: .choices[0].message.content[:100]}' -echo "" - -echo "=== Check Plano logs for MODEL_RESOLUTION details ===" -echo "Run: docker logs plano 2>&1 | grep MODEL_RESOLUTION"