Merge branch 'main' into adil/agent_format

2026-06-23 15:38:07 +02:00 · 2025-09-30 11:39:34 -07:00 · 2025-09-30 11:39:34 -07:00 · 2cebc0c85f
commit 2cebc0c85f
parent 0acd2a9a5e 7df1b8cdb0
33 changed files with 1369 additions and 421 deletions
--- a/demos/use_cases/claude_code/README.md
+++ b/demos/use_cases/claude_code/README.md
@ -0,0 +1,146 @@
+# Claude Code Router - Multi-Model Access with Intelligent Routing
+
+Arch Gateway extends Claude Code to access multiple LLM providers through a single interface. Offering two key benefits:
+
+1. **Access to Models**: Connect to Grok, Mistral, Gemini, DeepSeek, GPT models, Claude, and local models via Ollama
+2. **Intelligent Routing via Preferences for Coding Tasks**: Configure which models handle specific development tasks:
+   - Code generation and implementation
+   - Code reviews and analysis
+   - Architecture and system design
+   - Debugging and optimization
+   - Documentation and explanations
+
+Uses a [1.5B preference-aligned router LLM](https://arxiv.org/abs/2506.16655) to automatically select the best model based on your request type.
+
+## Benefits
+
+- **Single Interface**: Access multiple LLM providers through the same Claude Code CLI
+- **Task-Aware Routing**: Requests are analyzed and routed to models based on task type (code generation, debugging, architecture, documentation)
+- **Provider Flexibility**: Add or remove LLM providers without changing your workflow
+- **Routing Transparency**: See which model handles each request and why
+
+## How It Works
+
+Arch Gateway sits between Claude Code and multiple LLM providers, analyzing each request to route it to the most suitable model:
+
+```
+Your Request → Arch Gateway → Suitable Model → Response
+             ↓
+    [Task Analysis & Model Selection]
+```
+
+**Supported Providers**: OpenAI-compatible, Anthropic, DeepSeek, Grok, Gemini, Llama, Mistral, local models via Ollama. See [full list of supported providers](https://docs.archgw.com/concepts/llm_providers/supported_providers.html).
+
+
+## Quick Start (5 minutes)
+
+### Prerequisites
+```bash
+# Install Claude Code if you haven't already
+npm install -g @anthropic-ai/claude-code
+
+# Ensure Docker is running
+docker --version
+```
+
+### Step 1: Get Configuration
+```bash
+# Clone and navigate to demo
+git clone https://github.com/katanemo/arch.git
+cd arch/demos/use_cases/claude_code
+```
+
+### Step 2: Set API Keys
+```bash
+# Copy the sample environment file
+cp .env .env.local
+
+# Edit with your actual API keys
+export OPENAI_API_KEY="your-openai-key-here"
+export ANTHROPIC_API_KEY="your-anthropic-key-here"
+# Add other providers as needed
+```
+
+### Step 3: Start Arch Gateway
+```bash
+# Install and start the gateway
+pip install archgw
+archgw up
+```
+
+### Step 4: Launch Enhanced Claude Code
+```bash
+# This will launch Claude Code with multi-model routing
+archgw cli-agent claude
+```
+![claude code](claude_code.png)
+
+### Monitor Model Selection in Real-Time
+
+While using Claude Code, open a **second terminal** and run this helper script to watch routing decisions. This script shows you:
+- **Which model** was selected for each request
+- **Real-time routing decisions** as you work
+
+```bash
+# In a new terminal window (from the same directory)
+sh pretty_model_resolution.sh
+```
+![model_selection](model_selection.png)
+
+## Understanding the Configuration
+
+The `config.yaml` file defines your multi-model setup:
+
+```yaml
+llm_providers:
+  - model: openai/gpt-4.1-2025-04-14
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: code generation
+        description: generating new code snippets and functions
+
+  - model: anthropic/claude-3-5-sonnet-20241022
+    access_key: $ANTHROPIC_API_KEY
+    routing_preferences:
+      - name: code understanding
+        description: explaining and analyzing existing code
+```
+
+## Advanced Usage
+
+### Override Model Selection
+```bash
+# Force a specific model for this session
+archgw cli-agent claude --settings='{"ANTHROPIC_SMALL_FAST_MODEL": "deepseek-coder-v2"}'
+
+### Environment Variables
+The system automatically configures these variables for Claude Code:
+```bash
+ANTHROPIC_BASE_URL=http://127.0.0.1:12000  # Routes through Arch Gateway
+ANTHROPIC_SMALL_FAST_MODEL=arch.claude.code.small.fast    # Uses intelligent alias
+```
+
+### Custom Routing Configuration
+Edit `config.yaml` to define custom task→model mappings:
+
+```yaml
+llm_providers:
+  # OpenAI Models
+  - model: openai/gpt-5-2025-08-07
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: code generation
+        description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+
+  - model: openai/gpt-4.1-2025-04-14
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: code understanding
+        description: understand and explain existing code snippets, functions, or libraries
+```
+
+## Technical Details
+
+**How routing works:** Arch intercepts Claude Code requests, analyzes the content using preference-aligned routing, and forwards to the configured model.
+**Research foundation:** Built on our research in [Preference-Aligned LLM Routing](https://arxiv.org/abs/2506.16655)
+**Documentation:** [docs.archgw.com](https://docs.archgw.com) for advanced configuration and API details.
--- a/demos/use_cases/claude_code/claude_code.png
+++ b/demos/use_cases/claude_code/claude_code.png
--- a/demos/use_cases/claude_code/config.yaml
+++ b/demos/use_cases/claude_code/config.yaml
@ -0,0 +1,41 @@
+version: v0.1
+
+listeners:
+  egress_traffic:
+    address: 0.0.0.0
+    port: 12000
+    message_format: openai
+    timeout: 30s
+
+llm_providers:
+  # OpenAI Models
+  - model: openai/gpt-5-2025-08-07
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: code generation
+        description: generating new code snippets, functions, or boilerplate based on user prompts or requirements
+
+  - model: openai/gpt-4.1-2025-04-14
+    access_key: $OPENAI_API_KEY
+    routing_preferences:
+      - name: code understanding
+        description: understand and explain existing code snippets, functions, or libraries
+
+  # Anthropic Models
+  - model: anthropic/claude-sonnet-4-5
+    default: true
+    access_key: $ANTHROPIC_API_KEY
+
+  - model: anthropic/claude-3-haiku-20240307
+    access_key: $ANTHROPIC_API_KEY
+
+  # Ollama Models
+  - model: ollama/llama3.1
+    base_url: http://host.docker.internal:11434
+
+
+# Model aliases - friendly names that map to actual provider names
+model_aliases:
+  # Alias for a small faster Claude model
+  arch.claude.code.small.fast:
+    target: claude-3-haiku-20240307
--- a/demos/use_cases/claude_code/model_selection.png
+++ b/demos/use_cases/claude_code/model_selection.png
--- a/demos/use_cases/claude_code/pretty_model_resolution.sh
+++ b/demos/use_cases/claude_code/pretty_model_resolution.sh
@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+# Pretty-print ArchGW MODEL_RESOLUTION lines from docker logs
+# - hides Arch-Router
+# - prints timestamp
+# - colors MODEL_RESOLUTION red
+# - colors req_model cyan
+# - colors resolved_model magenta
+# - removes provider and streaming
+
+docker logs -f archgw 2>&1 \
+| awk '
+/MODEL_RESOLUTION:/ && $0 !~ /Arch-Router/ {
+  # extract timestamp between first [ and ]
+  ts=""
+  if (match($0, /\[[0-9-]+ [0-9:.]+\]/)) {
+    ts=substr($0, RSTART+1, RLENGTH-2)
+  }
+
+  # split out after MODEL_RESOLUTION:
+  n = split($0, parts, /MODEL_RESOLUTION: */)
+  line = parts[2]
+
+  # remove provider and streaming fields
+  sub(/ *provider='\''[^'\'']+'\''/, "", line)
+  sub(/ *streaming=(true|false)/, "", line)
+
+  # highlight fields
+  gsub(/req_model='\''[^'\'']+'\''/, "\033[36m&\033[0m", line)
+  gsub(/resolved_model='\''[^'\'']+'\''/, "\033[35m&\033[0m", line)
+
+  # print timestamp + MODEL_RESOLUTION
+  printf "\033[90m[%s]\033[0m \033[31mMODEL_RESOLUTION\033[0m: %s\n", ts, line
+}'
--- a/demos/use_cases/model_alias_routing/arch_config_with_aliases.yaml
+++ b/demos/use_cases/model_alias_routing/arch_config_with_aliases.yaml
@ -24,7 +24,7 @@ llm_providers:
    access_key: $OPENAI_API_KEY

  # Anthropic Models
-  - model: anthropic/claude-3-5-sonnet-20241022
+  - model: anthropic/claude-sonnet-4-20250514
    access_key: $ANTHROPIC_API_KEY

  - model: anthropic/claude-3-haiku-20240307
@ -56,7 +56,7 @@ model_aliases:

  # Alias for creative tasks -> Claude model
  arch.creative.v1:
-    target: claude-3-5-sonnet-20241022
+    target: claude-sonnet-4-20250514

  # Alias for quick responses -> fast model
  arch.fast.v1:
@ -67,7 +67,7 @@ model_aliases:
    target: gpt-5-mini-2025-08-07

  chat-model:
-    target: llama3.1
+    target: gpt-5-mini-2025-08-07

  creative-model:
-    target: claude-3-5-sonnet-20241022
+    target: claude-sonnet-4-20250514
--- a/demos/use_cases/model_choice_with_test_harness/pyproject.toml
+++ b/demos/use_cases/model_choice_with_test_harness/pyproject.toml
@ -12,7 +12,7 @@ python = ">=3.10,<3.13.3"
 pydantic = "^2.0"
 openai = "^1.0"
 pyyaml = "^6.0"
-archgw ="^0.3.13"
+archgw ="^0.3.14"

 [tool.poetry.group.dev.dependencies]
 pytest = "^8.3"
--- a/demos/use_cases/preference_based_routing/README.md
+++ b/demos/use_cases/preference_based_routing/README.md
@ -14,9 +14,9 @@ Make sure your machine is up to date with [latest version of archgw]([url](https
 2. start archgw in the foreground
 ```bash
 (venv) $ archgw up --service archgw --foreground
-2025-05-30 18:00:09,953 - cli.main - INFO - Starting archgw cli version: 0.3.13
+2025-05-30 18:00:09,953 - cli.main - INFO - Starting archgw cli version: 0.3.14
 2025-05-30 18:00:09,953 - cli.main - INFO - Validating /Users/adilhafeez/src/intelligent-prompt-gateway/demos/use_cases/preference_based_routing/arch_config.yaml
-2025-05-30 18:00:10,422 - cli.core - INFO - Starting arch gateway, image name: archgw, tag: katanemo/archgw:0.3.13
+2025-05-30 18:00:10,422 - cli.core - INFO - Starting arch gateway, image name: archgw, tag: katanemo/archgw:0.3.14
 2025-05-30 18:00:10,662 - cli.core - INFO - archgw status: running, health status: starting
 2025-05-30 18:00:11,712 - cli.core - INFO - archgw status: running, health status: starting
 2025-05-30 18:00:12,761 - cli.core - INFO - archgw is running and is healthy!