feat: add provider arbitrage policy and fallback routing

2026-06-23 15:38:07 +02:00 · 2026-03-18 15:54:49 -07:00 · 2026-03-18 15:54:49 -07:00 · 07ad4c6ae2
commit 07ad4c6ae2
parent de2d8847f3
10 changed files with 670 additions and 57 deletions
--- a/demos/llm_routing/gpu_free_tier_arbitrage/README.md
+++ b/demos/llm_routing/gpu_free_tier_arbitrage/README.md
@ -0,0 +1,38 @@
+# GPU Free-Tier Arbitrage Demo
+
+This demo package showcases provider-level free-tier-first routing and deterministic fallback using a local Plano endpoint on `localhost:12000`.
+
+## Files
+
+- `config.yaml` - demo Plano config with `arbitrage_policy`
+- `demo.rest` - runnable REST requests for IDE REST clients
+
+## Prerequisites
+
+Set API keys for providers used in this demo:
+
+- `OPENAI_API_KEY`
+- `GROQ_API_KEY`
+- `TOGETHER_API_KEY`
+
+## Run the demo
+
+From this directory:
+
+```bash
+planoai up config.yaml
+```
+
+Then run requests from `demo.rest` in your REST client.
+
+## What to show during the demo
+
+1. Run `free-tier-first showcase` and verify response success.
+2. Inspect logs/traces for provider selection reason and selected candidate.
+3. Force a retryable error on the first candidate (for example, temporarily invalid key), then run `fallback showcase`.
+4. Verify fallback metadata appears in traces/logs:
+   - `routing.selection_reason`
+   - `routing.is_fallback`
+   - `routing.fallback_trigger`
+   - `routing.next_candidate`
+   - `routing.upstream_endpoint`
--- a/demos/llm_routing/gpu_free_tier_arbitrage/config.yaml
+++ b/demos/llm_routing/gpu_free_tier_arbitrage/config.yaml
@ -0,0 +1,30 @@
+version: v0.3.0
+
+listeners:
+  - type: model
+    name: model_listener
+    port: 12000
+    max_retries: 1
+
+model_providers:
+  # Primary provider for the model.
+  - model: openai/gpt-5.2
+    # This is a failure key to test the arbitrage policy
+    access_key: $OPENAI_API_KEY_FAILURE
+    default: true
+    arbitrage_policy:
+      enabled: true
+      rank:
+        # Demo low-cost/free-tier candidates (ordered).
+        - ollama/qwen3:8b
+        - groq/llama-3.1-8b-instant
+
+  # Candidates referenced by arbitrage_policy.rank.
+  - model: groq/llama-3.1-8b-instant
+    access_key: $GROQ_API_KEY
+
+  - model: ollama/qwen3:8b
+    base_url: http://localhost:11434
+
+tracing:
+  random_sampling: 100
--- a/demos/llm_routing/gpu_free_tier_arbitrage/demo.rest
+++ b/demos/llm_routing/gpu_free_tier_arbitrage/demo.rest
@ -0,0 +1,31 @@
+@llm_endpoint = http://localhost:12000
+
+### free-tier-first showcase
+POST {{llm_endpoint}}/v1/chat/completions HTTP/1.1
+Content-Type: application/json
+
+{
+  "model": "gpt-5.2",
+  "stream": false,
+  "messages": [
+    {
+      "role": "user",
+      "content": "Reply with exactly: free-tier-first routing demo successful."
+    }
+  ]
+}
+
+### fallback showcase (run after forcing first candidate failure)
+POST {{llm_endpoint}}/v1/chat/completions HTTP/1.1
+Content-Type: application/json
+
+{
+  "model": "gpt-5.2",
+  "stream": false,
+  "messages": [
+    {
+      "role": "user",
+      "content": "Reply with exactly: fallback routing demo successful."
+    }
+  ]
+}