mirror of
https://github.com/katanemo/plano.git
synced 2026-06-08 14:55:14 +02:00
adding code snippets in a single place for newsletter (#569)
* adding code snippets in a single place for newsletter * fixing README and run_demo.sh * renaming branch --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-136.local>
This commit is contained in:
parent
3eb6af8829
commit
b56311f458
7 changed files with 3057 additions and 0 deletions
119
demos/use_cases/model_choice_with_test_harness/README.md
Normal file
119
demos/use_cases/model_choice_with_test_harness/README.md
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
# Model Choice Newsletter Demo
|
||||
|
||||
This folder demonstrates a practical workflow for rapid model adoption and safe model switching using Arch Gateway (`archgw`). It includes both a minimal test harness and a sample proxy configuration.
|
||||
|
||||
---
|
||||
|
||||
## Step-by-Step Walkthrough: Adopting New Models
|
||||
|
||||
### Part 1 — Testing Infrastructure
|
||||
|
||||
**Goal:** Quickly evaluate candidate models for a task using a repeatable, automated harness.
|
||||
|
||||
#### 1. Write Test Fixtures
|
||||
|
||||
Create a YAML file (`evals_summarize.yaml`) with real examples for your task. Each fixture includes:
|
||||
- `input`: The prompt or scenario.
|
||||
- `must_include`: List of anchor words that must appear in the output.
|
||||
- `schema`: The expected output schema.
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
# evals_summarize.yaml
|
||||
task: summarize
|
||||
fixtures:
|
||||
- id: sum-001
|
||||
input: "Thread about a billing dispute…"
|
||||
must_include: ["invoice"]
|
||||
schema: SummarizeOut
|
||||
- id: sum-002
|
||||
input: "Thread about a shipping delay…"
|
||||
must_include: ["status"]
|
||||
schema: SummarizeOut
|
||||
```
|
||||
|
||||
#### 2. Candidate Models
|
||||
|
||||
List the model aliases (e.g., `arch.summarize.v1`, `arch.reason.v1`) you want to test. The harness will route requests through `archgw`, so you don’t need provider API keys in your code.
|
||||
|
||||
#### 3. Minimal Python Harness
|
||||
|
||||
See `bench.py` for a complete example. It:
|
||||
- Loads fixtures.
|
||||
- Sends requests to each candidate model via `archgw`.
|
||||
- Validates output against schema and anchor words.
|
||||
- Reports success rate and latency.
|
||||
|
||||
Example usage:
|
||||
```sh
|
||||
poetry install
|
||||
python bench.py
|
||||
```
|
||||
|
||||
**Benchmarks:**
|
||||
- ≥90% schema-valid
|
||||
- ≥80% anchors present
|
||||
- Latency within SLO
|
||||
- Cost within budget
|
||||
|
||||
---
|
||||
|
||||
### Part 2 — Network Infrastructure
|
||||
|
||||
**Goal:** Use a proxy server (`archgw`) to decouple your app from vendor-specific model names and centralize control.
|
||||
|
||||
#### Why Use a Proxy?
|
||||
|
||||
- Consistent API across providers
|
||||
- Centralized key management
|
||||
- Unified logging, metrics, and guardrails
|
||||
- Intent-based model aliases (e.g., `arch.summarize.v1`)
|
||||
- Safe model promotions and rollbacks
|
||||
- Central governance and observability
|
||||
|
||||
#### Example Proxy Config
|
||||
|
||||
See `arch_config.yaml` for a sample configuration mapping aliases to provider models.
|
||||
|
||||
---
|
||||
|
||||
## How to Run This Demo
|
||||
|
||||
1. **Install Poetry** (if not already installed):
|
||||
```sh
|
||||
curl -sSL https://install.python-poetry.org | python3 -
|
||||
```
|
||||
|
||||
2. **Install dependencies:**
|
||||
- Install all dependencies as described in the main Arch README ([link](https://github.com/katanemo/arch/?tab=readme-ov-file#prerequisites))
|
||||
- Then run
|
||||
```sh
|
||||
poetry install
|
||||
```
|
||||
|
||||
3. **Start Arch Gateway**
|
||||
```sh
|
||||
run_demo.sh
|
||||
```
|
||||
|
||||
4. **Run the test harness:**
|
||||
```sh
|
||||
python bench.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files in This Folder
|
||||
|
||||
- `bench.py` — Minimal Python test harness
|
||||
- `evals_summarize.yaml` — Example test fixtures
|
||||
- `pyproject.toml` — Poetry environment file
|
||||
- `arch_config.yaml` — Sample archgw config (if present)
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- If you see `Success: 0/2 (0%)`, check your anchor words and prompt clarity.
|
||||
- Make sure archgw is running and accessible at `http://localhost:12000/`.
|
||||
- For schema validation errors, ensure your prompt instructs the model to output the correct JSON structure.
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
version: v0.1.0
|
||||
|
||||
listeners:
|
||||
egress_traffic:
|
||||
address: 0.0.0.0
|
||||
port: 12000
|
||||
message_format: openai
|
||||
timeout: 30s
|
||||
|
||||
llm_providers:
|
||||
- model: openai/gpt-4o-mini
|
||||
access_key: $OPENAI_API_KEY
|
||||
default: true
|
||||
|
||||
- model: openai/o3
|
||||
access_key: $OPENAI_API_KEY
|
||||
|
||||
model_aliases:
|
||||
arch.summarize.v1:
|
||||
target: gpt-4o-mini
|
||||
arch.reason.v1:
|
||||
target: o3
|
||||
86
demos/use_cases/model_choice_with_test_harness/bench.py
Normal file
86
demos/use_cases/model_choice_with_test_harness/bench.py
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
# bench.py
|
||||
import json, time, yaml, statistics as stats
|
||||
from pydantic import BaseModel, ValidationError
|
||||
from openai import OpenAI
|
||||
|
||||
# archgw endpoint (keys are handled by archgw)
|
||||
client = OpenAI(base_url="http://localhost:12000/v1", api_key="n/a")
|
||||
MODELS = ["arch.summarize.v1", "arch.reason.v1"]
|
||||
FIXTURES = "evals_summarize.yaml"
|
||||
|
||||
|
||||
# Expected output shape
|
||||
class SummarizeOut(BaseModel):
|
||||
title: str
|
||||
bullets: list[str]
|
||||
next_actions: list[str]
|
||||
|
||||
|
||||
def load_fixtures(path):
|
||||
with open(path, "r") as f:
|
||||
return yaml.safe_load(f)["fixtures"]
|
||||
|
||||
|
||||
def must_contain(text: str, anchors: list[str]) -> bool:
|
||||
t = text.lower()
|
||||
return all(a.lower() in t for a in anchors)
|
||||
|
||||
|
||||
def schema_fmt(model: type[BaseModel]):
|
||||
return {"type": "json_object"} # Simplified for broad compatibility
|
||||
|
||||
|
||||
def run_case(model, fx):
|
||||
t0 = time.perf_counter()
|
||||
schema = SummarizeOut.model_json_schema()
|
||||
resp = client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": f"Be concise. Output valid JSON matching this schema:\n{json.dumps(schema)}",
|
||||
},
|
||||
{"role": "user", "content": fx["input"]},
|
||||
],
|
||||
response_format=schema_fmt(SummarizeOut),
|
||||
)
|
||||
dt = time.perf_counter() - t0
|
||||
|
||||
content = resp.choices[0].message.content or "{}"
|
||||
passed, reasons = True, []
|
||||
|
||||
try:
|
||||
data = json.loads(content)
|
||||
except:
|
||||
return {"ok": False, "lat": dt, "why": "json decode"}
|
||||
|
||||
try:
|
||||
SummarizeOut(**data)
|
||||
except ValidationError:
|
||||
passed = False
|
||||
reasons.append("schema")
|
||||
if not must_contain(json.dumps(data), fx.get("must_include", [])):
|
||||
passed = False
|
||||
reasons.append("anchors")
|
||||
|
||||
return {"ok": passed, "lat": dt, "why": ";".join(reasons)}
|
||||
|
||||
|
||||
def main():
|
||||
fixtures = load_fixtures(FIXTURES)
|
||||
for model in MODELS:
|
||||
results = [run_case(model, fx) for fx in fixtures]
|
||||
ok = sum(r["ok"] for r in results)
|
||||
total = len(results)
|
||||
latencies = [r["lat"] for r in results]
|
||||
|
||||
print(f"\n››› {model}")
|
||||
print(f" Success: {ok}/{total} ({ok/total:.0%})")
|
||||
if latencies:
|
||||
avg_lat = stats.mean(latencies)
|
||||
p95_lat = stats.quantiles(latencies, n=100)[94]
|
||||
print(f" Latency (ms): avg={avg_lat*1000:.0f}, p95={p95_lat*1000:.0f}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -0,0 +1,11 @@
|
|||
# evals_summarize.yaml
|
||||
task: summarize
|
||||
fixtures:
|
||||
- id: sum-001
|
||||
input: "Thread about a billing dispute…"
|
||||
must_include: ["invoice"]
|
||||
schema: SummarizeOut
|
||||
- id: sum-002
|
||||
input: "Thread about a shipping delay…"
|
||||
must_include: ["status"]
|
||||
schema: SummarizeOut
|
||||
2756
demos/use_cases/model_choice_with_test_harness/poetry.lock
generated
Normal file
2756
demos/use_cases/model_choice_with_test_harness/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,22 @@
|
|||
[tool.poetry]
|
||||
name = "model-choice-newsletter-code-snippets"
|
||||
version = "0.1.0"
|
||||
description = "Benchmarking model alias routing with Arch Gateway."
|
||||
authors = ["Your Name <your@email.com>"]
|
||||
license = "Apache 2.0"
|
||||
readme = "README.md"
|
||||
package-mode = false
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.10,<3.13.3"
|
||||
pydantic = "^2.0"
|
||||
openai = "^1.0"
|
||||
pyyaml = "^6.0"
|
||||
archgw ="^0.3.12"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
pytest = "^8.3"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core>=1.0.0"]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
41
demos/use_cases/model_choice_with_test_harness/run_demo.sh
Normal file
41
demos/use_cases/model_choice_with_test_harness/run_demo.sh
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
# Function to start the demo
|
||||
start_demo() {
|
||||
# Step 1: Check if .env file exists
|
||||
if [ -f ".env" ]; then
|
||||
echo ".env file already exists. Skipping creation."
|
||||
else
|
||||
# Step 2: Create `.env` file and set API keys
|
||||
if [ -z "$OPENAI_API_KEY" ]; then
|
||||
echo "Error: OPENAI_API_KEY environment variable is not set for the demo."
|
||||
exit 1
|
||||
fi
|
||||
echo "Creating .env file..."
|
||||
echo "OPENAI_API_KEY=$OPENAI_API_KEY" > .env
|
||||
echo ".env file created with API keys."
|
||||
fi
|
||||
|
||||
# Step 3: Start Arch
|
||||
echo "Starting Arch with arch_config_with_aliases.yaml..."
|
||||
archgw up arch_config_with_aliases.yaml
|
||||
|
||||
echo "\n\nArch started successfully."
|
||||
echo "Please run the following command to test the setup: python bench.py\n"
|
||||
}
|
||||
|
||||
# Function to stop the demo
|
||||
stop_demo() {
|
||||
# Step 2: Stop Arch
|
||||
echo "Stopping Arch..."
|
||||
archgw down
|
||||
}
|
||||
|
||||
# Main script logic
|
||||
if [ "$1" == "down" ]; then
|
||||
stop_demo
|
||||
else
|
||||
# Default action is to bring the demo up
|
||||
start_demo
|
||||
fi
|
||||
Loading…
Add table
Add a link
Reference in a new issue