adding code snippets in a single place for newsletter (#569)

* adding code snippets in a single place for newsletter

* fixing README and run_demo.sh

* renaming branch

---------

Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-136.local>
This commit is contained in:
Salman Paracha 2025-09-17 01:06:06 -07:00 committed by GitHub
parent 3eb6af8829
commit b56311f458
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 3057 additions and 0 deletions

View file

@ -0,0 +1,119 @@
# Model Choice Newsletter Demo
This folder demonstrates a practical workflow for rapid model adoption and safe model switching using Arch Gateway (`archgw`). It includes both a minimal test harness and a sample proxy configuration.
---
## Step-by-Step Walkthrough: Adopting New Models
### Part 1 — Testing Infrastructure
**Goal:** Quickly evaluate candidate models for a task using a repeatable, automated harness.
#### 1. Write Test Fixtures
Create a YAML file (`evals_summarize.yaml`) with real examples for your task. Each fixture includes:
- `input`: The prompt or scenario.
- `must_include`: List of anchor words that must appear in the output.
- `schema`: The expected output schema.
Example:
```yaml
# evals_summarize.yaml
task: summarize
fixtures:
- id: sum-001
input: "Thread about a billing dispute…"
must_include: ["invoice"]
schema: SummarizeOut
- id: sum-002
input: "Thread about a shipping delay…"
must_include: ["status"]
schema: SummarizeOut
```
#### 2. Candidate Models
List the model aliases (e.g., `arch.summarize.v1`, `arch.reason.v1`) you want to test. The harness will route requests through `archgw`, so you dont need provider API keys in your code.
#### 3. Minimal Python Harness
See `bench.py` for a complete example. It:
- Loads fixtures.
- Sends requests to each candidate model via `archgw`.
- Validates output against schema and anchor words.
- Reports success rate and latency.
Example usage:
```sh
poetry install
python bench.py
```
**Benchmarks:**
- ≥90% schema-valid
- ≥80% anchors present
- Latency within SLO
- Cost within budget
---
### Part 2 — Network Infrastructure
**Goal:** Use a proxy server (`archgw`) to decouple your app from vendor-specific model names and centralize control.
#### Why Use a Proxy?
- Consistent API across providers
- Centralized key management
- Unified logging, metrics, and guardrails
- Intent-based model aliases (e.g., `arch.summarize.v1`)
- Safe model promotions and rollbacks
- Central governance and observability
#### Example Proxy Config
See `arch_config.yaml` for a sample configuration mapping aliases to provider models.
---
## How to Run This Demo
1. **Install Poetry** (if not already installed):
```sh
curl -sSL https://install.python-poetry.org | python3 -
```
2. **Install dependencies:**
- Install all dependencies as described in the main Arch README ([link](https://github.com/katanemo/arch/?tab=readme-ov-file#prerequisites))
- Then run
```sh
poetry install
```
3. **Start Arch Gateway**
```sh
run_demo.sh
```
4. **Run the test harness:**
```sh
python bench.py
```
---
## Files in This Folder
- `bench.py` — Minimal Python test harness
- `evals_summarize.yaml` — Example test fixtures
- `pyproject.toml` — Poetry environment file
- `arch_config.yaml` — Sample archgw config (if present)
---
## Troubleshooting
- If you see `Success: 0/2 (0%)`, check your anchor words and prompt clarity.
- Make sure archgw is running and accessible at `http://localhost:12000/`.
- For schema validation errors, ensure your prompt instructs the model to output the correct JSON structure.

View file

@ -0,0 +1,22 @@
version: v0.1.0
listeners:
egress_traffic:
address: 0.0.0.0
port: 12000
message_format: openai
timeout: 30s
llm_providers:
- model: openai/gpt-4o-mini
access_key: $OPENAI_API_KEY
default: true
- model: openai/o3
access_key: $OPENAI_API_KEY
model_aliases:
arch.summarize.v1:
target: gpt-4o-mini
arch.reason.v1:
target: o3

View file

@ -0,0 +1,86 @@
# bench.py
import json, time, yaml, statistics as stats
from pydantic import BaseModel, ValidationError
from openai import OpenAI
# archgw endpoint (keys are handled by archgw)
client = OpenAI(base_url="http://localhost:12000/v1", api_key="n/a")
MODELS = ["arch.summarize.v1", "arch.reason.v1"]
FIXTURES = "evals_summarize.yaml"
# Expected output shape
class SummarizeOut(BaseModel):
title: str
bullets: list[str]
next_actions: list[str]
def load_fixtures(path):
with open(path, "r") as f:
return yaml.safe_load(f)["fixtures"]
def must_contain(text: str, anchors: list[str]) -> bool:
t = text.lower()
return all(a.lower() in t for a in anchors)
def schema_fmt(model: type[BaseModel]):
return {"type": "json_object"} # Simplified for broad compatibility
def run_case(model, fx):
t0 = time.perf_counter()
schema = SummarizeOut.model_json_schema()
resp = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": f"Be concise. Output valid JSON matching this schema:\n{json.dumps(schema)}",
},
{"role": "user", "content": fx["input"]},
],
response_format=schema_fmt(SummarizeOut),
)
dt = time.perf_counter() - t0
content = resp.choices[0].message.content or "{}"
passed, reasons = True, []
try:
data = json.loads(content)
except:
return {"ok": False, "lat": dt, "why": "json decode"}
try:
SummarizeOut(**data)
except ValidationError:
passed = False
reasons.append("schema")
if not must_contain(json.dumps(data), fx.get("must_include", [])):
passed = False
reasons.append("anchors")
return {"ok": passed, "lat": dt, "why": ";".join(reasons)}
def main():
fixtures = load_fixtures(FIXTURES)
for model in MODELS:
results = [run_case(model, fx) for fx in fixtures]
ok = sum(r["ok"] for r in results)
total = len(results)
latencies = [r["lat"] for r in results]
print(f"\n {model}")
print(f" Success: {ok}/{total} ({ok/total:.0%})")
if latencies:
avg_lat = stats.mean(latencies)
p95_lat = stats.quantiles(latencies, n=100)[94]
print(f" Latency (ms): avg={avg_lat*1000:.0f}, p95={p95_lat*1000:.0f}")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,11 @@
# evals_summarize.yaml
task: summarize
fixtures:
- id: sum-001
input: "Thread about a billing dispute…"
must_include: ["invoice"]
schema: SummarizeOut
- id: sum-002
input: "Thread about a shipping delay…"
must_include: ["status"]
schema: SummarizeOut

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,22 @@
[tool.poetry]
name = "model-choice-newsletter-code-snippets"
version = "0.1.0"
description = "Benchmarking model alias routing with Arch Gateway."
authors = ["Your Name <your@email.com>"]
license = "Apache 2.0"
readme = "README.md"
package-mode = false
[tool.poetry.dependencies]
python = ">=3.10,<3.13.3"
pydantic = "^2.0"
openai = "^1.0"
pyyaml = "^6.0"
archgw ="^0.3.12"
[tool.poetry.group.dev.dependencies]
pytest = "^8.3"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

View file

@ -0,0 +1,41 @@
#!/bin/bash
set -e
# Function to start the demo
start_demo() {
# Step 1: Check if .env file exists
if [ -f ".env" ]; then
echo ".env file already exists. Skipping creation."
else
# Step 2: Create `.env` file and set API keys
if [ -z "$OPENAI_API_KEY" ]; then
echo "Error: OPENAI_API_KEY environment variable is not set for the demo."
exit 1
fi
echo "Creating .env file..."
echo "OPENAI_API_KEY=$OPENAI_API_KEY" > .env
echo ".env file created with API keys."
fi
# Step 3: Start Arch
echo "Starting Arch with arch_config_with_aliases.yaml..."
archgw up arch_config_with_aliases.yaml
echo "\n\nArch started successfully."
echo "Please run the following command to test the setup: python bench.py\n"
}
# Function to stop the demo
stop_demo() {
# Step 2: Stop Arch
echo "Stopping Arch..."
archgw down
}
# Main script logic
if [ "$1" == "down" ]; then
stop_demo
else
# Default action is to bring the demo up
start_demo
fi