adding code snippets in a single place for newsletter (#569)

* adding code snippets in a single place for newsletter * fixing README and run_demo.sh * renaming branch --------- Co-authored-by: Salman Paracha <salmanparacha@MacBook-Pro-136.local>
2026-07-23 16:51:04 +02:00 · 2025-09-17 01:06:06 -07:00 · 2025-09-17 01:06:06 -07:00 · b56311f458
commit b56311f458
parent 3eb6af8829
7 changed files with 3057 additions and 0 deletions
--- a/demos/use_cases/model_choice_with_test_harness/README.md
+++ b/demos/use_cases/model_choice_with_test_harness/README.md
@ -0,0 +1,119 @@
+# Model Choice Newsletter Demo
+
+This folder demonstrates a practical workflow for rapid model adoption and safe model switching using Arch Gateway (`archgw`). It includes both a minimal test harness and a sample proxy configuration.
+
+---
+
+## Step-by-Step Walkthrough: Adopting New Models
+
+### Part 1 — Testing Infrastructure
+
+**Goal:** Quickly evaluate candidate models for a task using a repeatable, automated harness.
+
+#### 1. Write Test Fixtures
+
+Create a YAML file (`evals_summarize.yaml`) with real examples for your task. Each fixture includes:
+- `input`: The prompt or scenario.
+- `must_include`: List of anchor words that must appear in the output.
+- `schema`: The expected output schema.
+
+Example:
+```yaml
+# evals_summarize.yaml
+task: summarize
+fixtures:
+  - id: sum-001
+    input: "Thread about a billing dispute…"
+    must_include: ["invoice"]
+    schema: SummarizeOut
+  - id: sum-002
+    input: "Thread about a shipping delay…"
+    must_include: ["status"]
+    schema: SummarizeOut
+```
+
+#### 2. Candidate Models
+
+List the model aliases (e.g., `arch.summarize.v1`, `arch.reason.v1`) you want to test. The harness will route requests through `archgw`, so you don’t need provider API keys in your code.
+
+#### 3. Minimal Python Harness
+
+See `bench.py` for a complete example. It:
+- Loads fixtures.
+- Sends requests to each candidate model via `archgw`.
+- Validates output against schema and anchor words.
+- Reports success rate and latency.
+
+Example usage:
+```sh
+poetry install
+python bench.py
+```
+
+**Benchmarks:**
+- ≥90% schema-valid
+- ≥80% anchors present
+- Latency within SLO
+- Cost within budget
+
+---
+
+### Part 2 — Network Infrastructure
+
+**Goal:** Use a proxy server (`archgw`) to decouple your app from vendor-specific model names and centralize control.
+
+#### Why Use a Proxy?
+
+- Consistent API across providers
+- Centralized key management
+- Unified logging, metrics, and guardrails
+- Intent-based model aliases (e.g., `arch.summarize.v1`)
+- Safe model promotions and rollbacks
+- Central governance and observability
+
+#### Example Proxy Config
+
+See `arch_config.yaml` for a sample configuration mapping aliases to provider models.
+
+---
+
+## How to Run This Demo
+
+1. **Install Poetry** (if not already installed):
+   ```sh
+   curl -sSL https://install.python-poetry.org | python3 -
+   ```
+
+2. **Install dependencies:**
+  - Install all dependencies as described in the main Arch README ([link](https://github.com/katanemo/arch/?tab=readme-ov-file#prerequisites))
+  - Then run
+    ```sh
+    poetry install
+    ```
+
+3. **Start Arch Gateway**
+   ```sh
+    run_demo.sh
+   ```
+
+4. **Run the test harness:**
+   ```sh
+   python bench.py
+   ```
+
+---
+
+## Files in This Folder
+
+- `bench.py` — Minimal Python test harness
+- `evals_summarize.yaml` — Example test fixtures
+- `pyproject.toml` — Poetry environment file
+- `arch_config.yaml` — Sample archgw config (if present)
+
+---
+
+## Troubleshooting
+
+- If you see `Success: 0/2 (0%)`, check your anchor words and prompt clarity.
+- Make sure archgw is running and accessible at `http://localhost:12000/`.
+- For schema validation errors, ensure your prompt instructs the model to output the correct JSON structure.
--- a/demos/use_cases/model_choice_with_test_harness/arch_config_with_aliases.yaml
+++ b/demos/use_cases/model_choice_with_test_harness/arch_config_with_aliases.yaml
@ -0,0 +1,22 @@
+version: v0.1.0
+
+listeners:
+  egress_traffic:
+    address: 0.0.0.0
+    port: 12000
+    message_format: openai
+    timeout: 30s
+
+llm_providers:
+  - model: openai/gpt-4o-mini
+    access_key: $OPENAI_API_KEY
+    default: true
+
+  - model: openai/o3
+    access_key: $OPENAI_API_KEY
+
+model_aliases:
+  arch.summarize.v1:
+    target: gpt-4o-mini
+  arch.reason.v1:
+    target: o3
--- a/demos/use_cases/model_choice_with_test_harness/bench.py
+++ b/demos/use_cases/model_choice_with_test_harness/bench.py
@ -0,0 +1,86 @@
+# bench.py
+import json, time, yaml, statistics as stats
+from pydantic import BaseModel, ValidationError
+from openai import OpenAI
+
+# archgw endpoint (keys are handled by archgw)
+client = OpenAI(base_url="http://localhost:12000/v1", api_key="n/a")
+MODELS = ["arch.summarize.v1", "arch.reason.v1"]
+FIXTURES = "evals_summarize.yaml"
+
+
+# Expected output shape
+class SummarizeOut(BaseModel):
+    title: str
+    bullets: list[str]
+    next_actions: list[str]
+
+
+def load_fixtures(path):
+    with open(path, "r") as f:
+        return yaml.safe_load(f)["fixtures"]
+
+
+def must_contain(text: str, anchors: list[str]) -> bool:
+    t = text.lower()
+    return all(a.lower() in t for a in anchors)
+
+
+def schema_fmt(model: type[BaseModel]):
+    return {"type": "json_object"}  # Simplified for broad compatibility
+
+
+def run_case(model, fx):
+    t0 = time.perf_counter()
+    schema = SummarizeOut.model_json_schema()
+    resp = client.chat.completions.create(
+        model=model,
+        messages=[
+            {
+                "role": "system",
+                "content": f"Be concise. Output valid JSON matching this schema:\n{json.dumps(schema)}",
+            },
+            {"role": "user", "content": fx["input"]},
+        ],
+        response_format=schema_fmt(SummarizeOut),
+    )
+    dt = time.perf_counter() - t0
+
+    content = resp.choices[0].message.content or "{}"
+    passed, reasons = True, []
+
+    try:
+        data = json.loads(content)
+    except:
+        return {"ok": False, "lat": dt, "why": "json decode"}
+
+    try:
+        SummarizeOut(**data)
+    except ValidationError:
+        passed = False
+        reasons.append("schema")
+    if not must_contain(json.dumps(data), fx.get("must_include", [])):
+        passed = False
+        reasons.append("anchors")
+
+    return {"ok": passed, "lat": dt, "why": ";".join(reasons)}
+
+
+def main():
+    fixtures = load_fixtures(FIXTURES)
+    for model in MODELS:
+        results = [run_case(model, fx) for fx in fixtures]
+        ok = sum(r["ok"] for r in results)
+        total = len(results)
+        latencies = [r["lat"] for r in results]
+
+        print(f"\n››› {model}")
+        print(f"  Success: {ok}/{total} ({ok/total:.0%})")
+        if latencies:
+            avg_lat = stats.mean(latencies)
+            p95_lat = stats.quantiles(latencies, n=100)[94]
+            print(f"  Latency (ms): avg={avg_lat*1000:.0f}, p95={p95_lat*1000:.0f}")
+
+
+if __name__ == "__main__":
+    main()
--- a/demos/use_cases/model_choice_with_test_harness/evals_summarize.yaml
+++ b/demos/use_cases/model_choice_with_test_harness/evals_summarize.yaml
@ -0,0 +1,11 @@
+# evals_summarize.yaml
+task: summarize
+fixtures:
+  - id: sum-001
+    input: "Thread about a billing dispute…"
+    must_include: ["invoice"]
+    schema: SummarizeOut
+  - id: sum-002
+    input: "Thread about a shipping delay…"
+    must_include: ["status"]
+    schema: SummarizeOut
--- a/demos/use_cases/model_choice_with_test_harness/poetry.lock
+++ b/demos/use_cases/model_choice_with_test_harness/poetry.lock
--- a/demos/use_cases/model_choice_with_test_harness/pyproject.toml
+++ b/demos/use_cases/model_choice_with_test_harness/pyproject.toml
@ -0,0 +1,22 @@
+[tool.poetry]
+name = "model-choice-newsletter-code-snippets"
+version = "0.1.0"
+description = "Benchmarking model alias routing with Arch Gateway."
+authors = ["Your Name <your@email.com>"]
+license = "Apache 2.0"
+readme = "README.md"
+package-mode = false
+
+[tool.poetry.dependencies]
+python = ">=3.10,<3.13.3"
+pydantic = "^2.0"
+openai = "^1.0"
+pyyaml = "^6.0"
+archgw ="^0.3.12"
+
+[tool.poetry.group.dev.dependencies]
+pytest = "^8.3"
+
+[build-system]
+requires = ["poetry-core>=1.0.0"]
+build-backend = "poetry.core.masonry.api"
--- a/demos/use_cases/model_choice_with_test_harness/run_demo.sh
+++ b/demos/use_cases/model_choice_with_test_harness/run_demo.sh
@ -0,0 +1,41 @@
+#!/bin/bash
+set -e
+
+# Function to start the demo
+start_demo() {
+  # Step 1: Check if .env file exists
+  if [ -f ".env" ]; then
+    echo ".env file already exists. Skipping creation."
+  else
+    # Step 2: Create `.env` file and set API keys
+    if [ -z "$OPENAI_API_KEY" ]; then
+      echo "Error: OPENAI_API_KEY environment variable is not set for the demo."
+      exit 1
+    fi
+    echo "Creating .env file..."
+    echo "OPENAI_API_KEY=$OPENAI_API_KEY" > .env
+    echo ".env file created with API keys."
+  fi
+
+  # Step 3: Start Arch
+  echo "Starting Arch with arch_config_with_aliases.yaml..."
+  archgw up arch_config_with_aliases.yaml
+
+  echo "\n\nArch started successfully."
+  echo "Please run the following command to test the setup: python bench.py\n"
+}
+
+# Function to stop the demo
+stop_demo() {
+  # Step 2: Stop Arch
+  echo "Stopping Arch..."
+  archgw down
+}
+
+# Main script logic
+if [ "$1" == "down" ]; then
+  stop_demo
+else
+  # Default action is to bring the demo up
+  start_demo
+fi