plano/demos/llm_routing/model_routing_service/prometheus.yaml at a7903d9271fb23523ba341ec7db4997314590687 - apunkt/plano - bitfreedom.net: free all bits, everywhere

apunkt/plano

mirror of https://github.com/katanemo/plano.git synced 2026-05-01 20:03:40 +02:00

Adil Hafeez bd7afd911e add DigitalOcean pricing, startup validation, and demo update

- MetricsSource::DigitalOceanPricing variant: fetch public DO Gen-AI pricing, normalize as lowercase(creator)/model_id, cost = input + output per million
- cost_metrics endpoint format updated to { "model": { "input_per_million": X, "output_per_million": Y } }
- Startup errors: prefer:cheapest requires cost source, prefer:fastest requires prometheus
- Startup warning: models with no pricing/latency data ranked last
- One-per-type enforcement: digitalocean_pricing; error if cost_metrics + digitalocean_pricing both configured
- cost_snapshot() / latency_snapshot() on ModelMetricsService for startup checks
- Demo config updated to v0.4.0 top-level routing_preferences with cheapest + fastest policies
- docker-compose.yaml + prometheus.yaml + metrics_server.py for demo latency metrics
- Schema and docs updated

2026-03-27 16:54:37 -07:00

8 lines

144 B

YAML

Raw Blame History

 global:
   scrape_interval: 15s
 scrape_configs:
   - job_name: model_latency
     static_configs:
       - targets:
           - model-metrics:8080