* - Refactored retry for rate limits into the base class
- ConsumerProducer is derived from Consumer to simplify code
- Added rate_limit_count metrics for rate limit events
* Add rate limit events to VertexAI and Google AI Studio
* Added Grafana rate limit dashboard
* Add rate limit handling to all LLMs
Added templates which produce K8s resources. With the provided GCP wrapper, it works on GCP K8s cluster. This isn't stable enough for other folks to use so will need more piloting before it can be documented and released.