feat: cache backend clients per endpoint instead of building one (with a fresh SSL context) per request
All checks were successful
Build and Publish Docker Image (Semantic Cache) / build (amd64, linux/amd64, docker-amd64) (push) Successful in 3m59s
Build and Publish Docker Image / build (amd64, linux/amd64, docker-amd64) (push) Successful in 1m25s
Build and Publish Docker Image / build (arm64, linux/arm64, docker-arm64) (push) Successful in 12m46s
Build and Publish Docker Image / merge (push) Successful in 33s
Build and Publish Docker Image (Semantic Cache) / build (arm64, linux/arm64, docker-arm64) (push) Successful in 19m56s
Build and Publish Docker Image (Semantic Cache) / merge (push) Successful in 33s

This commit is contained in:
Alpha Nerd 2026-06-07 09:55:54 +02:00
parent 1ce792c48b
commit 3cd530586c
Signed by: alpha-nerd
SSH key fingerprint: SHA256:QkkAgVoYi9TQ0UKPkiKSfnerZy2h4qhi3SVPXJmBN+M
5 changed files with 87 additions and 15 deletions

View file

@ -80,8 +80,10 @@ def _patches(exc, mark_unhealthy):
stack.enter_context(patch("api.ollama.is_openai_compatible", lambda ep: False))
stack.enter_context(patch("api.ollama.decrement_usage", AsyncMock()))
stack.enter_context(patch("api.ollama._mark_backend_unhealthy", mark_unhealthy))
# The native path now fetches a cached client via get_ollama_client() rather
# than constructing ollama.AsyncClient inline, so patch that seam.
stack.enter_context(
patch("api.ollama.ollama.AsyncClient", lambda *a, **k: _FakeAsyncClient(exc))
patch("api.ollama.get_ollama_client", lambda *a, **k: _FakeAsyncClient(exc))
)
return stack