b754daf1af
feat: after closing the probe session, reset
Build and Publish Docker Image (Semantic Cache) / build (amd64, linux/amd64, docker-amd64) (push) Successful in 3m52s
Build and Publish Docker Image / build (amd64, linux/amd64, docker-amd64) (push) Successful in 1m23s
Build and Publish Docker Image (Semantic Cache) / build (arm64, linux/arm64, docker-arm64) (push) Successful in 15m16s
Build and Publish Docker Image (Semantic Cache) / merge (push) Successful in 34s
Build and Publish Docker Image / build (arm64, linux/arm64, docker-arm64) (push) Successful in 11m59s
Build and Publish Docker Image / merge (push) Successful in 33s
2026-05-28 10:16:54 +02:00
820e217da6
fix: Lightweight health/introspection probes no longer compete with long-lived streaming completions for the proxy pool's per-host connection slots
2026-05-28 09:54:53 +02:00
4b5a70e787
refac: modularize apis VII
2026-05-19 14:57:39 +02:00
e74f5d1ba6
refac: request handling VI
2026-05-19 14:09:52 +02:00
8355bf9a1e
refac: modularize sse, routing, db and token handling V
2026-05-19 12:48:55 +02:00
3a9854c5db
refac: modularize backend IV
2026-05-19 12:05:51 +02:00
c88ba1e5a4
refac: modularize global states III
2026-05-19 11:18:06 +02:00
d2b31b6c7b
refac: modularize config II
2026-05-19 11:00:50 +02:00
90b6868f5a
refac: split into modules I
2026-05-19 10:05:27 +02:00
079b677e23
feat: completion errors on an endpoint:model key a caught, cached and rerouted (openai compatible endpoints)
PR Tests / test (pull_request) Successful in 57s
2026-05-18 18:14:28 +02:00
db6aa73903
fix:
...
PR Tests / test (pull_request) Successful in 58s
NYX Security Scan / nyx-scan (pull_request) Successful in 6m59s
- _fetch_loaded_models_internal now writes _loaded_error_cache[endpoint] = time.time() on /api/ps or /v1/models failure, and clears the entry on success
- choose_endpoint now filters out candidates with a fresh (<300s) loaded-models error.
- /health now probes both /api/version and /api/ps for Ollama endpoints
- dashboard adaption
relates to #83
2026-05-18 13:45:06 +02:00
6c869aa305
fix: mitigating WARNING router.py:1421:9 [cfg-resource-leak] — Image.open acquires file handle but not all exit paths release it
NYX Security Scan / nyx-scan (pull_request) Failing after 6m57s
2026-05-13 16:22:48 +02:00
84e3b30f2f
fix: removed dead config key
NYX Security Scan / nyx-scan (pull_request) Failing after 7m3s
2026-05-13 14:59:05 +02:00
ad0be90a70
fix: model naming for affinity status for llama endpoints
2026-05-13 14:35:45 +02:00
aa7ec6354a
feat: visualization of conversation affinity in dashboard
2026-05-13 13:38:37 +02:00
4acbaeb29c
fix: stopping background task properly on shutdown
2026-05-13 11:05:34 +02:00
27dfc07889
feat: add conversation-endpoint affinity to benefit from hot kv-caches if possible
2026-05-12 18:33:47 +02:00
90a54abc9b
feat: correct pass through of openai.APIStatusErrors
2026-05-08 12:19:03 +02:00
ecdd228a54
feat: better default referer handling
2026-05-08 12:15:51 +02:00
e296ac19ba
feat: new helper to bridge change of behaviour in llama.cpp v1/models status - now correctly reporting "sleeping" or "loaded" for auto-unload
2026-05-07 11:34:09 +02:00
182ddae539
fix: prevent dashboard and route hangs when endpoints are down by calling skip_error_cache also with reduced timeout
2026-05-01 13:49:34 +02:00
bbe7bd48c5
feat: populate error cache also from endpoint_details if necessary
2026-04-29 17:03:32 +02:00
5797615736
feat: enhance load balancing #23
2026-04-22 17:27:34 +02:00
5c64286b70
fix: correct user socket path
2026-04-17 13:31:19 +02:00
11637c9143
feat: support localhost llama_server access via unix sockets
2026-04-17 12:41:57 +02:00
1a2781ac23
fix: health check all endpoints with right per enpoint path
...
issue: resolving #24
2026-04-16 12:18:38 +02:00
1058f2418b
fix: security, exempt files to prevent path traversal
2026-04-10 17:40:44 +02:00
263c66aedd
feat: add hostname to dashboard
2026-04-10 17:29:43 +02:00
a432a65396
fix: params is never defined in ollama native backend
2026-04-08 13:01:56 +02:00
e7cd8d4d68
fix: usage locks now release before the subscriber queue awaits
2026-04-07 15:30:52 +02:00
2c87472483
fix: conditional to_thread for the image_transform to relieve threadpool pressure
2026-04-07 13:28:34 +02:00
81013ec3b1
fix: available_error_cache poisoning
2026-04-07 09:32:53 +02:00
5170162a80
fix: make image transform non-blocking
2026-04-07 09:18:12 +02:00
28afa4e9c0
fix: missing requirement
...
fix: strip assistant prefill when ollama -> openai translaton + openai guard
2026-04-06 11:32:47 +02:00
c0dc0a10af
fix: catch non-standard openai sdk error bodies for parsing
2026-03-12 19:08:01 +01:00
1e9996c393
fix: exclude embedding models from preemptive context shift caches
2026-03-12 18:56:51 +01:00
e416542bf8
fix: model name normalization for context_cash preemptive context-shifting for smaller context-windows with previous failure
2026-03-12 16:08:01 +01:00
be60a348e1
fix: changing error_cache to stale-while-revalidate same as available_models_cache
2026-03-12 14:47:54 +01:00
9acc37951a
feat: add reactive auto context-shift in openai endpoints to prevent recover from out of context errors
2026-03-12 10:15:52 +01:00
95c643109a
feat: add an openai retry if request with image is send to a pure text model
2026-03-12 10:06:18 +01:00
1ae989788b
fix(router): normalize multimodal input to extract text for embeddings
...
Extract text parts from multimodal payloads (lists/dicts).
Skip image_url and other non-text types to ensure embedding
models receive compatible text-only input.
2026-03-11 16:41:21 +01:00
fbdc73eebb
fix: improvements, fixes and opt-in cache
...
doc: semantic-cache.md added with detailed write-up
2026-03-10 15:19:37 +01:00
dd4b12da6a
feat: adding a semantic cache layer
2026-03-08 09:12:09 +01:00
b951cc82e3
bump version
2026-03-05 11:09:20 +01:00
8037706f0b
fix(db.py): remove full table scans with proper where clauses for dashboard statistics and calc in db rather than python
2026-03-03 17:20:33 +01:00
45315790d1
fix(router.py):
...
- added global for orphaned token_worker_task and flust_task
- fixed a regex to effectively _mask_secrets
- fixed several Type and KeyErrors
- fixed model deduplication for llama_server_endpoints
2026-03-03 16:34:16 +01:00
e96e890511
refactor: make choose_endpoint use cache incrementer for atomic updates
2026-03-03 14:57:37 +01:00
10c83c3e1e
fix(router): treat missing status as loaded for llama model check
...
Add check for `status is None` in `_is_llama_model_loaded`.
Models without a status field (e.g., single-model servers) are
assumed to be always loaded rather than failing the check.
Also updated docstring to clarify this behavior.
2026-03-02 08:54:46 +01:00
cac0580eec
feat: adding /v1/rerank endpoint with cohere,jina,llama.cpp compatibility
2026-02-28 09:31:25 +01:00
ad4a1d07b2
fix(/v1/embeddings): returning the async_gen forced FastAPI serialization which caused Pydantic Errors. Also sanizted nan/inf values to floats (0.0).
...
Use try - finally to properly decrement usage counters in case of error.
2026-02-27 16:39:27 +01:00