nomyo-router

Author	SHA1	Message	Date
alpha nerd	db6aa73903	fix: All checks were successful PR Tests / test (pull_request) Successful in 58s Details NYX Security Scan / nyx-scan (pull_request) Successful in 6m59s Details - _fetch_loaded_models_internal now writes _loaded_error_cache[endpoint] = time.time() on /api/ps or /v1/models failure, and clears the entry on success - choose_endpoint now filters out candidates with a fresh (<300s) loaded-models error. - /health now probes both /api/version and /api/ps for Ollama endpoints - dashboard adaption relates to #83	2026-05-18 13:45:06 +02:00
alpha nerd	6c869aa305	fix: mitigating WARNING router.py:1421:9 [cfg-resource-leak] — Image.open acquires file handle but not all exit paths release it Some checks failed NYX Security Scan / nyx-scan (pull_request) Failing after 6m57s Details	2026-05-13 16:22:48 +02:00
alpha nerd	84e3b30f2f	fix: removed dead config key Some checks failed NYX Security Scan / nyx-scan (pull_request) Failing after 7m3s Details	2026-05-13 14:59:05 +02:00
alpha nerd	ad0be90a70	fix: model naming for affinity status for llama endpoints	2026-05-13 14:35:45 +02:00
alpha nerd	aa7ec6354a	feat: visualization of conversation affinity in dashboard	2026-05-13 13:38:37 +02:00
alpha nerd	4acbaeb29c	fix: stopping background task properly on shutdown	2026-05-13 11:05:34 +02:00
alpha nerd	27dfc07889	feat: add conversation-endpoint affinity to benefit from hot kv-caches if possible	2026-05-12 18:33:47 +02:00
alpha nerd	90a54abc9b	feat: correct pass through of openai.APIStatusErrors	2026-05-08 12:19:03 +02:00
alpha nerd	ecdd228a54	feat: better default referer handling	2026-05-08 12:15:51 +02:00
alpha nerd	e296ac19ba	feat: new helper to bridge change of behaviour in llama.cpp v1/models status - now correctly reporting "sleeping" or "loaded" for auto-unload	2026-05-07 11:34:09 +02:00
alpha nerd	182ddae539	fix: prevent dashboard and route hangs when endpoints are down by calling skip_error_cache also with reduced timeout	2026-05-01 13:49:34 +02:00
alpha nerd	bbe7bd48c5	feat: populate error cache also from endpoint_details if necessary	2026-04-29 17:03:32 +02:00
alpha nerd	5797615736	feat: enhance load balancing #23	2026-04-22 17:27:34 +02:00
alpha nerd	5c64286b70	fix: correct user socket path	2026-04-17 13:31:19 +02:00
alpha nerd	11637c9143	feat: support localhost llama_server access via unix sockets	2026-04-17 12:41:57 +02:00
alpha nerd	1a2781ac23	fix: health check all endpoints with right per enpoint path issue: resolving #24	2026-04-16 12:18:38 +02:00
alpha nerd	1058f2418b	fix: security, exempt files to prevent path traversal	2026-04-10 17:40:44 +02:00
alpha nerd	263c66aedd	feat: add hostname to dashboard	2026-04-10 17:29:43 +02:00
alpha nerd	a432a65396	fix: params is never defined in ollama native backend	2026-04-08 13:01:56 +02:00
alpha nerd	e7cd8d4d68	fix: usage locks now release before the subscriber queue awaits	2026-04-07 15:30:52 +02:00
alpha nerd	2c87472483	fix: conditional to_thread for the image_transform to relieve threadpool pressure	2026-04-07 13:28:34 +02:00
alpha nerd	81013ec3b1	fix: available_error_cache poisoning	2026-04-07 09:32:53 +02:00
alpha nerd	5170162a80	fix: make image transform non-blocking	2026-04-07 09:18:12 +02:00
alpha nerd	28afa4e9c0	fix: missing requirement fix: strip assistant prefill when ollama -> openai translaton + openai guard	2026-04-06 11:32:47 +02:00
alpha-nerd-nomyo	c0dc0a10af	fix: catch non-standard openai sdk error bodies for parsing	2026-03-12 19:08:01 +01:00
alpha-nerd-nomyo	1e9996c393	fix: exclude embedding models from preemptive context shift caches	2026-03-12 18:56:51 +01:00
alpha-nerd-nomyo	e416542bf8	fix: model name normalization for context_cash preemptive context-shifting for smaller context-windows with previous failure	2026-03-12 16:08:01 +01:00
alpha-nerd-nomyo	be60a348e1	fix: changing error_cache to stale-while-revalidate same as available_models_cache	2026-03-12 14:47:54 +01:00
alpha-nerd-nomyo	9acc37951a	feat: add reactive auto context-shift in openai endpoints to prevent recover from out of context errors	2026-03-12 10:15:52 +01:00
alpha-nerd-nomyo	95c643109a	feat: add an openai retry if request with image is send to a pure text model	2026-03-12 10:06:18 +01:00
alpha-nerd-nomyo	1ae989788b	fix(router): normalize multimodal input to extract text for embeddings Extract text parts from multimodal payloads (lists/dicts). Skip image_url and other non-text types to ensure embedding models receive compatible text-only input.	2026-03-11 16:41:21 +01:00
alpha-nerd-nomyo	fbdc73eebb	fix: improvements, fixes and opt-in cache doc: semantic-cache.md added with detailed write-up	2026-03-10 15:19:37 +01:00
alpha-nerd-nomyo	dd4b12da6a	feat: adding a semantic cache layer	2026-03-08 09:12:09 +01:00
alpha-nerd-nomyo	b951cc82e3	bump version	2026-03-05 11:09:20 +01:00
alpha-nerd-nomyo	8037706f0b	fix(db.py): remove full table scans with proper where clauses for dashboard statistics and calc in db rather than python	2026-03-03 17:20:33 +01:00
alpha-nerd-nomyo	45315790d1	fix(router.py): - added global for orphaned token_worker_task and flust_task - fixed a regex to effectively _mask_secrets - fixed several Type and KeyErrors - fixed model deduplication for llama_server_endpoints	2026-03-03 16:34:16 +01:00
alpha-nerd-nomyo	e96e890511	refactor: make choose_endpoint use cache incrementer for atomic updates	2026-03-03 14:57:37 +01:00
alpha-nerd-nomyo	10c83c3e1e	fix(router): treat missing status as loaded for llama model check Add check for `status is None` in `_is_llama_model_loaded`. Models without a status field (e.g., single-model servers) are assumed to be always loaded rather than failing the check. Also updated docstring to clarify this behavior.	2026-03-02 08:54:46 +01:00
alpha-nerd-nomyo	cac0580eec	feat: adding /v1/rerank endpoint with cohere,jina,llama.cpp compatibility	2026-02-28 09:31:25 +01:00
alpha-nerd-nomyo	ad4a1d07b2	fix(/v1/embeddings): returning the async_gen forced FastAPI serialization which caused Pydantic Errors. Also sanizted nan/inf values to floats (0.0). Use try - finally to properly decrement usage counters in case of error.	2026-02-27 16:39:27 +01:00
alpha-nerd-nomyo	d2ea65f74a	fix(router): use normalized model keys for endpoint selection Refactor endpoint selection logic to consistently use tracking model keys (normalized via `get_tracking_model`) instead of raw model names, ensuring usage counts are accurately compared with how increment/decrement operations store them. This fixes inconsistent load balancing and model affinity behavior caused by mismatches between raw and tracked model identifiers.	2026-02-19 17:32:54 +01:00
alpha-nerd-nomyo	07751ddd3b	fix: endpoint selection logic again	2026-02-19 10:11:53 +01:00
alpha-nerd-nomyo	7cba67cce0	feat(router): normalize model names for usage tracking across endpoints (continued) Introduce `get_tracking_model()` to standardize model names for consistent usage tracking in Prometheus metrics. This ensures llama-server models are stripped of HF prefixes and quantization suffixes, Ollama models append `:latest` when versionless, and external OpenAI models remain unchanged—aligning all tracking keys with the PS table.	2026-02-18 11:45:37 +01:00
alpha-nerd-nomyo	b2980a7d24	fix(router): handle invalid version responses with 503 error Filter out non-string version responses (e.g., empty lists from failed requests) and return a 503 Service Unavailable error if no valid versions are received from any endpoint.	2026-02-17 15:56:09 +01:00
alpha-nerd-nomyo	836c5f41ea	fix(router): normalize model names for usage tracking across endpoints	2026-02-17 11:35:53 +01:00
alpha-nerd-nomyo	372fe9fb72	feat(router): parallelize llama-server props fetch and add reasoning/tool call support - Fetch `/props` endpoints in parallel to get context length and auto-unload sleeping models - Add support for reasoning content and tool calls in streaming openai chat/completions responses	2026-02-15 17:05:35 +01:00
alpha-nerd-nomyo	4d40048fd2	fix: loaded_models_cache timing restored	2026-02-15 12:15:36 +01:00
alpha-nerd-nomyo	0bad604b02	feat: deduplicate background refresh tasks and extend cache TTL Adds lock-protected dictionaries to track running background refresh tasks, preventing duplicate executions per endpoint. Increases cache freshness thresholds from 30s to 300s to reduce blocking behavior. fix: /v1 endpoints use correct media_types and usage information with proper logging	2026-02-14 14:51:44 +01:00
alpha-nerd-nomyo	c9ff384bb2	fix(router): /v1/models endpoint Shows now all available models	2026-02-13 16:27:06 +01:00
alpha-nerd-nomyo	4d80dc5e7c	feat: adding logprobs to /v1/chat/completion	2026-02-13 14:43:10 +01:00

1 2 3

149 commits