nomyo-router

Author	SHA1	Message	Date
Alpha Nerd	ba1b2fb651	Merge pull request 'dev-v0.7.x to prod' (#1 ) from dev-v0.7.x into main Some checks failed Build and Publish Docker Image (Semantic Cache) / build-and-push-semantic (push) Has been cancelled Details Build and Publish Docker Image / build-and-push (push) Has been cancelled Details Reviewed-on: https://bitfreedom.net/code/code/nomyo-ai/nomyo-router/pulls/1	2026-04-02 09:17:59 +02:00
alpha-nerd-nomyo	b899ac8559	feat: add all models to TPS graph in dashboard	2026-04-01 18:10:48 +02:00
alpha-nerd-nomyo	f0dd124118	doc: update repo base_url	2026-04-01 17:00:14 +02:00
alpha-nerd-nomyo	031de165a1	feat: prettyfy dashboard Some checks failed Build and Publish Docker Image / build-and-push (push) Has been cancelled Details Build and Publish Docker Image (Semantic Cache) / build-and-push-semantic (push) Has been cancelled Details	2026-03-27 16:24:57 +01:00
alpha-nerd-nomyo	c796fd6a47	fix: add missing git to docker for semcache dependency install	2026-03-23 17:06:46 +01:00
alpha-nerd-nomyo	c0dc0a10af	fix: catch non-standard openai sdk error bodies for parsing	2026-03-12 19:08:01 +01:00
alpha-nerd-nomyo	1e9996c393	fix: exclude embedding models from preemptive context shift caches	2026-03-12 18:56:51 +01:00
alpha-nerd-nomyo	21d6835253	Merge pull request #37 from nomyo-ai/dev-v0.7.x-semcache Dev v0.7.x semcache addtl. feature	2026-03-12 16:08:23 +01:00
alpha-nerd-nomyo	e416542bf8	fix: model name normalization for context_cash preemptive context-shifting for smaller context-windows with previous failure	2026-03-12 16:08:01 +01:00
alpha-nerd-nomyo	be60a348e1	fix: changing error_cache to stale-while-revalidate same as available_models_cache	2026-03-12 14:47:54 +01:00
alpha-nerd-nomyo	9acc37951a	feat: add reactive auto context-shift in openai endpoints to prevent recover from out of context errors	2026-03-12 10:15:52 +01:00
alpha-nerd-nomyo	95c643109a	feat: add an openai retry if request with image is send to a pure text model	2026-03-12 10:06:18 +01:00
alpha-nerd-nomyo	1ae989788b	fix(router): normalize multimodal input to extract text for embeddings Extract text parts from multimodal payloads (lists/dicts). Skip image_url and other non-text types to ensure embedding models receive compatible text-only input.	2026-03-11 16:41:21 +01:00
alpha-nerd-nomyo	7468bfffbb	Merge branch 'main' into dev-v0.7.x	2026-03-11 09:47:13 +01:00
alpha-nerd-nomyo	ca773d6ddb	Merge pull request #35 from nomyo-ai/dev-v0.7.x-semcache Dev v0.7.x semcache -> dev-v0.7.x	2026-03-11 09:40:55 +01:00
alpha-nerd-nomyo	46da392a53	fix: semcache version pinned	2026-03-11 09:40:00 +01:00
alpha-nerd-nomyo	95d03d828e	Merge pull request #34 from nomyo-ai/dev-v0.7.x docs: adding ghcr docker pull instructions	2026-03-10 15:58:45 +01:00
alpha-nerd-nomyo	fbdc73eebb	fix: improvements, fixes and opt-in cache doc: semantic-cache.md added with detailed write-up	2026-03-10 15:19:37 +01:00
alpha-nerd-nomyo	a5108486e3	conf: clean default conf	2026-03-08 09:35:40 +01:00
alpha-nerd-nomyo	e8b8981421	doc: updated usage.md	2026-03-08 09:26:53 +01:00
alpha-nerd-nomyo	dd4b12da6a	feat: adding a semantic cache layer	2026-03-08 09:12:09 +01:00
alpha-nerd-nomyo	c3d47c7ffe	docs: adding ghcr docker pull instructions	2026-03-05 11:54:42 +01:00
alpha-nerd-nomyo	cce8e66c3e	Merge pull request #32 from nomyo-ai/dev-v0.7.x Dev v0.7.x -> main	2026-03-05 11:12:38 +01:00
alpha-nerd-nomyo	b951cc82e3	bump version	2026-03-05 11:09:20 +01:00
alpha-nerd-nomyo	00a06dca51	feat: add docker publish workflow	2026-03-05 11:09:16 +01:00
alpha-nerd-nomyo	e51969a2bb	Merge pull request #30 from nomyo-ai/dev-v0.7.x - improved performance - added /v1/rerank endpoint - refactor of choose_endpoints for atomic upgrade of usage counters - fixes for security, type- and keyerrors - improved database handling	2026-03-04 11:01:22 +01:00
alpha-nerd-nomyo	8037706f0b	fix(db.py): remove full table scans with proper where clauses for dashboard statistics and calc in db rather than python	2026-03-03 17:20:33 +01:00
alpha-nerd-nomyo	45315790d1	fix(router.py): - added global for orphaned token_worker_task and flust_task - fixed a regex to effectively _mask_secrets - fixed several Type and KeyErrors - fixed model deduplication for llama_server_endpoints	2026-03-03 16:34:16 +01:00
alpha-nerd-nomyo	e96e890511	refactor: make choose_endpoint use cache incrementer for atomic updates	2026-03-03 14:57:37 +01:00
alpha-nerd-nomyo	e7196146ad	feat: add uvloop to requirements.txt as optional dependency to improve performance in high concurrent scenarios	2026-03-03 10:31:10 +01:00
alpha-nerd-nomyo	10c83c3e1e	fix(router): treat missing status as loaded for llama model check Add check for `status is None` in `_is_llama_model_loaded`. Models without a status field (e.g., single-model servers) are assumed to be always loaded rather than failing the check. Also updated docstring to clarify this behavior.	2026-03-02 08:54:46 +01:00
alpha-nerd-nomyo	cac0580eec	feat: adding /v1/rerank endpoint with cohere,jina,llama.cpp compatibility	2026-02-28 09:31:25 +01:00
alpha-nerd-nomyo	ad4a1d07b2	fix(/v1/embeddings): returning the async_gen forced FastAPI serialization which caused Pydantic Errors. Also sanizted nan/inf values to floats (0.0). Use try - finally to properly decrement usage counters in case of error.	2026-02-27 16:39:27 +01:00
alpha-nerd-nomyo	2542f10dfc	Merge pull request #29 from edingc/main fix: supress dockerfile build warnings	2026-02-25 13:15:40 +01:00
alpha-nerd-nomyo	a5a0bd51c0	Merge pull request #27 from nomyo-ai/dev-v0.6.X Dev v0.6.x to prod	2026-02-25 13:08:15 +01:00
Cody Eding	d17ce8380d	fix: supress dockerfile build warnings	2026-02-19 19:41:58 -05:00
alpha-nerd-nomyo	d2ea65f74a	fix(router): use normalized model keys for endpoint selection Refactor endpoint selection logic to consistently use tracking model keys (normalized via `get_tracking_model`) instead of raw model names, ensuring usage counts are accurately compared with how increment/decrement operations store them. This fixes inconsistent load balancing and model affinity behavior caused by mismatches between raw and tracked model identifiers.	2026-02-19 17:32:54 +01:00
alpha-nerd-nomyo	07751ddd3b	fix: endpoint selection logic again	2026-02-19 10:11:53 +01:00
alpha-nerd-nomyo	7cba67cce0	feat(router): normalize model names for usage tracking across endpoints (continued) Introduce `get_tracking_model()` to standardize model names for consistent usage tracking in Prometheus metrics. This ensures llama-server models are stripped of HF prefixes and quantization suffixes, Ollama models append `:latest` when versionless, and external OpenAI models remain unchanged—aligning all tracking keys with the PS table.	2026-02-18 11:45:37 +01:00
alpha-nerd-nomyo	b2980a7d24	fix(router): handle invalid version responses with 503 error Filter out non-string version responses (e.g., empty lists from failed requests) and return a 503 Service Unavailable error if no valid versions are received from any endpoint.	2026-02-17 15:56:09 +01:00
alpha-nerd-nomyo	836c5f41ea	fix(router): normalize model names for usage tracking across endpoints	2026-02-17 11:35:53 +01:00
alpha-nerd-nomyo	372fe9fb72	feat(router): parallelize llama-server props fetch and add reasoning/tool call support - Fetch `/props` endpoints in parallel to get context length and auto-unload sleeping models - Add support for reasoning content and tool calls in streaming openai chat/completions responses	2026-02-15 17:05:35 +01:00
alpha-nerd-nomyo	4d40048fd2	fix: loaded_models_cache timing restored	2026-02-15 12:15:36 +01:00
alpha-nerd-nomyo	0bad604b02	feat: deduplicate background refresh tasks and extend cache TTL Adds lock-protected dictionaries to track running background refresh tasks, preventing duplicate executions per endpoint. Increases cache freshness thresholds from 30s to 300s to reduce blocking behavior. fix: /v1 endpoints use correct media_types and usage information with proper logging	2026-02-14 14:51:44 +01:00
alpha-nerd-nomyo	f27c0e9582	Merge pull request #26 from nomyo-ai/dev-v0.6.X Dev v0.6.x -> prod for small feat add and fix	2026-02-13 16:30:10 +01:00
alpha-nerd-nomyo	c9ff384bb2	fix(router): /v1/models endpoint Shows now all available models	2026-02-13 16:27:06 +01:00
alpha-nerd-nomyo	4d80dc5e7c	feat: adding logprobs to /v1/chat/completion	2026-02-13 14:43:10 +01:00
alpha-nerd-nomyo	eda48562da	feat(router): add logprob support in /api/chat Add logprob support to the OpenAI-to-Ollama proxy by converting OpenAI logprob formats to Ollama types. Also update the ollama dependency.	2026-02-13 13:29:45 +01:00
alpha-nerd-nomyo	07af6e2e36	fix: better sample config	2026-02-13 10:52:14 +01:00
alpha-nerd-nomyo	9ef1b770ba	Merge pull request #25 from nomyo-ai/dev-v0.6 - updated reasoning handling - improved model and error caches - fixed openai tool calling incl. ollama translations - direct support for llama.cpp's llama_server via llama_server_endpoint config - basic llama_server model info in dashboard - improved endpoint info fetching behaviour in error cases	2026-02-13 10:34:42 +01:00

1 2 3 4 5

234 commits