nomyo-router

Author	SHA1	Message	Date
alpha-nerd-nomyo	4d80dc5e7c	feat: adding logprobs to /v1/chat/completion	2026-02-13 14:43:10 +01:00
alpha-nerd-nomyo	eda48562da	feat(router): add logprob support in /api/chat Add logprob support to the OpenAI-to-Ollama proxy by converting OpenAI logprob formats to Ollama types. Also update the ollama dependency.	2026-02-13 13:29:45 +01:00
alpha-nerd-nomyo	07af6e2e36	fix: better sample config	2026-02-13 10:52:14 +01:00
alpha-nerd-nomyo	9ef1b770ba	Merge pull request #25 from nomyo-ai/dev-v0.6 - updated reasoning handling - improved model and error caches - fixed openai tool calling incl. ollama translations - direct support for llama.cpp's llama_server via llama_server_endpoint config - basic llama_server model info in dashboard - improved endpoint info fetching behaviour in error cases	2026-02-13 10:34:42 +01:00
alpha-nerd-nomyo	1b355d8435	Merge branch 'main' into dev-v0.6	2026-02-13 10:33:36 +01:00
alpha-nerd-nomyo	c545f413a5	Merge pull request #23 from JTHesse/main Fix for SSL verification and SQL Bug	2026-02-13 10:19:07 +01:00
alpha-nerd-nomyo	08b77428b8	refactor(router): bump cache TTLs and skip error cache for health checks - Increased error and loaded model cache freshness thresholds from 10s to 30s. - Added `skip_error_cache` parameter to `endpoint_details` to prevent cached failures from blocking health checks. - Implemented automatic error recording in `_available_error_cache` on API request failures.	2026-02-13 10:11:41 +01:00
alpha-nerd-nomyo	f7ef413090	replays `3af166c8a4` to grant merge into main	2026-02-12 16:28:40 +01:00
alpha-nerd-nomyo	b649dcd8d6	proposal: use global truststore ctx for all connections	2026-02-12 16:15:39 +01:00
alpha-nerd-nomyo	5c4e1e81a6	Merge pull request #24 from nomyo-ai:dependabot/pip/pillow-12.1.1 Bump pillow from 11.3.0 to 12.1.1	2026-02-12 15:58:33 +01:00
dependabot[bot]	99f5a3bc91	Bump pillow from 11.3.0 to 12.1.1 Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.3.0 to 12.1.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/11.3.0...12.1.1) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.1.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-02-11 17:43:19 +00:00
Jan-Timo	dd30ab9422	fix SSL: CERTIFICATE_VERIFY_FAILED	2026-02-11 13:47:11 +01:00
Jan-Timo	3af166c8a4	fix sqlite3.OperationalError: no such table: main.token_time_series	2026-02-11 13:46:37 +01:00
alpha-nerd-nomyo	9875eb977a	feat: Add tool call normalization and streaming delta accumulation Adds support for correctly handling tool calls in chat requests. Normalizes tool call data (ensuring IDs, types, and JSON arguments) in non-streaming mode and accumulates OpenAI-style deltas during streaming to build the final Ollama response.	2026-02-10 20:21:46 +01:00
alpha-nerd-nomyo	4892998abc	feat(router): Add llama-server endpoints support and model parsing Add `llama_server_endpoints` configuration field to support llama_server OpenAI-compatible endpoints for status checks. Implement helper functions to parse model names and quantization levels from llama-server responses (best effort). Update `is_ext_openai_endpoint` to properly distinguish these endpoints from external OpenAI services. Update sample configuration documentation.	2026-02-10 16:46:51 +01:00
alpha-nerd-nomyo	1f81e69ce1	refactor(router.py): correctly implement OpenAI tool_calls to Ollama format conversion	2026-02-09 11:04:14 +01:00
alpha-nerd-nomyo	7deb088c6a	refactor(cache): split error cache and add stale-while-revalidate Refactor error tracking to use separate caches for 'available' and 'loaded' models, preventing cross-contamination of transient errors. Implement background refresh for available models to prevent blocking requests, and use stale-while-revalidate (300-600s) to serve stale data immediately when the cache is between 300s and 600s old.	2026-02-08 16:46:40 +01:00
alpha-nerd-nomyo	92cea1dead	feat: update reasoning handling Updated reasoning content handling in router.py to check for both "reasoning_content" and "reasoning" attributes.	2026-02-08 11:29:47 +01:00
alpha-nerd-nomyo	bd0d210b2a	feat: enforce api key authentication and update table header - Added proper API key validation in router.py with 401 response when key is missing - Implemented CORS headers for authentication requests - Updated table header from "Until" to "Unload" in static/index.html - Improved security by preventing API key leakage in access logs	2026-02-01 10:05:46 +01:00
alpha-nerd-nomyo	b718d575b7	Merge pull request #22 from nomyo-ai/dev-v0.5.x Dev v0.5.x	2026-01-30 18:18:42 +01:00
alpha-nerd-nomyo	d80b29e4f2	feat: enhance code quality and documentation - Renamed Feedback class to follow PascalCase convention - Fixed candidate enumeration start index from 0 to 1 - Simplified candidate content access by removing .message.content - Updated CONFIG_PATH environment variable name to CONFIG_PATH_ARG - Bumped version from 0.5 to 0.6 - Removed unnecessary return statement and trailing newline	2026-01-29 19:59:08 +01:00
alpha-nerd-nomyo	0ebfa7c519	security: bump orjson to >=3.11.5 preventing a recursive DOS attack	2026-01-29 18:12:05 +01:00
alpha-nerd-nomyo	4ca1a5667e	feat(router): implement in-flight request tracking to prevent cache stampede in high concurrency scenarios Added in-flight request tracking mechanism to prevent cache stampede when multiple concurrent requests arrive for the same endpoint. This introduces new dictionaries to track ongoing requests and a lock to coordinate access. The available_models method was refactored to use an internal helper function and includes request coalescing logic to ensure only one HTTP request is made per endpoint when cache entries expire. The loaded_models method was also updated to use the new caching and coalescing pattern.	2026-01-29 18:00:33 +01:00
alpha-nerd-nomyo	7c25ffafb2	Merge pull request #21 from YetheSamartaka/model-ps-improvements Add endpoint differentiation for models ps board	2026-01-29 10:57:22 +01:00
alpha-nerd-nomyo	efdf14a207	fix: optimize table column widths and improve time formatting for responsive layout - Reduced min-width of model columns from 340px to 200px with max-width of 300px - Added specific styling for narrow columns (3rd-5th) with fixed width and center alignment - Removed "Instance count" as it has redundant information - Enhanced time formatting logic to show relative time instead of absolute dates - Simplified digest display to show last 6 characters instead of truncated format - Added proper handling for various time value types (number, string, null)	2026-01-29 10:54:43 +01:00
alpha-nerd-nomyo	bfdae1e4a6	Merge branch 'dev-v0.5.x' of https://github.com/nomyo-ai/nomyo-router into dev-v0.5.x	2026-01-29 10:33:01 +01:00
alpha-nerd-nomyo	a1276e3de8	fix: correct indentation for publish_snapshot calls in usage functions This fix ensures that the snapshot publishing happens within the usage lock context, maintaining proper synchronization of usage counts.	2026-01-29 10:32:59 +01:00
YetheSamartaka	d3aa87ca15	Added endpoint differentiation for models ps board Added endpoint differentiation for models PS board to see where which model is loaded and for how long to ease the viewing of multiple same models deployed for load balancing	2026-01-27 13:29:54 +01:00
alpha-nerd-nomyo	ff402ba0bb	Update video link to clickable thumbnail Replace static video link with a clickable thumbnail.	2026-01-26 18:34:30 +01:00
alpha-nerd-nomyo	bdd4dd45d9	Merge pull request #20 from YetheSamartaka/main add: Optional router-level API key that gates router/API/web UI access	2026-01-26 18:14:55 +01:00
alpha-nerd-nomyo	ee1c460477	Empty key strings could bypass authentication in _extract_router_api_key() when malformed Authorization headers were sent - Added validation to check that the extracted key is not empty before returning it - Added CORS headers to enforce_router_api_key() for proper cross-origin request handling and CORS-related error prevention	2026-01-26 18:11:28 +01:00
alpha-nerd-nomyo	d4b2558116	refactor: improve snapshot safety and usage tracking Create atomic snapshots by deep copying usage data structures to prevent race conditions. Protect concurrent reads of usage counts with explicit locking in endpoint selection. Replace README screenshot with a video link.	2026-01-26 17:18:57 +01:00
alpha-nerd-nomyo	3e3f0dd383	fix: endpoint selection logic	2026-01-19 14:21:08 +01:00
alpha-nerd-nomyo	5ad5bfe66e	feat: endpoint selection more consistent and understandable	2026-01-18 09:31:53 +01:00
alpha-nerd-nomyo	067cdf641a	feat: add timestamp index and improve cache concurrency - Added index on token_time_series timestamp for faster queries - Introduced cache locks to prevent race conditions	2026-01-16 16:47:24 +01:00
YetheSamartaka	eca4a92a33	add: Optional router-level API key that gates router/API/web UI access Optional router-level API key that gates router/API/web UI access (leave empty to disable) ## Supplying the router API key If you set `nomyo-router-api-key` in `config.yaml` (or `NOMYO_ROUTER_API_KEY` env), every request to NOMYO Router must include the key: - HTTP header (recommended): `Authorization: Bearer <router_key>` - Query param (fallback): `?api_key=<router_key>` Examples: ```bash curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags curl "http://localhost:12434/api/tags?api_key=$NOMYO_ROUTER_API_KEY" ```	2026-01-14 09:28:02 +01:00
alpha-nerd-nomyo	6828411f95	Merge pull request #19 from nomyo-ai/dev-v0.5.x feat: buffer_locks preventing race conditions in high concurrency scenarios documentation folder	2026-01-06 10:51:29 +01:00
alpha-nerd-nomyo	ac2a4fe8e0	Merge pull request #18 from nomyo-ai/dependabot/pip/aiohttp-3.13.3 Bump aiohttp from 3.12.15 to 3.13.3	2026-01-06 10:49:32 +01:00
dependabot[bot]	66cabcf3a9	Bump aiohttp from 3.12.15 to 3.13.3 --- updated-dependencies: - dependency-name: aiohttp dependency-version: 3.13.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-06 00:33:28 +00:00
alpha-nerd-nomyo	20a016269d	feat: added buffer_lock to prevent race condition in high concurrency scenarios added documentation	2026-01-05 17:16:31 +01:00
alpha-nerd-nomyo	dc36a81f6c	Merge pull request #17 from nomyo-ai/dev-v0.5.x feat add: Multiple Opinions Ensemble prefix any ollama model with "moe-" on /api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get 3 different response variants. Each variant is then revisited and finally scored to find the best response among them all and finally returned to the user. Runs longer, uses more tokens for expected better quality response.	2025-12-17 16:50:26 +01:00
alpha-nerd-nomyo	434b6d4cca	finalize feat: Mixture of Experts: - prefix any ollama model with "moe-" on api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get response variants. Each variant is then revisited and finally scored to find the best repsonse among them all and finally returned to the user. Runs longer, uses more tokens for expected better quality response.	2025-12-16 09:46:36 +01:00
alpha-nerd-nomyo	19a13cc613	fix(enhance.py): correct typo in function name from 'moe_select_candiadate' to 'moe_select_candidate' feat(router.py): add helper function _make_chat_request for handling enhancing chat requests to endpoints	2025-12-15 10:35:56 +01:00
alpha-nerd-nomyo	5eb5490d16	feat: improve model version handling in endpoint selection Add logic to only append ":latest" suffix to models without existing version suffixes, preventing duplicate version tags and ensuring correct endpoint selection for models following Ollama naming conventions.	2025-12-14 17:58:45 +01:00
alpha-nerd-nomyo	b35afbc1c9	Merge pull request #16 from nomyo-ai/dev-v0.5.x Dev v0.5.x - incl. hotfix	2025-12-13 12:37:19 +01:00
alpha-nerd-nomyo	3ccaf78e5d	fix: simplify model version handling in proxy functions Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.	2025-12-13 12:34:24 +01:00
alpha-nerd-nomyo	34d6abd28b	refactor: optimize token aggregation query and enhance chat proxy - Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance - Combined import statements in db.py and router.py to reduce lines of code - Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation - Added last_user_content() helper function to extract user content from messages - Improved code readability and maintainability through these structural changes	2025-12-13 11:58:49 +01:00
alpha-nerd-nomyo	67edbb5f8e	Merge pull request #15 from nomyo-ai:dev-v0.5.x Dev-v0.5.x -> Main	2025-12-09 12:08:46 +01:00
alpha-nerd-nomyo	59a8ef3abb	refactor: use a persistent WAL-enabled connection with async locks - Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access - Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect - Add close method to gracefully close the persistent connection - Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup - Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table - Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint	2025-12-02 12:18:23 +01:00
alpha-nerd-nomyo	0ffb321154	fixing total stats model, button, labels and code clean up	2025-11-28 14:59:29 +01:00

1 2 3 4

188 commits