nomyo-router

Author	SHA1	Message	Date
alpha-nerd-nomyo	a1276e3de8	fix: correct indentation for publish_snapshot calls in usage functions This fix ensures that the snapshot publishing happens within the usage lock context, maintaining proper synchronization of usage counts.	2026-01-29 10:32:59 +01:00
YetheSamartaka	d3aa87ca15	Added endpoint differentiation for models ps board Added endpoint differentiation for models PS board to see where which model is loaded and for how long to ease the viewing of multiple same models deployed for load balancing	2026-01-27 13:29:54 +01:00
alpha-nerd-nomyo	ff402ba0bb	Update video link to clickable thumbnail Replace static video link with a clickable thumbnail.	2026-01-26 18:34:30 +01:00
alpha-nerd-nomyo	bdd4dd45d9	Merge pull request #20 from YetheSamartaka/main add: Optional router-level API key that gates router/API/web UI access	2026-01-26 18:14:55 +01:00
alpha-nerd-nomyo	ee1c460477	Empty key strings could bypass authentication in _extract_router_api_key() when malformed Authorization headers were sent - Added validation to check that the extracted key is not empty before returning it - Added CORS headers to enforce_router_api_key() for proper cross-origin request handling and CORS-related error prevention	2026-01-26 18:11:28 +01:00
alpha-nerd-nomyo	d4b2558116	refactor: improve snapshot safety and usage tracking Create atomic snapshots by deep copying usage data structures to prevent race conditions. Protect concurrent reads of usage counts with explicit locking in endpoint selection. Replace README screenshot with a video link.	2026-01-26 17:18:57 +01:00
alpha-nerd-nomyo	3e3f0dd383	fix: endpoint selection logic	2026-01-19 14:21:08 +01:00
alpha-nerd-nomyo	5ad5bfe66e	feat: endpoint selection more consistent and understandable	2026-01-18 09:31:53 +01:00
alpha-nerd-nomyo	067cdf641a	feat: add timestamp index and improve cache concurrency - Added index on token_time_series timestamp for faster queries - Introduced cache locks to prevent race conditions	2026-01-16 16:47:24 +01:00
YetheSamartaka	eca4a92a33	add: Optional router-level API key that gates router/API/web UI access Optional router-level API key that gates router/API/web UI access (leave empty to disable) ## Supplying the router API key If you set `nomyo-router-api-key` in `config.yaml` (or `NOMYO_ROUTER_API_KEY` env), every request to NOMYO Router must include the key: - HTTP header (recommended): `Authorization: Bearer <router_key>` - Query param (fallback): `?api_key=<router_key>` Examples: ```bash curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags curl "http://localhost:12434/api/tags?api_key=$NOMYO_ROUTER_API_KEY" ```	2026-01-14 09:28:02 +01:00
alpha-nerd-nomyo	6828411f95	Merge pull request #19 from nomyo-ai/dev-v0.5.x feat: buffer_locks preventing race conditions in high concurrency scenarios documentation folder	2026-01-06 10:51:29 +01:00
alpha-nerd-nomyo	ac2a4fe8e0	Merge pull request #18 from nomyo-ai/dependabot/pip/aiohttp-3.13.3 Bump aiohttp from 3.12.15 to 3.13.3	2026-01-06 10:49:32 +01:00
dependabot[bot]	66cabcf3a9	Bump aiohttp from 3.12.15 to 3.13.3 --- updated-dependencies: - dependency-name: aiohttp dependency-version: 3.13.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-06 00:33:28 +00:00
alpha-nerd-nomyo	20a016269d	feat: added buffer_lock to prevent race condition in high concurrency scenarios added documentation	2026-01-05 17:16:31 +01:00
alpha-nerd-nomyo	dc36a81f6c	Merge pull request #17 from nomyo-ai/dev-v0.5.x feat add: Multiple Opinions Ensemble prefix any ollama model with "moe-" on /api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get 3 different response variants. Each variant is then revisited and finally scored to find the best response among them all and finally returned to the user. Runs longer, uses more tokens for expected better quality response.	2025-12-17 16:50:26 +01:00
alpha-nerd-nomyo	434b6d4cca	finalize feat: Mixture of Experts: - prefix any ollama model with "moe-" on api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get response variants. Each variant is then revisited and finally scored to find the best repsonse among them all and finally returned to the user. Runs longer, uses more tokens for expected better quality response.	2025-12-16 09:46:36 +01:00
alpha-nerd-nomyo	19a13cc613	fix(enhance.py): correct typo in function name from 'moe_select_candiadate' to 'moe_select_candidate' feat(router.py): add helper function _make_chat_request for handling enhancing chat requests to endpoints	2025-12-15 10:35:56 +01:00
alpha-nerd-nomyo	5eb5490d16	feat: improve model version handling in endpoint selection Add logic to only append ":latest" suffix to models without existing version suffixes, preventing duplicate version tags and ensuring correct endpoint selection for models following Ollama naming conventions.	2025-12-14 17:58:45 +01:00
alpha-nerd-nomyo	b35afbc1c9	Merge pull request #16 from nomyo-ai/dev-v0.5.x Dev v0.5.x - incl. hotfix	2025-12-13 12:37:19 +01:00
alpha-nerd-nomyo	3ccaf78e5d	fix: simplify model version handling in proxy functions Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.	2025-12-13 12:34:24 +01:00
alpha-nerd-nomyo	34d6abd28b	refactor: optimize token aggregation query and enhance chat proxy - Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance - Combined import statements in db.py and router.py to reduce lines of code - Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation - Added last_user_content() helper function to extract user content from messages - Improved code readability and maintainability through these structural changes	2025-12-13 11:58:49 +01:00
alpha-nerd-nomyo	67edbb5f8e	Merge pull request #15 from nomyo-ai:dev-v0.5.x Dev-v0.5.x -> Main	2025-12-09 12:08:46 +01:00
alpha-nerd-nomyo	59a8ef3abb	refactor: use a persistent WAL-enabled connection with async locks - Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access - Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect - Add close method to gracefully close the persistent connection - Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup - Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table - Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint	2025-12-02 12:18:23 +01:00
alpha-nerd-nomyo	0ffb321154	fixing total stats model, button, labels and code clean up	2025-11-28 14:59:29 +01:00
alpha-nerd-nomyo	1c3f9a9dc4	fix model naming to allow correct decrement usage counter in /v1 endpoints	2025-11-24 09:33:54 +01:00
alpha-nerd-nomyo	7b50a5a299	adding usage metrics to /v1 endpoints if stream == True	2025-11-21 09:56:42 +01:00
alpha-nerd-nomyo	45d1d442ee	sqlite: adding connection pooling and WAL	2025-11-20 15:37:04 +01:00
alpha-nerd-nomyo	aa23a4dd81	fixing timezone issues	2025-11-20 12:53:18 +01:00
alpha-nerd-nomyo	0d187e91b9	fixing chart timescales	2025-11-20 09:53:28 +01:00
alpha-nerd-nomyo	e0c6861f2f	aggregating token_counts for stats over all endpoints and adjusting the color mapping	2025-11-20 09:22:45 +01:00
alpha-nerd-nomyo	3f77a8ec62	chart enhancements	2025-11-19 17:28:31 +01:00
alpha-nerd-nomyo	79a7ca972b	initial chart view	2025-11-19 17:05:25 +01:00
alpha-nerd-nomyo	541f2826e0	fixing token_queue, prepping chart view	2025-11-18 19:02:36 +01:00
alpha-nerd-nomyo	baf5d98318	adding token timeseries counting in db for future data viz	2025-11-18 11:16:21 +01:00
alpha-nerd-nomyo	0dd0a3712c	Merge pull request #14 from nomyo-ai:dev-v0.4.2 Dev-v0.4.2 includese mainly performance and other improvements	2025-11-17 14:44:20 +01:00
alpha-nerd-nomyo	8a05f2ac44	cache loaded models to decrease load on ollamas	2025-11-17 14:40:24 +01:00
alpha-nerd-nomyo	06103e5f01	cache loaded models to decrease load on ollamas	2025-11-17 14:40:22 +01:00
alpha-nerd-nomyo	4c7ebb5af4	cancel token_worker_task only if running	2025-11-14 15:53:26 +01:00
alpha-nerd-nomyo	b9933e000f	rollback - needs more logic in v1/embedding	2025-11-13 13:32:46 +01:00
alpha-nerd-nomyo	9f90bc9cd0	fixing /v1/embedding ollama notations	2025-11-13 12:40:40 +01:00
alpha-nerd-nomyo	8aef941385	stopping the token_worker_task gracefully on shutdown	2025-11-13 10:13:10 +01:00
alpha-nerd-nomyo	f14d9dc7da	don't query non-Ollama endpoints for health status	2025-11-13 10:06:23 +01:00
alpha-nerd-nomyo	1427e98e6d	various performance improvements and json replacement orjson	2025-11-10 15:37:46 +01:00
alpha-nerd-nomyo	c6c1059ede	Merge pull request #12 from nomyo-ai/dev-v0.4.x token usage counter for non-stream openai ollama endpoints and improvements	2025-11-08 11:54:33 +01:00
alpha-nerd-nomyo	4e0b2f9fee	Merge pull request #11 from YetheSamartaka:main Add Docker support	2025-11-07 16:17:45 +01:00
YetheSamartaka	9a4bcb6f97	Add Docker support Adds comprehensive docker support	2025-11-07 13:59:16 +01:00
alpha-nerd-nomyo	47a39184ad	token usage counter for non-stream openai ollama endpoints added	2025-11-06 14:27:34 +01:00
alpha-nerd-nomyo	f0f6069577	Merge branch 'dev-v0.4.x' of https://github.com/nomyo-ai/nomyo-router into dev-v0.4.x	2025-11-04 17:55:22 +01:00
alpha-nerd-nomyo	4c9ec5b1b2	record and display total token usage on ollama endpoints using ollama client	2025-11-04 17:55:19 +01:00
alpha-nerd-nomyo	60694e885b	Update README.md	2025-10-31 13:54:22 +01:00

1 2 3 4 5 ...

262 commits