Commit graph

265 commits

Author SHA1 Message Date
7c25ffafb2
Merge pull request #21 from YetheSamartaka/model-ps-improvements
Add endpoint differentiation for models ps board
2026-01-29 10:57:22 +01:00
efdf14a207 fix: optimize table column widths and improve time formatting for responsive layout
- Reduced min-width of model columns from 340px to 200px with max-width of 300px
- Added specific styling for narrow columns (3rd-5th) with fixed width and center alignment
- Removed "Instance count" as it has redundant information
- Enhanced time formatting logic to show relative time instead of absolute dates
- Simplified digest display to show last 6 characters instead of truncated format
- Added proper handling for various time value types (number, string, null)
2026-01-29 10:54:43 +01:00
bfdae1e4a6 Merge branch 'dev-v0.5.x' of https://github.com/nomyo-ai/nomyo-router into dev-v0.5.x 2026-01-29 10:33:01 +01:00
a1276e3de8 fix: correct indentation for publish_snapshot calls in usage functions
This fix ensures that the snapshot publishing happens within the usage lock context, maintaining proper synchronization of usage counts.
2026-01-29 10:32:59 +01:00
YetheSamartaka
d3aa87ca15 Added endpoint differentiation for models ps board
Added endpoint differentiation for models PS board to see where which model is loaded and for how long to ease the viewing of multiple same models deployed for load balancing
2026-01-27 13:29:54 +01:00
ff402ba0bb
Update video link to clickable thumbnail
Replace static video link with a clickable thumbnail.
2026-01-26 18:34:30 +01:00
bdd4dd45d9
Merge pull request #20 from YetheSamartaka/main
add: Optional router-level API key that gates router/API/web UI access
2026-01-26 18:14:55 +01:00
ee1c460477 Empty key strings could bypass authentication in _extract_router_api_key() when malformed Authorization headers were sent
- Added validation to check that the extracted key is not empty before returning it
- Added CORS headers to enforce_router_api_key() for proper cross-origin request handling and CORS-related error prevention
2026-01-26 18:11:28 +01:00
d4b2558116 refactor: improve snapshot safety and usage tracking
Create atomic snapshots by deep copying usage data structures to prevent race conditions.
Protect concurrent reads of usage counts with explicit locking in endpoint selection.
Replace README screenshot with a video link.
2026-01-26 17:18:57 +01:00
3e3f0dd383 fix: endpoint selection logic 2026-01-19 14:21:08 +01:00
5ad5bfe66e feat: endpoint selection more consistent and understandable 2026-01-18 09:31:53 +01:00
067cdf641a feat: add timestamp index and improve cache concurrency
- Added index on token_time_series timestamp for faster queries
- Introduced cache locks to prevent race conditions
2026-01-16 16:47:24 +01:00
YetheSamartaka
eca4a92a33 add: Optional router-level API key that gates router/API/web UI access
Optional router-level API key that gates router/API/web UI access (leave empty to disable)

## Supplying the router API key

If you set `nomyo-router-api-key` in `config.yaml` (or `NOMYO_ROUTER_API_KEY` env), every request to NOMYO Router must include the key:

- HTTP header (recommended): `Authorization: Bearer <router_key>`
- Query param (fallback): `?api_key=<router_key>`

Examples:
```bash
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
curl "http://localhost:12434/api/tags?api_key=$NOMYO_ROUTER_API_KEY"
```
2026-01-14 09:28:02 +01:00
6828411f95
Merge pull request #19 from nomyo-ai/dev-v0.5.x
feat:
buffer_locks preventing race conditions in high concurrency scenarios
documentation folder
2026-01-06 10:51:29 +01:00
ac2a4fe8e0
Merge pull request #18 from nomyo-ai/dependabot/pip/aiohttp-3.13.3
Bump aiohttp from 3.12.15 to 3.13.3
2026-01-06 10:49:32 +01:00
dependabot[bot]
66cabcf3a9
Bump aiohttp from 3.12.15 to 3.13.3
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-06 00:33:28 +00:00
20a016269d feat:
added buffer_lock to prevent race condition in high concurrency scenarios
added documentation
2026-01-05 17:16:31 +01:00
dc36a81f6c
Merge pull request #17 from nomyo-ai/dev-v0.5.x
feat add:
Multiple Opinions Ensemble

prefix any ollama model with "moe-" on /api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get 3 different response variants. 
Each variant is then revisited and finally scored to find the best response among them all and finally returned to the user. 

Runs longer, uses more tokens for expected better quality response.
2025-12-17 16:50:26 +01:00
434b6d4cca finalize feat:
Mixture of Experts:
- prefix any ollama model with "moe-" on api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get response variants. Each variant is then revisited and finally scored to find the best repsonse among them all and finally returned to the user. Runs longer, uses more tokens for expected better quality response.
2025-12-16 09:46:36 +01:00
19a13cc613 fix(enhance.py): correct typo in function name from 'moe_select_candiadate' to 'moe_select_candidate'
feat(router.py): add helper function _make_chat_request for handling enhancing chat requests to endpoints
2025-12-15 10:35:56 +01:00
5eb5490d16 feat: improve model version handling in endpoint selection
Add logic to only append ":latest" suffix to models without existing version suffixes, preventing duplicate version tags and ensuring correct endpoint selection for models following Ollama naming conventions.
2025-12-14 17:58:45 +01:00
b35afbc1c9
Merge pull request #16 from nomyo-ai/dev-v0.5.x
Dev v0.5.x - incl. hotfix
2025-12-13 12:37:19 +01:00
3ccaf78e5d fix: simplify model version handling in proxy functions
Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.
2025-12-13 12:34:24 +01:00
34d6abd28b refactor: optimize token aggregation query and enhance chat proxy
- Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance
- Combined import statements in db.py and router.py to reduce lines of code
- Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation
- Added last_user_content() helper function to extract user content from messages
- Improved code readability and maintainability through these structural changes
2025-12-13 11:58:49 +01:00
67edbb5f8e
Merge pull request #15 from nomyo-ai:dev-v0.5.x
Dev-v0.5.x -> Main
2025-12-09 12:08:46 +01:00
59a8ef3abb refactor: use a persistent WAL-enabled connection with async locks
- Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access
- Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect
- Add close method to gracefully close the persistent connection
- Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup
- Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table
- Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly

addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint
2025-12-02 12:18:23 +01:00
0ffb321154 fixing total stats model, button, labels and code clean up 2025-11-28 14:59:29 +01:00
1c3f9a9dc4 fix model naming to allow correct decrement usage counter in /v1 endpoints 2025-11-24 09:33:54 +01:00
7b50a5a299 adding usage metrics to /v1 endpoints if stream == True 2025-11-21 09:56:42 +01:00
45d1d442ee sqlite: adding connection pooling and WAL 2025-11-20 15:37:04 +01:00
aa23a4dd81 fixing timezone issues 2025-11-20 12:53:18 +01:00
0d187e91b9 fixing chart timescales 2025-11-20 09:53:28 +01:00
e0c6861f2f aggregating token_counts for stats over all endpoints and adjusting the color mapping 2025-11-20 09:22:45 +01:00
3f77a8ec62 chart enhancements 2025-11-19 17:28:31 +01:00
79a7ca972b initial chart view 2025-11-19 17:05:25 +01:00
541f2826e0 fixing token_queue, prepping chart view 2025-11-18 19:02:36 +01:00
baf5d98318 adding token timeseries counting in db for future data viz 2025-11-18 11:16:21 +01:00
0dd0a3712c
Merge pull request #14 from nomyo-ai:dev-v0.4.2
Dev-v0.4.2 includese mainly performance and other improvements
2025-11-17 14:44:20 +01:00
8a05f2ac44 cache loaded models to decrease load on ollamas 2025-11-17 14:40:24 +01:00
06103e5f01 cache loaded models to decrease load on ollamas 2025-11-17 14:40:22 +01:00
4c7ebb5af4 cancel token_worker_task only if running 2025-11-14 15:53:26 +01:00
b9933e000f rollback - needs more logic in v1/embedding 2025-11-13 13:32:46 +01:00
9f90bc9cd0 fixing /v1/embedding ollama notations 2025-11-13 12:40:40 +01:00
8aef941385 stopping the token_worker_task gracefully on shutdown 2025-11-13 10:13:10 +01:00
f14d9dc7da don't query non-Ollama endpoints for health status 2025-11-13 10:06:23 +01:00
1427e98e6d various performance improvements and json replacement orjson 2025-11-10 15:37:46 +01:00
c6c1059ede
Merge pull request #12 from nomyo-ai/dev-v0.4.x
token usage counter for non-stream openai ollama endpoints and improvements
2025-11-08 11:54:33 +01:00
4e0b2f9fee
Merge pull request #11 from YetheSamartaka:main
Add Docker support
2025-11-07 16:17:45 +01:00
YetheSamartaka
9a4bcb6f97 Add Docker support
Adds comprehensive docker support
2025-11-07 13:59:16 +01:00
47a39184ad token usage counter for non-stream openai ollama endpoints added 2025-11-06 14:27:34 +01:00