nomyo-router

Author	SHA1	Message	Date
alpha-nerd-nomyo	3e3f0dd383	fix: endpoint selection logic	2026-01-19 14:21:08 +01:00
alpha-nerd-nomyo	5ad5bfe66e	feat: endpoint selection more consistent and understandable	2026-01-18 09:31:53 +01:00
alpha-nerd-nomyo	067cdf641a	feat: add timestamp index and improve cache concurrency - Added index on token_time_series timestamp for faster queries - Introduced cache locks to prevent race conditions	2026-01-16 16:47:24 +01:00
alpha-nerd-nomyo	20a016269d	feat: added buffer_lock to prevent race condition in high concurrency scenarios added documentation	2026-01-05 17:16:31 +01:00
alpha-nerd-nomyo	19a13cc613	fix(enhance.py): correct typo in function name from 'moe_select_candiadate' to 'moe_select_candidate' feat(router.py): add helper function _make_chat_request for handling enhancing chat requests to endpoints	2025-12-15 10:35:56 +01:00
alpha-nerd-nomyo	5eb5490d16	feat: improve model version handling in endpoint selection Add logic to only append ":latest" suffix to models without existing version suffixes, preventing duplicate version tags and ensuring correct endpoint selection for models following Ollama naming conventions.	2025-12-14 17:58:45 +01:00
alpha-nerd-nomyo	3ccaf78e5d	fix: simplify model version handling in proxy functions Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.	2025-12-13 12:34:24 +01:00
alpha-nerd-nomyo	34d6abd28b	refactor: optimize token aggregation query and enhance chat proxy - Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance - Combined import statements in db.py and router.py to reduce lines of code - Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation - Added last_user_content() helper function to extract user content from messages - Improved code readability and maintainability through these structural changes	2025-12-13 11:58:49 +01:00
alpha-nerd-nomyo	59a8ef3abb	refactor: use a persistent WAL-enabled connection with async locks - Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access - Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect - Add close method to gracefully close the persistent connection - Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup - Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table - Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint	2025-12-02 12:18:23 +01:00
alpha-nerd-nomyo	0ffb321154	fixing total stats model, button, labels and code clean up	2025-11-28 14:59:29 +01:00
alpha-nerd-nomyo	1c3f9a9dc4	fix model naming to allow correct decrement usage counter in /v1 endpoints	2025-11-24 09:33:54 +01:00
alpha-nerd-nomyo	7b50a5a299	adding usage metrics to /v1 endpoints if stream == True	2025-11-21 09:56:42 +01:00
alpha-nerd-nomyo	aa23a4dd81	fixing timezone issues	2025-11-20 12:53:18 +01:00
alpha-nerd-nomyo	3f77a8ec62	chart enhancements	2025-11-19 17:28:31 +01:00
alpha-nerd-nomyo	79a7ca972b	initial chart view	2025-11-19 17:05:25 +01:00
alpha-nerd-nomyo	541f2826e0	fixing token_queue, prepping chart view	2025-11-18 19:02:36 +01:00
alpha-nerd-nomyo	baf5d98318	adding token timeseries counting in db for future data viz	2025-11-18 11:16:21 +01:00
alpha-nerd-nomyo	8a05f2ac44	cache loaded models to decrease load on ollamas	2025-11-17 14:40:24 +01:00
alpha-nerd-nomyo	4c7ebb5af4	cancel token_worker_task only if running	2025-11-14 15:53:26 +01:00
alpha-nerd-nomyo	b9933e000f	rollback - needs more logic in v1/embedding	2025-11-13 13:32:46 +01:00
alpha-nerd-nomyo	9f90bc9cd0	fixing /v1/embedding ollama notations	2025-11-13 12:40:40 +01:00
alpha-nerd-nomyo	8aef941385	stopping the token_worker_task gracefully on shutdown	2025-11-13 10:13:10 +01:00
alpha-nerd-nomyo	f14d9dc7da	don't query non-Ollama endpoints for health status	2025-11-13 10:06:23 +01:00
alpha-nerd-nomyo	1427e98e6d	various performance improvements and json replacement orjson	2025-11-10 15:37:46 +01:00
alpha-nerd-nomyo	c6c1059ede	Merge pull request #12 from nomyo-ai/dev-v0.4.x token usage counter for non-stream openai ollama endpoints and improvements	2025-11-08 11:54:33 +01:00
YetheSamartaka	9a4bcb6f97	Add Docker support Adds comprehensive docker support	2025-11-07 13:59:16 +01:00
alpha-nerd-nomyo	47a39184ad	token usage counter for non-stream openai ollama endpoints added	2025-11-06 14:27:34 +01:00
alpha-nerd-nomyo	4c9ec5b1b2	record and display total token usage on ollama endpoints using ollama client	2025-11-04 17:55:19 +01:00
alpha-nerd-nomyo	9007f686c2	performance increase of iso8601_ns ~49%	2025-10-30 10:17:18 +01:00
alpha-nerd-nomyo	26dcbf9c02	fixing app logic and eventListeners in frontend	2025-10-30 09:06:21 +01:00
alpha-nerd-nomyo	3585f90437	fixing typos and smaller issues	2025-10-28 11:08:52 +01:00
alpha-nerd-nomyo	b72673d693	check for base64 encoded images and remove alpha channel	2025-10-03 10:04:50 +02:00
alpha-nerd-nomyo	11f6e2dca6	data-url handling and removing alpha channel in images	2025-09-24 18:10:17 +02:00
alpha-nerd-nomyo	e66c0ed0fc	new requirement for image preprocessing to downsize and convert to png for faster and safer transaction	2025-09-24 11:46:38 +02:00
alpha-nerd-nomyo	738d981157	poc: messsage translation with images	2025-09-23 17:33:15 +02:00
alpha-nerd-nomyo	8327ab4ae1	rm print statements	2025-09-23 14:47:55 +02:00
alpha-nerd-nomyo	fcfabbe926	mitigating div by zero due to google genai sending completion_token=0 in first chunk	2025-09-23 13:08:17 +02:00
alpha-nerd-nomyo	a74cc5be0f	fixing endpoint usage metrics	2025-09-23 12:51:37 +02:00
alpha-nerd-nomyo	19df75afa9	fixing types and params	2025-09-22 19:01:14 +02:00
alpha-nerd-nomyo	c43dc4139f	adding optional parameters in ollama to openai translation	2025-09-22 14:04:19 +02:00
alpha-nerd-nomyo	18d2fca027	formatting Response Objects in rechunk and fixing TypeErrors in /api/chat and /api/generate	2025-09-22 09:30:27 +02:00
alpha-nerd-nomyo	aeca77c1a1	formatting, condensing rechunk	2025-09-21 16:33:43 +02:00
alpha-nerd-nomyo	43d95fbf38	fixing headers, using ollama.Responses in rechunk class, fixing reseverd words var usage, fixing embedding output, fixing model naming in frontend	2025-09-21 16:20:36 +02:00
alpha-nerd-nomyo	f0e181d6b8	improving queue logic for high load scenarios	2025-09-19 16:38:48 +02:00
alpha-nerd-nomyo	8fe3880af7	randomize endpoint selection for bootstrapping ollamas	2025-09-18 18:49:11 +02:00
alpha-nerd-nomyo	deca8e37ad	fixing model re-naming in /v1 endpoints and thinking in rechunk	2025-09-17 11:40:48 +02:00
alpha-nerd-nomyo	f4678018bf	adding thinking to rechunk class	2025-09-16 17:51:51 +02:00
alpha-nerd-nomyo	795873b4c9	finalizing compliance tasks	2025-09-15 19:12:00 +02:00
alpha-nerd-nomyo	16dba93c0d	compliance for ollama embeddings endpoints using openai models	2025-09-15 17:48:17 +02:00
alpha-nerd-nomyo	4b5834d7df	comliance with ollama naming conventions and openai model['id']	2025-09-15 17:39:15 +02:00

1 2

80 commits