Commit graph

80 commits

Author SHA1 Message Date
3e3f0dd383 fix: endpoint selection logic 2026-01-19 14:21:08 +01:00
5ad5bfe66e feat: endpoint selection more consistent and understandable 2026-01-18 09:31:53 +01:00
067cdf641a feat: add timestamp index and improve cache concurrency
- Added index on token_time_series timestamp for faster queries
- Introduced cache locks to prevent race conditions
2026-01-16 16:47:24 +01:00
20a016269d feat:
added buffer_lock to prevent race condition in high concurrency scenarios
added documentation
2026-01-05 17:16:31 +01:00
19a13cc613 fix(enhance.py): correct typo in function name from 'moe_select_candiadate' to 'moe_select_candidate'
feat(router.py): add helper function _make_chat_request for handling enhancing chat requests to endpoints
2025-12-15 10:35:56 +01:00
5eb5490d16 feat: improve model version handling in endpoint selection
Add logic to only append ":latest" suffix to models without existing version suffixes, preventing duplicate version tags and ensuring correct endpoint selection for models following Ollama naming conventions.
2025-12-14 17:58:45 +01:00
3ccaf78e5d fix: simplify model version handling in proxy functions
Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.
2025-12-13 12:34:24 +01:00
34d6abd28b refactor: optimize token aggregation query and enhance chat proxy
- Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance
- Combined import statements in db.py and router.py to reduce lines of code
- Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation
- Added last_user_content() helper function to extract user content from messages
- Improved code readability and maintainability through these structural changes
2025-12-13 11:58:49 +01:00
59a8ef3abb refactor: use a persistent WAL-enabled connection with async locks
- Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access
- Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect
- Add close method to gracefully close the persistent connection
- Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup
- Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table
- Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly

addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint
2025-12-02 12:18:23 +01:00
0ffb321154 fixing total stats model, button, labels and code clean up 2025-11-28 14:59:29 +01:00
1c3f9a9dc4 fix model naming to allow correct decrement usage counter in /v1 endpoints 2025-11-24 09:33:54 +01:00
7b50a5a299 adding usage metrics to /v1 endpoints if stream == True 2025-11-21 09:56:42 +01:00
aa23a4dd81 fixing timezone issues 2025-11-20 12:53:18 +01:00
3f77a8ec62 chart enhancements 2025-11-19 17:28:31 +01:00
79a7ca972b initial chart view 2025-11-19 17:05:25 +01:00
541f2826e0 fixing token_queue, prepping chart view 2025-11-18 19:02:36 +01:00
baf5d98318 adding token timeseries counting in db for future data viz 2025-11-18 11:16:21 +01:00
8a05f2ac44 cache loaded models to decrease load on ollamas 2025-11-17 14:40:24 +01:00
4c7ebb5af4 cancel token_worker_task only if running 2025-11-14 15:53:26 +01:00
b9933e000f rollback - needs more logic in v1/embedding 2025-11-13 13:32:46 +01:00
9f90bc9cd0 fixing /v1/embedding ollama notations 2025-11-13 12:40:40 +01:00
8aef941385 stopping the token_worker_task gracefully on shutdown 2025-11-13 10:13:10 +01:00
f14d9dc7da don't query non-Ollama endpoints for health status 2025-11-13 10:06:23 +01:00
1427e98e6d various performance improvements and json replacement orjson 2025-11-10 15:37:46 +01:00
c6c1059ede
Merge pull request #12 from nomyo-ai/dev-v0.4.x
token usage counter for non-stream openai ollama endpoints and improvements
2025-11-08 11:54:33 +01:00
YetheSamartaka
9a4bcb6f97 Add Docker support
Adds comprehensive docker support
2025-11-07 13:59:16 +01:00
47a39184ad token usage counter for non-stream openai ollama endpoints added 2025-11-06 14:27:34 +01:00
4c9ec5b1b2 record and display total token usage on ollama endpoints using ollama client 2025-11-04 17:55:19 +01:00
9007f686c2 performance increase of iso8601_ns ~49% 2025-10-30 10:17:18 +01:00
26dcbf9c02 fixing app logic and eventListeners in frontend 2025-10-30 09:06:21 +01:00
3585f90437 fixing typos and smaller issues 2025-10-28 11:08:52 +01:00
b72673d693 check for base64 encoded images and remove alpha channel 2025-10-03 10:04:50 +02:00
11f6e2dca6 data-url handling and removing alpha channel in images 2025-09-24 18:10:17 +02:00
e66c0ed0fc new requirement for image preprocessing to downsize and convert to png for faster and safer transaction 2025-09-24 11:46:38 +02:00
738d981157 poc: messsage translation with images 2025-09-23 17:33:15 +02:00
8327ab4ae1 rm print statements 2025-09-23 14:47:55 +02:00
fcfabbe926 mitigating div by zero due to google genai sending completion_token=0 in first chunk 2025-09-23 13:08:17 +02:00
a74cc5be0f fixing endpoint usage metrics 2025-09-23 12:51:37 +02:00
19df75afa9 fixing types and params 2025-09-22 19:01:14 +02:00
c43dc4139f adding optional parameters in ollama to openai translation 2025-09-22 14:04:19 +02:00
18d2fca027 formatting Response Objects in rechunk and fixing TypeErrors in /api/chat and /api/generate 2025-09-22 09:30:27 +02:00
aeca77c1a1 formatting, condensing rechunk 2025-09-21 16:33:43 +02:00
43d95fbf38 fixing headers, using ollama.Responses in rechunk class, fixing reseverd words var usage, fixing embedding output, fixing model naming in frontend 2025-09-21 16:20:36 +02:00
f0e181d6b8 improving queue logic for high load scenarios 2025-09-19 16:38:48 +02:00
8fe3880af7 randomize endpoint selection for bootstrapping ollamas 2025-09-18 18:49:11 +02:00
deca8e37ad fixing model re-naming in /v1 endpoints and thinking in rechunk 2025-09-17 11:40:48 +02:00
f4678018bf adding thinking to rechunk class 2025-09-16 17:51:51 +02:00
795873b4c9 finalizing compliance tasks 2025-09-15 19:12:00 +02:00
16dba93c0d compliance for ollama embeddings endpoints using openai models 2025-09-15 17:48:17 +02:00
4b5834d7df comliance with ollama naming conventions and openai model['id'] 2025-09-15 17:39:15 +02:00