3ccaf78e5d
fix: simplify model version handling in proxy functions
...
Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.
2025-12-13 12:34:24 +01:00
34d6abd28b
refactor: optimize token aggregation query and enhance chat proxy
...
- Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance
- Combined import statements in db.py and router.py to reduce lines of code
- Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation
- Added last_user_content() helper function to extract user content from messages
- Improved code readability and maintainability through these structural changes
2025-12-13 11:58:49 +01:00
59a8ef3abb
refactor: use a persistent WAL-enabled connection with async locks
...
- Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access
- Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect
- Add close method to gracefully close the persistent connection
- Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup
- Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table
- Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly
addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint
2025-12-02 12:18:23 +01:00
0ffb321154
fixing total stats model, button, labels and code clean up
2025-11-28 14:59:29 +01:00
1c3f9a9dc4
fix model naming to allow correct decrement usage counter in /v1 endpoints
2025-11-24 09:33:54 +01:00
7b50a5a299
adding usage metrics to /v1 endpoints if stream == True
2025-11-21 09:56:42 +01:00
aa23a4dd81
fixing timezone issues
2025-11-20 12:53:18 +01:00
3f77a8ec62
chart enhancements
2025-11-19 17:28:31 +01:00
79a7ca972b
initial chart view
2025-11-19 17:05:25 +01:00
541f2826e0
fixing token_queue, prepping chart view
2025-11-18 19:02:36 +01:00
baf5d98318
adding token timeseries counting in db for future data viz
2025-11-18 11:16:21 +01:00
8a05f2ac44
cache loaded models to decrease load on ollamas
2025-11-17 14:40:24 +01:00
4c7ebb5af4
cancel token_worker_task only if running
2025-11-14 15:53:26 +01:00
b9933e000f
rollback - needs more logic in v1/embedding
2025-11-13 13:32:46 +01:00
9f90bc9cd0
fixing /v1/embedding ollama notations
2025-11-13 12:40:40 +01:00
8aef941385
stopping the token_worker_task gracefully on shutdown
2025-11-13 10:13:10 +01:00
f14d9dc7da
don't query non-Ollama endpoints for health status
2025-11-13 10:06:23 +01:00
1427e98e6d
various performance improvements and json replacement orjson
2025-11-10 15:37:46 +01:00
c6c1059ede
Merge pull request #12 from nomyo-ai/dev-v0.4.x
...
token usage counter for non-stream openai ollama endpoints and improvements
2025-11-08 11:54:33 +01:00
YetheSamartaka
9a4bcb6f97
Add Docker support
...
Adds comprehensive docker support
2025-11-07 13:59:16 +01:00
47a39184ad
token usage counter for non-stream openai ollama endpoints added
2025-11-06 14:27:34 +01:00
4c9ec5b1b2
record and display total token usage on ollama endpoints using ollama client
2025-11-04 17:55:19 +01:00
9007f686c2
performance increase of iso8601_ns ~49%
2025-10-30 10:17:18 +01:00
26dcbf9c02
fixing app logic and eventListeners in frontend
2025-10-30 09:06:21 +01:00
3585f90437
fixing typos and smaller issues
2025-10-28 11:08:52 +01:00
b72673d693
check for base64 encoded images and remove alpha channel
2025-10-03 10:04:50 +02:00
11f6e2dca6
data-url handling and removing alpha channel in images
2025-09-24 18:10:17 +02:00
e66c0ed0fc
new requirement for image preprocessing to downsize and convert to png for faster and safer transaction
2025-09-24 11:46:38 +02:00
738d981157
poc: messsage translation with images
2025-09-23 17:33:15 +02:00
8327ab4ae1
rm print statements
2025-09-23 14:47:55 +02:00
fcfabbe926
mitigating div by zero due to google genai sending completion_token=0 in first chunk
2025-09-23 13:08:17 +02:00
a74cc5be0f
fixing endpoint usage metrics
2025-09-23 12:51:37 +02:00
19df75afa9
fixing types and params
2025-09-22 19:01:14 +02:00
c43dc4139f
adding optional parameters in ollama to openai translation
2025-09-22 14:04:19 +02:00
18d2fca027
formatting Response Objects in rechunk and fixing TypeErrors in /api/chat and /api/generate
2025-09-22 09:30:27 +02:00
aeca77c1a1
formatting, condensing rechunk
2025-09-21 16:33:43 +02:00
43d95fbf38
fixing headers, using ollama.Responses in rechunk class, fixing reseverd words var usage, fixing embedding output, fixing model naming in frontend
2025-09-21 16:20:36 +02:00
f0e181d6b8
improving queue logic for high load scenarios
2025-09-19 16:38:48 +02:00
8fe3880af7
randomize endpoint selection for bootstrapping ollamas
2025-09-18 18:49:11 +02:00
deca8e37ad
fixing model re-naming in /v1 endpoints and thinking in rechunk
2025-09-17 11:40:48 +02:00
f4678018bf
adding thinking to rechunk class
2025-09-16 17:51:51 +02:00
795873b4c9
finalizing compliance tasks
2025-09-15 19:12:00 +02:00
16dba93c0d
compliance for ollama embeddings endpoints using openai models
2025-09-15 17:48:17 +02:00
4b5834d7df
comliance with ollama naming conventions and openai model['id']
2025-09-15 17:39:15 +02:00
da8b165f4a
fixing openai models relabling for ollama client libs
2025-09-15 17:00:53 +02:00
ed84be2760
relabling openai models with ollama compatible tags
2025-09-15 11:57:00 +02:00
6c9ffad834
adding ollama embeddings conversion calls to openai endpoint
2025-09-15 11:47:55 +02:00
bd21906687
fixing /v1/embeddings
2025-09-15 09:04:38 +02:00
49b1ea16d0
hotfix ep2base
2025-09-13 18:11:05 +02:00
9ea852f154
adding fetch class and ollama client completions on openai endpoints
2025-09-13 16:57:09 +02:00