feat add:
Multiple Opinions Ensemble
prefix any ollama model with "moe-" on /api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get 3 different response variants.
Each variant is then revisited and finally scored to find the best response among them all and finally returned to the user.
Runs longer, uses more tokens for expected better quality response.
Mixture of Experts:
- prefix any ollama model with "moe-" on api/chat and the original user request gets passed to the selected model 3 times with temp=1 to get response variants. Each variant is then revisited and finally scored to find the best repsonse among them all and finally returned to the user. Runs longer, uses more tokens for expected better quality response.
Add logic to only append ":latest" suffix to models without existing version suffixes, preventing duplicate version tags and ensuring correct endpoint selection for models following Ollama naming conventions.
Simplify the logic for handling model versions in `openai_chat_completions_proxy` and `openai_completions_proxy` by removing redundant conditions and initializing `local_model` earlier. This makes the code more readable and maintains the same functionality.
- Refactored token aggregation query in db.py to use a single SQL query with SUM() instead of iterating through rows, improving performance
- Combined import statements in db.py and router.py to reduce lines of code
- Enhanced chat proxy in router.py to handle "moe-" prefixed models with multiple query execution and critique generation
- Added last_user_content() helper function to extract user content from messages
- Improved code readability and maintainability through these structural changes
- Introduce a lazily initialized, shared aiosqlite connection stored in self._db and two asyncio locks (_db_lock, _operation_lock) for safe concurrent access
- Ensure the database directory exists before connecting and enable WAL journaling and foreign keys on first connect
- Add close method to gracefully close the persistent connection
- Guard initialization and write operations with _operation_lock to ensure single-threaded schema setup
- Switch to ON CONFLICT UPSERT for token_counts updates and initialize token_time_series table
- Add typing for _db (Optional[aiosqlite.Connection]) and adjust imports accordingly
addition: Frontend button with total stats aggregation task and feedback span element to keep user informed and a small database footprint