Add full MiniMax provider support across the entire stack:
Backend:
- Add MINIMAX to LiteLLMProvider enum in db.py
- Add MINIMAX mapping to all provider_map dicts in llm_service.py,
llm_router_service.py, and llm_config.py
- Add Alembic migration (rev 106) for PostgreSQL enum
- Add MiniMax M2.5 example in global_llm_config.example.yaml
Frontend:
- Add MiniMax to LLM_PROVIDERS enum with apiBase
- Add MiniMax-M2.5 and MiniMax-M2.5-highspeed to LLM_MODELS
- Add MINIMAX to Zod validation schema
- Add MiniMax SVG icon and wire up in provider-icons
Docs:
- Add MiniMax setup guide in chinese-llm-setup.md
MiniMax uses an OpenAI-compatible API (https://api.minimax.io/v1)
with models supporting up to 204K context window.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added `_attach_model_profile` function to attach model context metadata to `ChatLiteLLM`.
- Updated `create_chat_litellm_from_config` and `create_chat_litellm_from_agent_config` to utilize the new profile attachment.
- Improved context profile caching in `llm_router_service.py` to include both minimum and maximum input tokens, along with token model names for better context management.
- Introduced new methods for token counting and context trimming based on model profiles.
- Introduced a mechanism to identify degenerate queries that lack meaningful search signals, improving search accuracy.
- Implemented a fallback method for browsing recent documents when queries are degenerate, ensuring relevant results are returned.
- Added limits on the number of chunks fetched per document to optimize performance and prevent excessive data loading.
- Updated the ConnectorService to allow for reusable query embeddings, enhancing efficiency in search operations.
- Enhanced LLM router service to support context window fallbacks, improving robustness during context window limitations.
- Increased maximum file upload limit from 10 to 50 to improve user experience.
- Implemented batch processing for document uploads to avoid proxy timeouts, splitting files into manageable chunks.
- Enhanced garbage collection in chat streaming functions to prevent memory leaks and improve performance.
- Added memory delta tracking in system snapshots for better monitoring of resource usage.
- Updated LLM router and service configurations to prevent unbounded internal accumulation and improve efficiency.
- Improved in-memory rate limiting by evicting timestamps outside the current window and cleaning up empty keys.
- Updated LLM router service to cache context profiles and avoid redundant computations.
- Introduced cache eviction logic for MCP tools and sandbox instances to manage memory usage effectively.
- Added garbage collection triggers in chat streaming functions to reclaim resources promptly.
- improved search_knowledgebase_tool
- Added new endpoint to batch-fetch comments for multiple messages, reducing the number of API calls.
- Introduced CommentBatchRequest and CommentBatchResponse schemas for handling batch requests and responses.
- Updated chat_comments_service to validate message existence and permissions before fetching comments.
- Enhanced frontend with useBatchCommentsPreload hook to optimize comment loading for assistant messages.
- Introduced RequestPerfMiddleware to log request performance metrics, including slow request thresholds.
- Updated various services and retrievers to utilize the new performance logging utility for better tracking of execution times.
- Enhanced existing methods with detailed performance logs for operations such as embedding, searching, and indexing.
- Removed deprecated logging setup in stream_new_chat and replaced it with the new performance logger.
- Introduced dynamic character budget calculation for document formatting based on model's context window.
- Updated `format_documents_for_context` to respect character limits and improve output quality.
- Added `max_input_tokens` parameter to various functions to facilitate context-aware processing.
- Enhanced error handling for context overflow in LLM router service.
- Replaced direct embedding calls with a utility function across various components to streamline embedding logic.
- Added enable_summary flag to several models and routes to control summary generation behavior.