- Enhanced Azure Document Intelligence parser to raise an error for empty or whitespace-only content.
- Updated LLMRouterService to log premium model strings more clearly.
- Added automatic encoding detection for file reading in document processors.
- Improved error handling for empty markdown content extraction in file processors.
- Refactored DocumentUploadTab component for better accessibility and user interaction.
- Introduced a `ProcessingMode` enum to differentiate between basic and premium processing modes.
- Updated `EtlRequest` to include a `processing_mode` field, defaulting to basic.
- Enhanced ETL pipeline services to utilize the selected processing mode for Azure Document Intelligence and LlamaCloud parsing.
- Modified various routes and services to handle processing mode, affecting document upload and indexing tasks.
- Improved error handling and logging to include processing mode details.
- Added tests to validate processing mode functionality and its impact on ETL operations.
- Updated maximum file size limit to 500 MB per file.
- Removed restrictions on the number of files per upload and total upload size.
- Enhanced handling of user-mentioning documents in the knowledge base search middleware.
- Improved document reading and processing logic to accommodate new features and optimizations.
- Introduced a mapping of native Google document types to their legacy Composio equivalents, ensuring seamless search and indexing for both types.
- Updated relevant components to utilize the new mapping, enhancing the consistency of document type handling across the application.
- Improved search functionality to transparently include legacy types, maintaining accessibility for older documents until re-indexed.
- Replaced direct embedding calls with a utility function across various components to streamline embedding logic.
- Added enable_summary flag to several models and routes to control summary generation behavior.
- Added support for content hash fallback in document migration to prevent duplicate entries from different sources.
- Improved existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
- Updated functions to check for existing documents with enhanced logging for better traceability of duplicate content detection.
- Implemented support for both new file_id-based and legacy filename-based hash schemes in document processing.
- Added functions to generate unique identifier hashes and find existing documents with migration support.
- Improved existing document update logic to handle content changes and metadata updates, particularly for Google Drive files.
- Enhanced UI components to display appropriate file icons based on file types in the Google Drive connector.
- Updated document processing functions to accommodate the new connector structure and ensure seamless integration.
- Introduced a new DocumentProcessingNotificationHandler to manage notifications for document processing stages.
- Updated existing notification methods to include detailed progress updates for various stages (queued, parsing, chunking, embedding, storing, completed, failed).
- Refactored NotificationService to support the new document processing notification type and metadata schema.
- Updated multiple document processing tasks to create and manage notifications throughout the processing lifecycle.
- Adjusted UI components to reflect changes in notification types and improve user experience during document uploads and processing.