- Modified the Google Drive indexer to use SQLAlchemy's cast function for querying document metadata, ensuring proper type handling for file IDs.
- Improved the consistency of metadata queries across the indexing functions, enhancing reliability in document retrieval and processing.
- Updated error handling in the indexing functions for BookStack, Confluence, Google Calendar, Jira, Linear, and Luma connectors to log specific error messages when failures occur.
- Enhanced logging for cases where no pages or events are found, providing clearer informational messages instead of treating them as critical errors.
- Ensured consistent error reporting across all connector indexers, improving debugging and user feedback during indexing operations.
- Enhanced Google Calendar and Composio connector indexing to track and log duplicate content, preventing re-indexing of already processed events.
- Implemented robust error handling during final commits to manage integrity errors gracefully, ensuring successful indexing despite potential duplicates.
- Updated notification service to differentiate between actual errors and warnings for duplicate content, improving user feedback.
- Refactored date handling to ensure valid date ranges and adjusted end dates when necessary for better indexing accuracy.
- Added normalization for "undefined" strings to None in date parameters to prevent parsing errors.
- Improved date range validation to ensure start_date is strictly before end_date, adjusting end_date if necessary.
- Updated Google Calendar and Composio connector indexing logic to handle duplicate content more effectively, logging warnings for skipped events.
- Enhanced error handling during final commits to manage integrity errors gracefully.
- Refactored date handling in various connector indexers for consistency and reliability.
- Added user-friendly re-authentication messages for expired or revoked tokens in both Google Calendar and Gmail connectors.
- Updated error handling in indexing tasks to log specific authentication errors and provide clearer feedback to users.
- Enhanced the connector UI to handle indexing failures more effectively, improving overall user experience.
- Added methods to retrieve the starting page token and list changes in Google Drive, enabling delta sync capabilities.
- Updated Composio service to handle file download directory configuration.
- Modified indexing tasks to support delta sync, improving efficiency by processing only changed files.
- Adjusted date handling in connector tasks to allow optional start and end dates.
- Improved error handling and logging throughout the Composio indexing process.
- Enhanced the handling of file content from Composio, supporting both binary and text files with appropriate processing methods.
- Introduced robust error logging and handling for file content extraction, ensuring better visibility into issues during processing.
- Updated the indexing logic to accommodate new content processing methods, improving overall reliability and user feedback on errors.
- Added temporary file handling for binary files to facilitate text extraction using the ETL service.
- Added a new endpoint to list folders and files in a user's Composio Google Drive, supporting hierarchical structure.
- Implemented UI components for selecting specific folders and files to index, improving user control over indexing options.
- Introduced indexing options for maximum files per folder and inclusion of subfolders, allowing for customizable indexing behavior.
- Enhanced error handling and logging for Composio Drive operations, ensuring better visibility into issues during file retrieval and indexing.
- Updated the Composio configuration component to reflect new selection capabilities and indexing options.
- Updated the list_gmail_messages method to support pagination with page tokens, allowing for more efficient message retrieval.
- Modified the return structure to include next_page_token and result_size_estimate for better client-side handling.
- Improved error handling and logging throughout the Gmail indexing process, ensuring better visibility into failures.
- Implemented batch processing for Gmail messages, committing changes incrementally to prevent data loss.
- Ensured consistent timestamp updates for connectors, even when no documents are indexed, to maintain accurate UI states.
- Refactored the indexing logic to streamline message processing and enhance overall performance.
- Introduced new enum values for Composio connectors: COMPOSIO_GOOGLE_DRIVE_CONNECTOR, COMPOSIO_GMAIL_CONNECTOR, and COMPOSIO_GOOGLE_CALENDAR_CONNECTOR.
- Updated database migration to add these new enum values to the relevant types.
- Refactored Composio integration logic to handle specific connector types, improving the management of connected accounts and indexing processes.
- Enhanced frontend components to support the new Composio connector types, including updated UI elements and connector configuration handling.
- Improved backend services to manage Composio connected accounts more effectively, including deletion and indexing tasks.
- Refactored GitHubConnector to utilize gitingest CLI via subprocess, improving performance and avoiding async issues with Celery.
- Updated ingestion method to handle repository digests more efficiently, including error handling for subprocess execution.
- Adjusted GitHub indexer to call the new synchronous ingestion method.
- Clarified documentation regarding the optional nature of the Personal Access Token for public repositories.
- Added gitingest as a dependency to streamline the ingestion of GitHub repositories.
- Refactored GitHubConnector to utilize gitingest for efficient repository digest generation, reducing API calls.
- Updated GitHub indexer to process entire repository digests, enhancing performance and simplifying the indexing process.
- Modified GitHub connect form to indicate that the Personal Access Token is optional for public repositories.
- Updated Google Drive API calls to include md5Checksum in file metadata retrieval for improved content tracking.
- Added logic to check for rename-only updates based on md5Checksum, optimizing document processing by preventing unnecessary ETL operations for unchanged content.
- Enhanced existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
- Added support for content hash fallback in document migration to prevent duplicate entries from different sources.
- Improved existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
- Updated functions to check for existing documents with enhanced logging for better traceability of duplicate content detection.
- Implemented support for both new file_id-based and legacy filename-based hash schemes in document processing.
- Added functions to generate unique identifier hashes and find existing documents with migration support.
- Improved existing document update logic to handle content changes and metadata updates, particularly for Google Drive files.
- Enhanced UI components to display appropriate file icons based on file types in the Google Drive connector.
- Updated document processing functions to accommodate the new connector structure and ensure seamless integration.
- Added logic to refresh connector and notification attributes after indexing to ensure up-to-date information.
- Enhanced periodic sync configuration to disable the option when no folders or files are selected for Google Drive, providing user feedback through a message.
- Updated the connector edit view to reflect the new disabled state for periodic sync based on selected items.
- Implemented validation in the connector dialog to prevent enabling periodic sync without selected items, improving user experience.
- Updated the Google Drive indexing functionality to include indexing options such as max files per folder, incremental sync, and inclusion of subfolders.
- Modified the API to accept a new 'indexing_options' parameter in the request body.
- Enhanced the UI to allow users to configure these options when selecting folders and files for indexing.
- Updated related components and tasks to support the new indexing options, ensuring a more flexible and efficient indexing process.
- Simplified chat document processing display by removing the book emoji for a cleaner look.
- Enhanced the greeting function to prioritize user display names over email for a more personalized experience.
- Adjusted the ChatShareButton component by removing unused imports and unnecessary elements for better clarity and performance.
- Updated the title in the Electric SQL documentation for conciseness.
- Enhanced error handling in the indexing process to differentiate between actual failures and cases where no new documents are processed.
- Updated notification messages to reflect the status accurately, including a message for when no new items are synced.
- Standardized return values across various indexer tasks to return `None` on success, simplifying logging and error management.
- Added session refresh for notifications to prevent stale data after rollbacks in multiple document processing tasks.
- Wrapped notification update logic in try-except blocks to handle potential failures gracefully and log errors without crashing the process.
- Improved error handling for notification updates in various document processing functions, enhancing overall robustness.
- Introduced a new DocumentProcessingNotificationHandler to manage notifications for document processing stages.
- Updated existing notification methods to include detailed progress updates for various stages (queued, parsing, chunking, embedding, storing, completed, failed).
- Refactored NotificationService to support the new document processing notification type and metadata schema.
- Updated multiple document processing tasks to create and manage notifications throughout the processing lifecycle.
- Adjusted UI components to reflect changes in notification types and improve user experience during document uploads and processing.