Gmail indexer was using a hardcoded 30-day default instead of respecting
last_indexed_at like other connectors. Now uses calculate_date_range()
for consistent behavior (last_indexed_at → now, or 365 days for first run).
Prevent UniqueViolationError on ix_documents_content_hash constraint by
adding check_duplicate_document_by_hash() before inserting new documents
in 15 connector indexers that were missing this check.
Affected: clickup, luma, linear, jira, google_gmail, confluence,
bookstack, github, webcrawler, teams, slack, notion, discord,
airtable, obsidian indexers.
- Modified the Google Drive indexer to use SQLAlchemy's cast function for querying document metadata, ensuring proper type handling for file IDs.
- Improved the consistency of metadata queries across the indexing functions, enhancing reliability in document retrieval and processing.
- Updated error handling in the indexing functions for BookStack, Confluence, Google Calendar, Jira, Linear, and Luma connectors to log specific error messages when failures occur.
- Enhanced logging for cases where no pages or events are found, providing clearer informational messages instead of treating them as critical errors.
- Ensured consistent error reporting across all connector indexers, improving debugging and user feedback during indexing operations.
- Enhanced Google Calendar and Composio connector indexing to track and log duplicate content, preventing re-indexing of already processed events.
- Implemented robust error handling during final commits to manage integrity errors gracefully, ensuring successful indexing despite potential duplicates.
- Updated notification service to differentiate between actual errors and warnings for duplicate content, improving user feedback.
- Refactored date handling to ensure valid date ranges and adjusted end dates when necessary for better indexing accuracy.
- Added normalization for "undefined" strings to None in date parameters to prevent parsing errors.
- Improved date range validation to ensure start_date is strictly before end_date, adjusting end_date if necessary.
- Updated Google Calendar and Composio connector indexing logic to handle duplicate content more effectively, logging warnings for skipped events.
- Enhanced error handling during final commits to manage integrity errors gracefully.
- Refactored date handling in various connector indexers for consistency and reliability.
- Added user-friendly re-authentication messages for expired or revoked tokens in both Google Calendar and Gmail connectors.
- Updated error handling in indexing tasks to log specific authentication errors and provide clearer feedback to users.
- Enhanced the connector UI to handle indexing failures more effectively, improving overall user experience.
- Refactored GitHubConnector to utilize gitingest CLI via subprocess, improving performance and avoiding async issues with Celery.
- Updated ingestion method to handle repository digests more efficiently, including error handling for subprocess execution.
- Adjusted GitHub indexer to call the new synchronous ingestion method.
- Clarified documentation regarding the optional nature of the Personal Access Token for public repositories.
- Added gitingest as a dependency to streamline the ingestion of GitHub repositories.
- Refactored GitHubConnector to utilize gitingest for efficient repository digest generation, reducing API calls.
- Updated GitHub indexer to process entire repository digests, enhancing performance and simplifying the indexing process.
- Modified GitHub connect form to indicate that the Personal Access Token is optional for public repositories.
- Updated Google Drive API calls to include md5Checksum in file metadata retrieval for improved content tracking.
- Added logic to check for rename-only updates based on md5Checksum, optimizing document processing by preventing unnecessary ETL operations for unchanged content.
- Enhanced existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
- Implemented support for both new file_id-based and legacy filename-based hash schemes in document processing.
- Added functions to generate unique identifier hashes and find existing documents with migration support.
- Improved existing document update logic to handle content changes and metadata updates, particularly for Google Drive files.
- Enhanced UI components to display appropriate file icons based on file types in the Google Drive connector.
- Updated document processing functions to accommodate the new connector structure and ensure seamless integration.
- Added logic to refresh connector and notification attributes after indexing to ensure up-to-date information.
- Enhanced periodic sync configuration to disable the option when no folders or files are selected for Google Drive, providing user feedback through a message.
- Updated the connector edit view to reflect the new disabled state for periodic sync based on selected items.
- Implemented validation in the connector dialog to prevent enabling periodic sync without selected items, improving user experience.
- Updated the Google Drive indexing functionality to include indexing options such as max files per folder, incremental sync, and inclusion of subfolders.
- Modified the API to accept a new 'indexing_options' parameter in the request body.
- Enhanced the UI to allow users to configure these options when selecting folders and files for indexing.
- Updated related components and tasks to support the new indexing options, ensuring a more flexible and efficient indexing process.
- Enhanced error handling in the indexing process to differentiate between actual failures and cases where no new documents are processed.
- Updated notification messages to reflect the status accurately, including a message for when no new items are synced.
- Standardized return values across various indexer tasks to return `None` on success, simplifying logging and error management.
- Updated date handling in indexing functions to permit future dates for Google Calendar and Luma connectors.
- Enhanced UI components to support future date selection, including a new button for selecting the next 30 days.
- Adjusted documentation and descriptions to clarify date range options for users.
- Added ClickUp OAuth authentication flow with new environment variables for client ID, client secret, and redirect URI.
- Introduced ClickUpHistoryConnector to manage OAuth-based authentication and token refresh for ClickUp API access.
- Created ClickUp connector routes for OAuth flow, including authorization and callback handling.
- Updated indexing logic to utilize the new ClickUpHistoryConnector, supporting both OAuth and legacy API token methods.
- Enhanced frontend components to reflect the new ClickUp integration and removed legacy API token forms.
- Introduced AirtableHistoryConnector to manage OAuth-based authentication and token refresh for Airtable API access.
- Added date string validation in AirtableConnector to ensure valid date inputs before processing.
- Updated indexing logic to utilize the new AirtableHistoryConnector, improving credential management and token handling.