Commit graph

177 commits

Author SHA1 Message Date
DESKTOP-RTLN3BA\$punk
17642493eb chore: linting 2026-03-31 14:45:46 -07:00
Anish Sarkar
526940e9fe fix: improve error handling and path retrieval in Dropbox indexing for better reliability 2026-03-30 23:51:21 +05:30
Anish Sarkar
d8d5102416 feat: introduce incremental sync option for Dropbox indexing, enhancing performance and user control 2026-03-30 23:27:48 +05:30
Anish Sarkar
1f12151e03 feat: implement Dropbox API client and folder management for enhanced file indexing 2026-03-30 22:17:50 +05:30
Anish Sarkar
04691d572b chore: ran linting 2026-03-30 01:50:41 +05:30
Anish Sarkar
5a3eece397 Merge remote-tracking branch 'upstream/dev' into feat/onedrive-connector 2026-03-29 11:55:06 +05:30
DESKTOP-RTLN3BA\$punk
2cc2d339e6 feat: made agent file sytem optimized 2026-03-28 16:39:46 -07:00
Anish Sarkar
5bddde60cb feat: implement Microsoft OneDrive connector with OAuth support and indexing capabilities 2026-03-28 14:31:25 +05:30
Anish Sarkar
4e0749f907 fix: update file skipping logic for failed documents in Google Drive indexer
- Modified the `_should_skip_file` function to skip previously failed documents during processing, improving error handling.
- Updated the corresponding test to reflect the new behavior, ensuring that failed documents are correctly identified and skipped during automatic sync.
2026-03-27 20:01:08 +05:30
Anish Sarkar
00934ff462 feat: enhance Google Drive client with improved logging and thread-safe operations
- Added logging to track the start and end of file download and export processes, improving visibility into execution time.
- Implemented per-thread HTTP transport for concurrent downloads and exports, ensuring thread safety.
- Refactored download and export methods to utilize resolved credentials, enhancing functionality.
- Updated unit tests to validate the new threading and logging features, ensuring robust parallel execution.
2026-03-27 19:25:45 +05:30
Anish Sarkar
0bc1c766ff feat: migrate Confluence and Jira indexers to unified parallel pipeline
- Refactored Confluence and Jira indexers to utilize the shared IndexingPipelineService for improved document processing.
- Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries.
- Modified the `index_confluence_pages` and `index_jira_issues` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting.
- Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.
2026-03-27 16:02:09 +05:30
Anish Sarkar
db6dd058dd feat: migrate Linear and Notion indexers to unified parallel pipeline
- Refactored Linear and Notion indexers to utilize the shared IndexingPipelineService for improved document deduplication, summarization, chunking, and embedding with bounded parallel indexing.
- Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries.
- Modified the `index_linear_issues` and `index_notion_pages` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting.
- Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.
2026-03-27 11:19:32 +05:30
Anish Sarkar
7c7f8b216c feat: implement batch indexing for selected Google Drive files
- Introduced `index_google_drive_selected_files` function to enable indexing of multiple user-selected files in parallel, improving efficiency.
- Refactored existing indexing logic to handle batch processing, including error handling for individual file failures.
- Added unit tests for the new batch indexing functionality, ensuring robustness and proper error collection during the indexing process.
2026-03-27 00:17:07 +05:30
Anish Sarkar
c016962064 feat: implement parallel file downloading and indexing in Google Drive indexer
- Added `_download_files_parallel` function to enable concurrent downloading of files from Google Drive, improving efficiency in document processing.
- Introduced `_download_and_index` function to handle the parallel downloading and indexing phases, streamlining the overall workflow.
- Updated `_index_full_scan` and `_index_with_delta_sync` methods to utilize the new parallel downloading functionality, enhancing performance.
- Added unit tests to validate the new parallel downloading and indexing logic, ensuring robustness and error handling during document processing.
2026-03-26 23:53:26 +05:30
Anish Sarkar
4fd776e7ef feat: implement parallel indexing for Google Calendar and Gmail connectors
- Refactored Google Calendar and Gmail indexers to utilize the new `index_batch_parallel` method for concurrent document indexing, enhancing performance.
- Updated the indexing logic to replace serial processing with parallel execution, allowing for improved efficiency in handling multiple documents.
- Adjusted logging and error handling to accommodate the new parallel processing approach, ensuring robust operation during indexing.
- Enhanced unit tests to validate the functionality of the parallel indexing method and its integration with existing workflows.
2026-03-26 19:34:04 +05:30
Anish Sarkar
c3d5c865fd fix: update file skipping logic in Google Drive indexer
- Modified the `_should_skip_file` function to prevent skipping of documents with a FAILED status, ensuring they are reprocessed even if their content remains unchanged.
- Added a new integration test to verify that FAILED documents are not skipped during the indexing process.
2026-03-25 18:51:40 +05:30
Anish Sarkar
f7b52470eb feat: enhance Google connectors indexing with content extraction and document migration
- Added `download_and_extract_content` function to extract content from Google Drive files as markdown.
- Updated Google Drive indexer to utilize the new content extraction method.
- Implemented document migration logic to update legacy Composio document types to their native Google types.
- Introduced identifier hashing for stable document identification.
- Improved file pre-filtering to handle unchanged and rename-only files efficiently.
2026-03-25 18:33:44 +05:30
CREDO23
5d8a62a4a6 merge upstream/dev into feat/migrate-electric-to-zero
Resolve 8 conflicts:
- Accept upstream deletion of 3 composio_*_connector.py (unified Google connectors)
- Accept our deletion of ElectricProvider.tsx, use-connectors-electric.ts,
  use-messages-electric.ts (replaced by Zero equivalents)
- Keep both new deps in package.json (@rocicorp/zero + @slate-serializers/html)
- Regenerate pnpm-lock.yaml
2026-03-24 17:40:34 +02:00
CREDO23
cf21eaacfc fix: critical timestamp parsing and audit fixes
- Fix timestamp conversion: String(epochMs) → new Date(epochMs).toISOString()
  in use-messages-sync, use-comments-sync, use-documents, use-inbox.
  Without this, date comparisons (isEdited, cutoff filters) would fail.
- Fix updated_at: undefined → null in use-inbox to match InboxItem type
- Fix ZeroProvider: skip Zero connection for unauthenticated users
- Clean 30+ stale "Electric SQL" comments in backend Python code
2026-03-23 19:49:28 +02:00
Anish Sarkar
2bc6a0c3bc chore: ran linting 2026-03-22 00:43:53 +05:30
Anish Sarkar
de8841fb86 chore: ran linting 2026-03-21 13:20:13 +05:30
Anish Sarkar
aaf34800e6 feat: enhance legacy document migration for Google connectors
- Implemented fallback logic in Google Calendar, Drive, and Gmail indexers to handle legacy Composio document types, ensuring smooth migration to native types.
- Updated document indexing functions to check for existing documents using both primary and legacy hashes, improving data integrity during indexing.
2026-03-20 03:39:05 +05:30
Anish Sarkar
8e7cda31c5 feat: update Google indexing functions to track skipped messages
- Modified the indexing functions for Google Calendar and Gmail to return the count of skipped messages alongside indexed messages, enhancing performance tracking.
- Updated related tests to accommodate the new return values, ensuring comprehensive coverage of the indexing process.
- Improved error handling to maintain consistency in returned values across different indexing functions.
2026-03-19 20:56:40 +05:30
Anish Sarkar
e9485ab2df feat: update Google Drive indexing to include skipped file tracking 2026-03-19 20:27:50 +05:30
Anish Sarkar
eac4cb6075 feat: enhance Google Drive indexing to track skipped files
- Updated the indexing function to return the count of skipped files alongside indexed files, improving tracking of indexing performance.
- Added logic to accumulate skipped file counts during the indexing process, providing better insights into potential issues.
- Enhanced notification updates to include skipped file counts, ensuring comprehensive progress reporting for users.
2026-03-19 20:27:36 +05:30
Anish Sarkar
2390bd7d26 feat: enhance Google Drive authentication error handling
- Improved error handling for Google Drive indexing and listing operations to manage authentication failures more effectively.
- Added logic to mark connectors as 'auth_expired' when a 401 error or invalid credentials are detected, prompting users to re-authenticate.
- Updated error messages to provide clearer guidance on authentication issues, ensuring a better user experience.
2026-03-19 18:24:41 +05:30
Anish Sarkar
83152e8e7e refactor: unify all 3 google Composio and non-Composio connector types and pipelines keeping same credential adapters 2026-03-19 05:08:21 +05:30
Anish Sarkar
ac0f2fa2eb chore: ran linting 2026-03-17 04:40:46 +05:30
DESKTOP-RTLN3BA\$punk
2b33dfe728 refactor: update safe_set_chunks function to be asynchronous and modify all connector and document processor files to use the new async implementation 2026-03-15 00:44:27 -07:00
DESKTOP-RTLN3BA\$punk
e9892c8fe9 feat: added configable summary calculation and various improvements
- Replaced direct embedding calls with a utility function across various components to streamline embedding logic.
- Added enable_summary flag to several models and routes to control summary generation behavior.
2026-02-26 18:24:57 -08:00
CREDO23
7d1bd1fab4 Implement KB sync after Notion page updates with block ID verification
- Add NotionKBSyncService for immediate KB updates after page changes
- Implement block ID verification to ensure content freshness
- Refactor duplicate block processing logic to shared utils
- Add user-friendly status messages
- Include debug logging for troubleshooting
2026-02-17 20:30:12 +02:00
Rohan Verma
26fd61fcbb
Merge pull request #796 from AnishSarkar22/feat/sur-149-batch-index
impr: batch index for messaging connectors & some fixes
2026-02-09 15:00:16 -08:00
DESKTOP-RTLN3BA\$punk
17b7348f61 feat: fixed and improved search and background task management. 2026-02-09 14:03:56 -08:00
Anish Sarkar
20ab128b05 feat: implement batch indexing for Microsoft Teams messages to improve efficiency and conversational context 2026-02-09 14:31:22 +05:30
Anish Sarkar
e2dd80c604 chore: ran linting 2026-02-08 12:43:31 +05:30
Anish Sarkar
7cede99d29 feat: implement batch indexing for Slack messages to enhance efficiency and conversational context 2026-02-07 18:30:06 +05:30
Anish Sarkar
98870a9f9a feat: implement batch indexing for Discord messages to improve efficiency and context 2026-02-07 18:26:29 +05:30
Anish Sarkar
0fdd194d92 Merge remote-tracking branch 'upstream/dev' into fix/documents 2026-02-06 12:13:26 +05:30
DESKTOP-RTLN3BA\$punk
1511c26ef5 feat: add residential proxy configuration for web crawling and YouTube transcript fetching 2026-02-05 20:44:13 -08:00
Anish Sarkar
aa66928154 chore: ran linting 2026-02-06 05:35:15 +05:30
Anish Sarkar
cc1e796c12 feat: implement two-phase document indexing for webcrawler and YouTube video processors with real-time status updates 2026-02-06 04:54:50 +05:30
Anish Sarkar
629f6f9cf5 feat: implement two-phase document indexing for Obsidian and Circleback connectors with real-time status updates 2026-02-06 04:35:13 +05:30
Anish Sarkar
0f61a249c0 feat: implement two-phase document indexing for BookStack, Elasticsearch, and Luma connectors with real-time status updates 2026-02-06 04:31:55 +05:30
Anish Sarkar
bfa3be655e feat: implement two-phase document indexing for ClickUp and GitHub connectors with real-time status updates 2026-02-06 04:06:14 +05:30
Anish Sarkar
1d870e45a4 feat: implement two-phase document indexing for Confluence and Jira connectors with real-time status updates 2026-02-06 03:54:24 +05:30
Anish Sarkar
0249ea20a5 feat: implement two-phase document indexing for Discord and Teams connectors with real-time status updates 2026-02-06 03:42:03 +05:30
Anish Sarkar
2077344934 feat: implement two-phase document indexing for Linear and Slack connectors with real-time status updates 2026-02-06 02:59:21 +05:30
Anish Sarkar
c12401c1e8 feat: implement two-phase document indexing across Google connectors with real-time status updates 2026-02-06 02:24:35 +05:30
Manoj Aggarwal
e6c0fabd0a Merge branch 'dev' into bugs_prod 2026-02-05 10:53:16 -08:00
Anish Sarkar
3bbac0d4ea feat: implement two-phase document indexing for Airtable and Notion connectors with real-time status updates 2026-02-06 00:12:48 +05:30