Commit graph

26 commits

Author SHA1 Message Date
CREDO23
f556446d07 Fix test mocks for vision_llm kwarg 2026-04-10 18:20:49 +02:00
Anish Sarkar
0a26a6c5bb chore: ran linting 2026-04-07 05:55:39 +05:30
Anish Sarkar
aba5f6a124 refactor: improve file handling logic in Dropbox and OneDrive connectors to include unsupported file extension information 2026-04-07 05:19:23 +05:30
Anish Sarkar
122be76133 refactor: update _index_selected_files method signatures in Dropbox, Google Drive, and OneDrive indexers to include unsupported file count, enhancing error reporting and consistency across connectors 2026-04-07 03:16:46 +05:30
Anish Sarkar
3a1d700817 refactor: enhance file skipping logic across Dropbox, Google Drive, and OneDrive connectors to return unsupported extensions, improving error reporting and maintainability 2026-04-07 03:16:34 +05:30
Anish Sarkar
e7beeb2a36 refactor: unify file skipping logic across Dropbox, Google Drive, and OneDrive connectors by replacing classification checks with a centralized service-based approach, enhancing maintainability and consistency in file handling 2026-04-07 02:19:31 +05:30
Anish Sarkar
f03bf05aaa refactor: enhance Google Drive indexer to support file extension filtering, improving file handling and error reporting 2026-04-06 22:34:49 +05:30
Anish Sarkar
63a75052ca Merge remote-tracking branch 'upstream/dev' into feat/unified-etl-pipeline 2026-04-06 22:04:51 +05:30
Anish Sarkar
caca491774 test: add unit tests for Dropbox integration, covering delta sync methods, file type filtering, and re-authentication behavior 2026-04-06 18:36:48 +05:30
Anish Sarkar
f8913adaa3 test: add unit tests for content extraction from cloud connectors and ETL pipeline functionality 2026-04-05 17:46:04 +05:30
Anish Sarkar
a2b3541046 chore: ran linting 2026-04-04 03:11:56 +05:30
Anish Sarkar
0d2acc665d Merge remote-tracking branch 'upstream/dev' into feat/page-limit-connectors 2026-04-04 03:08:27 +05:30
Anish Sarkar
ce40da80ea feat: implement page limit estimation and enforcement in file based connector indexers
- Added a static method `estimate_pages_from_metadata` to `PageLimitService` for estimating page counts based on file metadata.
- Integrated page limit checks in Google Drive, Dropbox, and OneDrive indexers to prevent exceeding user quotas during file indexing.
- Updated relevant indexing methods to utilize the new page estimation logic and enforce limits accordingly.
- Enhanced tests for page limit functionality, ensuring accurate estimation and enforcement across different file types.
2026-04-04 02:51:28 +05:30
Anish Sarkar
746c730b2e chore: ran linting 2026-04-03 13:14:40 +05:30
Anish Sarkar
775dea7894 feat: add integration and unit tests for local folder indexing and document versioning 2026-04-02 11:12:16 +05:30
Anish Sarkar
272de1bb40 feat: add integration and unit tests for Dropbox indexing pipeline and parallel downloads 2026-03-30 22:19:15 +05:30
Anish Sarkar
04691d572b chore: ran linting 2026-03-30 01:50:41 +05:30
Anish Sarkar
5a3eece397 Merge remote-tracking branch 'upstream/dev' into feat/onedrive-connector 2026-03-29 11:55:06 +05:30
DESKTOP-RTLN3BA\$punk
2cc2d339e6 feat: made agent file sytem optimized 2026-03-28 16:39:46 -07:00
Anish Sarkar
028c88be72 feat: add integration and unit tests for OneDrive indexing pipeline and parallel downloads 2026-03-28 16:39:47 +05:30
Anish Sarkar
00934ff462 feat: enhance Google Drive client with improved logging and thread-safe operations
- Added logging to track the start and end of file download and export processes, improving visibility into execution time.
- Implemented per-thread HTTP transport for concurrent downloads and exports, ensuring thread safety.
- Refactored download and export methods to utilize resolved credentials, enhancing functionality.
- Updated unit tests to validate the new threading and logging features, ensuring robust parallel execution.
2026-03-27 19:25:45 +05:30
Anish Sarkar
d2a4b238d7 feat: enhance Google Drive client with thread-safe download and export methods
- Implemented per-thread HTTP transport for concurrent downloads to ensure thread safety.
- Refactored `download_file` and `download_file_to_disk` methods to utilize blocking calls on separate threads, improving performance during file operations.
- Added logging to track the start and end of download and export processes, providing better visibility into execution time.
- Updated unit tests to verify parallel execution of download and export operations, ensuring efficiency in handling multiple requests.
2026-03-27 19:25:03 +05:30
Anish Sarkar
0bc1c766ff feat: migrate Confluence and Jira indexers to unified parallel pipeline
- Refactored Confluence and Jira indexers to utilize the shared IndexingPipelineService for improved document processing.
- Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries.
- Modified the `index_confluence_pages` and `index_jira_issues` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting.
- Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.
2026-03-27 16:02:09 +05:30
Anish Sarkar
db6dd058dd feat: migrate Linear and Notion indexers to unified parallel pipeline
- Refactored Linear and Notion indexers to utilize the shared IndexingPipelineService for improved document deduplication, summarization, chunking, and embedding with bounded parallel indexing.
- Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries.
- Modified the `index_linear_issues` and `index_notion_pages` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting.
- Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.
2026-03-27 11:19:32 +05:30
Anish Sarkar
7c7f8b216c feat: implement batch indexing for selected Google Drive files
- Introduced `index_google_drive_selected_files` function to enable indexing of multiple user-selected files in parallel, improving efficiency.
- Refactored existing indexing logic to handle batch processing, including error handling for individual file failures.
- Added unit tests for the new batch indexing functionality, ensuring robustness and proper error collection during the indexing process.
2026-03-27 00:17:07 +05:30
Anish Sarkar
c016962064 feat: implement parallel file downloading and indexing in Google Drive indexer
- Added `_download_files_parallel` function to enable concurrent downloading of files from Google Drive, improving efficiency in document processing.
- Introduced `_download_and_index` function to handle the parallel downloading and indexing phases, streamlining the overall workflow.
- Updated `_index_full_scan` and `_index_with_delta_sync` methods to utilize the new parallel downloading functionality, enhancing performance.
- Added unit tests to validate the new parallel downloading and indexing logic, ensuring robustness and error handling during document processing.
2026-03-26 23:53:26 +05:30