Commit graph

87 commits

Author SHA1 Message Date
Anish Sarkar
53df393cf7 refactor: streamline local folder indexing logic by removing unused imports, enhancing content hashing, and improving document creation process 2026-04-02 23:28:23 +05:30
Anish Sarkar
c27d24a117 feat: enhance folder indexing by adding root folder ID support and implement folder creation and cleanup logic 2026-04-02 22:41:45 +05:30
Anish Sarkar
caf2525ab5 fix: update folder ID collection logic to include deleted directories and adjust test cases for document titles 2026-04-02 22:29:07 +05:30
Anish Sarkar
22ee5c99cc refactor: remove Local Folder connector and related tasks, implement new folder indexing endpoints 2026-04-02 22:21:31 +05:30
Anish Sarkar
775dea7894 feat: add integration and unit tests for local folder indexing and document versioning 2026-04-02 11:12:16 +05:30
DESKTOP-RTLN3BA\$punk
ad0e77c3d6 feat: enhance knowledge base search with date filtering 2026-03-31 20:13:46 -07:00
DESKTOP-RTLN3BA\$punk
a9fd45844d feat: integrate Stripe for page purchases and reconciliation tasks 2026-03-31 18:39:45 -07:00
Anish Sarkar
272de1bb40 feat: add integration and unit tests for Dropbox indexing pipeline and parallel downloads 2026-03-30 22:19:15 +05:30
Anish Sarkar
04691d572b chore: ran linting 2026-03-30 01:50:41 +05:30
Anish Sarkar
5a3eece397 Merge remote-tracking branch 'upstream/dev' into feat/onedrive-connector 2026-03-29 11:55:06 +05:30
DESKTOP-RTLN3BA\$punk
2cc2d339e6 feat: made agent file sytem optimized 2026-03-28 16:39:46 -07:00
Anish Sarkar
028c88be72 feat: add integration and unit tests for OneDrive indexing pipeline and parallel downloads 2026-03-28 16:39:47 +05:30
Anish Sarkar
489e48644f fix: revert native excel parsing 2026-03-27 22:15:24 +05:30
Anish Sarkar
3da0ffd683 feat: add native Excel parsing and improve Google Drive content extraction
- Introduced a new utility for parsing .xlsx files into markdown format, enhancing the ability to process Excel documents natively.
- Updated the Google Drive content extractor to utilize the new Excel parsing functionality, allowing for better handling of spreadsheet files.
- Enhanced file type detection and export logic to support various document formats, improving overall content extraction accuracy.
- Added unit tests to ensure the correctness of the new Excel parsing feature and its integration with existing content extraction workflows.
2026-03-27 21:47:14 +05:30
Anish Sarkar
4e0749f907 fix: update file skipping logic for failed documents in Google Drive indexer
- Modified the `_should_skip_file` function to skip previously failed documents during processing, improving error handling.
- Updated the corresponding test to reflect the new behavior, ensuring that failed documents are correctly identified and skipped during automatic sync.
2026-03-27 20:01:08 +05:30
Anish Sarkar
00934ff462 feat: enhance Google Drive client with improved logging and thread-safe operations
- Added logging to track the start and end of file download and export processes, improving visibility into execution time.
- Implemented per-thread HTTP transport for concurrent downloads and exports, ensuring thread safety.
- Refactored download and export methods to utilize resolved credentials, enhancing functionality.
- Updated unit tests to validate the new threading and logging features, ensuring robust parallel execution.
2026-03-27 19:25:45 +05:30
Anish Sarkar
d2a4b238d7 feat: enhance Google Drive client with thread-safe download and export methods
- Implemented per-thread HTTP transport for concurrent downloads to ensure thread safety.
- Refactored `download_file` and `download_file_to_disk` methods to utilize blocking calls on separate threads, improving performance during file operations.
- Added logging to track the start and end of download and export processes, providing better visibility into execution time.
- Updated unit tests to verify parallel execution of download and export operations, ensuring efficiency in handling multiple requests.
2026-03-27 19:25:03 +05:30
Anish Sarkar
0bc1c766ff feat: migrate Confluence and Jira indexers to unified parallel pipeline
- Refactored Confluence and Jira indexers to utilize the shared IndexingPipelineService for improved document processing.
- Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries.
- Modified the `index_confluence_pages` and `index_jira_issues` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting.
- Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.
2026-03-27 16:02:09 +05:30
Anish Sarkar
db6dd058dd feat: migrate Linear and Notion indexers to unified parallel pipeline
- Refactored Linear and Notion indexers to utilize the shared IndexingPipelineService for improved document deduplication, summarization, chunking, and embedding with bounded parallel indexing.
- Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries.
- Modified the `index_linear_issues` and `index_notion_pages` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting.
- Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.
2026-03-27 11:19:32 +05:30
Anish Sarkar
7c7f8b216c feat: implement batch indexing for selected Google Drive files
- Introduced `index_google_drive_selected_files` function to enable indexing of multiple user-selected files in parallel, improving efficiency.
- Refactored existing indexing logic to handle batch processing, including error handling for individual file failures.
- Added unit tests for the new batch indexing functionality, ensuring robustness and proper error collection during the indexing process.
2026-03-27 00:17:07 +05:30
Anish Sarkar
c016962064 feat: implement parallel file downloading and indexing in Google Drive indexer
- Added `_download_files_parallel` function to enable concurrent downloading of files from Google Drive, improving efficiency in document processing.
- Introduced `_download_and_index` function to handle the parallel downloading and indexing phases, streamlining the overall workflow.
- Updated `_index_full_scan` and `_index_with_delta_sync` methods to utilize the new parallel downloading functionality, enhancing performance.
- Added unit tests to validate the new parallel downloading and indexing logic, ensuring robustness and error handling during document processing.
2026-03-26 23:53:26 +05:30
Anish Sarkar
4fd776e7ef feat: implement parallel indexing for Google Calendar and Gmail connectors
- Refactored Google Calendar and Gmail indexers to utilize the new `index_batch_parallel` method for concurrent document indexing, enhancing performance.
- Updated the indexing logic to replace serial processing with parallel execution, allowing for improved efficiency in handling multiple documents.
- Adjusted logging and error handling to accommodate the new parallel processing approach, ensuring robust operation during indexing.
- Enhanced unit tests to validate the functionality of the parallel indexing method and its integration with existing workflows.
2026-03-26 19:34:04 +05:30
Anish Sarkar
e5cb6bfacf feat: implement parallel document indexing in IndexingPipelineService
- Added `index_batch_parallel` method to enable concurrent indexing of documents with bounded concurrency, improving performance and efficiency.
- Refactored existing indexing logic to utilize `asyncio.to_thread` for non-blocking execution of embedding and chunking functions.
- Introduced unit tests to validate the functionality of the new parallel indexing method, ensuring robustness and error handling during document processing.
2026-03-26 19:33:49 +05:30
Anish Sarkar
c3d5c865fd fix: update file skipping logic in Google Drive indexer
- Modified the `_should_skip_file` function to prevent skipping of documents with a FAILED status, ensuring they are reprocessed even if their content remains unchanged.
- Added a new integration test to verify that FAILED documents are not skipped during the indexing process.
2026-03-25 18:51:40 +05:30
Anish Sarkar
8c41fd91ba feat: add integration tests for indexing pipeline components
- Introduced integration tests for Calendar, Drive, and Gmail indexers to ensure proper document creation and migration.
- Added tests for batch indexing functionality to validate the processing of multiple documents.
- Implemented tests for legacy document migration to verify updates to document types and hashes.
- Enhanced test coverage for the IndexingPipelineService to ensure robust functionality across various document types.
2026-03-25 18:34:02 +05:30
Anish Sarkar
f7b52470eb feat: enhance Google connectors indexing with content extraction and document migration
- Added `download_and_extract_content` function to extract content from Google Drive files as markdown.
- Updated Google Drive indexer to utilize the new content extraction method.
- Implemented document migration logic to update legacy Composio document types to their native Google types.
- Introduced identifier hashing for stable document identification.
- Improved file pre-filtering to handle unchanged and rename-only files efficiently.
2026-03-25 18:33:44 +05:30
CREDO23
5d8a62a4a6 merge upstream/dev into feat/migrate-electric-to-zero
Resolve 8 conflicts:
- Accept upstream deletion of 3 composio_*_connector.py (unified Google connectors)
- Accept our deletion of ElectricProvider.tsx, use-connectors-electric.ts,
  use-messages-electric.ts (replaced by Zero equivalents)
- Keep both new deps in package.json (@rocicorp/zero + @slate-serializers/html)
- Regenerate pnpm-lock.yaml
2026-03-24 17:40:34 +02:00
CREDO23
0916c1addd remove stale ElectricSQL references from changelog and test fixtures 2026-03-24 17:07:11 +02:00
Anish Sarkar
2bc6a0c3bc chore: ran linting 2026-03-22 00:43:53 +05:30
Anish Sarkar
e37e6d2d18 chore: ran linting 2026-03-21 13:21:19 +05:30
Anish Sarkar
de8841fb86 chore: ran linting 2026-03-21 13:20:13 +05:30
Anish Sarkar
772150eb66 feat: add unit tests for DedupHITLToolCallsMiddleware 2026-03-21 13:19:58 +05:30
Anish Sarkar
8e7cda31c5 feat: update Google indexing functions to track skipped messages
- Modified the indexing functions for Google Calendar and Gmail to return the count of skipped messages alongside indexed messages, enhancing performance tracking.
- Updated related tests to accommodate the new return values, ensuring comprehensive coverage of the indexing process.
- Improved error handling to maintain consistency in returned values across different indexing functions.
2026-03-19 20:56:40 +05:30
Anish Sarkar
e9485ab2df feat: update Google Drive indexing to include skipped file tracking 2026-03-19 20:27:50 +05:30
Anish Sarkar
36f4709225 feat: add integration and unit tests for Google unification connectors
- Introduced comprehensive integration tests for Google Drive, Gmail, and Calendar indexers, ensuring proper credential handling for both Composio and native connectors.
- Added unit tests to validate the acceptance of Composio-sourced credentials across various connector types.
- Implemented fixtures to seed test data and facilitate testing of hybrid search functionality, ensuring accurate document type filtering.
2026-03-19 17:51:15 +05:30
Anish Sarkar
851856a54b fix: update document cleanup logic and mock Celery task in tests 2026-03-11 12:27:32 +05:30
DESKTOP-RTLN3BA\$punk
d8a05ae4d5 feat: refactor agent tools management and add UI integration
- Added endpoint to list agent tools with metadata, excluding hidden tools.
- Updated NewChatRequest and RegenerateRequest schemas to include disabled tools.
- Integrated disabled tools management in the NewChatPage and Composer components.
- Improved tool instructions and visibility in the system prompt.
- Refactored tool registration to support hidden tools and default enabled states.
- Enhanced document chunk creation to handle strict zip behavior.
- Cleaned up imports and formatting across various files for consistency.
2026-03-10 17:36:26 -07:00
Rohan Verma
547077e5b9
Merge pull request #865 from CREDO23/sur-182-fix-ux-experience-for-composio-google-drive-connector
[Perf] Batch embedding, non-blocking search, chunks index & Google Drive UX fix
2026-03-10 12:52:16 -07:00
CREDO23
e951fbb991 fix: update stale embed_text mock in document_upload tests 2026-03-09 21:47:27 +02:00
CREDO23
929445afd9 feat: use batch embedding in IndexingPipelineService.index 2026-03-09 16:13:44 +02:00
Anish Sarkar
8122370cec test: mark test_connector_document.py with unit pytest marker 2026-03-08 02:53:47 +05:30
Anish Sarkar
ca3710a239 fix: remove slowapi limiter for testing 2026-03-08 02:41:05 +05:30
Anish Sarkar
b2bf00e11a chore: ran linting 2026-02-28 02:28:03 +05:30
Anish Sarkar
ce82807f16 test: enhance reindexing tests for UploadDocumentAdapter 2026-02-28 02:18:02 +05:30
Anish Sarkar
37f76a8533 test: add should_summarize parameter to file upload adapter tests 2026-02-28 01:44:41 +05:30
Anish Sarkar
23a98d802c refactor: implement UploadDocumentAdapter for file indexing and reindexing 2026-02-28 01:38:32 +05:30
DESKTOP-RTLN3BA\$punk
a4dc84d1ab feat: add should_summarize parameter to task dispatchers
- Introduced should_summarize parameter in TaskDispatcher and CeleryTaskDispatcher to control summary generation.
- Updated InlineTaskDispatcher to support the new parameter for document processing.
2026-02-26 19:12:37 -08:00
Anish Sarkar
836d5293df refactor: remove unused TestStatusPolling class from document upload integration tests 2026-02-27 01:52:35 +05:30
Anish Sarkar
fd032f3709 refactor: simplify and clarify documentation in document upload integration tests 2026-02-27 01:48:25 +05:30
Anish Sarkar
7c09958ddc refactor: enhance document upload integration tests for API contract validation 2026-02-27 01:24:20 +05:30