SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-04-26 01:06:23 +02:00

Author	SHA1	Message	Date
Anish Sarkar	775dea7894	feat: add integration and unit tests for local folder indexing and document versioning	2026-04-02 11:12:16 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	ad0e77c3d6	feat: enhance knowledge base search with date filtering	2026-03-31 20:13:46 -07:00
Anish Sarkar	272de1bb40	feat: add integration and unit tests for Dropbox indexing pipeline and parallel downloads	2026-03-30 22:19:15 +05:30
Anish Sarkar	04691d572b	chore: ran linting	2026-03-30 01:50:41 +05:30
Anish Sarkar	5a3eece397	Merge remote-tracking branch 'upstream/dev' into feat/onedrive-connector	2026-03-29 11:55:06 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	2cc2d339e6	feat: made agent file sytem optimized	2026-03-28 16:39:46 -07:00
Anish Sarkar	028c88be72	feat: add integration and unit tests for OneDrive indexing pipeline and parallel downloads	2026-03-28 16:39:47 +05:30
Anish Sarkar	489e48644f	fix: revert native excel parsing	2026-03-27 22:15:24 +05:30
Anish Sarkar	3da0ffd683	feat: add native Excel parsing and improve Google Drive content extraction - Introduced a new utility for parsing .xlsx files into markdown format, enhancing the ability to process Excel documents natively. - Updated the Google Drive content extractor to utilize the new Excel parsing functionality, allowing for better handling of spreadsheet files. - Enhanced file type detection and export logic to support various document formats, improving overall content extraction accuracy. - Added unit tests to ensure the correctness of the new Excel parsing feature and its integration with existing content extraction workflows.	2026-03-27 21:47:14 +05:30
Anish Sarkar	00934ff462	feat: enhance Google Drive client with improved logging and thread-safe operations - Added logging to track the start and end of file download and export processes, improving visibility into execution time. - Implemented per-thread HTTP transport for concurrent downloads and exports, ensuring thread safety. - Refactored download and export methods to utilize resolved credentials, enhancing functionality. - Updated unit tests to validate the new threading and logging features, ensuring robust parallel execution.	2026-03-27 19:25:45 +05:30
Anish Sarkar	d2a4b238d7	feat: enhance Google Drive client with thread-safe download and export methods - Implemented per-thread HTTP transport for concurrent downloads to ensure thread safety. - Refactored `download_file` and `download_file_to_disk` methods to utilize blocking calls on separate threads, improving performance during file operations. - Added logging to track the start and end of download and export processes, providing better visibility into execution time. - Updated unit tests to verify parallel execution of download and export operations, ensuring efficiency in handling multiple requests.	2026-03-27 19:25:03 +05:30
Anish Sarkar	0bc1c766ff	feat: migrate Confluence and Jira indexers to unified parallel pipeline - Refactored Confluence and Jira indexers to utilize the shared IndexingPipelineService for improved document processing. - Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries. - Modified the `index_confluence_pages` and `index_jira_issues` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting. - Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.	2026-03-27 16:02:09 +05:30
Anish Sarkar	db6dd058dd	feat: migrate Linear and Notion indexers to unified parallel pipeline - Refactored Linear and Notion indexers to utilize the shared IndexingPipelineService for improved document deduplication, summarization, chunking, and embedding with bounded parallel indexing. - Updated the `_build_connector_doc` function in both indexers to create ConnectorDocument instances with enhanced metadata and fallback summaries. - Modified the `index_linear_issues` and `index_notion_pages` functions to return a tuple of (indexed_count, skipped_count, warning_or_error_message) for better error handling and reporting. - Added unit tests for both indexers to validate the new parallel processing logic and ensure correct document creation and indexing behavior.	2026-03-27 11:19:32 +05:30
Anish Sarkar	7c7f8b216c	feat: implement batch indexing for selected Google Drive files - Introduced `index_google_drive_selected_files` function to enable indexing of multiple user-selected files in parallel, improving efficiency. - Refactored existing indexing logic to handle batch processing, including error handling for individual file failures. - Added unit tests for the new batch indexing functionality, ensuring robustness and proper error collection during the indexing process.	2026-03-27 00:17:07 +05:30
Anish Sarkar	c016962064	feat: implement parallel file downloading and indexing in Google Drive indexer - Added `_download_files_parallel` function to enable concurrent downloading of files from Google Drive, improving efficiency in document processing. - Introduced `_download_and_index` function to handle the parallel downloading and indexing phases, streamlining the overall workflow. - Updated `_index_full_scan` and `_index_with_delta_sync` methods to utilize the new parallel downloading functionality, enhancing performance. - Added unit tests to validate the new parallel downloading and indexing logic, ensuring robustness and error handling during document processing.	2026-03-26 23:53:26 +05:30
Anish Sarkar	4fd776e7ef	feat: implement parallel indexing for Google Calendar and Gmail connectors - Refactored Google Calendar and Gmail indexers to utilize the new `index_batch_parallel` method for concurrent document indexing, enhancing performance. - Updated the indexing logic to replace serial processing with parallel execution, allowing for improved efficiency in handling multiple documents. - Adjusted logging and error handling to accommodate the new parallel processing approach, ensuring robust operation during indexing. - Enhanced unit tests to validate the functionality of the parallel indexing method and its integration with existing workflows.	2026-03-26 19:34:04 +05:30
Anish Sarkar	e5cb6bfacf	feat: implement parallel document indexing in IndexingPipelineService - Added `index_batch_parallel` method to enable concurrent indexing of documents with bounded concurrency, improving performance and efficiency. - Refactored existing indexing logic to utilize `asyncio.to_thread` for non-blocking execution of embedding and chunking functions. - Introduced unit tests to validate the functionality of the new parallel indexing method, ensuring robustness and error handling during document processing.	2026-03-26 19:33:49 +05:30
Anish Sarkar	8c41fd91ba	feat: add integration tests for indexing pipeline components - Introduced integration tests for Calendar, Drive, and Gmail indexers to ensure proper document creation and migration. - Added tests for batch indexing functionality to validate the processing of multiple documents. - Implemented tests for legacy document migration to verify updates to document types and hashes. - Enhanced test coverage for the IndexingPipelineService to ensure robust functionality across various document types.	2026-03-25 18:34:02 +05:30
Anish Sarkar	f7b52470eb	feat: enhance Google connectors indexing with content extraction and document migration - Added `download_and_extract_content` function to extract content from Google Drive files as markdown. - Updated Google Drive indexer to utilize the new content extraction method. - Implemented document migration logic to update legacy Composio document types to their native Google types. - Introduced identifier hashing for stable document identification. - Improved file pre-filtering to handle unchanged and rename-only files efficiently.	2026-03-25 18:33:44 +05:30
Anish Sarkar	2bc6a0c3bc	chore: ran linting	2026-03-22 00:43:53 +05:30
Anish Sarkar	e37e6d2d18	chore: ran linting	2026-03-21 13:21:19 +05:30
Anish Sarkar	de8841fb86	chore: ran linting	2026-03-21 13:20:13 +05:30
Anish Sarkar	772150eb66	feat: add unit tests for DedupHITLToolCallsMiddleware	2026-03-21 13:19:58 +05:30
Anish Sarkar	36f4709225	feat: add integration and unit tests for Google unification connectors - Introduced comprehensive integration tests for Google Drive, Gmail, and Calendar indexers, ensuring proper credential handling for both Composio and native connectors. - Added unit tests to validate the acceptance of Composio-sourced credentials across various connector types. - Implemented fixtures to seed test data and facilitate testing of hybrid search functionality, ensuring accurate document type filtering.	2026-03-19 17:51:15 +05:30
Anish Sarkar	8122370cec	test: mark test_connector_document.py with unit pytest marker	2026-03-08 02:53:47 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	aabc24f82c	feat: enhance performance logging and caching in various components - Introduced slow callback logging in FastAPI to identify blocking calls. - Added performance logging for agent creation and tool loading processes. - Implemented caching for MCP tools to reduce redundant server calls. - Enhanced sandbox management with in-process caching for improved efficiency. - Refactored several functions for better readability and performance tracking. - Updated tests to ensure proper functionality of new features and optimizations.	2026-02-26 13:00:31 -08:00
CREDO23	0de74f4bf7	add docstrings to all indexing pipeline tests	2026-02-25 20:30:31 +02:00
CREDO23	cad400be1b	add file upload adapter and make index() return refreshed document	2026-02-25 19:56:59 +02:00
CREDO23	1b4ed35de3	fix: correct test fixtures and add missing summarizer tests	2026-02-25 11:15:48 +02:00
CREDO23	af22fa7c88	refactor: remove redundant and low-value tests, enforce connector_id and created_by_id constraints	2026-02-25 08:29:53 +02:00
CREDO23	5b616eac5a	fix: plug all gaps found in deep review of indexing pipeline	2026-02-25 02:20:44 +02:00
CREDO23	a0134a5830	test: add document hashing unit tests and clean up conftest mocks	2026-02-24 22:48:40 +02:00
CREDO23	d5e10bd8f9	test: add ConnectorDocument unit tests and factory fixture	2026-02-24 22:20:08 +02:00
CREDO23	10a6ba6924	test: bootstrap pytest environment for backend	2026-02-24 18:19:56 +02:00

34 commits