SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-06-30 21:59:46 +02:00

Author	SHA1	Message	Date
CREDO23	a95bf58c8f	Make Vision LLM opt-in for uploads and connectors	2026-04-10 16:45:51 +02:00
Anish Sarkar	0a26a6c5bb	chore: ran linting	2026-04-07 05:55:39 +05:30
Anish Sarkar	5803fe79da	refactor: update filename handling in Google Drive connector to include Google Workspace file extensions, improving content extraction accuracy	2026-04-07 05:43:34 +05:30
Anish Sarkar	1b87719a92	refactor: enhance file skipping logic in Google Drive connector to check for Google Workspace files before unsupported extensions	2026-04-07 05:36:29 +05:30
Anish Sarkar	a624c86b04	refactor: update file skipping logic in Dropbox, Google Drive, and OneDrive connectors to return unsupported extension information	2026-04-07 05:11:15 +05:30
Anish Sarkar	3a1d700817	refactor: enhance file skipping logic across Dropbox, Google Drive, and OneDrive connectors to return unsupported extensions, improving error reporting and maintainability	2026-04-07 03:16:34 +05:30
Anish Sarkar	e7beeb2a36	refactor: unify file skipping logic across Dropbox, Google Drive, and OneDrive connectors by replacing classification checks with a centralized service-based approach, enhancing maintainability and consistency in file handling	2026-04-07 02:19:31 +05:30
Anish Sarkar	dc7047f64d	refactor: implement file type classification for supported extensions across Dropbox, Google Drive, and OneDrive connectors, enhancing file handling and error management	2026-04-06 22:03:47 +05:30
Anish Sarkar	8224360afa	refactor: unify file parsing logic across Dropbox, Google Drive, and OneDrive using the ETL pipeline	2026-04-05 17:30:29 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	2cc2d339e6	feat: made agent file sytem optimized	2026-03-28 16:39:46 -07:00
Anish Sarkar	6d4eb32345	fix: update export format for Google Docs to use correct MIME type	2026-03-27 22:20:32 +05:30
Anish Sarkar	489e48644f	fix: revert native excel parsing	2026-03-27 22:15:24 +05:30
Anish Sarkar	dff8a1df37	feat: add descendant checking for folder filtering in Google Drive changes	2026-03-27 22:00:31 +05:30
Anish Sarkar	3da0ffd683	feat: add native Excel parsing and improve Google Drive content extraction - Introduced a new utility for parsing .xlsx files into markdown format, enhancing the ability to process Excel documents natively. - Updated the Google Drive content extractor to utilize the new Excel parsing functionality, allowing for better handling of spreadsheet files. - Enhanced file type detection and export logic to support various document formats, improving overall content extraction accuracy. - Added unit tests to ensure the correctness of the new Excel parsing feature and its integration with existing content extraction workflows.	2026-03-27 21:47:14 +05:30
Anish Sarkar	00934ff462	feat: enhance Google Drive client with improved logging and thread-safe operations - Added logging to track the start and end of file download and export processes, improving visibility into execution time. - Implemented per-thread HTTP transport for concurrent downloads and exports, ensuring thread safety. - Refactored download and export methods to utilize resolved credentials, enhancing functionality. - Updated unit tests to validate the new threading and logging features, ensuring robust parallel execution.	2026-03-27 19:25:45 +05:30
Anish Sarkar	d2a4b238d7	feat: enhance Google Drive client with thread-safe download and export methods - Implemented per-thread HTTP transport for concurrent downloads to ensure thread safety. - Refactored `download_file` and `download_file_to_disk` methods to utilize blocking calls on separate threads, improving performance during file operations. - Added logging to track the start and end of download and export processes, providing better visibility into execution time. - Updated unit tests to verify parallel execution of download and export operations, ensuring efficiency in handling multiple requests.	2026-03-27 19:25:03 +05:30
Anish Sarkar	da6bbcfe39	feat: add file streaming download functionality to Google Drive client - Introduced `download_file_to_disk` method to stream files directly to disk in chunks, reducing memory usage during downloads. - Updated `download_and_extract_content` function to utilize the new streaming download method for binary files, enhancing efficiency in handling large files. - Improved error handling for download operations, providing clearer feedback on failures.	2026-03-27 08:54:06 +05:30
Anish Sarkar	2f30e48e90	feat: implement async service locking in Google Drive client - Introduced an asyncio lock to the GoogleDriveClient to ensure thread-safe access to the service instance. - Refactored the get_service method to utilize the lock, preventing concurrent attempts to create the service and improving stability in multi-threaded environments.	2026-03-27 00:06:21 +05:30
Anish Sarkar	f7b52470eb	feat: enhance Google connectors indexing with content extraction and document migration - Added `download_and_extract_content` function to extract content from Google Drive files as markdown. - Updated Google Drive indexer to utilize the new content extraction method. - Implemented document migration logic to update legacy Composio document types to their native Google types. - Introduced identifier hashing for stable document identification. - Improved file pre-filtering to handle unchanged and rename-only files efficiently.	2026-03-25 18:33:44 +05:30
Anish Sarkar	2bc6a0c3bc	chore: ran linting	2026-03-22 00:43:53 +05:30
Anish Sarkar	83152e8e7e	refactor: unify all 3 google Composio and non-Composio connector types and pipelines keeping same credential adapters	2026-03-19 05:08:21 +05:30
CREDO23	f1fac7dedc	add create_file and trash_file to GoogleDriveClient	2026-02-20 16:25:25 +02:00
Anish Sarkar	2125c76841	feat: merge new credentials with existing connector configurations to preserve user settings	2026-02-02 19:03:05 +05:30
Anish Sarkar	bf08982029	feat: add connector_id to documents for source tracking and implement connector deletion task	2026-02-02 16:23:26 +05:30
Anish Sarkar	5e555a8f9a	fix: improve notification for token expiration and revocation errors for multiple connectors	2026-01-31 16:24:43 +05:30
Anish Sarkar	f538d59ca3	feat: enhance Google Drive file metadata handling - Updated Google Drive API calls to include md5Checksum in file metadata retrieval for improved content tracking. - Added logic to check for rename-only updates based on md5Checksum, optimizing document processing by preventing unnecessary ETL operations for unchanged content. - Enhanced existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.	2026-01-17 16:24:53 +05:30
Anish Sarkar	645e849d93	chore: ran both frontend and backend linting	2026-01-03 00:18:17 +05:30
Anish Sarkar	45489423d1	feat: implement token encryption and state management for OAuth connectors - Added encryption for sensitive tokens (access token, refresh token, client secret) in Google Calendar, Google Drive, Gmail, Linear, and Notion connectors to enhance security. - Introduced OAuthStateManager for secure state parameter generation and validation, improving the integrity of OAuth flows. - Updated callback routes to handle state validation and error management, ensuring robust handling of authorization processes. - Enhanced indexers to support decryption of tokens for backward compatibility, maintaining functionality with existing encrypted credentials. - Improved validation for date parameters in connector routes to ensure proper input handling.	2026-01-02 23:46:03 +05:30
CREDO23	9c78726b6b	feat: add file selection to Google Drive connector - Add structured request body with folders and files arrays - Support individual file indexing alongside folder indexing - Remove deprecated folder_ids/folder_names query params - Update UI to allow selecting both folders and files	2025-12-31 14:15:07 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	c19d300c9d	feat: added circleback connector	2025-12-30 09:00:59 -08:00
CREDO23	7618662e70	refactor: rename GOOGLE_DRIVE_CONNECTOR to GOOGLE_DRIVE_FILE document type	2025-12-29 20:38:26 +02:00
CREDO23	16bc991b13	feat: add Google Drive connector to knowledge base search	2025-12-29 18:13:27 +02:00
CREDO23	acf47e3b0c	refactor(connectors): remove verbose docstrings and obvious comments - Simplify module docstrings (remove meta-commentary about 'small focused modules') - Remove redundant inline comments (e.g., 'Log task start', 'Get connector from database') - Trim verbose function docstrings to essential information only - Remove over-explanatory comments that restate what code does - Keep necessary documentation, remove noise for better readability	2025-12-28 18:53:13 +02:00
CREDO23	a5935bc677	feat(connectors): add connector parameter to file processor for source tracking - Add optional 'connector' parameter with 'type' and 'metadata' fields - Create helper function _update_document_from_connector - Use document_metadata column (not metadata) for JSON field - Merge metadata with existing using dict spread operator - Google Drive documents now marked as GOOGLE_DRIVE_CONNECTOR - Backward compatible - no changes to existing logic - Simple and clean implementation	2025-12-28 18:01:39 +02:00
CREDO23	b2b891e4d7	fix(connectors): properly commit Google Drive document type changes - Return file metadata from content_extractor for indexer to use - Update document type and metadata in indexer after processing - Explicitly commit changes to database - Ensures documents are properly marked as GOOGLE_DRIVE_CONNECTOR type	2025-12-28 17:15:29 +02:00
CREDO23	9f1fd20944	feat(connectors): mark Google Drive documents with GOOGLE_DRIVE_CONNECTOR type - Change document_type from file type (PDF, DOCX) to GOOGLE_DRIVE_CONNECTOR - Store original file type in metadata for reference - Add Google Drive specific metadata (file_id, mime_type, source) - Include export format info for Google Workspace files - Enables proper source tracking and bulk management	2025-12-28 16:55:14 +02:00
CREDO23	3e67d5f31e	feat(connectors): add Google Drive delta sync with change tracking - Get start page token for change tracking baseline - Fetch incremental changes using Google Drive Changes API - Categorize changes into added, modified, and removed files - Enable efficient re-indexing of only changed content	2025-12-28 15:55:06 +02:00
CREDO23	84bde67979	feat(connectors): add Google Drive folder browsing and file listing - List folder contents with full pagination support - Query root folder or specific parent folder - Return both folders and files with metadata (size, icons, links) - Filter out shortcuts and trashed items	2025-12-28 15:54:58 +02:00
CREDO23	40304c6795	feat(connectors): add Google Drive content extraction using existing ETL - Download files from Google Drive to temporary location - Export Google Workspace files as PDF - Delegate content extraction to existing process_file_in_background - Reuse Surfsense's ETL services (Unstructured, LlamaCloud, Docling)	2025-12-28 15:54:50 +02:00
CREDO23	701c3409b3	feat(connectors): add Google Drive file type detection and mapping - Detect Google Workspace files (Docs, Sheets, Slides) - Map to PDF export format to preserve rich content (images, formatting) - Identify files to skip (shortcuts, unsupported types)	2025-12-28 15:54:42 +02:00
CREDO23	74386affdc	feat(connectors): add Google Drive API client wrapper - Build and manage Google Drive service with credentials - List files with query support and pagination - Download binary files and export Google Workspace files as PDF - Handle HTTP errors gracefully	2025-12-28 15:54:32 +02:00
CREDO23	2c8717b14b	feat(connectors): add Google Drive credentials module for OAuth management - Handle Google OAuth credential initialization and validation - Automatic token refresh with database persistence - Reuse existing tokens when valid	2025-12-28 15:54:26 +02:00

42 commits