Commit graph

42 commits

Author SHA1 Message Date
CREDO23
a95bf58c8f Make Vision LLM opt-in for uploads and connectors 2026-04-10 16:45:51 +02:00
Anish Sarkar
0a26a6c5bb chore: ran linting 2026-04-07 05:55:39 +05:30
Anish Sarkar
5803fe79da refactor: update filename handling in Google Drive connector to include Google Workspace file extensions, improving content extraction accuracy 2026-04-07 05:43:34 +05:30
Anish Sarkar
1b87719a92 refactor: enhance file skipping logic in Google Drive connector to check for Google Workspace files before unsupported extensions 2026-04-07 05:36:29 +05:30
Anish Sarkar
a624c86b04 refactor: update file skipping logic in Dropbox, Google Drive, and OneDrive connectors to return unsupported extension information 2026-04-07 05:11:15 +05:30
Anish Sarkar
3a1d700817 refactor: enhance file skipping logic across Dropbox, Google Drive, and OneDrive connectors to return unsupported extensions, improving error reporting and maintainability 2026-04-07 03:16:34 +05:30
Anish Sarkar
e7beeb2a36 refactor: unify file skipping logic across Dropbox, Google Drive, and OneDrive connectors by replacing classification checks with a centralized service-based approach, enhancing maintainability and consistency in file handling 2026-04-07 02:19:31 +05:30
Anish Sarkar
dc7047f64d refactor: implement file type classification for supported extensions across Dropbox, Google Drive, and OneDrive connectors, enhancing file handling and error management 2026-04-06 22:03:47 +05:30
Anish Sarkar
8224360afa refactor: unify file parsing logic across Dropbox, Google Drive, and OneDrive using the ETL pipeline 2026-04-05 17:30:29 +05:30
DESKTOP-RTLN3BA\$punk
2cc2d339e6 feat: made agent file sytem optimized 2026-03-28 16:39:46 -07:00
Anish Sarkar
6d4eb32345 fix: update export format for Google Docs to use correct MIME type 2026-03-27 22:20:32 +05:30
Anish Sarkar
489e48644f fix: revert native excel parsing 2026-03-27 22:15:24 +05:30
Anish Sarkar
dff8a1df37 feat: add descendant checking for folder filtering in Google Drive changes 2026-03-27 22:00:31 +05:30
Anish Sarkar
3da0ffd683 feat: add native Excel parsing and improve Google Drive content extraction
- Introduced a new utility for parsing .xlsx files into markdown format, enhancing the ability to process Excel documents natively.
- Updated the Google Drive content extractor to utilize the new Excel parsing functionality, allowing for better handling of spreadsheet files.
- Enhanced file type detection and export logic to support various document formats, improving overall content extraction accuracy.
- Added unit tests to ensure the correctness of the new Excel parsing feature and its integration with existing content extraction workflows.
2026-03-27 21:47:14 +05:30
Anish Sarkar
00934ff462 feat: enhance Google Drive client with improved logging and thread-safe operations
- Added logging to track the start and end of file download and export processes, improving visibility into execution time.
- Implemented per-thread HTTP transport for concurrent downloads and exports, ensuring thread safety.
- Refactored download and export methods to utilize resolved credentials, enhancing functionality.
- Updated unit tests to validate the new threading and logging features, ensuring robust parallel execution.
2026-03-27 19:25:45 +05:30
Anish Sarkar
d2a4b238d7 feat: enhance Google Drive client with thread-safe download and export methods
- Implemented per-thread HTTP transport for concurrent downloads to ensure thread safety.
- Refactored `download_file` and `download_file_to_disk` methods to utilize blocking calls on separate threads, improving performance during file operations.
- Added logging to track the start and end of download and export processes, providing better visibility into execution time.
- Updated unit tests to verify parallel execution of download and export operations, ensuring efficiency in handling multiple requests.
2026-03-27 19:25:03 +05:30
Anish Sarkar
da6bbcfe39 feat: add file streaming download functionality to Google Drive client
- Introduced `download_file_to_disk` method to stream files directly to disk in chunks, reducing memory usage during downloads.
- Updated `download_and_extract_content` function to utilize the new streaming download method for binary files, enhancing efficiency in handling large files.
- Improved error handling for download operations, providing clearer feedback on failures.
2026-03-27 08:54:06 +05:30
Anish Sarkar
2f30e48e90 feat: implement async service locking in Google Drive client
- Introduced an asyncio lock to the GoogleDriveClient to ensure thread-safe access to the service instance.
- Refactored the get_service method to utilize the lock, preventing concurrent attempts to create the service and improving stability in multi-threaded environments.
2026-03-27 00:06:21 +05:30
Anish Sarkar
f7b52470eb feat: enhance Google connectors indexing with content extraction and document migration
- Added `download_and_extract_content` function to extract content from Google Drive files as markdown.
- Updated Google Drive indexer to utilize the new content extraction method.
- Implemented document migration logic to update legacy Composio document types to their native Google types.
- Introduced identifier hashing for stable document identification.
- Improved file pre-filtering to handle unchanged and rename-only files efficiently.
2026-03-25 18:33:44 +05:30
Anish Sarkar
2bc6a0c3bc chore: ran linting 2026-03-22 00:43:53 +05:30
Anish Sarkar
83152e8e7e refactor: unify all 3 google Composio and non-Composio connector types and pipelines keeping same credential adapters 2026-03-19 05:08:21 +05:30
CREDO23
f1fac7dedc add create_file and trash_file to GoogleDriveClient 2026-02-20 16:25:25 +02:00
Anish Sarkar
2125c76841 feat: merge new credentials with existing connector configurations to preserve user settings 2026-02-02 19:03:05 +05:30
Anish Sarkar
bf08982029 feat: add connector_id to documents for source tracking and implement connector deletion task 2026-02-02 16:23:26 +05:30
Anish Sarkar
5e555a8f9a fix: improve notification for token expiration and revocation errors for multiple connectors 2026-01-31 16:24:43 +05:30
Anish Sarkar
f538d59ca3 feat: enhance Google Drive file metadata handling
- Updated Google Drive API calls to include md5Checksum in file metadata retrieval for improved content tracking.
- Added logic to check for rename-only updates based on md5Checksum, optimizing document processing by preventing unnecessary ETL operations for unchanged content.
- Enhanced existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
2026-01-17 16:24:53 +05:30
Anish Sarkar
645e849d93 chore: ran both frontend and backend linting 2026-01-03 00:18:17 +05:30
Anish Sarkar
45489423d1 feat: implement token encryption and state management for OAuth connectors
- Added encryption for sensitive tokens (access token, refresh token, client secret) in Google Calendar, Google Drive, Gmail, Linear, and Notion connectors to enhance security.
- Introduced OAuthStateManager for secure state parameter generation and validation, improving the integrity of OAuth flows.
- Updated callback routes to handle state validation and error management, ensuring robust handling of authorization processes.
- Enhanced indexers to support decryption of tokens for backward compatibility, maintaining functionality with existing encrypted credentials.
- Improved validation for date parameters in connector routes to ensure proper input handling.
2026-01-02 23:46:03 +05:30
CREDO23
9c78726b6b feat: add file selection to Google Drive connector
- Add structured request body with folders and files arrays
- Support individual file indexing alongside folder indexing
- Remove deprecated folder_ids/folder_names query params
- Update UI to allow selecting both folders and files
2025-12-31 14:15:07 +02:00
DESKTOP-RTLN3BA\$punk
c19d300c9d feat: added circleback connector 2025-12-30 09:00:59 -08:00
CREDO23
7618662e70 refactor: rename GOOGLE_DRIVE_CONNECTOR to GOOGLE_DRIVE_FILE document type 2025-12-29 20:38:26 +02:00
CREDO23
16bc991b13 feat: add Google Drive connector to knowledge base search 2025-12-29 18:13:27 +02:00
CREDO23
acf47e3b0c refactor(connectors): remove verbose docstrings and obvious comments
- Simplify module docstrings (remove meta-commentary about 'small focused modules')
- Remove redundant inline comments (e.g., 'Log task start', 'Get connector from database')
- Trim verbose function docstrings to essential information only
- Remove over-explanatory comments that restate what code does
- Keep necessary documentation, remove noise for better readability
2025-12-28 18:53:13 +02:00
CREDO23
a5935bc677 feat(connectors): add connector parameter to file processor for source tracking
- Add optional 'connector' parameter with 'type' and 'metadata' fields
- Create helper function _update_document_from_connector
- Use document_metadata column (not metadata) for JSON field
- Merge metadata with existing using dict spread operator
- Google Drive documents now marked as GOOGLE_DRIVE_CONNECTOR
- Backward compatible - no changes to existing logic
- Simple and clean implementation
2025-12-28 18:01:39 +02:00
CREDO23
b2b891e4d7 fix(connectors): properly commit Google Drive document type changes
- Return file metadata from content_extractor for indexer to use
- Update document type and metadata in indexer after processing
- Explicitly commit changes to database
- Ensures documents are properly marked as GOOGLE_DRIVE_CONNECTOR type
2025-12-28 17:15:29 +02:00
CREDO23
9f1fd20944 feat(connectors): mark Google Drive documents with GOOGLE_DRIVE_CONNECTOR type
- Change document_type from file type (PDF, DOCX) to GOOGLE_DRIVE_CONNECTOR
- Store original file type in metadata for reference
- Add Google Drive specific metadata (file_id, mime_type, source)
- Include export format info for Google Workspace files
- Enables proper source tracking and bulk management
2025-12-28 16:55:14 +02:00
CREDO23
3e67d5f31e feat(connectors): add Google Drive delta sync with change tracking
- Get start page token for change tracking baseline
- Fetch incremental changes using Google Drive Changes API
- Categorize changes into added, modified, and removed files
- Enable efficient re-indexing of only changed content
2025-12-28 15:55:06 +02:00
CREDO23
84bde67979 feat(connectors): add Google Drive folder browsing and file listing
- List folder contents with full pagination support
- Query root folder or specific parent folder
- Return both folders and files with metadata (size, icons, links)
- Filter out shortcuts and trashed items
2025-12-28 15:54:58 +02:00
CREDO23
40304c6795 feat(connectors): add Google Drive content extraction using existing ETL
- Download files from Google Drive to temporary location
- Export Google Workspace files as PDF
- Delegate content extraction to existing process_file_in_background
- Reuse Surfsense's ETL services (Unstructured, LlamaCloud, Docling)
2025-12-28 15:54:50 +02:00
CREDO23
701c3409b3 feat(connectors): add Google Drive file type detection and mapping
- Detect Google Workspace files (Docs, Sheets, Slides)
- Map to PDF export format to preserve rich content (images, formatting)
- Identify files to skip (shortcuts, unsupported types)
2025-12-28 15:54:42 +02:00
CREDO23
74386affdc feat(connectors): add Google Drive API client wrapper
- Build and manage Google Drive service with credentials
- List files with query support and pagination
- Download binary files and export Google Workspace files as PDF
- Handle HTTP errors gracefully
2025-12-28 15:54:32 +02:00
CREDO23
2c8717b14b feat(connectors): add Google Drive credentials module for OAuth management
- Handle Google OAuth credential initialization and validation
- Automatic token refresh with database persistence
- Reuse existing tokens when valid
2025-12-28 15:54:26 +02:00