SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-05-01 20:03:30 +02:00

Author	SHA1	Message	Date
CREDO23	9c78726b6b	feat: add file selection to Google Drive connector - Add structured request body with folders and files arrays - Support individual file indexing alongside folder indexing - Remove deprecated folder_ids/folder_names query params - Update UI to allow selecting both folders and files	2025-12-31 14:15:07 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	c19d300c9d	feat: added circleback connector	2025-12-30 09:00:59 -08:00
CREDO23	7618662e70	refactor: rename GOOGLE_DRIVE_CONNECTOR to GOOGLE_DRIVE_FILE document type	2025-12-29 20:38:26 +02:00
CREDO23	16bc991b13	feat: add Google Drive connector to knowledge base search	2025-12-29 18:13:27 +02:00
CREDO23	c5c61a2c6b	Merge branch 'dev' into google-drive-connector Merge in dev	2025-12-28 19:00:09 +02:00
CREDO23	acf47e3b0c	refactor(connectors): remove verbose docstrings and obvious comments - Simplify module docstrings (remove meta-commentary about 'small focused modules') - Remove redundant inline comments (e.g., 'Log task start', 'Get connector from database') - Trim verbose function docstrings to essential information only - Remove over-explanatory comments that restate what code does - Keep necessary documentation, remove noise for better readability	2025-12-28 18:53:13 +02:00
CREDO23	a5935bc677	feat(connectors): add connector parameter to file processor for source tracking - Add optional 'connector' parameter with 'type' and 'metadata' fields - Create helper function _update_document_from_connector - Use document_metadata column (not metadata) for JSON field - Merge metadata with existing using dict spread operator - Google Drive documents now marked as GOOGLE_DRIVE_CONNECTOR - Backward compatible - no changes to existing logic - Simple and clean implementation	2025-12-28 18:01:39 +02:00
CREDO23	b2b891e4d7	fix(connectors): properly commit Google Drive document type changes - Return file metadata from content_extractor for indexer to use - Update document type and metadata in indexer after processing - Explicitly commit changes to database - Ensures documents are properly marked as GOOGLE_DRIVE_CONNECTOR type	2025-12-28 17:15:29 +02:00
CREDO23	9f1fd20944	feat(connectors): mark Google Drive documents with GOOGLE_DRIVE_CONNECTOR type - Change document_type from file type (PDF, DOCX) to GOOGLE_DRIVE_CONNECTOR - Store original file type in metadata for reference - Add Google Drive specific metadata (file_id, mime_type, source) - Include export format info for Google Workspace files - Enables proper source tracking and bulk management	2025-12-28 16:55:14 +02:00
CREDO23	3e67d5f31e	feat(connectors): add Google Drive delta sync with change tracking - Get start page token for change tracking baseline - Fetch incremental changes using Google Drive Changes API - Categorize changes into added, modified, and removed files - Enable efficient re-indexing of only changed content	2025-12-28 15:55:06 +02:00
CREDO23	84bde67979	feat(connectors): add Google Drive folder browsing and file listing - List folder contents with full pagination support - Query root folder or specific parent folder - Return both folders and files with metadata (size, icons, links) - Filter out shortcuts and trashed items	2025-12-28 15:54:58 +02:00
CREDO23	40304c6795	feat(connectors): add Google Drive content extraction using existing ETL - Download files from Google Drive to temporary location - Export Google Workspace files as PDF - Delegate content extraction to existing process_file_in_background - Reuse Surfsense's ETL services (Unstructured, LlamaCloud, Docling)	2025-12-28 15:54:50 +02:00
CREDO23	701c3409b3	feat(connectors): add Google Drive file type detection and mapping - Detect Google Workspace files (Docs, Sheets, Slides) - Map to PDF export format to preserve rich content (images, formatting) - Identify files to skip (shortcuts, unsupported types)	2025-12-28 15:54:42 +02:00
CREDO23	74386affdc	feat(connectors): add Google Drive API client wrapper - Build and manage Google Drive service with credentials - List files with query support and pagination - Download binary files and export Google Workspace files as PDF - Handle HTTP errors gracefully	2025-12-28 15:54:32 +02:00
CREDO23	2c8717b14b	feat(connectors): add Google Drive credentials module for OAuth management - Handle Google OAuth credential initialization and validation - Automatic token refresh with database persistence - Reuse existing tokens when valid	2025-12-28 15:54:26 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	9d0721de43	feat: Replace AsyncChromiumLoader with Playwright for web crawling and content extraction in link preview and web crawler connector modules.	2025-12-27 13:58:00 -08:00
Anish Sarkar	ebc04f590e	refactor: improve write_todos tool and UI components - Refactored the write_todos tool to enhance argument and result schemas using Zod for better validation and type safety. - Updated the WriteTodosToolUI to streamline the rendering logic and improve loading states, ensuring a smoother user experience. - Enhanced the Plan and TodoItem components to better handle streaming states and display progress, providing clearer feedback during task management. - Cleaned up code formatting and structure for improved readability and maintainability.	2025-12-26 17:49:56 +05:30
Anish Sarkar	d9df63f57e	refactor: enhance web crawling functionality with Firecrawl integration - Updated WebCrawlerConnector to prioritize Firecrawl API for crawling if an API key is provided, falling back to Chromium if Firecrawl fails. - Improved error handling to log failures from both Firecrawl and Chromium. - Enhanced link preview tool to use a random User-Agent for better compatibility with web servers. - Passed Firecrawl API key to the stream_new_chat function for improved configuration management.	2025-12-26 02:37:20 +05:30
CREDO23	64cd65bc1f	use trafilatura to extrack page content from the chromium result	2025-12-19 10:05:51 +02:00
CREDO23	1f60d1c22f	add user agent to AsyncChromiumLoader	2025-12-17 19:43:54 +02:00
CREDO23	4cfeffb38a	refactor: update the webcrawler connector formater	2025-12-17 18:42:37 +02:00
Differ	e238fab638	Merge remote-tracking branch 'upstream/main' into feat/bookstack-connector	2025-12-06 09:15:02 +08:00
Differ	500bc60d02	fix: add input validation, retry limit, code formatting, and exclude i18n from secret detection	2025-12-05 09:58:49 +08:00
CREDO23	803f792a9d	clean up	2025-12-04 12:55:19 +02:00
CREDO23	521cea3ef0	update query parmas for get issues by date range method	2025-12-04 12:53:18 +02:00
Differ	6b1b8d0f2e	feat: add BookStack connector for wiki documentation indexing	2025-12-04 14:08:44 +08:00
CREDO23	107f013ff9	jira-connector: update get_issues_by_date_range method	2025-12-04 01:21:46 +02:00
CREDO23	abf017eabb	jira-connector: update get_issues_by_date_range method	2025-12-04 00:48:54 +02:00
CREDO23	4df6b09db9	jira-connector: update get all issues method	2025-12-04 00:42:10 +02:00
CREDO23	875924e5fd	jira-connector: update make_api_request to accespt POST with payload	2025-12-04 00:38:13 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	0b1ca97acf	refactor(webcrawler): update scraping logic to use v2 API and improve error handling	2025-11-26 14:30:08 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	8f30cfd69a	chore(lint): ruff checks	2025-11-26 13:22:31 -08:00
samkul-swe	6d19e0fad8	Fixing search logic	2025-11-22 13:33:16 -08:00
samkul-swe	896e410e2a	Webcrawler connector draft	2025-11-21 23:27:21 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	a3a5b13f48	chore: linting	2025-11-03 16:00:58 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	e65d74f2e2	refactor: added batch commits and Increased task time limits in celery_app.py - Increased task time limits in celery_app.py for longer processing times. - Enhanced pagination logic in NotionHistoryConnector to handle large result sets. - Implemented batch commits every 10 documents across various indexers (Airtable, ClickUp, Confluence, Discord, GitHub, Google Calendar, Gmail, JIRA, Linear, Luma, Notion, Slack) to improve performance and reduce database load. - Updated final commit logging for clarity on total documents processed.	2025-11-03 15:57:19 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	0e6669ac4e	fix: celery_app path and gmail indexing	2025-10-21 21:11:41 -07:00
Anish Sarkar	bbb2abfc02	fix: ran formatter as per coderrabbitai	2025-10-17 02:44:44 +05:30
Anish Sarkar	0ff1b586a2	feat: update Elasticsearch integration and logging - revised Elasticsearch connector enum revision IDs - added `TaskLoggingService` to elasticsearch_indexer - integrated Elasticsearch into prompts.py as requested	2025-10-17 02:21:56 +05:30
Anish Sarkar	929035f802	Merge remote-tracking branch 'upstream/main' into feature/elasticsearch-connector	2025-10-16 16:24:37 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	31982cea9a	chore: removed content trunking for better UI	2025-10-14 14:19:48 -07:00
Anish Sarkar	55d752e3c8	feat: added elasticsearch connector	2025-10-12 09:39:04 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	aea09a5dad	feat: Moved searchconnectors association from user to searchspace - Need to move llm configs to searchspace	2025-10-08 21:13:01 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	94367e4226	chore: linting and formatting	2025-09-28 22:26:26 -07:00
samkul-swe	9d2b808e66	Added Luma connector	2025-09-28 14:59:10 -07:00
Rohan Verma	662212d4e2	Merge pull request #295 from CREDO23/feature/airtable-connector [Feature] Add Airtable connector	2025-09-03 12:49:14 -07:00
Rohan Verma	c2030cec48	Merge pull request #275 from CREDO23/improvement/persist-refreshed-token-in-google-related-connector [Improvement] Google connectors \| Update the connector config after refreshing the token	2025-08-26 18:47:36 -07:00
CREDO23	45d2c18c16	update airtable indexer	2025-08-26 19:17:46 +02:00
CREDO23	c4b7c45d6d	Add sirtable connector	2025-08-26 15:41:24 +02:00
CREDO23	ecbb1f27e0	clean up	2025-08-26 11:53:27 +02:00

1 2

93 commits