Commit graph

93 commits

Author SHA1 Message Date
CREDO23
9c78726b6b feat: add file selection to Google Drive connector
- Add structured request body with folders and files arrays
- Support individual file indexing alongside folder indexing
- Remove deprecated folder_ids/folder_names query params
- Update UI to allow selecting both folders and files
2025-12-31 14:15:07 +02:00
DESKTOP-RTLN3BA\$punk
c19d300c9d feat: added circleback connector 2025-12-30 09:00:59 -08:00
CREDO23
7618662e70 refactor: rename GOOGLE_DRIVE_CONNECTOR to GOOGLE_DRIVE_FILE document type 2025-12-29 20:38:26 +02:00
CREDO23
16bc991b13 feat: add Google Drive connector to knowledge base search 2025-12-29 18:13:27 +02:00
CREDO23
c5c61a2c6b Merge branch 'dev' into google-drive-connector
Merge in dev
2025-12-28 19:00:09 +02:00
CREDO23
acf47e3b0c refactor(connectors): remove verbose docstrings and obvious comments
- Simplify module docstrings (remove meta-commentary about 'small focused modules')
- Remove redundant inline comments (e.g., 'Log task start', 'Get connector from database')
- Trim verbose function docstrings to essential information only
- Remove over-explanatory comments that restate what code does
- Keep necessary documentation, remove noise for better readability
2025-12-28 18:53:13 +02:00
CREDO23
a5935bc677 feat(connectors): add connector parameter to file processor for source tracking
- Add optional 'connector' parameter with 'type' and 'metadata' fields
- Create helper function _update_document_from_connector
- Use document_metadata column (not metadata) for JSON field
- Merge metadata with existing using dict spread operator
- Google Drive documents now marked as GOOGLE_DRIVE_CONNECTOR
- Backward compatible - no changes to existing logic
- Simple and clean implementation
2025-12-28 18:01:39 +02:00
CREDO23
b2b891e4d7 fix(connectors): properly commit Google Drive document type changes
- Return file metadata from content_extractor for indexer to use
- Update document type and metadata in indexer after processing
- Explicitly commit changes to database
- Ensures documents are properly marked as GOOGLE_DRIVE_CONNECTOR type
2025-12-28 17:15:29 +02:00
CREDO23
9f1fd20944 feat(connectors): mark Google Drive documents with GOOGLE_DRIVE_CONNECTOR type
- Change document_type from file type (PDF, DOCX) to GOOGLE_DRIVE_CONNECTOR
- Store original file type in metadata for reference
- Add Google Drive specific metadata (file_id, mime_type, source)
- Include export format info for Google Workspace files
- Enables proper source tracking and bulk management
2025-12-28 16:55:14 +02:00
CREDO23
3e67d5f31e feat(connectors): add Google Drive delta sync with change tracking
- Get start page token for change tracking baseline
- Fetch incremental changes using Google Drive Changes API
- Categorize changes into added, modified, and removed files
- Enable efficient re-indexing of only changed content
2025-12-28 15:55:06 +02:00
CREDO23
84bde67979 feat(connectors): add Google Drive folder browsing and file listing
- List folder contents with full pagination support
- Query root folder or specific parent folder
- Return both folders and files with metadata (size, icons, links)
- Filter out shortcuts and trashed items
2025-12-28 15:54:58 +02:00
CREDO23
40304c6795 feat(connectors): add Google Drive content extraction using existing ETL
- Download files from Google Drive to temporary location
- Export Google Workspace files as PDF
- Delegate content extraction to existing process_file_in_background
- Reuse Surfsense's ETL services (Unstructured, LlamaCloud, Docling)
2025-12-28 15:54:50 +02:00
CREDO23
701c3409b3 feat(connectors): add Google Drive file type detection and mapping
- Detect Google Workspace files (Docs, Sheets, Slides)
- Map to PDF export format to preserve rich content (images, formatting)
- Identify files to skip (shortcuts, unsupported types)
2025-12-28 15:54:42 +02:00
CREDO23
74386affdc feat(connectors): add Google Drive API client wrapper
- Build and manage Google Drive service with credentials
- List files with query support and pagination
- Download binary files and export Google Workspace files as PDF
- Handle HTTP errors gracefully
2025-12-28 15:54:32 +02:00
CREDO23
2c8717b14b feat(connectors): add Google Drive credentials module for OAuth management
- Handle Google OAuth credential initialization and validation
- Automatic token refresh with database persistence
- Reuse existing tokens when valid
2025-12-28 15:54:26 +02:00
DESKTOP-RTLN3BA\$punk
9d0721de43 feat: Replace AsyncChromiumLoader with Playwright for web crawling and content extraction in link preview and web crawler connector modules. 2025-12-27 13:58:00 -08:00
Anish Sarkar
ebc04f590e refactor: improve write_todos tool and UI components
- Refactored the write_todos tool to enhance argument and result schemas using Zod for better validation and type safety.
- Updated the WriteTodosToolUI to streamline the rendering logic and improve loading states, ensuring a smoother user experience.
- Enhanced the Plan and TodoItem components to better handle streaming states and display progress, providing clearer feedback during task management.
- Cleaned up code formatting and structure for improved readability and maintainability.
2025-12-26 17:49:56 +05:30
Anish Sarkar
d9df63f57e refactor: enhance web crawling functionality with Firecrawl integration
- Updated WebCrawlerConnector to prioritize Firecrawl API for crawling if an API key is provided, falling back to Chromium if Firecrawl fails.
- Improved error handling to log failures from both Firecrawl and Chromium.
- Enhanced link preview tool to use a random User-Agent for better compatibility with web servers.
- Passed Firecrawl API key to the stream_new_chat function for improved configuration management.
2025-12-26 02:37:20 +05:30
CREDO23
64cd65bc1f use trafilatura to extrack page content from the chromium result 2025-12-19 10:05:51 +02:00
CREDO23
1f60d1c22f add user agent to AsyncChromiumLoader 2025-12-17 19:43:54 +02:00
CREDO23
4cfeffb38a refactor: update the webcrawler connector formater 2025-12-17 18:42:37 +02:00
Differ
e238fab638 Merge remote-tracking branch 'upstream/main' into feat/bookstack-connector 2025-12-06 09:15:02 +08:00
Differ
500bc60d02 fix: add input validation, retry limit, code formatting, and exclude i18n from secret detection 2025-12-05 09:58:49 +08:00
CREDO23
803f792a9d clean up 2025-12-04 12:55:19 +02:00
CREDO23
521cea3ef0 update query parmas for get issues by date range method 2025-12-04 12:53:18 +02:00
Differ
6b1b8d0f2e feat: add BookStack connector for wiki documentation indexing 2025-12-04 14:08:44 +08:00
CREDO23
107f013ff9 jira-connector: update get_issues_by_date_range method 2025-12-04 01:21:46 +02:00
CREDO23
abf017eabb jira-connector: update get_issues_by_date_range method 2025-12-04 00:48:54 +02:00
CREDO23
4df6b09db9 jira-connector: update get all issues method 2025-12-04 00:42:10 +02:00
CREDO23
875924e5fd jira-connector: update make_api_request to accespt POST with payload 2025-12-04 00:38:13 +02:00
DESKTOP-RTLN3BA\$punk
0b1ca97acf refactor(webcrawler): update scraping logic to use v2 API and improve error handling 2025-11-26 14:30:08 -08:00
DESKTOP-RTLN3BA\$punk
8f30cfd69a chore(lint): ruff checks 2025-11-26 13:22:31 -08:00
samkul-swe
6d19e0fad8 Fixing search logic 2025-11-22 13:33:16 -08:00
samkul-swe
896e410e2a Webcrawler connector draft 2025-11-21 23:27:21 -08:00
DESKTOP-RTLN3BA\$punk
a3a5b13f48 chore: linting 2025-11-03 16:00:58 -08:00
DESKTOP-RTLN3BA\$punk
e65d74f2e2 refactor: added batch commits and Increased task time limits in celery_app.py
- Increased task time limits in celery_app.py for longer processing times.
- Enhanced pagination logic in NotionHistoryConnector to handle large result sets.
- Implemented batch commits every 10 documents across various indexers (Airtable, ClickUp, Confluence, Discord, GitHub, Google Calendar, Gmail, JIRA, Linear, Luma, Notion, Slack) to improve performance and reduce database load.
- Updated final commit logging for clarity on total documents processed.
2025-11-03 15:57:19 -08:00
DESKTOP-RTLN3BA\$punk
0e6669ac4e fix: celery_app path and gmail indexing 2025-10-21 21:11:41 -07:00
Anish Sarkar
bbb2abfc02 fix: ran formatter as per coderrabbitai 2025-10-17 02:44:44 +05:30
Anish Sarkar
0ff1b586a2 feat: update Elasticsearch integration and logging
- revised Elasticsearch connector enum revision IDs
- added `TaskLoggingService` to elasticsearch_indexer
- integrated Elasticsearch into prompts.py as requested
2025-10-17 02:21:56 +05:30
Anish Sarkar
929035f802 Merge remote-tracking branch 'upstream/main' into feature/elasticsearch-connector 2025-10-16 16:24:37 +05:30
DESKTOP-RTLN3BA\$punk
31982cea9a chore: removed content trunking for better UI 2025-10-14 14:19:48 -07:00
Anish Sarkar
55d752e3c8 feat: added elasticsearch connector 2025-10-12 09:39:04 +05:30
DESKTOP-RTLN3BA\$punk
aea09a5dad feat: Moved searchconnectors association from user to searchspace
- Need to move llm configs to searchspace
2025-10-08 21:13:01 -07:00
DESKTOP-RTLN3BA\$punk
94367e4226 chore: linting and formatting 2025-09-28 22:26:26 -07:00
samkul-swe
9d2b808e66 Added Luma connector 2025-09-28 14:59:10 -07:00
Rohan Verma
662212d4e2
Merge pull request #295 from CREDO23/feature/airtable-connector
[Feature]  Add Airtable connector
2025-09-03 12:49:14 -07:00
Rohan Verma
c2030cec48
Merge pull request #275 from CREDO23/improvement/persist-refreshed-token-in-google-related-connector
[Improvement] Google connectors | Update the connector config after refreshing the token
2025-08-26 18:47:36 -07:00
CREDO23
45d2c18c16 update airtable indexer 2025-08-26 19:17:46 +02:00
CREDO23
c4b7c45d6d Add sirtable connector 2025-08-26 15:41:24 +02:00
CREDO23
ecbb1f27e0 clean up 2025-08-26 11:53:27 +02:00