Commit graph

50 commits

Author SHA1 Message Date
DESKTOP-RTLN3BA\$punk
17b7348f61 feat: fixed and improved search and background task management. 2026-02-09 14:03:56 -08:00
DESKTOP-RTLN3BA\$punk
20a13df7e7 Merge branch 'dev' of https://github.com/MODSetter/SurfSense into dev 2026-02-06 14:02:51 -08:00
DESKTOP-RTLN3BA\$punk
cdc217dbe2 feat: update YouTube transcript fetching to select primary language transcripts 2026-02-06 14:02:46 -08:00
Anish Sarkar
72205ce11b feat: implement Redis heartbeat mechanism for document processing tasks and enhance stale notification cleanup 2026-02-06 18:09:05 +05:30
Anish Sarkar
0fdd194d92 Merge remote-tracking branch 'upstream/dev' into fix/documents 2026-02-06 12:13:26 +05:30
DESKTOP-RTLN3BA\$punk
1511c26ef5 feat: add residential proxy configuration for web crawling and YouTube transcript fetching 2026-02-05 20:44:13 -08:00
Anish Sarkar
aa66928154 chore: ran linting 2026-02-06 05:35:15 +05:30
Anish Sarkar
ed2fc5c636 feat: enhance document upload process with two-phase indexing and real-time status updates 2026-02-06 05:15:47 +05:30
Anish Sarkar
cc1e796c12 feat: implement two-phase document indexing for webcrawler and YouTube video processors with real-time status updates 2026-02-06 04:54:50 +05:30
Anish Sarkar
629f6f9cf5 feat: implement two-phase document indexing for Obsidian and Circleback connectors with real-time status updates 2026-02-06 04:35:13 +05:30
Anish Sarkar
c12401c1e8 feat: implement two-phase document indexing across Google connectors with real-time status updates 2026-02-06 02:24:35 +05:30
Anish Sarkar
bf08982029 feat: add connector_id to documents for source tracking and implement connector deletion task 2026-02-02 16:23:26 +05:30
Anish Sarkar
e0ade20e68 feat: add created_by_id column to documents for ownership tracking and update related connectors 2026-02-02 12:32:24 +05:30
CREDO23
949ec949f6 style(backend): run ruff format on 10 files 2026-01-28 22:20:02 +02:00
DESKTOP-RTLN3BA\$punk
b598cbeac3 feat(backend): Enhance LlamaCloud upload resilience with dynamic timeout calculations and increased retry settings 2026-01-27 17:50:45 -08:00
Anish Sarkar
e0be1b9133 chore: ran backend and frontend linting 2026-01-17 16:30:07 +05:30
Anish Sarkar
49efc50767 feat: enhance document processing with content hash deduplication
- Added support for content hash fallback in document migration to prevent duplicate entries from different sources.
- Improved existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
- Updated functions to check for existing documents with enhanced logging for better traceability of duplicate content detection.
2026-01-17 15:39:36 +05:30
Anish Sarkar
6550c378b2 feat: enhance Google Drive document handling and UI integration
- Implemented support for both new file_id-based and legacy filename-based hash schemes in document processing.
- Added functions to generate unique identifier hashes and find existing documents with migration support.
- Improved existing document update logic to handle content changes and metadata updates, particularly for Google Drive files.
- Enhanced UI components to display appropriate file icons based on file types in the Google Drive connector.
- Updated document processing functions to accommodate the new connector structure and ensure seamless integration.
2026-01-17 14:57:31 +05:30
DESKTOP-RTLN3BA\$punk
8aad15d392 Reapply "Merge pull request #686 from AnishSarkar22/feat/replace-logs"
This reverts commit 3418c0e026.
2026-01-16 11:32:06 -08:00
DESKTOP-RTLN3BA\$punk
3418c0e026 Revert "Merge pull request #686 from AnishSarkar22/feat/replace-logs"
This reverts commit 5963a1125e, reversing
changes made to 0d2a2f8ea1.
2026-01-16 00:49:33 -08:00
Anish Sarkar
ab63b23f0a Merge remote-tracking branch 'upstream/dev' into feat/replace-logs 2026-01-15 15:52:47 +05:30
DESKTOP-RTLN3BA\$punk
7ae68455b3 chore: linting 2026-01-15 00:05:53 -08:00
DESKTOP-RTLN3BA\$punk
bab89274e0 feat: implement LlamaCloud parsing with retry logic for transient errors 2026-01-14 23:53:17 -08:00
Anish Sarkar
5bd6bd3d67 chore: ran both frontend and backend linting 2026-01-14 02:05:40 +05:30
Anish Sarkar
12671ede0e feat: Enhance document processing notifications and refactor related services
- Introduced a new DocumentProcessingNotificationHandler to manage notifications for document processing stages.
- Updated existing notification methods to include detailed progress updates for various stages (queued, parsing, chunking, embedding, storing, completed, failed).
- Refactored NotificationService to support the new document processing notification type and metadata schema.
- Updated multiple document processing tasks to create and manage notifications throughout the processing lifecycle.
- Adjusted UI components to reflect changes in notification types and improve user experience during document uploads and processing.
2026-01-13 19:09:12 +05:30
DESKTOP-RTLN3BA\$punk
c19d300c9d feat: added circleback connector 2025-12-30 09:00:59 -08:00
CREDO23
7618662e70 refactor: rename GOOGLE_DRIVE_CONNECTOR to GOOGLE_DRIVE_FILE document type 2025-12-29 20:38:26 +02:00
CREDO23
27beac4f62 fix: Google Drive folder handling and connector page updates 2025-12-28 19:57:10 +02:00
CREDO23
a5935bc677 feat(connectors): add connector parameter to file processor for source tracking
- Add optional 'connector' parameter with 'type' and 'metadata' fields
- Create helper function _update_document_from_connector
- Use document_metadata column (not metadata) for JSON field
- Merge metadata with existing using dict spread operator
- Google Drive documents now marked as GOOGLE_DRIVE_CONNECTOR
- Backward compatible - no changes to existing logic
- Simple and clean implementation
2025-12-28 18:01:39 +02:00
DESKTOP-RTLN3BA\$punk
8c9aa68faa feat: update document tracking to use 'updated_at' timestamp instead of 'last_edited_at' 2025-12-12 01:32:14 -08:00
Anish Sarkar
5e53207edc refactor: update alembic migration revision ID and added some defaults for blocknote in file_processors.py file 2025-11-30 04:57:07 +05:30
Anish Sarkar
b98c312fb1 Merge remote-tracking branch 'upstream/main' into feature/blocknote-editor 2025-11-30 04:10:49 +05:30
Anish Sarkar
91bc344b56 feat: Added celery tasks to populate blocknote_document for existing documents 2025-11-30 03:49:43 +05:30
Anish Sarkar
3fac196c35 code quality issues fixed 2025-11-23 16:39:23 +05:30
Anish Sarkar
e68286f22e introduced blocknote editor 2025-11-23 15:23:31 +05:30
samkul-swe
8333697598 Removed the CRAWLED_URL document processors 2025-11-21 23:27:21 -08:00
DESKTOP-RTLN3BA\$punk
9466bf595c feat: Implement LLM configuration validation in create and update routes
- Added `validate_llm_config` function to `llm_service.py` for validating LLM configurations via test API calls.
- Integrated validation in `create_llm_config` and `update_llm_config` routes in `llm_config_routes.py`, raising HTTP exceptions for invalid configurations.
- Enhanced error handling to provide detailed feedback on configuration issues.
2025-11-05 12:15:05 -08:00
samkul-swe
e49c455c01 Making async 2025-11-04 15:27:57 -08:00
samkul-swe
b03365cded Add web crawling 2025-11-04 13:05:09 -08:00
Chirag
b3026e4412 fix: resolve ruff F823 error by importing getLogger and ERROR directly 2025-11-02 12:03:03 +05:30
Chirag
094bdfad45 fix: suppress pdfminer warnings to prevent upload halting
- Added warning suppression for pdfminer warnings during Docling PDF processing
- Suppresses 'Cannot set gray non-stroke color' warnings that cause uploads to halt
- Temporarily sets pdfminer logger to ERROR level during document processing
- Fixes issue where files ~34MB would fail due to pdfminer warning spam

Resolves issue where PDF uploads would halt with repeated pdfminer warnings
2025-11-01 17:42:23 +05:30
DESKTOP-RTLN3BA\$punk
4be9d099bf feat: added file limit tracking for a user 2025-10-30 14:58:08 -07:00
DESKTOP-RTLN3BA\$punk
18adf79649 feat(fix): document type filtering 2025-10-21 21:53:55 -07:00
DESKTOP-RTLN3BA\$punk
c99cd710ea feat: add unique identifier hash for documents to prevent duplicates across various connectors 2025-10-14 21:11:19 -07:00
DESKTOP-RTLN3BA\$punk
633ea3ac0f feat: moved LLMConfigs from User to SearchSpaces
- RBAC soon??
- Updated various services and routes to handle search space-specific LLM preferences.
- Modified frontend components to pass search space ID for LLM configuration management.
- Removed onboarding page and settings page as part of the refactor.
2025-10-10 00:50:29 -07:00
sandeeppainuly
7bb8e77ee1 Update transcript processing to use new API object attributes 2025-09-28 12:09:17 +02:00
sandeeppainuly
c08508c0c4 Fix YouTube transcript API: replace deprecated get_transcript with fetch method 2025-09-28 12:03:36 +02:00
DESKTOP-RTLN3BA\$punk
1c4c61eb04 feat: Fixed Document Summary Content across connectors and processors 2025-08-18 20:51:48 -07:00
DESKTOP-RTLN3BA\$punk
54374bd7be ruff format 2025-08-12 15:33:17 -07:00
DESKTOP-RTLN3BA\$punk
5aa52375c3 refactor: refactored background_tasks & indexing_tasks 2025-08-12 15:28:13 -07:00