SurfSense/surfsense_backend/app/tasks/document_processors
Anish Sarkar 49efc50767 feat: enhance document processing with content hash deduplication
- Added support for content hash fallback in document migration to prevent duplicate entries from different sources.
- Improved existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files.
- Updated functions to check for existing documents with enhanced logging for better traceability of duplicate content detection.
2026-01-17 15:39:36 +05:30
..
__init__.py Removed the CRAWLED_URL document processors 2025-11-21 23:27:21 -08:00
base.py feat: update document tracking to use 'updated_at' timestamp instead of 'last_edited_at' 2025-12-12 01:32:14 -08:00
circleback_processor.py feat: added circleback connector 2025-12-30 09:00:59 -08:00
extension_processor.py feat: update document tracking to use 'updated_at' timestamp instead of 'last_edited_at' 2025-12-12 01:32:14 -08:00
file_processors.py feat: enhance document processing with content hash deduplication 2026-01-17 15:39:36 +05:30
markdown_processor.py feat: enhance document processing with content hash deduplication 2026-01-17 15:39:36 +05:30
youtube_processor.py feat: update document tracking to use 'updated_at' timestamp instead of 'last_edited_at' 2025-12-12 01:32:14 -08:00