SurfSense/surfsense_backend/app/tasks/document_processors
Anish Sarkar 3da0ffd683 feat: add native Excel parsing and improve Google Drive content extraction
- Introduced a new utility for parsing .xlsx files into markdown format, enhancing the ability to process Excel documents natively.
- Updated the Google Drive content extractor to utilize the new Excel parsing functionality, allowing for better handling of spreadsheet files.
- Enhanced file type detection and export logic to support various document formats, improving overall content extraction accuracy.
- Added unit tests to ensure the correctness of the new Excel parsing feature and its integration with existing content extraction workflows.
2026-03-27 21:47:14 +05:30
..
__init__.py Removed the CRAWLED_URL document processors 2025-11-21 23:27:21 -08:00
base.py chore: ran linting 2026-03-17 04:40:46 +05:30
circleback_processor.py refactor: update safe_set_chunks function to be asynchronous and modify all connector and document processor files to use the new async implementation 2026-03-15 00:44:27 -07:00
extension_processor.py refactor: update safe_set_chunks function to be asynchronous and modify all connector and document processor files to use the new async implementation 2026-03-15 00:44:27 -07:00
file_processors.py feat: add native Excel parsing and improve Google Drive content extraction 2026-03-27 21:47:14 +05:30
markdown_processor.py feat: unify handling of native and legacy document types for Google connectors 2026-03-20 03:41:32 +05:30
youtube_processor.py refactor: update safe_set_chunks function to be asynchronous and modify all connector and document processor files to use the new async implementation 2026-03-15 00:44:27 -07:00