SurfSense/surfsense_backend/app/tasks/document_processors
CREDO23 0fb1d3d37b feat(etl-cache): route all file-based sources through the parse cache
Every file ingestion path (Dropbox, Google Drive / Composio Drive, OneDrive,
local folder, Obsidian, and the legacy upload handlers) now parses via the
extract_with_cache facade instead of calling EtlPipelineService.extract
directly, so identical bytes are deduplicated globally regardless of source.
vision_llm is passed through, keeping the existing cacheability gate intact.
2026-06-12 14:47:25 +02:00
..
__init__.py refactor: consolidate document processing logic and remove unused files and ETL strategies 2026-04-05 17:29:24 +05:30
_direct_converters.py refactor: improve content extraction and encoding handling 2026-04-16 00:25:46 -07:00
_helpers.py refactor: consolidate document processing logic and remove unused files and ETL strategies 2026-04-05 17:29:24 +05:30
_save.py feat(backend): Remove LLM summaries from document indexing 2026-06-04 00:50:19 +05:30
base.py chore: ran linting 2026-03-17 04:40:46 +05:30
circleback_processor.py feat(backend): Remove LLM summaries from document indexing 2026-06-04 00:50:19 +05:30
extension_processor.py feat(backend): Remove LLM summaries from document indexing 2026-06-04 00:50:19 +05:30
file_processors.py feat(etl-cache): route all file-based sources through the parse cache 2026-06-12 14:47:25 +02:00
markdown_processor.py feat(backend): Remove LLM summaries from document indexing 2026-06-04 00:50:19 +05:30
youtube_processor.py feat(proxy): integrate Scrapling for enhanced web scraping capabilities 2026-06-09 00:15:10 -07:00