SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-06-14 20:55:15 +02:00

CREDO23 0fb1d3d37b feat(etl-cache): route all file-based sources through the parse cache Every file ingestion path (Dropbox, Google Drive / Composio Drive, OneDrive, local folder, Obsidian, and the legacy upload handlers) now parses via the extract_with_cache facade instead of calling EtlPipelineService.extract directly, so identical bytes are deduplicated globally regardless of source. vision_llm is passed through, keeping the existing cacheability gate intact.		2026-06-12 14:47:25 +02:00
..
__init__.py	refactor: consolidate document processing logic and remove unused files and ETL strategies	2026-04-05 17:29:24 +05:30
_direct_converters.py	refactor: improve content extraction and encoding handling	2026-04-16 00:25:46 -07:00
_helpers.py	refactor: consolidate document processing logic and remove unused files and ETL strategies	2026-04-05 17:29:24 +05:30
_save.py	feat(backend): Remove LLM summaries from document indexing	2026-06-04 00:50:19 +05:30
base.py	chore: ran linting	2026-03-17 04:40:46 +05:30
circleback_processor.py	feat(backend): Remove LLM summaries from document indexing	2026-06-04 00:50:19 +05:30
extension_processor.py	feat(backend): Remove LLM summaries from document indexing	2026-06-04 00:50:19 +05:30
file_processors.py	feat(etl-cache): route all file-based sources through the parse cache	2026-06-12 14:47:25 +02:00
markdown_processor.py	feat(backend): Remove LLM summaries from document indexing	2026-06-04 00:50:19 +05:30
youtube_processor.py	feat(proxy): integrate Scrapling for enhanced web scraping capabilities	2026-06-09 00:15:10 -07:00