SurfSense/surfsense_backend/app/indexing_pipeline
Anish Sarkar f7b52470eb feat: enhance Google connectors indexing with content extraction and document migration
- Added `download_and_extract_content` function to extract content from Google Drive files as markdown.
- Updated Google Drive indexer to utilize the new content extraction method.
- Implemented document migration logic to update legacy Composio document types to their native Google types.
- Introduced identifier hashing for stable document identification.
- Improved file pre-filtering to handle unchanged and rename-only files efficiently.
2026-03-25 18:33:44 +05:30
..
adapters refactor: implement UploadDocumentAdapter for file indexing and reindexing 2026-02-28 01:38:32 +05:30
__init__.py test: add ConnectorDocument unit tests and factory fixture 2026-02-24 22:20:08 +02:00
connector_document.py feat: enhance performance logging and caching in various components 2026-02-26 13:00:31 -08:00
document_chunker.py feat: enhance performance logging and caching in various components 2026-02-26 13:00:31 -08:00
document_embedder.py feat: re-export embed_texts from document_embedder 2026-03-09 15:54:02 +02:00
document_hashing.py feat: enhance Google connectors indexing with content extraction and document migration 2026-03-25 18:33:44 +05:30
document_persistence.py fix bugs in indexing pipeline exception handling 2026-02-25 16:27:12 +02:00
document_summarizer.py feat: enhance performance logging and caching in various components 2026-02-26 13:00:31 -08:00
exceptions.py Merge branch 'dev' of https://github.com/MODSetter/SurfSense into dev 2026-02-26 13:01:24 -08:00
indexing_pipeline_service.py feat: enhance Google connectors indexing with content extraction and document migration 2026-03-25 18:33:44 +05:30
pipeline_logger.py feat: enhance performance logging and caching in various components 2026-02-26 13:00:31 -08:00