SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-06-22 21:28:12 +02:00

Author	SHA1	Message	Date
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	2f793e7a69	refactor: improve content extraction and encoding handling - Enhanced Azure Document Intelligence parser to raise an error for empty or whitespace-only content. - Updated LLMRouterService to log premium model strings more clearly. - Added automatic encoding detection for file reading in document processors. - Improved error handling for empty markdown content extraction in file processors. - Refactored DocumentUploadTab component for better accessibility and user interaction.	2026-04-16 00:25:46 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	656e061f84	feat: add processing mode support for document uploads and ETL pipeline, improded error handling ux Some checks are pending Build and Push Docker Images / tag_release (push) Waiting to run Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions - Introduced a `ProcessingMode` enum to differentiate between basic and premium processing modes. - Updated `EtlRequest` to include a `processing_mode` field, defaulting to basic. - Enhanced ETL pipeline services to utilize the selected processing mode for Azure Document Intelligence and LlamaCloud parsing. - Modified various routes and services to handle processing mode, affecting document upload and indexing tasks. - Improved error handling and logging to include processing mode details. - Added tests to validate processing mode functionality and its impact on ETL operations.	2026-04-14 21:26:00 -07:00
CREDO23	a95bf58c8f	Make Vision LLM opt-in for uploads and connectors	2026-04-10 16:45:51 +02:00
CREDO23	ff2a9c77f9	Pass vision_llm in legacy process_file_in_background path	2026-04-09 15:28:21 +02:00
CREDO23	7e90a8ed3c	Route uploaded images to vision LLM with document-parser fallback	2026-04-09 14:33:33 +02:00
Anish Sarkar	8d810467dd	refactor: add support for XHTML file conversion to markdown in document processors	2026-04-07 05:57:13 +05:30
Anish Sarkar	0a26a6c5bb	chore: ran linting	2026-04-07 05:55:39 +05:30
Anish Sarkar	dc7047f64d	refactor: implement file type classification for supported extensions across Dropbox, Google Drive, and OneDrive connectors, enhancing file handling and error management	2026-04-06 22:03:47 +05:30
Anish Sarkar	87af012a60	refactor: streamline file processing by integrating ETL pipeline for all file types and removing redundant functions	2026-04-05 17:45:18 +05:30
Anish Sarkar	1248363ca9	refactor: consolidate document processing logic and remove unused files and ETL strategies	2026-04-05 17:29:24 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	62e698d8aa	refactor: streamline document upload limits and enhance handling of mentioned documents - Updated maximum file size limit to 500 MB per file. - Removed restrictions on the number of files per upload and total upload size. - Enhanced handling of user-mentioning documents in the knowledge base search middleware. - Improved document reading and processing logic to accommodate new features and optimizations.	2026-04-02 19:39:10 -07:00
Anish Sarkar	de8841fb86	chore: ran linting	2026-03-21 13:20:13 +05:30
Anish Sarkar	d21593ee71	feat: unify handling of native and legacy document types for Google connectors - Introduced a mapping of native Google document types to their legacy Composio equivalents, ensuring seamless search and indexing for both types. - Updated relevant components to utilize the new mapping, enhancing the consistency of document type handling across the application. - Improved search functionality to transparently include legacy types, maintaining accessibility for older documents until re-indexed.	2026-03-20 03:41:32 +05:30
Anish Sarkar	83152e8e7e	refactor: unify all 3 google Composio and non-Composio connector types and pipelines keeping same credential adapters	2026-03-19 05:08:21 +05:30
Anish Sarkar	ac0f2fa2eb	chore: ran linting	2026-03-17 04:40:46 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	2b33dfe728	refactor: update safe_set_chunks function to be asynchronous and modify all connector and document processor files to use the new async implementation	2026-03-15 00:44:27 -07:00
Anish Sarkar	23a98d802c	refactor: implement UploadDocumentAdapter for file indexing and reindexing	2026-02-28 01:38:32 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	e9892c8fe9	feat: added configable summary calculation and various improvements - Replaced direct embedding calls with a utility function across various components to streamline embedding logic. - Added enable_summary flag to several models and routes to control summary generation behavior.	2026-02-26 18:24:57 -08:00
Anish Sarkar	9ccee054a5	chore: ran linting	2026-02-26 03:05:20 +05:30
Anish Sarkar	f59a70f7a5	Merge remote-tracking branch 'upstream/dev' into feat/document-test	2026-02-26 02:22:10 +05:30
Anish Sarkar	380c1c3877	fix: Refactor document ID usage in file processing to improve clarity	2026-02-26 01:28:09 +05:30
CREDO23	4293910e8e	plug file upload into indexing pipeline adapter and add integration tests	2026-02-25 20:20:52 +02:00
Anish Sarkar	8b497da130	feat: add source_markdown column to documents and implement migration logic for existing records using a pure-Python BlockNote JSON to Markdown converter	2026-02-17 11:34:11 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	17b7348f61	feat: fixed and improved search and background task management.	2026-02-09 14:03:56 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	20a13df7e7	Merge branch 'dev' of https://github.com/MODSetter/SurfSense into dev	2026-02-06 14:02:51 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	cdc217dbe2	feat: update YouTube transcript fetching to select primary language transcripts	2026-02-06 14:02:46 -08:00
Anish Sarkar	72205ce11b	feat: implement Redis heartbeat mechanism for document processing tasks and enhance stale notification cleanup	2026-02-06 18:09:05 +05:30
Anish Sarkar	0fdd194d92	Merge remote-tracking branch 'upstream/dev' into fix/documents	2026-02-06 12:13:26 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	1511c26ef5	feat: add residential proxy configuration for web crawling and YouTube transcript fetching	2026-02-05 20:44:13 -08:00
Anish Sarkar	aa66928154	chore: ran linting	2026-02-06 05:35:15 +05:30
Anish Sarkar	ed2fc5c636	feat: enhance document upload process with two-phase indexing and real-time status updates	2026-02-06 05:15:47 +05:30
Anish Sarkar	cc1e796c12	feat: implement two-phase document indexing for webcrawler and YouTube video processors with real-time status updates	2026-02-06 04:54:50 +05:30
Anish Sarkar	629f6f9cf5	feat: implement two-phase document indexing for Obsidian and Circleback connectors with real-time status updates	2026-02-06 04:35:13 +05:30
Anish Sarkar	c12401c1e8	feat: implement two-phase document indexing across Google connectors with real-time status updates	2026-02-06 02:24:35 +05:30
Anish Sarkar	bf08982029	feat: add connector_id to documents for source tracking and implement connector deletion task	2026-02-02 16:23:26 +05:30
Anish Sarkar	e0ade20e68	feat: add created_by_id column to documents for ownership tracking and update related connectors	2026-02-02 12:32:24 +05:30
CREDO23	949ec949f6	style(backend): run ruff format on 10 files	2026-01-28 22:20:02 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	b598cbeac3	feat(backend): Enhance LlamaCloud upload resilience with dynamic timeout calculations and increased retry settings	2026-01-27 17:50:45 -08:00
Anish Sarkar	e0be1b9133	chore: ran backend and frontend linting	2026-01-17 16:30:07 +05:30
Anish Sarkar	49efc50767	feat: enhance document processing with content hash deduplication - Added support for content hash fallback in document migration to prevent duplicate entries from different sources. - Improved existing document update logic to handle renaming and metadata updates more effectively, particularly for Google Drive files. - Updated functions to check for existing documents with enhanced logging for better traceability of duplicate content detection.	2026-01-17 15:39:36 +05:30
Anish Sarkar	6550c378b2	feat: enhance Google Drive document handling and UI integration - Implemented support for both new file_id-based and legacy filename-based hash schemes in document processing. - Added functions to generate unique identifier hashes and find existing documents with migration support. - Improved existing document update logic to handle content changes and metadata updates, particularly for Google Drive files. - Enhanced UI components to display appropriate file icons based on file types in the Google Drive connector. - Updated document processing functions to accommodate the new connector structure and ensure seamless integration.	2026-01-17 14:57:31 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	8aad15d392	Reapply "Merge pull request #686 from AnishSarkar22/feat/replace-logs" This reverts commit `3418c0e026`.	2026-01-16 11:32:06 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	3418c0e026	Revert "Merge pull request #686 from AnishSarkar22/feat/replace-logs" This reverts commit `5963a1125e`, reversing changes made to `0d2a2f8ea1`.	2026-01-16 00:49:33 -08:00
Anish Sarkar	ab63b23f0a	Merge remote-tracking branch 'upstream/dev' into feat/replace-logs	2026-01-15 15:52:47 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	7ae68455b3	chore: linting	2026-01-15 00:05:53 -08:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	bab89274e0	feat: implement LlamaCloud parsing with retry logic for transient errors	2026-01-14 23:53:17 -08:00
Anish Sarkar	5bd6bd3d67	chore: ran both frontend and backend linting	2026-01-14 02:05:40 +05:30
Anish Sarkar	12671ede0e	feat: Enhance document processing notifications and refactor related services - Introduced a new DocumentProcessingNotificationHandler to manage notifications for document processing stages. - Updated existing notification methods to include detailed progress updates for various stages (queued, parsing, chunking, embedding, storing, completed, failed). - Refactored NotificationService to support the new document processing notification type and metadata schema. - Updated multiple document processing tasks to create and manage notifications throughout the processing lifecycle. - Adjusted UI components to reflect changes in notification types and improve user experience during document uploads and processing.	2026-01-13 19:09:12 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	c19d300c9d	feat: added circleback connector	2025-12-30 09:00:59 -08:00
CREDO23	7618662e70	refactor: rename GOOGLE_DRIVE_CONNECTOR to GOOGLE_DRIVE_FILE document type	2025-12-29 20:38:26 +02:00

1 2

73 commits