SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-06-26 21:39:43 +02:00

Author	SHA1	Message	Date
DhruvTilva	e16e4e2c5c	fix: guard missing text_as_html in Table element markdown conversion When the Unstructured API returns a Table element without text_as_html in its metadata (e.g. local install or free-tier API), the lambda was raising KeyError: 'text_as_html', crashing the entire document indexing pipeline for any file containing tables. Guard the key access with .get() and fall back to the plain extracted text content (x) so the pipeline continues and the table content is still indexed, just without HTML formatting.	2026-06-25 23:52:15 +05:30
CREDO23	32a6e54ce6	Merge remote-tracking branch 'upstream/dev' into features/documents-injestion-layered-cached	2026-06-14 11:30:33 +02:00
CREDO23	5a71769dba	fix(chunks): set position on remaining chunk insert paths document_converters, the github size-fallback chunker, revert_service restores, and the kb-persistence middleware now write explicit positions (the middleware read path also orders by position).	2026-06-12 18:53:08 +02:00
Anish Sarkar	3dd54230e7	fix(chat): normalize provider-safe message history	2026-06-12 02:17:37 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	ce952d2ad1	chore: linting	2026-06-09 00:42:26 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	640ef5f15d	feat(proxy): integrate Scrapling for enhanced web scraping capabilities - Replaced Playwright with Scrapling's fetchers in the web crawling and YouTube processing modules for improved performance and flexibility. - Updated proxy configuration to support dynamic proxy selection via environment variables. - Enhanced logging to track performance metrics during web scraping operations. - Refactored related modules to utilize the new proxy utilities and streamline the scraping process.	2026-06-09 00:15:10 -07:00
Anish Sarkar	81fa219b30	feat(backend): Remove LLM summaries from document indexing	2026-06-04 00:50:19 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	40ca9e6ed2	refactor: remove `search_surfsense_docs` tool and related references - Deleted the `search_surfsense_docs` tool and its associated files, streamlining the agent's toolset. - Updated various components and prompts to remove references to the now-removed tool, ensuring consistency across the codebase. - Adjusted documentation to direct users to the SurfSense documentation link for product-related queries instead.	2026-05-28 22:35:14 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	9d6e9b7e2d	feat: enhance task management and timeout configurations in multi-agent chat - Added new environment variables for controlling task execution limits, including `SURFSENSE_SUBAGENT_INVOKE_TIMEOUT_SECONDS`, `SURFSENSE_TASK_BATCH_CONCURRENCY`, and `SURFSENSE_TASK_BATCH_MAX_SIZE`. - Updated documentation to reflect new batch processing capabilities for `task` calls, allowing for concurrent execution of multiple subagent tasks. - Improved error handling and receipt generation for deliverables, ensuring consistent feedback on task status. - Refactored middleware to incorporate search space ID for better task management.	2026-05-27 14:58:10 -07:00
Anish Sarkar	6095b48b5f	feat(observability): add SurfSense metric helpers	2026-05-21 23:02:20 +05:30
CREDO23	d5ee8cc4cd	Merge remote-tracking branch 'upstream/dev' into improvement-agent-speed	2026-05-20 19:22:49 +02:00
CREDO23	a3d6fa6196	perf(document-converters): offload sync embed_text/embed_texts to thread generate_document_summary and create_document_chunks are async helpers called from the chat path and from many connector indexers. Both wrapped embed_text/embed_texts directly inside the coroutine, blocking the event loop for the full duration of the embedding call.	2026-05-20 10:03:42 +02:00
Anish Sarkar	01d7379914	refactor: add public URL handling for SurfSense documents across various components and schemas	2026-05-15 02:05:11 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	ca9bbee06d	chore: linting	2026-04-28 21:37:51 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	e6433f78c4	Merge commit '`61f4d05cd1`' into dev_mod	2026-04-28 09:25:41 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	31a372bb84	feat: updated agent harness	2026-04-28 09:22:19 -07:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	8d50f90060	chore: linting Some checks failed Obsidian Plugin Lint / lint (push) Has been cancelled	2026-04-27 14:04:50 -07:00
CREDO23	2d962f6dd2	Merge upstream/dev	2026-04-27 22:44:40 +02:00
CREDO23	d1080b1298	Extend new chat streaming for multimodal user turns	2026-04-24 18:48:02 +02:00
Anish Sarkar	9b1b9a90c0	Merge remote-tracking branch 'upstream/dev' into feat/obsidian-plugin	2026-04-24 21:34:55 +05:30
CREDO23	0eae96bffb	fix: harden MCP OAuth and connector edge cases	2026-04-22 20:54:42 +02:00
CREDO23	328219e46f	disable first-run indexing for live connectors	2026-04-21 21:52:17 +02:00
Anish Sarkar	99623a85d5	refactor: remove legacy Obsidian connector support	2026-04-22 00:10:24 +05:30
CREDO23	45acf9de15	add async retry utility with tenacity	2026-04-21 20:28:36 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	4a51ccdc2c	cloud: added openrouter integration with global configs	2026-04-15 23:46:29 -07:00
CREDO23	7e90a8ed3c	Route uploaded images to vision LLM with document-parser fallback	2026-04-09 14:33:33 +02:00
Anish Sarkar	20fa93f0ba	refactor: make Azure Document Intelligence an internal LLAMACLOUD accelerator instead of a standalone ETL service	2026-04-08 03:26:24 +05:30
Anish Sarkar	1fa8d1220b	feat: add support for Azure Document Intelligence in ETL pipeline	2026-04-08 00:59:12 +05:30
Anish Sarkar	0a26a6c5bb	chore: ran linting	2026-04-07 05:55:39 +05:30
Anish Sarkar	e7beeb2a36	refactor: unify file skipping logic across Dropbox, Google Drive, and OneDrive connectors by replacing classification checks with a centralized service-based approach, enhancing maintainability and consistency in file handling	2026-04-07 02:19:31 +05:30
Anish Sarkar	0fb92b7c56	refactor: streamline file skipping logic in Dropbox indexer by removing redundant checks, improving code clarity	2026-04-06 22:17:50 +05:30
Anish Sarkar	63a75052ca	Merge remote-tracking branch 'upstream/dev' into feat/unified-etl-pipeline	2026-04-06 22:04:51 +05:30
Anish Sarkar	dc7047f64d	refactor: implement file type classification for supported extensions across Dropbox, Google Drive, and OneDrive connectors, enhancing file handling and error management	2026-04-06 22:03:47 +05:30
Anish Sarkar	e814540727	refactor: move PKCE pair generatio for airtable - Removed the `generate_pkce_pair` function from `airtable_add_connector_route.py` and relocated it to `oauth_security.py` for better organization. - Updated imports in `airtable_add_connector_route.py` to reflect the new location of the PKCE generation function.	2026-04-04 03:36:54 +05:30
Anish Sarkar	8e6b1c77ea	feat: implement PKCE support in native Google OAuth flows - Added `generate_code_verifier` function to create a PKCE code verifier for enhanced security. - Updated Google Calendar, Drive, and Gmail connector routes to utilize the PKCE code verifier during OAuth authorization. - Modified state management to include the code verifier for secure state generation and validation.	2026-04-04 03:35:34 +05:30
Anish Sarkar	746c730b2e	chore: ran linting	2026-04-03 13:14:40 +05:30
Anish Sarkar	96a58d0d30	feat: implement local folder indexing and document versioning capabilities	2026-04-02 11:11:57 +05:30
Anish Sarkar	0d5b902c26	feat: extend Dropbox support in chat event streaming and connector naming for enhanced integration	2026-03-30 23:07:25 +05:30
Anish Sarkar	5bddde60cb	feat: implement Microsoft OneDrive connector with OAuth support and indexing capabilities	2026-03-28 14:31:25 +05:30
Anish Sarkar	489e48644f	fix: revert native excel parsing	2026-03-27 22:15:24 +05:30
Anish Sarkar	3da0ffd683	feat: add native Excel parsing and improve Google Drive content extraction - Introduced a new utility for parsing .xlsx files into markdown format, enhancing the ability to process Excel documents natively. - Updated the Google Drive content extractor to utilize the new Excel parsing functionality, allowing for better handling of spreadsheet files. - Enhanced file type detection and export logic to support various document formats, improving overall content extraction accuracy. - Added unit tests to ensure the correctness of the new Excel parsing feature and its integration with existing content extraction workflows.	2026-03-27 21:47:14 +05:30
Anish Sarkar	683a4c17dd	feat: implement thread-safe embedding access in document converters - Added a reentrant lock to ensure thread-safe access to the tokenizer and embedding model, preventing runtime errors during concurrent operations. - Updated the `truncate_for_embedding` and `embed_text` functions to utilize the lock, ensuring safe execution in multi-threaded environments. - Enhanced the `embed_texts` function to maintain thread safety while processing multiple texts for embedding.	2026-03-27 11:31:00 +05:30
Anish Sarkar	e37e6d2d18	chore: ran linting	2026-03-21 13:21:19 +05:30
Anish Sarkar	83152e8e7e	refactor: unify all 3 google Composio and non-Composio connector types and pipelines keeping same credential adapters	2026-03-19 05:08:21 +05:30
Anish Sarkar	8baba0693d	feat: ensure unique connector names for MCP connectors	2026-03-18 16:09:35 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	d8a05ae4d5	feat: refactor agent tools management and add UI integration - Added endpoint to list agent tools with metadata, excluding hidden tools. - Updated NewChatRequest and RegenerateRequest schemas to include disabled tools. - Integrated disabled tools management in the NewChatPage and Composer components. - Improved tool instructions and visibility in the system prompt. - Refactored tool registration to support hidden tools and default enabled states. - Enhanced document chunk creation to handle strict zip behavior. - Cleaned up imports and formatting across various files for consistency.	2026-03-10 17:36:26 -07:00
CREDO23	6eabfe2396	perf: conditional batch embedding — batch for API, sequential for local	2026-03-09 19:12:43 +02:00
CREDO23	c4f2e9a3a5	feat: use batch embedding in create_document_chunks	2026-03-09 16:21:14 +02:00
CREDO23	15aeec1fcb	feat: add embed_texts batch embedding utility	2026-03-09 15:53:40 +02:00
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	ecb0a25cc8	feat: enhance memory management and session handling in database operations - Introduced a shielded async session context manager to ensure safe session closure during cancellations. - Updated various database operations to utilize the new shielded session, preventing orphaned connections. - Added environment variables to optimize glibc memory management, improving overall application performance. - Implemented a function to trim the native heap, allowing for better memory reclamation on Linux systems.	2026-02-28 23:59:28 -08:00

1 2 3 4

162 commits