Commit graph

1307 commits

Author SHA1 Message Date
CREDO23
fffef4cb5e perf: add missing index on chunks.document_id for faster search retrieval 2026-03-09 21:16:53 +02:00
CREDO23
6eabfe2396 perf: conditional batch embedding — batch for API, sequential for local 2026-03-09 19:12:43 +02:00
CREDO23
a49a4db6d6 perf: use asyncio.to_thread for embed_text in FastAPI paths 2026-03-09 16:33:24 +02:00
CREDO23
c4f2e9a3a5 feat: use batch embedding in create_document_chunks 2026-03-09 16:21:14 +02:00
CREDO23
929445afd9 feat: use batch embedding in IndexingPipelineService.index 2026-03-09 16:13:44 +02:00
CREDO23
cb4b155b9d feat: re-export embed_texts from document_embedder 2026-03-09 15:54:02 +02:00
CREDO23
15aeec1fcb feat: add embed_texts batch embedding utility 2026-03-09 15:53:40 +02:00
Anish Sarkar
e8cf677b25 refactor: update title generation logic to improve user experience by generating titles in parallel with assistant responses 2026-03-09 01:28:53 +05:30
Anish Sarkar
a11c95e30f feat: add last_login column to user table and update user login tracking 2026-03-08 18:24:29 +05:30
Anish Sarkar
f8b0e946ce chore: ran linting 2026-03-07 04:46:48 +05:30
Anish Sarkar
2ea67c1764 Merge remote-tracking branch 'upstream/dev' into feat/document-revamp 2026-03-07 04:37:37 +05:30
Anish Sarkar
1a688c7161 feat: enhance notifications system by introducing category-based filtering for comments and status, improving user experience in the inbox and API interactions 2026-03-06 19:35:35 +05:30
Anish Sarkar
bd783cc2d0 feat: add filtering options for notifications by 'unread' and 'errors', enhancing user experience in the notifications list 2026-03-06 18:32:28 +05:30
Anish Sarkar
d03f938fcd feat: implement source type filtering in notifications API and UI, enhancing user experience by allowing users to filter notifications by connector and document types in the status tab 2026-03-06 17:25:07 +05:30
Anish Sarkar
dc6c18b3f6 refactor: add sorting functionality to document retrieval and enhance DocumentsSidebar for improved search and pagination handling 2026-03-06 11:22:33 +05:30
Rohan Verma
7b6f832483
Merge pull request #842 from AnishSarkar22/refactor/upload-document-adapter-class
refactor: Migrate document reindexing to `UploadDocumentAdapter` with unified indexing pipeline
2026-03-05 18:45:16 -08:00
DESKTOP-RTLN3BA\$punk
81fb1f327c fix: update LLM retrieval in podcast transcript creation to use agent LLM instead of document summary LLM 2026-03-03 13:35:29 -08:00
Rohan Verma
672b4e1808
Merge pull request #838 from AnishSarkar22/fix/docker
feat: docker-compose and docker CI pipeline enhancements
2026-03-02 13:54:27 -08:00
Anish Sarkar
d24691a968 fix: increase timeout for alembic migrations in entrypoint script to prevent premature failures 2026-03-02 23:45:24 +05:30
Anish Sarkar
6d00b0debf Merge remote-tracking branch 'upstream/dev' into refactor/upload-document-adapter-class 2026-03-01 22:35:17 +05:30
DESKTOP-RTLN3BA\$punk
ecb0a25cc8 feat: enhance memory management and session handling in database operations
- Introduced a shielded async session context manager to ensure safe session closure during cancellations.
- Updated various database operations to utilize the new shielded session, preventing orphaned connections.
- Added environment variables to optimize glibc memory management, improving overall application performance.
- Implemented a function to trim the native heap, allowing for better memory reclamation on Linux systems.
2026-02-28 23:59:28 -08:00
DESKTOP-RTLN3BA\$punk
dd3da2bc36 refactor: improve session management and cleanup in chat streaming
- Added proper session closure to prevent connection leaks during streaming.
- Implemented a fresh session for cleanup tasks to ensure data integrity.
- Enhanced error handling during session operations to improve robustness.
- Removed unnecessary session parameters from function signatures for clarity.
2026-02-28 23:17:11 -08:00
DESKTOP-RTLN3BA\$punk
40a091f8cc feat: enhance knowledge base search and document retrieval
- Introduced a mechanism to identify degenerate queries that lack meaningful search signals, improving search accuracy.
- Implemented a fallback method for browsing recent documents when queries are degenerate, ensuring relevant results are returned.
- Added limits on the number of chunks fetched per document to optimize performance and prevent excessive data loading.
- Updated the ConnectorService to allow for reusable query embeddings, enhancing efficiency in search operations.
- Enhanced LLM router service to support context window fallbacks, improving robustness during context window limitations.
2026-02-28 19:40:24 -08:00
DESKTOP-RTLN3BA\$punk
d959a6a6c8 feat: optimize document upload process and enhance memory management
- Increased maximum file upload limit from 10 to 50 to improve user experience.
- Implemented batch processing for document uploads to avoid proxy timeouts, splitting files into manageable chunks.
- Enhanced garbage collection in chat streaming functions to prevent memory leaks and improve performance.
- Added memory delta tracking in system snapshots for better monitoring of resource usage.
- Updated LLM router and service configurations to prevent unbounded internal accumulation and improve efficiency.
2026-02-28 17:22:34 -08:00
DESKTOP-RTLN3BA\$punk
f4b2ab0899 feat: enhance caching mechanisms to prevent memory leaks
- Improved in-memory rate limiting by evicting timestamps outside the current window and cleaning up empty keys.
- Updated LLM router service to cache context profiles and avoid redundant computations.
- Introduced cache eviction logic for MCP tools and sandbox instances to manage memory usage effectively.
- Added garbage collection triggers in chat streaming functions to reclaim resources promptly.
2026-02-27 17:56:00 -08:00
DESKTOP-RTLN3BA\$punk
0e723a5b8b feat: perf optimizations
- improved search_knowledgebase_tool
- Added new endpoint to batch-fetch comments for multiple messages, reducing the number of API calls.
- Introduced CommentBatchRequest and CommentBatchResponse schemas for handling batch requests and responses.
- Updated chat_comments_service to validate message existence and permissions before fetching comments.
- Enhanced frontend with useBatchCommentsPreload hook to optimize comment loading for assistant messages.
2026-02-27 17:19:25 -08:00
DESKTOP-RTLN3BA\$punk
664c43ca13 feat: add performance logging middleware and enhance performance tracking across services
- Introduced RequestPerfMiddleware to log request performance metrics, including slow request thresholds.
- Updated various services and retrievers to utilize the new performance logging utility for better tracking of execution times.
- Enhanced existing methods with detailed performance logs for operations such as embedding, searching, and indexing.
- Removed deprecated logging setup in stream_new_chat and replaced it with the new performance logger.
2026-02-27 16:32:30 -08:00
Anish Sarkar
b2bf00e11a chore: ran linting 2026-02-28 02:28:03 +05:30
Anish Sarkar
ce82807f16 test: enhance reindexing tests for UploadDocumentAdapter 2026-02-28 02:18:02 +05:30
Anish Sarkar
37f76a8533 test: add should_summarize parameter to file upload adapter tests 2026-02-28 01:44:41 +05:30
Anish Sarkar
23a98d802c refactor: implement UploadDocumentAdapter for file indexing and reindexing 2026-02-28 01:38:32 +05:30
DESKTOP-RTLN3BA\$punk
1e4b8d3e89 feat: enhance document formatting and context management for LLM tools
- Introduced dynamic character budget calculation for document formatting based on model's context window.
- Updated `format_documents_for_context` to respect character limits and improve output quality.
- Added `max_input_tokens` parameter to various functions to facilitate context-aware processing.
- Enhanced error handling for context overflow in LLM router service.
2026-02-26 20:47:19 -08:00
DESKTOP-RTLN3BA\$punk
a4dc84d1ab feat: add should_summarize parameter to task dispatchers
- Introduced should_summarize parameter in TaskDispatcher and CeleryTaskDispatcher to control summary generation.
- Updated InlineTaskDispatcher to support the new parameter for document processing.
2026-02-26 19:12:37 -08:00
DESKTOP-RTLN3BA\$punk
6f4bf11a32 Merge branch 'dev' of https://github.com/MODSetter/SurfSense into dev 2026-02-26 18:25:05 -08:00
DESKTOP-RTLN3BA\$punk
e9892c8fe9 feat: added configable summary calculation and various improvements
- Replaced direct embedding calls with a utility function across various components to streamline embedding logic.
- Added enable_summary flag to several models and routes to control summary generation behavior.
2026-02-26 18:24:57 -08:00
Anish Sarkar
f419efcde1 Merge remote-tracking branch 'upstream/dev' into fix/docker 2026-02-27 05:00:23 +05:30
Rohan Verma
2f08dc9cf4
Merge pull request #839 from AnishSarkar22/feat/document-test
fix: enhanced document upload, page limit, upload limit tests
2026-02-26 13:48:45 -08:00
DESKTOP-RTLN3BA\$punk
23f553ef84 Merge branch 'dev' of https://github.com/MODSetter/SurfSense into dev 2026-02-26 13:01:24 -08:00
DESKTOP-RTLN3BA\$punk
aabc24f82c feat: enhance performance logging and caching in various components
- Introduced slow callback logging in FastAPI to identify blocking calls.
- Added performance logging for agent creation and tool loading processes.
- Implemented caching for MCP tools to reduce redundant server calls.
- Enhanced sandbox management with in-process caching for improved efficiency.
- Refactored several functions for better readability and performance tracking.
- Updated tests to ensure proper functionality of new features and optimizations.
2026-02-26 13:00:31 -08:00
Anish Sarkar
836d5293df refactor: remove unused TestStatusPolling class from document upload integration tests 2026-02-27 01:52:35 +05:30
Anish Sarkar
fd032f3709 refactor: simplify and clarify documentation in document upload integration tests 2026-02-27 01:48:25 +05:30
Anish Sarkar
7c09958ddc refactor: enhance document upload integration tests for API contract validation 2026-02-27 01:24:20 +05:30
Anish Sarkar
1068ea25a7 refactor: standardize test database configuration across test files 2026-02-27 00:45:51 +05:30
Anish Sarkar
f09b5b0ea4 refactor: replace hardcoded embedding dimension with dynamic configuration
- Updated the embedding dimension in test configurations to use the value from the application config, enhancing maintainability and consistency across tests.
2026-02-27 00:17:39 +05:30
Anish Sarkar
223c2de0d2 refactor: update database connection handling in test configurations 2026-02-27 00:05:21 +05:30
Anish Sarkar
87711ee381 chore: clean up .env.example and pyproject.toml
- Removed commented-out testing configuration from .env.example to streamline the file.
- Updated markers in pyproject.toml to remove the e2e test marker, clarifying the purpose of the remaining markers.
2026-02-26 23:56:01 +05:30
Anish Sarkar
3393e435f9 feat: implement task dispatcher for document processing
- Introduced a TaskDispatcher abstraction to decouple the upload endpoint from Celery, allowing for easier testing with synchronous implementations.
- Updated the create_documents_file_upload function to utilize the new dispatcher for task management.
- Removed direct Celery task imports from the upload function, enhancing modularity.
- Added integration tests for document upload, including page limit enforcement and file size restrictions.
2026-02-26 23:55:47 +05:30
Anish Sarkar
b5874a587a chore: resolve merge conflict by removing legacy all-in-one Docker files
Keeps the deletion of Dockerfile.allinone, docker-compose.yml (root), and
scripts/docker/entrypoint-allinone.sh from fix/docker. Ports the Daytona
sandbox env vars added by upstream/dev into docker/docker-compose.yml and
docker/docker-compose.dev.yml instead.

Made-with: Cursor
2026-02-26 13:57:56 +05:30
Anish Sarkar
bf60a5049f feat: add end-to-end test for document searchability after upload
- Introduced a new test class to verify that uploaded documents appear in search results once their status is ready.
- Implemented assertions to ensure the uploaded document's ID is present in the search response.
2026-02-26 03:33:37 +05:30
Anish Sarkar
9ccee054a5 chore: ran linting 2026-02-26 03:05:20 +05:30