SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-06-24 21:38:09 +02:00

Author	SHA1	Message	Date
CREDO23	aee0f1ef7d	add persist_scratch_index unit tests	2026-06-17 14:59:24 +02:00
CREDO23	a8a1f01945	update index batch parallel tests	2026-06-17 14:59:24 +02:00
CREDO23	f82dedf712	feat(indexing): add pure chunk reconciler for content-addressed diffs Greedy multiset match on chunk text decides which rows keep their embeddings, which texts need embedding, and which rows are deleted. No DB, no embeddings; fully unit-tested (reuse, head insert, middle edit, deletion, duplicates, reorder, full rewrite).	2026-06-12 18:52:46 +02:00
CREDO23	91d947ff79	refactor(embedding-cache): rename index cache to embedding cache The cached payload is the indexing pipeline's embeddings (markdown is chunked then embedded), so "embedding cache" names the expensive output directly and removes the "index" ambiguity (DB index vs vector index vs indexing phase). Renames the service, settings, eligibility, eviction task, metrics, config flags (INDEX_CACHE_* -> EMBEDDING_CACHE_*), object prefix, and the table (index_cache_embedding_sets -> embedding_cache_sets) with its constraint and indexes. Migration 161 renamed accordingly.	2026-06-12 17:00:01 +02:00
CREDO23	8cf578d965	test(index-cache): add unit tests and repoint embed/chunk patch targets	2026-06-12 16:48:18 +02:00
Anish Sarkar	e588782a9b	refactor(tests): Update tests to remove summary references and adjust for embedding errors	2026-06-04 01:51:21 +05:30
Anish Sarkar	ddfe60c2f0	feat(tests): Update tests for summary-free indexing	2026-06-04 00:53:51 +05:30
Anish Sarkar	8de7d86d56	Merge remote-tracking branch 'upstream/dev' into fix/backend-tests	2026-05-16 19:40:01 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	9fb9778bd0	test: enhance index batch parallel tests to include hybrid chunker Updated the test for the indexing pipeline to verify that both the standard and hybrid chunkers are called via asyncio.to_thread, ensuring non-blocking behavior. This change reflects the routing of non-code documents through the hybrid chunker, maintaining the event loop contract.	2026-05-15 18:02:04 -07:00
Anish Sarkar	9b926b3133	refactor: update test for index() to use chunk_text_hybrid	2026-05-13 00:22:43 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	2cc2d339e6	feat: made agent file sytem optimized	2026-03-28 16:39:46 -07:00
Anish Sarkar	4fd776e7ef	feat: implement parallel indexing for Google Calendar and Gmail connectors - Refactored Google Calendar and Gmail indexers to utilize the new `index_batch_parallel` method for concurrent document indexing, enhancing performance. - Updated the indexing logic to replace serial processing with parallel execution, allowing for improved efficiency in handling multiple documents. - Adjusted logging and error handling to accommodate the new parallel processing approach, ensuring robust operation during indexing. - Enhanced unit tests to validate the functionality of the parallel indexing method and its integration with existing workflows.	2026-03-26 19:34:04 +05:30
Anish Sarkar	e5cb6bfacf	feat: implement parallel document indexing in IndexingPipelineService - Added `index_batch_parallel` method to enable concurrent indexing of documents with bounded concurrency, improving performance and efficiency. - Refactored existing indexing logic to utilize `asyncio.to_thread` for non-blocking execution of embedding and chunking functions. - Introduced unit tests to validate the functionality of the new parallel indexing method, ensuring robustness and error handling during document processing.	2026-03-26 19:33:49 +05:30
Anish Sarkar	8c41fd91ba	feat: add integration tests for indexing pipeline components - Introduced integration tests for Calendar, Drive, and Gmail indexers to ensure proper document creation and migration. - Added tests for batch indexing functionality to validate the processing of multiple documents. - Implemented tests for legacy document migration to verify updates to document types and hashes. - Enhanced test coverage for the IndexingPipelineService to ensure robust functionality across various document types.	2026-03-25 18:34:02 +05:30
Anish Sarkar	f7b52470eb	feat: enhance Google connectors indexing with content extraction and document migration - Added `download_and_extract_content` function to extract content from Google Drive files as markdown. - Updated Google Drive indexer to utilize the new content extraction method. - Implemented document migration logic to update legacy Composio document types to their native Google types. - Introduced identifier hashing for stable document identification. - Improved file pre-filtering to handle unchanged and rename-only files efficiently.	2026-03-25 18:33:44 +05:30
Anish Sarkar	8122370cec	test: mark test_connector_document.py with unit pytest marker	2026-03-08 02:53:47 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	aabc24f82c	feat: enhance performance logging and caching in various components - Introduced slow callback logging in FastAPI to identify blocking calls. - Added performance logging for agent creation and tool loading processes. - Implemented caching for MCP tools to reduce redundant server calls. - Enhanced sandbox management with in-process caching for improved efficiency. - Refactored several functions for better readability and performance tracking. - Updated tests to ensure proper functionality of new features and optimizations.	2026-02-26 13:00:31 -08:00
CREDO23	0de74f4bf7	add docstrings to all indexing pipeline tests	2026-02-25 20:30:31 +02:00
CREDO23	cad400be1b	add file upload adapter and make index() return refreshed document	2026-02-25 19:56:59 +02:00
CREDO23	1b4ed35de3	fix: correct test fixtures and add missing summarizer tests	2026-02-25 11:15:48 +02:00
CREDO23	af22fa7c88	refactor: remove redundant and low-value tests, enforce connector_id and created_by_id constraints	2026-02-25 08:29:53 +02:00
CREDO23	5b616eac5a	fix: plug all gaps found in deep review of indexing pipeline	2026-02-25 02:20:44 +02:00
CREDO23	a0134a5830	test: add document hashing unit tests and clean up conftest mocks	2026-02-24 22:48:40 +02:00
CREDO23	d5e10bd8f9	test: add ConnectorDocument unit tests and factory fixture	2026-02-24 22:20:08 +02:00
CREDO23	10a6ba6924	test: bootstrap pytest environment for backend	2026-02-24 18:19:56 +02:00

25 commits