Commit graph

25 commits

Author SHA1 Message Date
CREDO23
aee0f1ef7d add persist_scratch_index unit tests 2026-06-17 14:59:24 +02:00
CREDO23
a8a1f01945 update index batch parallel tests 2026-06-17 14:59:24 +02:00
CREDO23
f82dedf712 feat(indexing): add pure chunk reconciler for content-addressed diffs
Greedy multiset match on chunk text decides which rows keep their embeddings,
which texts need embedding, and which rows are deleted. No DB, no embeddings;
fully unit-tested (reuse, head insert, middle edit, deletion, duplicates,
reorder, full rewrite).
2026-06-12 18:52:46 +02:00
CREDO23
91d947ff79 refactor(embedding-cache): rename index cache to embedding cache
The cached payload is the indexing pipeline's embeddings (markdown is
chunked then embedded), so "embedding cache" names the expensive output
directly and removes the "index" ambiguity (DB index vs vector index vs
indexing phase). Renames the service, settings, eligibility, eviction
task, metrics, config flags (INDEX_CACHE_* -> EMBEDDING_CACHE_*), object
prefix, and the table (index_cache_embedding_sets -> embedding_cache_sets)
with its constraint and indexes. Migration 161 renamed accordingly.
2026-06-12 17:00:01 +02:00
CREDO23
8cf578d965 test(index-cache): add unit tests and repoint embed/chunk patch targets 2026-06-12 16:48:18 +02:00
Anish Sarkar
e588782a9b refactor(tests): Update tests to remove summary references and adjust for embedding errors 2026-06-04 01:51:21 +05:30
Anish Sarkar
ddfe60c2f0 feat(tests): Update tests for summary-free indexing 2026-06-04 00:53:51 +05:30
Anish Sarkar
8de7d86d56 Merge remote-tracking branch 'upstream/dev' into fix/backend-tests 2026-05-16 19:40:01 +05:30
DESKTOP-RTLN3BA\$punk
9fb9778bd0 test: enhance index batch parallel tests to include hybrid chunker
Updated the test for the indexing pipeline to verify that both the standard and hybrid chunkers are called via asyncio.to_thread, ensuring non-blocking behavior. This change reflects the routing of non-code documents through the hybrid chunker, maintaining the event loop contract.
2026-05-15 18:02:04 -07:00
Anish Sarkar
9b926b3133 refactor: update test for index() to use chunk_text_hybrid 2026-05-13 00:22:43 +05:30
DESKTOP-RTLN3BA\$punk
2cc2d339e6 feat: made agent file sytem optimized 2026-03-28 16:39:46 -07:00
Anish Sarkar
4fd776e7ef feat: implement parallel indexing for Google Calendar and Gmail connectors
- Refactored Google Calendar and Gmail indexers to utilize the new `index_batch_parallel` method for concurrent document indexing, enhancing performance.
- Updated the indexing logic to replace serial processing with parallel execution, allowing for improved efficiency in handling multiple documents.
- Adjusted logging and error handling to accommodate the new parallel processing approach, ensuring robust operation during indexing.
- Enhanced unit tests to validate the functionality of the parallel indexing method and its integration with existing workflows.
2026-03-26 19:34:04 +05:30
Anish Sarkar
e5cb6bfacf feat: implement parallel document indexing in IndexingPipelineService
- Added `index_batch_parallel` method to enable concurrent indexing of documents with bounded concurrency, improving performance and efficiency.
- Refactored existing indexing logic to utilize `asyncio.to_thread` for non-blocking execution of embedding and chunking functions.
- Introduced unit tests to validate the functionality of the new parallel indexing method, ensuring robustness and error handling during document processing.
2026-03-26 19:33:49 +05:30
Anish Sarkar
8c41fd91ba feat: add integration tests for indexing pipeline components
- Introduced integration tests for Calendar, Drive, and Gmail indexers to ensure proper document creation and migration.
- Added tests for batch indexing functionality to validate the processing of multiple documents.
- Implemented tests for legacy document migration to verify updates to document types and hashes.
- Enhanced test coverage for the IndexingPipelineService to ensure robust functionality across various document types.
2026-03-25 18:34:02 +05:30
Anish Sarkar
f7b52470eb feat: enhance Google connectors indexing with content extraction and document migration
- Added `download_and_extract_content` function to extract content from Google Drive files as markdown.
- Updated Google Drive indexer to utilize the new content extraction method.
- Implemented document migration logic to update legacy Composio document types to their native Google types.
- Introduced identifier hashing for stable document identification.
- Improved file pre-filtering to handle unchanged and rename-only files efficiently.
2026-03-25 18:33:44 +05:30
Anish Sarkar
8122370cec test: mark test_connector_document.py with unit pytest marker 2026-03-08 02:53:47 +05:30
DESKTOP-RTLN3BA\$punk
aabc24f82c feat: enhance performance logging and caching in various components
- Introduced slow callback logging in FastAPI to identify blocking calls.
- Added performance logging for agent creation and tool loading processes.
- Implemented caching for MCP tools to reduce redundant server calls.
- Enhanced sandbox management with in-process caching for improved efficiency.
- Refactored several functions for better readability and performance tracking.
- Updated tests to ensure proper functionality of new features and optimizations.
2026-02-26 13:00:31 -08:00
CREDO23
0de74f4bf7 add docstrings to all indexing pipeline tests 2026-02-25 20:30:31 +02:00
CREDO23
cad400be1b add file upload adapter and make index() return refreshed document 2026-02-25 19:56:59 +02:00
CREDO23
1b4ed35de3 fix: correct test fixtures and add missing summarizer tests 2026-02-25 11:15:48 +02:00
CREDO23
af22fa7c88 refactor: remove redundant and low-value tests, enforce connector_id and created_by_id constraints 2026-02-25 08:29:53 +02:00
CREDO23
5b616eac5a fix: plug all gaps found in deep review of indexing pipeline 2026-02-25 02:20:44 +02:00
CREDO23
a0134a5830 test: add document hashing unit tests and clean up conftest mocks 2026-02-24 22:48:40 +02:00
CREDO23
d5e10bd8f9 test: add ConnectorDocument unit tests and factory fixture 2026-02-24 22:20:08 +02:00
CREDO23
10a6ba6924 test: bootstrap pytest environment for backend 2026-02-24 18:19:56 +02:00