feat: add processing mode support for document uploads and ETL pipeline, improded error handling ux
Some checks are pending
Build and Push Docker Images / tag_release (push) Waiting to run
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions

- Introduced a `ProcessingMode` enum to differentiate between basic and premium processing modes.
- Updated `EtlRequest` to include a `processing_mode` field, defaulting to basic.
- Enhanced ETL pipeline services to utilize the selected processing mode for Azure Document Intelligence and LlamaCloud parsing.
- Modified various routes and services to handle processing mode, affecting document upload and indexing tasks.
- Improved error handling and logging to include processing mode details.
- Added tests to validate processing mode functionality and its impact on ETL operations.
This commit is contained in:
DESKTOP-RTLN3BA\$punk 2026-04-14 21:26:00 -07:00
parent b659f41bab
commit 656e061f84
104 changed files with 1900 additions and 909 deletions

View file

@ -124,6 +124,7 @@ async def create_documents_file_upload(
search_space_id: int = Form(...),
should_summarize: bool = Form(False),
use_vision_llm: bool = Form(False),
processing_mode: str = Form("basic"),
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user),
dispatcher: TaskDispatcher = Depends(get_task_dispatcher),
@ -142,12 +143,15 @@ async def create_documents_file_upload(
from datetime import datetime
from app.db import DocumentStatus
from app.etl_pipeline.etl_document import ProcessingMode
from app.tasks.document_processors.base import (
check_document_by_unique_identifier,
get_current_timestamp,
)
from app.utils.document_converters import generate_unique_identifier_hash
validated_mode = ProcessingMode.coerce(processing_mode)
try:
await check_permission(
session,
@ -274,6 +278,7 @@ async def create_documents_file_upload(
user_id=str(user.id),
should_summarize=should_summarize,
use_vision_llm=use_vision_llm,
processing_mode=validated_mode.value,
)
return {
@ -1493,6 +1498,7 @@ async def folder_upload(
root_folder_id: int | None = Form(None),
enable_summary: bool = Form(False),
use_vision_llm: bool = Form(False),
processing_mode: str = Form("basic"),
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user),
):
@ -1504,6 +1510,10 @@ async def folder_upload(
import json
import tempfile
from app.etl_pipeline.etl_document import ProcessingMode
validated_mode = ProcessingMode.coerce(processing_mode)
await check_permission(
session,
user,
@ -1558,6 +1568,7 @@ async def folder_upload(
watched_metadata = {
"watched": True,
"folder_path": folder_name,
"processing_mode": validated_mode.value,
}
existing_root = (
await session.execute(
@ -1621,6 +1632,7 @@ async def folder_upload(
enable_summary=enable_summary,
use_vision_llm=use_vision_llm,
file_mappings=list(file_mappings),
processing_mode=validated_mode.value,
)
return {