feat: added celery and removed background_tasks for MQ's

- removed pre commit hooks - updated docker setup - updated github docker actions - updated docs
2026-06-08 20:25:19 +02:00 · 2025-10-20 00:30:00 -07:00 · 2025-10-20 00:30:00 -07:00 · c80bbfa867
commit c80bbfa867
parent 031dc055da
27 changed files with 1664 additions and 1038 deletions
--- a/.env.example
+++ b/.env.example
@ -1,3 +1,9 @@
+# Docker Specific Env's Only - Can skip if needed
+
+# Celery Config
+REDIS_PORT=6379
+FLOWER_PORT=5555
+
 # Frontend Configuration
 FRONTEND_PORT=3000
 NEXT_PUBLIC_API_URL=http://backend:8000
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -1,16 +1,13 @@
-<!--- Provide a general summary of your changes in the Title above -->
+<!--- Summarize your pull request in a few sentences -->

 ## Description
-<!--- Describe your changes in detail -->
+<!--- Clearly describe what has changed in this pull request -->

 ## Motivation and Context
 <!--- Why is this change required? What problem does it solve? -->
 <!--- If this PR relates to an open issue, please link to the issue here: FIX #123 -->
 FIX #

-## Changes Overview
-<!-- List the primary changes/improvements made in this PR -->
- 

 ## Screenshots
 <!-- If applicable, add screenshots or images to demonstrate the changes visually -->
@ -19,27 +16,26 @@ FIX #
 <!-- Document any API changes if applicable -->
 - [ ] This PR includes API changes

-## Types of changes
-<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance improvement (non-breaking change which enhances performance)
- [ ] Documentation update
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
+## Change Type
+<!--- Indicate what kind(s) of changes this PR includes: -->
+- [ ] Bug fix
+- [ ] New feature
+- [ ] Performance improvement
+- [ ] Refactoring
+- [ ] Documentation
+- [ ] Dependency/Build system
+- [ ] Breaking change
+- [ ] Other (specify):

-## Testing
-<!-- Describe the tests that have been run to verify your changes -->
- [ ] I have tested these changes locally
- [ ] I have added/updated unit tests
- [ ] I have added/updated integration tests
+## Testing Performed
+<!--- Briefly describe how you have tested these changes and what verification was performed -->
+- [ ] Tested locally
+- [ ] Manual/QA verification

-## Checklist:
-<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
-<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the code style of this project
- [ ] My change requires documentation updates
- [ ] I have updated the documentation accordingly
- [ ] My change requires dependency updates
- [ ] I have updated the dependencies accordingly
- [ ] My code builds clean without any errors or warnings
- [ ] All new and existing tests passed 
+## Checklist
+<!--- Please confirm the following by marking with an 'x' as appropriate -->
+- [ ] Follows project coding standards and conventions
+- [ ] Documentation updated as needed
+- [ ] Dependencies updated as needed
+- [ ] No lint/build errors or new warnings
+- [ ] All relevant tests are passing
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@ -4,40 +4,40 @@ on:
  workflow_dispatch:

 jobs:
-  build_and_push_backend:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: read
-      packages: write
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
+  # build_and_push_backend:
+  #   runs-on: ubuntu-latest
+  #   permissions:
+  #     contents: read
+  #     packages: write
+  #   steps:
+  #     - name: Checkout repository
+  #       uses: actions/checkout@v4

-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v3
+  #     - name: Set up QEMU
+  #       uses: docker/setup-qemu-action@v3

-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
+  #     - name: Set up Docker Buildx
+  #       uses: docker/setup-buildx-action@v3

-      - name: Log in to GitHub Container Registry
-        uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.actor }}
-          password: ${{ secrets.GITHUB_TOKEN }}
+  #     - name: Log in to GitHub Container Registry
+  #       uses: docker/login-action@v3
+  #       with:
+  #         registry: ghcr.io
+  #         username: ${{ github.actor }}
+  #         password: ${{ secrets.GITHUB_TOKEN }}

-      - name: Build and push backend image
-        uses: docker/build-push-action@v5
-        with:
-          context: ./surfsense_backend
-          file: ./surfsense_backend/Dockerfile
-          push: true
-          tags: ghcr.io/${{ github.repository_owner }}/surfsense_backend:${{ github.sha }}
-          platforms: linux/amd64,linux/arm64
-          labels: |
-            org.opencontainers.image.source=${{ github.repositoryUrl }}
-            org.opencontainers.image.created=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }}
-            org.opencontainers.image.revision=${{ github.sha }}
+  #     - name: Build and push backend image
+  #       uses: docker/build-push-action@v5
+  #       with:
+  #         context: ./surfsense_backend
+  #         file: ./surfsense_backend/Dockerfile
+  #         push: true
+  #         tags: ghcr.io/${{ github.repository_owner }}/surfsense_backend:${{ github.sha }}
+  #         platforms: linux/amd64,linux/arm64
+  #         labels: |
+  #           org.opencontainers.image.source=${{ github.repositoryUrl }}
+  #           org.opencontainers.image.created=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }}
+  #           org.opencontainers.image.revision=${{ github.sha }}

  build_and_push_frontend:
    runs-on: ubuntu-latest
--- a/.github/workflows/docker_build.yaml
+++ b/.github/workflows/docker_build.yaml
@ -124,52 +124,52 @@ jobs:
          git ls-remote --tags origin | grep "refs/tags/${{ steps.tag_version.outputs.next_version }}" || (echo "Tag push verification failed!" && exit 1)
          echo "Tag successfully pushed."
  
-  build_and_push_backend_image: 
-    runs-on: ubuntu-latest
-    needs: tag_release # Depends on the tag being created successfully
-    permissions:
-      packages: write # Need permission to write to GHCR
-      contents: read # Need permission to read repo contents (checkout)
+  # build_and_push_backend_image: 
+  #   runs-on: ubuntu-latest
+  #   needs: tag_release # Depends on the tag being created successfully
+  #   permissions:
+  #     packages: write # Need permission to write to GHCR
+  #     contents: read # Need permission to read repo contents (checkout)

-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
+  #   steps:
+  #     - name: Checkout code
+  #       uses: actions/checkout@v4

-      - name: Login to GitHub Container Registry
-        uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.repository_owner }}
-          password: ${{ secrets.GITHUB_TOKEN }}
+  #     - name: Login to GitHub Container Registry
+  #       uses: docker/login-action@v3
+  #       with:
+  #         registry: ghcr.io
+  #         username: ${{ github.repository_owner }}
+  #         password: ${{ secrets.GITHUB_TOKEN }}

-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v3
+  #     - name: Set up QEMU
+  #       uses: docker/setup-qemu-action@v3

-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
+  #     - name: Set up Docker Buildx
+  #       uses: docker/setup-buildx-action@v3

-      - name: Extract metadata (tags, labels) for Docker build
-        id: meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ghcr.io/${{ github.repository_owner }}/surfsense_backend
-          tags: |
-            # Use the tag generated in the previous job
-            type=raw,value=${{ needs.tag_release.outputs.new_tag }}
-            # Optionally add 'latest' tag if building from the default branch
-            type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) || github.event.inputs.branch == github.event.repository.default_branch }}
+  #     - name: Extract metadata (tags, labels) for Docker build
+  #       id: meta
+  #       uses: docker/metadata-action@v5
+  #       with:
+  #         images: ghcr.io/${{ github.repository_owner }}/surfsense_backend
+  #         tags: |
+  #           # Use the tag generated in the previous job
+  #           type=raw,value=${{ needs.tag_release.outputs.new_tag }}
+  #           # Optionally add 'latest' tag if building from the default branch
+  #           type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) || github.event.inputs.branch == github.event.repository.default_branch }}

-      - name: Build and push surfsense backend
-        uses: docker/build-push-action@v5
-        with:
-          context: ./surfsense_backend
-          push: true
-          tags: ${{ steps.meta.outputs.tags }}
-          labels: ${{ steps.meta.outputs.labels }}
-          platforms: linux/amd64,linux/arm64
-          # Optional: Add build cache for faster builds
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
+  #     - name: Build and push surfsense backend
+  #       uses: docker/build-push-action@v5
+  #       with:
+  #         context: ./surfsense_backend
+  #         push: true
+  #         tags: ${{ steps.meta.outputs.tags }}
+  #         labels: ${{ steps.meta.outputs.labels }}
+  #         platforms: linux/amd64,linux/arm64
+  #         # Optional: Add build cache for faster builds
+  #         cache-from: type=gha
+  #         cache-to: type=gha,mode=max

  build_and_push_ui_image: 
    runs-on: ubuntu-latest
--- a/.github/workflows/pre-commit.yml
+++ b/.github/workflows/pre-commit.yml
@ -1,59 +0,0 @@
-name: pre-commit
-
-on:
-  push:
-  pull_request:
-    branches: [main, dev]
-
-jobs:
-  pre-commit:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0  # Required for detecting diffs
-
-      - name: Fetch main branch
-        run: |
-          # Ensure we have the main branch reference for comparison
-          git fetch origin main:main 2>/dev/null || git fetch origin main 2>/dev/null || true
-
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.12'
-
-      - name: Cache pre-commit environments
-        uses: actions/cache@v4
-        with:
-          path: ~/.cache/pre-commit
-          key: pre-commit-${{ hashFiles('.pre-commit-config.yaml') }}
-          restore-keys: |
-            pre-commit-
-
-      - name: Install pre-commit
-        run: |
-          pip install pre-commit
-
-      - name: Install hook environments (cache)
-        run: |
-          pre-commit install-hooks
-
-      - name: Run pre-commit on changed files
-        run: |
-          # Use pre-commit's native diff detection with fallback strategies
-          if git show-ref --verify --quiet refs/heads/main; then
-            # Main branch exists locally, use pre-commit's native diff mode
-            echo "Running pre-commit with native diff detection against main branch"
-            pre-commit run --from-ref main --to-ref HEAD
-          elif git show-ref --verify --quiet refs/remotes/origin/main; then
-            # Origin/main exists, use it as reference
-            echo "Running pre-commit with native diff detection against origin/main"
-            pre-commit run --from-ref origin/main --to-ref HEAD
-          else
-            # Fallback: run on all files (for first commits or when main is unavailable)
-            echo "Main branch reference not found, running pre-commit on all files"
-            echo "⚠️  This may take longer and show more issues than normal"
-            pre-commit run --all-files
-          fi
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -1,95 +0,0 @@
-# Pre-commit configuration for SurfSense
-# See https://pre-commit.com for more information
-
-repos:
-  # General file quality hooks
-  - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v5.0.0
-    hooks:
-      - id: check-yaml
-        args: [--multi, --unsafe]
-      - id: check-json
-        exclude: '(tsconfig\.json|\.vscode/.*\.json)$'
-      - id: check-toml
-      - id: check-merge-conflict
-      - id: check-added-large-files
-        args: [--maxkb=10240]  # 10MB limit
-      - id: debug-statements
-      - id: check-case-conflict
-
-  # Security - detect secrets across all file types
-  - repo: https://github.com/Yelp/detect-secrets
-    rev: v1.5.0
-    hooks:
-      - id: detect-secrets
-        args: ['--baseline', '.secrets.baseline']
-        exclude: |
-          (?x)^(
-            .*\.env\.example|
-            .*\.env\.template|
-            .*/tests/.*|
-            .*test.*\.py|
-            test_.*\.py|
-            .github/workflows/.*\.yml|
-            .github/workflows/.*\.yaml|
-            .*pnpm-lock\.yaml|
-            .*alembic\.ini|
-            .*alembic/versions/.*\.py|
-            .*\.mdx$
-          )$
-
-  # Python Backend Hooks (surfsense_backend) - Using Ruff for linting and formatting
-  - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.12.5
-    hooks:
-      - id: ruff
-        name: ruff-check
-        files: ^surfsense_backend/
-        exclude: ^surfsense_backend/(test_.*\.py|.*test.*\.py)
-        args: [--fix]
-      - id: ruff-format
-        name: ruff-format
-        files: ^surfsense_backend/
-        exclude: ^surfsense_backend/(test_.*\.py|.*test.*\.py)
-
-  - repo: https://github.com/PyCQA/bandit
-    rev: 1.8.6
-    hooks:
-      - id: bandit
-        files: ^surfsense_backend/
-        args: ['-f', 'json', '--severity-level', 'high', '--confidence-level', 'high']
-        exclude: ^surfsense_backend/(tests/|test_.*\.py|.*test.*\.py|alembic/)
-
-  # Biome hooks for TypeScript/JavaScript projects
-  - repo: local
-    hooks:
-      # Biome check for surfsense_web
-      - id: biome-check-web
-        name: biome-check-web
-        entry: bash -c 'cd surfsense_web && npx @biomejs/biome check --diagnostic-level=error .'
-        language: system
-        files: ^surfsense_web/
-        pass_filenames: false
-        always_run: true
-        stages: [pre-commit]
-
-      # Biome check for surfsense_browser_extension
-      # - id: biome-check-extension
-      #   name: biome-check-extension
-      #   entry: bash -c 'cd surfsense_browser_extension && npx @biomejs/biome check --diagnostic-level=error .'
-      #   language: system
-      #   files: ^surfsense_browser_extension/
-      #   pass_filenames: false
-      #   always_run: true
-      #   stages: [pre-commit]
-
-  # Commit message linting
-  - repo: https://github.com/commitizen-tools/commitizen
-    rev: v4.8.3
-    hooks:
-      - id: commitizen
-        stages: [commit-msg]
-
-# Global configuration
-default_stages: [pre-commit]
-fail_fast: false
--- a/DEPLOYMENT_GUIDE.md
+++ b/DEPLOYMENT_GUIDE.md
@ -1,124 +0,0 @@
-# SurfSense Deployment Guide
-
-This guide explains the different deployment options available for SurfSense using Docker Compose.
-
-## Deployment Options
-
-SurfSense uses a flexible Docker Compose configuration that allows you to easily switch between deployment modes without manually editing files. Our approach uses Docker's built-in override functionality with two configuration files:
-
-1. **docker-compose.yml**: Contains essential core services (database and pgAdmin)
-2. **docker-compose.override.yml**: Contains application services (frontend and backend)
-
-This structure provides several advantages:
- No need to comment/uncomment services manually
- Clear separation between core infrastructure and application services
- Easy switching between development and production environments
-
-## Deployment Modes
-
-### Full Stack Mode (Development)
-
-This mode runs everything: frontend, backend, database, and pgAdmin. It's ideal for development environments where you need the complete application stack.
-
-```bash
-# Both files are automatically used (docker-compose.yml + docker-compose.override.yml)
-docker compose up -d
-```
-
-### Core Services Mode (Production)
-
-This mode runs only the database and pgAdmin services. It's suitable for production environments where you might want to deploy the frontend and backend separately or need to run database migrations.
-
-```bash
-# Explicitly use only the main file
-docker compose -f docker-compose.yml up -d
-```
-
-## Custom Deployment Options
-
-### Running Specific Services
-
-You can specify which services to start by naming them:
-
-```bash
-# Start only database
-docker compose up -d db
-
-# Start database and pgAdmin
-docker compose up -d db pgadmin
-
-# Start only backend (requires db to be running)
-docker compose up -d backend
-```
-
-### Using Custom Override Files
-
-You can create and use custom override files for different environments:
-
-```bash
-# Create a staging configuration
-docker compose -f docker-compose.yml -f docker-compose.staging.yml up -d
-```
-
-## Environment Variables
-
-The deployment can be customized using environment variables:
-
-```bash
-# Change default ports
-FRONTEND_PORT=4000 BACKEND_PORT=9000 docker compose up -d
-
-# Or use a .env file
-# Create or modify .env file with your desired values
-docker compose up -d
-```
-
-## Common Deployment Workflows
-
-### Initial Setup
-
-```bash
-# Clone the repository
-git clone https://github.com/MODSetter/SurfSense.git
-cd SurfSense
-
-# Copy example env files
-cp .env.example .env
-cp surfsense_backend/.env.example surfsense_backend/.env
-cp surfsense_web/.env.example surfsense_web/.env
-
-# Edit the .env files with your configuration
-
-# Start full stack for development
-docker compose up -d
-```
-
-### Database-Only Mode (for migrations or maintenance)
-
-```bash
-# Start just the database
-docker compose -f docker-compose.yml up -d db
-
-# Run migrations or maintenance tasks
-docker compose exec db psql -U postgres -d surfsense
-```
-
-### Scaling in Production
-
-For production deployments, you might want to:
-
-1. Run core services with Docker Compose
-2. Deploy frontend/backend with specialized services like Vercel, Netlify, or dedicated application servers
-
-This separation allows for better scaling and resource utilization in production environments.
-
-## Troubleshooting
-
-If you encounter issues with the deployment:
-
- Check container logs: `docker compose logs -f [service_name]`
- Ensure all required environment variables are set
- Verify network connectivity between containers
- Check that required ports are available and not blocked by firewalls
-
-For more detailed setup instructions, refer to [DOCKER_SETUP.md](DOCKER_SETUP.md). 
--- a/DOCKER_SETUP.md
+++ b/DOCKER_SETUP.md
@ -1,192 +0,0 @@
-# Docker Setup for SurfSense
-
-This document explains how to run the SurfSense project using Docker Compose.
-
-## Prerequisites
-
- Docker and Docker Compose installed on your machine
- Git (to clone the repository)
-
-## Environment Variables Configuration
-
-SurfSense Docker setup supports configuration through environment variables. You can set these variables in two ways:
-
-1. Create a `.env` file in the project root directory (copy from `.env.example`)
-2. Set environment variables directly in your shell before running Docker Compose
-
-The following environment variables are available:
-
-```
-# Frontend Configuration
-FRONTEND_PORT=3000
-NEXT_PUBLIC_API_URL=http://backend:8000
-
-# Backend Configuration
-BACKEND_PORT=8000
-
-# Database Configuration
-POSTGRES_USER=postgres
-POSTGRES_PASSWORD=postgres
-POSTGRES_DB=surfsense
-POSTGRES_PORT=5432
-
-# pgAdmin Configuration
-PGADMIN_PORT=5050
-PGADMIN_DEFAULT_EMAIL=admin@surfsense.com
-PGADMIN_DEFAULT_PASSWORD=surfsense
-```
-
-## Deployment Options
-
-SurfSense uses a flexible Docker Compose setup that allows you to choose between different deployment modes:
-
-### Option 1: Full-Stack Deployment (Development Mode)
-Includes frontend, backend, database, and pgAdmin. This is the default when running `docker compose up`.
-
-### Option 2: Core Services Only (Production Mode)
-Includes only database and pgAdmin, suitable for production environments where you might deploy frontend/backend separately.
-
-Our setup uses two files:
- `docker-compose.yml`: Contains core services (database and pgAdmin)
- `docker-compose.override.yml`: Contains application services (frontend and backend)
-
-## Setup
-
-1. Make sure you have all the necessary environment variables set up:
-   - Run `cp surfsense_backend/.env.example surfsense_backend/.env` to create .env file, and fill in the required values
-   - Run `cp surfsense_web/.env.example surfsense_web/.env` to create .env file, fill in the required values
-   - Optionally: Copy `.env.example` to `.env` in the project root to customize Docker settings
-
-2. Deploy based on your needs:
-
-   **Full Stack (Development Mode)**:
-   ```bash
-   # Both files are automatically used
-   docker compose up --build
-   ```
-
-   **Core Services Only (Production Mode)**:
-   ```bash
-   # Explicitly use only the main file
-   docker compose -f docker-compose.yml up --build
-   ```
-
-3. To run in detached mode (in the background):
-   ```bash
-   # Full stack
-   docker compose up -d
-   
-   # Core services only
-   docker compose -f docker-compose.yml up -d
-   ```
-
-4. Access the applications:
-   - Frontend: http://localhost:3000 (when using full stack)
-   - Backend API: http://localhost:8000 (when using full stack)
-   - API Documentation: http://localhost:8000/docs (when using full stack)
-   - pgAdmin: http://localhost:5050
-
-## Customizing the Deployment
-
-If you need to make temporary changes to either full stack or core services deployment, you can:
-
-1. **Temporarily disable override file**:
-   ```bash
-   docker compose -f docker-compose.yml up -d
-   ```
-
-2. **Use a custom override file**:
-   ```bash
-   docker compose -f docker-compose.yml -f custom-override.yml up -d
-   ```
-
-3. **Temporarily modify which services start**:
-   ```bash
-   docker compose up -d db pgadmin
-   ```
-
-## Useful Commands
-
- Stop the containers:
-  ```bash
-  docker compose down
-  ```
-
- View logs:
-  ```bash
-  # All services
-  docker compose logs -f
-  
-  # Specific service
-  docker compose logs -f backend
-  docker compose logs -f frontend
-  docker compose logs -f db
-  docker compose logs -f pgadmin
-  ```
-
- Restart a specific service:
-  ```bash
-  docker compose restart backend
-  ```
-
- Execute commands in a running container:
-  ```bash
-  # Backend
-  docker compose exec backend python -m pytest
-  
-  # Frontend
-  docker compose exec frontend pnpm lint
-  ```
-
-## Database
-
-The PostgreSQL database with pgvector extensions is available at:
- Host: localhost
- Port: 5432 (configurable via POSTGRES_PORT)
- Username: postgres (configurable via POSTGRES_USER)
- Password: postgres (configurable via POSTGRES_PASSWORD)
- Database: surfsense (configurable via POSTGRES_DB)
-
-You can connect to it using any PostgreSQL client or the included pgAdmin.
-
-## pgAdmin
-
-pgAdmin is a web-based administration tool for PostgreSQL. It is included in the Docker setup for easier database management.
-
- URL: http://localhost:5050 (configurable via PGADMIN_PORT)
- Default Email: admin@surfsense.com (configurable via PGADMIN_DEFAULT_EMAIL)
- Default Password: surfsense (configurable via PGADMIN_DEFAULT_PASSWORD)
-
-### Connecting to the Database in pgAdmin
-
-1. Log in to pgAdmin using the credentials above
-2. Right-click on "Servers" in the left sidebar and select "Create" > "Server"
-3. In the "General" tab, give your connection a name (e.g., "SurfSense DB")
-4. In the "Connection" tab, enter the following:
-   - Host: db
-   - Port: 5432
-   - Maintenance database: surfsense
-   - Username: postgres 
-   - Password: postgres
-5. Click "Save" to establish the connection
-
-## Troubleshooting
-
- If you encounter permission errors, you may need to run the docker commands with `sudo`.
- If ports are already in use, modify the port mappings in the `.env` file or directly in the `docker-compose.yml` file.
- For backend dependency issues, you may need to modify the `Dockerfile` in the backend directory.
- If you encounter frontend dependency errors, adjust the frontend's `Dockerfile` accordingly.
- If pgAdmin doesn't connect to the database, ensure you're using `db` as the hostname, not `localhost`, as that's the Docker network name. 
- If you need only specific services, you can explicitly name them: `docker compose up db pgadmin`
-
-## Understanding Docker Compose File Structure
-
-The project uses Docker's default override mechanism:
-
-1. **docker-compose.yml**: Contains essential services (database and pgAdmin)
-2. **docker-compose.override.yml**: Contains development services (frontend and backend)
-
-When you run `docker compose up` without additional flags, Docker automatically merges both files.
-When you run `docker compose -f docker-compose.yml up`, only the specified file is used.
-
-This approach lets you maintain a cleaner codebase without manually commenting/uncommenting services in your configuration files. 
--- a/PRE_COMMIT.md
+++ b/PRE_COMMIT.md
@ -1,237 +0,0 @@
-# Pre-commit Hooks for SurfSense Contributors
-
-Welcome to SurfSense! As an open-source project, we use pre-commit hooks to maintain code quality, security, and consistency across our multi-component codebase. This guide will help you set up and work with our pre-commit configuration.
-
-## 🚀 What is Pre-commit?
-
-Pre-commit is a framework for managing multi-language pre-commit hooks. It runs automatically before each commit to catch issues early, ensuring high code quality and consistency across the project.
-
-## 📁 Project Structure
-
-SurfSense consists of three main components:
- **`surfsense_backend/`** - Python backend API
- **`surfsense_web/`** - Next.js web application  
- **`surfsense_browser_extension/`** - TypeScript browser extension
-
-## 🛠 Installation
-
-### Prerequisites
- Python 3.8 or higher
- Node.js 18+ and pnpm (for frontend components)
- Git
-
-### Install Pre-commit
-
-```bash
-# Install pre-commit globally
-pip install pre-commit
-
-# Or using your preferred package manager
-# pipx install pre-commit  # Recommended for isolation
-```
-
-### Setup Pre-commit Hooks
-
-1. **Clone the repository**:
-   ```bash
-   git clone https://github.com/masabinhok/SurfSense.git
-   cd SurfSense
-   ```
-
-2. **Install the pre-commit hooks**:
-   ```bash
-   pre-commit install
-   ```
-
-3. **Install commit message hooks** (optional, for conventional commits):
-   ```bash
-   pre-commit install --hook-type commit-msg
-   ```
-
-## 🔧 Configuration Files Added
-
-When you install pre-commit, the following files are part of the setup:
-
- **`.pre-commit-config.yaml`** - Main pre-commit configuration
- **`.secrets.baseline`** - Baseline file for secret detection (prevents false positives)
- **`.github/workflows/pre-commit.yml`** - CI workflow that runs pre-commit on PRs
-
-## 🎯 What Gets Checked
-
-### All Files
- ✅ Trailing whitespace removal
- ✅ YAML, JSON, and TOML validation
- ✅ Large file detection (>10MB)
- ✅ Merge conflict markers
- 🔒 **Secret detection** using detect-secrets
-
-### Python Backend (`surfsense_backend/`)
- 🐍 **Black** - Code formatting
- 📦 **isort** - Import sorting
- ⚡ **Ruff** - Fast linting and formatting
- 🔍 **MyPy** - Static type checking
- 🛡️ **Bandit** - Security vulnerability scanning
-
-### Frontend (`surfsense_web/` & `surfsense_browser_extension/`)
- 💅 **Prettier** - Code formatting
- 🔍 **ESLint** - Linting (Next.js config)
- 📝 **TypeScript** - Compilation checks
-
-### Commit Messages
- 📝 **Commitizen** - Conventional commit format validation
-
-## 🚀 Usage
-
-### Normal Workflow
-Pre-commit will run automatically when you commit:
-
-```bash
-git add .
-git commit -m "feat: add new feature"
-# Pre-commit hooks will run automatically
-```
-
-### Manual Execution
-
-Run on staged files only:
-```bash
-pre-commit run
-```
-
-Run on specific files:
-```bash
-pre-commit run --files path/to/file.py path/to/file.ts
-```
-
-Run all hooks on all files:
-```bash
-pre-commit run --all-files
-```
-
-⚠️ **Warning**: Running `--all-files` may generate numerous errors as this codebase has existing linting and type issues that are being gradually resolved.
-
-### Advanced Commands
-
-Update all hooks to latest versions:
-```bash
-pre-commit autoupdate
-```
-
-Run only specific hooks:
-```bash
-pre-commit run black                    # Run only black
-pre-commit run --all-files prettier     # Run prettier on all files
-```
-
-Clean pre-commit cache:
-```bash
-pre-commit clean
-```
-
-## 🆘 Bypassing Pre-commit (When Necessary)
-
-Sometimes you might need to bypass pre-commit hooks (use sparingly!):
-
-### Skip all hooks for one commit:
-```bash
-git commit -m "fix: urgent hotfix" --no-verify
-```
-
-### Skip specific hooks:
-```bash
-SKIP=mypy,black git commit -m "feat: work in progress"
-```
-
-Available hook IDs to skip:
- `trailing-whitespace`, `check-yaml`, `check-json`
- `detect-secrets`
- `black`, `isort`, `ruff`, `ruff-format`, `mypy`, `bandit`  
- `prettier`, `eslint`
- `typescript-check-web`, `typescript-check-extension`
- `commitizen`
-
-## 🐛 Common Issues & Solutions
-
-### Secret Detection False Positives
-
-If detect-secrets flags legitimate content as secrets:
-
-1. **Review the detection** - Ensure it's not actually a secret
-2. **Update baseline**:
-   ```bash
-   detect-secrets scan --baseline .secrets.baseline --update
-   git add .secrets.baseline
-   ```
-
-### TypeScript/Node.js Issues
-
-Ensure dependencies are installed:
-```bash
-cd surfsense_web && pnpm install
-cd surfsense_browser_extension && pnpm install
-```
-
-### Python Environment Issues
-
-For Python hooks, ensure you're in the correct environment:
-```bash
-cd surfsense_backend
-# If using uv
-uv sync
-# Or traditional pip
-pip install -r requirements.txt
-```
-
-### Hook Installation Issues
-
-If hooks aren't running:
-```bash
-pre-commit uninstall
-pre-commit install --install-hooks
-```
-
-## 📊 Performance Tips
-
- **Incremental runs**: Pre-commit only runs on changed files by default
- **Parallel execution**: Many hooks run in parallel for speed
- **Caching**: Pre-commit caches environments to speed up subsequent runs
-
-## 🔄 CI Integration
-
-Pre-commit also runs in our GitHub Actions CI pipeline on every PR to `main`. The CI:
- Runs only on changed files for efficiency
- Provides the same feedback as local pre-commit
- Prevents merging code that doesn't pass quality checks
-
-## 📋 Best Practices
-
-1. **Install pre-commit early** in your development setup
-2. **Fix issues incrementally** rather than bypassing hooks
-3. **Update your branch regularly** to avoid conflicts with formatting changes
-4. **Run `--all-files` periodically** on feature branches (in small chunks)
-5. **Keep the `.secrets.baseline` updated** when legitimate secrets-like strings are added
-
-## 💡 Contributing to Pre-commit Config
-
-To modify the pre-commit configuration:
-
-1. Edit `.pre-commit-config.yaml`
-2. Test your changes:
-   ```bash
-   pre-commit run --all-files  # Test with caution!
-   ```
-3. Update the baseline if needed:
-   ```bash
-   detect-secrets scan --baseline .secrets.baseline --update
-   ```
-4. Submit a PR with your changes
-
-## 🆘 Getting Help
-
- **Pre-commit docs**: https://pre-commit.com/
- **Project issues**: Open an issue on GitHub
- **Hook-specific help**: Check individual tool documentation (Black, Ruff, ESLint, etc.)
-
---
-
-Thank you for contributing to SurfSense! 🏄‍♀️ Quality code makes everyone's surfing experience smoother.
--- a/README.md
+++ b/README.md
@ -156,7 +156,7 @@ SurfSense provides two installation methods:
 Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.

 Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including:
- PGVector setup
+- Auth setup
 - **File Processing ETL Service** (choose one):
  - Unstructured.io API key (supports 34+ formats)
  - LlamaIndex API key (enhanced parsing, supports 50+ formats)
--- a/docker-compose.override.yml
+++ b/docker-compose.override.yml
@ -1,34 +0,0 @@
-version: '3.8'
-
-services:
-  frontend:
-    image: ghcr.io/modsetter/surfsense_ui:latest
-    ports:
-      - "${FRONTEND_PORT:-3000}:3000"
-    volumes:
-      - ./surfsense_web:/app
-      - /app/node_modules
-    env_file:
-      - ./surfsense_web/.env
-    depends_on:
-      - backend
-    environment:
-      - NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL:-http://backend:8000}
-
-  backend:
-    image: ghcr.io/modsetter/surfsense_backend:latest
-    ports:
-      - "${BACKEND_PORT:-8000}:8000"
-    volumes:
-      - ./surfsense_backend:/app
-    depends_on:
-      - db
-    env_file:
-      - ./surfsense_backend/.env
-    environment:
-      - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-postgres}@db:5432/${POSTGRES_DB:-surfsense}
-      - PYTHONPATH=/app
-      - UVICORN_LOOP=asyncio
-      - UNSTRUCTURED_HAS_PATCHED_LOOP=1
-      - LANGCHAIN_TRACING_V2=false
-      - LANGSMITH_TRACING=false
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,4 +1,4 @@
-version: '3.8'
+version: "3.8"

 services:
  db:
@ -24,6 +24,89 @@ services:
    depends_on:
      - db

+  redis:
+    image: redis:7-alpine
+    ports:
+      - "${REDIS_PORT:-6379}:6379"
+    volumes:
+      - redis_data:/data
+    command: redis-server --appendonly yes
+
+  backend:
+    build: ./surfsense_backend
+    # image: ghcr.io/modsetter/surfsense_backend:latest
+    ports:
+      - "${BACKEND_PORT:-8000}:8000"
+    volumes:
+      - ./surfsense_backend:/app
+      - shared_temp:/tmp
+    env_file:
+      - ./surfsense_backend/.env
+    environment:
+      - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-postgres}@db:5432/${POSTGRES_DB:-surfsense}
+      - CELERY_BROKER_URL=redis://redis:${REDIS_PORT:-6379}/0
+      - CELERY_RESULT_BACKEND=redis://redis:${REDIS_PORT:-6379}/0
+      - PYTHONPATH=/app
+      - UVICORN_LOOP=asyncio
+      - UNSTRUCTURED_HAS_PATCHED_LOOP=1
+      - LANGCHAIN_TRACING_V2=false
+      - LANGSMITH_TRACING=false
+    depends_on:
+      - db
+      - redis
+
+  celery_worker:
+    build: ./surfsense_backend
+    # image: ghcr.io/modsetter/surfsense_backend:latest
+    command: celery -A app.celery_app worker --loglevel=info --concurrency=1 --pool=solo
+    volumes:
+      - ./surfsense_backend:/app
+      - shared_temp:/tmp
+    env_file:
+      - ./surfsense_backend/.env
+    environment:
+      - DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-postgres}@db:5432/${POSTGRES_DB:-surfsense}
+      - CELERY_BROKER_URL=redis://redis:${REDIS_PORT:-6379}/0
+      - CELERY_RESULT_BACKEND=redis://redis:${REDIS_PORT:-6379}/0
+      - PYTHONPATH=/app
+    depends_on:
+      - db
+      - redis
+      - backend
+
+  # flower:
+  #   build: ./surfsense_backend
+  #   # image: ghcr.io/modsetter/surfsense_backend:latest
+  #   command: celery -A app.celery_app flower --port=5555
+  #   ports:
+  #     - "${FLOWER_PORT:-5555}:5555"
+  #   env_file:
+  #     - ./surfsense_backend/.env
+  #   environment:
+  #     - CELERY_BROKER_URL=redis://redis:${REDIS_PORT:-6379}/0
+  #     - CELERY_RESULT_BACKEND=redis://redis:${REDIS_PORT:-6379}/0
+  #     - PYTHONPATH=/app
+  #   depends_on:
+  #     - redis
+  #     - celery_worker
+
+  frontend:
+    # build: ./surfsense_web
+    image: ghcr.io/modsetter/surfsense_ui:latest
+    ports:
+      - "${FRONTEND_PORT:-3000}:3000"
+    volumes:
+      - ./surfsense_web:/app
+      - /app/node_modules
+    env_file:
+      - ./surfsense_web/.env
+    environment:
+      - NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL:-http://backend:8000}
+    depends_on:
+      - backend
+
 volumes:
  postgres_data:
-  pgadmin_data: 
+  pgadmin_data:
+  redis_data:
+  shared_temp:
--- a/surfsense_backend/.env.example
+++ b/surfsense_backend/.env.example
@ -1,5 +1,9 @@
 DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense

+#Celery Config
+CELERY_BROKER_URL=redis://localhost:6379/0
+CELERY_RESULT_BACKEND=redis://localhost:6379/0
+
 SECRET_KEY=SECRET
 NEXT_FRONTEND_URL=http://localhost:3000

@ -17,7 +21,7 @@ AIRTABLE_CLIENT_SECRET=your_airtable_client_secret
 AIRTABLE_REDIRECT_URI=http://localhost:8000/api/v1/auth/airtable/connector/callback

 # Embedding Model
-EMBEDDING_MODEL=mixedbread-ai/mxbai-embed-large-v1
+EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

 RERANKERS_MODEL_NAME=ms-marco-MiniLM-L-12-v2
 RERANKERS_MODEL_TYPE=flashrank
--- a/surfsense_backend/app/celery_app.py
+++ b/surfsense_backend/app/celery_app.py
@ -0,0 +1,59 @@
+"""Celery application configuration and setup."""
+
+import os
+
+from celery import Celery
+from dotenv import load_dotenv
+
+# Load environment variables
+load_dotenv()
+
+# Get Celery configuration from environment
+CELERY_BROKER_URL = os.getenv("CELERY_BROKER_URL", "redis://localhost:6379/0")
+CELERY_RESULT_BACKEND = os.getenv("CELERY_RESULT_BACKEND", "redis://localhost:6379/0")
+
+# Create Celery app
+celery_app = Celery(
+    "surfsense",
+    broker=CELERY_BROKER_URL,
+    backend=CELERY_RESULT_BACKEND,
+    include=[
+        "app.tasks.celery_tasks.document_tasks",
+        "app.tasks.celery_tasks.podcast_tasks",
+        "app.tasks.celery_tasks.connector_tasks",
+    ],
+)
+
+# Celery configuration
+celery_app.conf.update(
+    # Task settings
+    task_serializer="json",
+    accept_content=["json"],
+    result_serializer="json",
+    timezone="UTC",
+    enable_utc=True,
+    # Task execution settings
+    task_track_started=True,
+    task_time_limit=3600,  # 1 hour hard limit
+    task_soft_time_limit=3000,  # 50 minutes soft limit
+    # Result backend settings
+    result_expires=86400,  # Results expire after 24 hours
+    result_extended=True,
+    # Worker settings
+    worker_prefetch_multiplier=1,
+    worker_max_tasks_per_child=1000,
+    # Retry settings
+    task_acks_late=True,
+    task_reject_on_worker_lost=True,
+    # Broker settings
+    broker_connection_retry_on_startup=True,
+)
+
+# Optional: Configure Celery Beat for periodic tasks
+celery_app.conf.beat_schedule = {
+    # Example: Add periodic tasks here if needed
+    # "periodic-task-name": {
+    #     "task": "app.tasks.celery_tasks.some_task",
+    #     "schedule": crontab(minute=0, hour=0),  # Run daily at midnight
+    # },
+}
--- a/surfsense_backend/app/routes/documents_routes.py
+++ b/surfsense_backend/app/routes/documents_routes.py
@ -1,7 +1,7 @@
 # Force asyncio to use standard event loop before unstructured imports
 import asyncio

-from fastapi import APIRouter, BackgroundTasks, Depends, Form, HTTPException, UploadFile
+from fastapi import APIRouter, Depends, Form, HTTPException, UploadFile
 from litellm import atranscription
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
@ -56,35 +56,41 @@ async def create_documents(
    request: DocumentsCreate,
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
-    fastapi_background_tasks: BackgroundTasks = BackgroundTasks(),
 ):
    try:
        # Check if the user owns the search space
        await check_ownership(session, SearchSpace, request.search_space_id, user)

        if request.document_type == DocumentType.EXTENSION:
+            from app.tasks.celery_tasks.document_tasks import (
+                process_extension_document_task,
+            )
+
            for individual_document in request.content:
-                fastapi_background_tasks.add_task(
-                    process_extension_document_with_new_session,
-                    individual_document,
-                    request.search_space_id,
-                    str(user.id),
+                # Convert document to dict for Celery serialization
+                document_dict = {
+                    "metadata": {
+                        "VisitedWebPageTitle": individual_document.metadata.VisitedWebPageTitle,
+                        "VisitedWebPageURL": individual_document.metadata.VisitedWebPageURL,
+                    },
+                    "content": individual_document.content,
+                }
+                process_extension_document_task.delay(
+                    document_dict, request.search_space_id, str(user.id)
                )
        elif request.document_type == DocumentType.CRAWLED_URL:
+            from app.tasks.celery_tasks.document_tasks import process_crawled_url_task
+
            for url in request.content:
-                fastapi_background_tasks.add_task(
-                    process_crawled_url_with_new_session,
-                    url,
-                    request.search_space_id,
-                    str(user.id),
+                process_crawled_url_task.delay(
+                    url, request.search_space_id, str(user.id)
                )
        elif request.document_type == DocumentType.YOUTUBE_VIDEO:
+            from app.tasks.celery_tasks.document_tasks import process_youtube_video_task
+
            for url in request.content:
-                fastapi_background_tasks.add_task(
-                    process_youtube_video_with_new_session,
-                    url,
-                    request.search_space_id,
-                    str(user.id),
+                process_youtube_video_task.delay(
+                    url, request.search_space_id, str(user.id)
                )
        else:
            raise HTTPException(status_code=400, detail="Invalid document type")
@ -106,7 +112,6 @@ async def create_documents_file_upload(
    search_space_id: int = Form(...),
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
-    fastapi_background_tasks: BackgroundTasks = BackgroundTasks(),
 ):
    try:
        await check_ownership(session, SearchSpace, search_space_id, user)
@ -131,12 +136,12 @@ async def create_documents_file_upload(
                with open(temp_path, "wb") as f:
                    f.write(content)

-                fastapi_background_tasks.add_task(
-                    process_file_in_background_with_new_session,
-                    temp_path,
-                    file.filename,
-                    search_space_id,
-                    str(user.id),
+                from app.tasks.celery_tasks.document_tasks import (
+                    process_file_upload_task,
+                )
+
+                process_file_upload_task.delay(
+                    temp_path, file.filename, search_space_id, str(user.id)
                )
            except Exception as e:
                raise HTTPException(
--- a/surfsense_backend/app/routes/podcasts_routes.py
+++ b/surfsense_backend/app/routes/podcasts_routes.py
@ -1,7 +1,7 @@
 import os
 from pathlib import Path

-from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
+from fastapi import APIRouter, Depends, HTTPException
 from fastapi.responses import StreamingResponse
 from sqlalchemy.exc import IntegrityError, SQLAlchemyError
 from sqlalchemy.ext.asyncio import AsyncSession
@ -176,7 +176,6 @@ async def generate_podcast(
    request: PodcastGenerateRequest,
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
-    fastapi_background_tasks: BackgroundTasks = BackgroundTasks(),
 ):
    try:
        # Check if the user owns the search space
@ -205,14 +204,14 @@ async def generate_podcast(
                    detail="One or more chat IDs do not belong to this user or search space",
                )

-            # Only add a single task with the first chat ID
+            from app.tasks.celery_tasks.podcast_tasks import (
+                generate_chat_podcast_task,
+            )
+
+            # Add Celery tasks for each chat ID
            for chat_id in valid_chat_ids:
-                fastapi_background_tasks.add_task(
-                    generate_chat_podcast_with_new_session,
-                    chat_id,
-                    request.search_space_id,
-                    request.podcast_title,
-                    user.id,
+                generate_chat_podcast_task.delay(
+                    chat_id, request.search_space_id, request.podcast_title, user.id
                )

        return {
--- a/surfsense_backend/app/routes/search_source_connectors_routes.py
+++ b/surfsense_backend/app/routes/search_source_connectors_routes.py
@ -14,7 +14,7 @@ import logging
 from datetime import datetime, timedelta
 from typing import Any

-from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Query
+from fastapi import APIRouter, Depends, HTTPException, Query
 from pydantic import BaseModel, Field, ValidationError
 from sqlalchemy.exc import IntegrityError
 from sqlalchemy.ext.asyncio import AsyncSession
@ -351,7 +351,6 @@ async def index_connector_content(
    ),
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
-    background_tasks: BackgroundTasks = None,
 ):
    """
    Index content from a connector to a search space.
@ -409,107 +408,83 @@ async def index_connector_content(
        indexing_to = end_date if end_date else today_str

        if connector.connector_type == SearchSourceConnectorType.SLACK_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_slack_messages_task,
+            )
+
            logger.info(
                f"Triggering Slack indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_slack_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_slack_messages_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Slack indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.NOTION_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import index_notion_pages_task
+
            logger.info(
                f"Triggering Notion indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_notion_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_notion_pages_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Notion indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import index_github_repos_task
+
            logger.info(
                f"Triggering GitHub indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_github_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_github_repos_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "GitHub indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import index_linear_issues_task
+
            logger.info(
                f"Triggering Linear indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_linear_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_linear_issues_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Linear indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.JIRA_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import index_jira_issues_task
+
            logger.info(
                f"Triggering Jira indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_jira_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_jira_issues_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Jira indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.CONFLUENCE_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_confluence_pages_task,
+            )
+
            logger.info(
                f"Triggering Confluence indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_confluence_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_confluence_pages_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Confluence indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.CLICKUP_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import index_clickup_tasks_task
+
            logger.info(
                f"Triggering ClickUp indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_clickup_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_clickup_tasks_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "ClickUp indexing started in the background."

@ -517,77 +492,65 @@ async def index_connector_content(
            connector.connector_type
            == SearchSourceConnectorType.GOOGLE_CALENDAR_CONNECTOR
        ):
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_google_calendar_events_task,
+            )
+
            logger.info(
                f"Triggering Google Calendar indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_google_calendar_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_google_calendar_events_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Google Calendar indexing started in the background."
        elif connector.connector_type == SearchSourceConnectorType.AIRTABLE_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_airtable_records_task,
+            )
+
            logger.info(
                f"Triggering Airtable indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_airtable_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_airtable_records_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Airtable indexing started in the background."
        elif (
            connector.connector_type == SearchSourceConnectorType.GOOGLE_GMAIL_CONNECTOR
        ):
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_google_gmail_messages_task,
+            )
+
            logger.info(
                f"Triggering Google Gmail indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_google_gmail_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_google_gmail_messages_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Google Gmail indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.DISCORD_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_discord_messages_task,
+            )
+
            logger.info(
                f"Triggering Discord indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_discord_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_discord_messages_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Discord indexing started in the background."

        elif connector.connector_type == SearchSourceConnectorType.LUMA_CONNECTOR:
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import index_luma_events_task
+
            logger.info(
                f"Triggering Luma indexing for connector {connector_id} into search space {search_space_id} from {indexing_from} to {indexing_to}"
            )
-            background_tasks.add_task(
-                run_luma_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_luma_events_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Luma indexing started in the background."

@ -595,17 +558,15 @@ async def index_connector_content(
            connector.connector_type
            == SearchSourceConnectorType.ELASTICSEARCH_CONNECTOR
        ):
-            # Run indexing in background
+            from app.tasks.celery_tasks.connector_tasks import (
+                index_elasticsearch_documents_task,
+            )
+
            logger.info(
                f"Triggering Elasticsearch indexing for connector {connector_id} into search space {search_space_id}"
            )
-            background_tasks.add_task(
-                run_elasticsearch_indexing_with_new_session,
-                connector_id,
-                search_space_id,
-                str(user.id),
-                indexing_from,
-                indexing_to,
+            index_elasticsearch_documents_task.delay(
+                connector_id, search_space_id, str(user.id), indexing_from, indexing_to
            )
            response_message = "Elasticsearch indexing started in the background."

--- a/surfsense_backend/app/tasks/celery_tasks/init.py
+++ b/surfsense_backend/app/tasks/celery_tasks/init.py
@ -0,0 +1 @@
+"""Celery tasks package."""
--- a/surfsense_backend/app/tasks/celery_tasks/connector_tasks.py
+++ b/surfsense_backend/app/tasks/celery_tasks/connector_tasks.py
@ -0,0 +1,589 @@
+"""Celery tasks for connector indexing."""
+
+import logging
+
+from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
+from sqlalchemy.pool import NullPool
+
+from app.celery_app import celery_app
+from app.config import config
+
+logger = logging.getLogger(__name__)
+
+
+def get_celery_session_maker():
+    """
+    Create a new async session maker for Celery tasks.
+    This is necessary because Celery tasks run in a new event loop,
+    and the default session maker is bound to the main app's event loop.
+    """
+    engine = create_async_engine(
+        config.DATABASE_URL,
+        poolclass=NullPool,  # Don't use connection pooling for Celery tasks
+        echo=False,
+    )
+    return async_sessionmaker(engine, expire_on_commit=False)
+
+
+@celery_app.task(name="index_slack_messages", bind=True)
+def index_slack_messages_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Slack messages."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_slack_messages(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_slack_messages(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Slack messages with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_slack_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_slack_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_notion_pages", bind=True)
+def index_notion_pages_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Notion pages."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_notion_pages(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_notion_pages(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Notion pages with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_notion_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_notion_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_github_repos", bind=True)
+def index_github_repos_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index GitHub repositories."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_github_repos(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_github_repos(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index GitHub repositories with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_github_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_github_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_linear_issues", bind=True)
+def index_linear_issues_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Linear issues."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_linear_issues(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_linear_issues(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Linear issues with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_linear_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_linear_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_jira_issues", bind=True)
+def index_jira_issues_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Jira issues."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_jira_issues(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_jira_issues(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Jira issues with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_jira_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_jira_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_confluence_pages", bind=True)
+def index_confluence_pages_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Confluence pages."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_confluence_pages(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_confluence_pages(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Confluence pages with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_confluence_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_confluence_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_clickup_tasks", bind=True)
+def index_clickup_tasks_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index ClickUp tasks."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_clickup_tasks(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_clickup_tasks(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index ClickUp tasks with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_clickup_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_clickup_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_google_calendar_events", bind=True)
+def index_google_calendar_events_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Google Calendar events."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_google_calendar_events(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_google_calendar_events(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Google Calendar events with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_google_calendar_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_google_calendar_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_airtable_records", bind=True)
+def index_airtable_records_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Airtable records."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_airtable_records(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_airtable_records(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Airtable records with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_airtable_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_airtable_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_google_gmail_messages", bind=True)
+def index_google_gmail_messages_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Google Gmail messages."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_google_gmail_messages(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_google_gmail_messages(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Google Gmail messages with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_google_gmail_indexing,
+    )
+
+    # Parse dates to get max_messages and days_back
+    # For now, we'll use default values
+    max_messages = 100
+    days_back = 30
+
+    async with get_celery_session_maker()() as session:
+        await run_google_gmail_indexing(
+            session, connector_id, search_space_id, user_id, max_messages, days_back
+        )
+
+
+@celery_app.task(name="index_discord_messages", bind=True)
+def index_discord_messages_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Discord messages."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_discord_messages(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_discord_messages(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Discord messages with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_discord_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_discord_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_luma_events", bind=True)
+def index_luma_events_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Luma events."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_luma_events(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_luma_events(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Luma events with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_luma_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_luma_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
+
+
+@celery_app.task(name="index_elasticsearch_documents", bind=True)
+def index_elasticsearch_documents_task(
+    self,
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Celery task to index Elasticsearch documents."""
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_elasticsearch_documents(
+                connector_id, search_space_id, user_id, start_date, end_date
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_elasticsearch_documents(
+    connector_id: int,
+    search_space_id: int,
+    user_id: str,
+    start_date: str,
+    end_date: str,
+):
+    """Index Elasticsearch documents with new session."""
+    from app.routes.search_source_connectors_routes import (
+        run_elasticsearch_indexing,
+    )
+
+    async with get_celery_session_maker()() as session:
+        await run_elasticsearch_indexing(
+            session, connector_id, search_space_id, user_id, start_date, end_date
+        )
--- a/surfsense_backend/app/tasks/celery_tasks/document_tasks.py
+++ b/surfsense_backend/app/tasks/celery_tasks/document_tasks.py
@ -0,0 +1,318 @@
+"""Celery tasks for document processing."""
+
+import logging
+
+from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
+from sqlalchemy.pool import NullPool
+
+from app.celery_app import celery_app
+from app.config import config
+from app.services.task_logging_service import TaskLoggingService
+from app.tasks.document_processors import (
+    add_crawled_url_document,
+    add_extension_received_document,
+    add_youtube_video_document,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def get_celery_session_maker():
+    """
+    Create a new async session maker for Celery tasks.
+    This is necessary because Celery tasks run in a new event loop,
+    and the default session maker is bound to the main app's event loop.
+    """
+    engine = create_async_engine(
+        config.DATABASE_URL,
+        poolclass=NullPool,  # Don't use connection pooling for Celery tasks
+        echo=False,
+    )
+    return async_sessionmaker(engine, expire_on_commit=False)
+
+
+@celery_app.task(name="process_extension_document", bind=True)
+def process_extension_document_task(
+    self, individual_document_dict, search_space_id: int, user_id: str
+):
+    """
+    Celery task to process extension document.
+
+    Args:
+        individual_document_dict: Document data as dictionary
+        search_space_id: ID of the search space
+        user_id: ID of the user
+    """
+    import asyncio
+
+    # Create a new event loop for this task
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _process_extension_document(
+                individual_document_dict, search_space_id, user_id
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _process_extension_document(
+    individual_document_dict, search_space_id: int, user_id: str
+):
+    """Process extension document with new session."""
+    from pydantic import BaseModel
+
+    # Reconstruct the document object from dict
+    # You'll need to define the proper model for this
+    class DocumentMetadata(BaseModel):
+        VisitedWebPageTitle: str
+        VisitedWebPageURL: str
+
+    class IndividualDocument(BaseModel):
+        metadata: DocumentMetadata
+        content: str
+
+    individual_document = IndividualDocument(**individual_document_dict)
+
+    async with get_celery_session_maker()() as session:
+        task_logger = TaskLoggingService(session, search_space_id)
+
+        log_entry = await task_logger.log_task_start(
+            task_name="process_extension_document",
+            source="document_processor",
+            message=f"Starting processing of extension document from {individual_document.metadata.VisitedWebPageTitle}",
+            metadata={
+                "document_type": "EXTENSION",
+                "url": individual_document.metadata.VisitedWebPageURL,
+                "title": individual_document.metadata.VisitedWebPageTitle,
+                "user_id": user_id,
+            },
+        )
+
+        try:
+            result = await add_extension_received_document(
+                session, individual_document, search_space_id, user_id
+            )
+
+            if result:
+                await task_logger.log_task_success(
+                    log_entry,
+                    f"Successfully processed extension document: {individual_document.metadata.VisitedWebPageTitle}",
+                    {"document_id": result.id, "content_hash": result.content_hash},
+                )
+            else:
+                await task_logger.log_task_success(
+                    log_entry,
+                    f"Extension document already exists (duplicate): {individual_document.metadata.VisitedWebPageTitle}",
+                    {"duplicate_detected": True},
+                )
+        except Exception as e:
+            await task_logger.log_task_failure(
+                log_entry,
+                f"Failed to process extension document: {individual_document.metadata.VisitedWebPageTitle}",
+                str(e),
+                {"error_type": type(e).__name__},
+            )
+            logger.error(f"Error processing extension document: {e!s}")
+            raise
+
+
+@celery_app.task(name="process_crawled_url", bind=True)
+def process_crawled_url_task(self, url: str, search_space_id: int, user_id: str):
+    """
+    Celery task to process crawled URL.
+
+    Args:
+        url: URL to crawl and process
+        search_space_id: ID of the search space
+        user_id: ID of the user
+    """
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(_process_crawled_url(url, search_space_id, user_id))
+    finally:
+        loop.close()
+
+
+async def _process_crawled_url(url: str, search_space_id: int, user_id: str):
+    """Process crawled URL with new session."""
+    async with get_celery_session_maker()() as session:
+        task_logger = TaskLoggingService(session, search_space_id)
+
+        log_entry = await task_logger.log_task_start(
+            task_name="process_crawled_url",
+            source="document_processor",
+            message=f"Starting URL crawling and processing for: {url}",
+            metadata={"document_type": "CRAWLED_URL", "url": url, "user_id": user_id},
+        )
+
+        try:
+            result = await add_crawled_url_document(
+                session, url, search_space_id, user_id
+            )
+
+            if result:
+                await task_logger.log_task_success(
+                    log_entry,
+                    f"Successfully crawled and processed URL: {url}",
+                    {
+                        "document_id": result.id,
+                        "title": result.title,
+                        "content_hash": result.content_hash,
+                    },
+                )
+            else:
+                await task_logger.log_task_success(
+                    log_entry,
+                    f"URL document already exists (duplicate): {url}",
+                    {"duplicate_detected": True},
+                )
+        except Exception as e:
+            await task_logger.log_task_failure(
+                log_entry,
+                f"Failed to crawl URL: {url}",
+                str(e),
+                {"error_type": type(e).__name__},
+            )
+            logger.error(f"Error processing crawled URL: {e!s}")
+            raise
+
+
+@celery_app.task(name="process_youtube_video", bind=True)
+def process_youtube_video_task(self, url: str, search_space_id: int, user_id: str):
+    """
+    Celery task to process YouTube video.
+
+    Args:
+        url: YouTube video URL
+        search_space_id: ID of the search space
+        user_id: ID of the user
+    """
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(_process_youtube_video(url, search_space_id, user_id))
+    finally:
+        loop.close()
+
+
+async def _process_youtube_video(url: str, search_space_id: int, user_id: str):
+    """Process YouTube video with new session."""
+    async with get_celery_session_maker()() as session:
+        task_logger = TaskLoggingService(session, search_space_id)
+
+        log_entry = await task_logger.log_task_start(
+            task_name="process_youtube_video",
+            source="document_processor",
+            message=f"Starting YouTube video processing for: {url}",
+            metadata={"document_type": "YOUTUBE_VIDEO", "url": url, "user_id": user_id},
+        )
+
+        try:
+            result = await add_youtube_video_document(
+                session, url, search_space_id, user_id
+            )
+
+            if result:
+                await task_logger.log_task_success(
+                    log_entry,
+                    f"Successfully processed YouTube video: {result.title}",
+                    {
+                        "document_id": result.id,
+                        "video_id": result.document_metadata.get("video_id"),
+                        "content_hash": result.content_hash,
+                    },
+                )
+            else:
+                await task_logger.log_task_success(
+                    log_entry,
+                    f"YouTube video document already exists (duplicate): {url}",
+                    {"duplicate_detected": True},
+                )
+        except Exception as e:
+            await task_logger.log_task_failure(
+                log_entry,
+                f"Failed to process YouTube video: {url}",
+                str(e),
+                {"error_type": type(e).__name__},
+            )
+            logger.error(f"Error processing YouTube video: {e!s}")
+            raise
+
+
+@celery_app.task(name="process_file_upload", bind=True)
+def process_file_upload_task(
+    self, file_path: str, filename: str, search_space_id: int, user_id: str
+):
+    """
+    Celery task to process uploaded file.
+
+    Args:
+        file_path: Path to the uploaded file
+        filename: Original filename
+        search_space_id: ID of the search space
+        user_id: ID of the user
+    """
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _process_file_upload(file_path, filename, search_space_id, user_id)
+        )
+    finally:
+        loop.close()
+
+
+async def _process_file_upload(
+    file_path: str, filename: str, search_space_id: int, user_id: str
+):
+    """Process file upload with new session."""
+    from app.routes.documents_routes import process_file_in_background
+
+    async with get_celery_session_maker()() as session:
+        task_logger = TaskLoggingService(session, search_space_id)
+
+        log_entry = await task_logger.log_task_start(
+            task_name="process_file_upload",
+            source="document_processor",
+            message=f"Starting file processing for: {filename}",
+            metadata={
+                "document_type": "FILE",
+                "filename": filename,
+                "file_path": file_path,
+                "user_id": user_id,
+            },
+        )
+
+        try:
+            await process_file_in_background(
+                file_path,
+                filename,
+                search_space_id,
+                user_id,
+                session,
+                task_logger,
+                log_entry,
+            )
+        except Exception as e:
+            await task_logger.log_task_failure(
+                log_entry,
+                f"Failed to process file: {filename}",
+                str(e),
+                {"error_type": type(e).__name__},
+            )
+            logger.error(f"Error processing file: {e!s}")
+            raise
--- a/surfsense_backend/app/tasks/celery_tasks/podcast_tasks.py
+++ b/surfsense_backend/app/tasks/celery_tasks/podcast_tasks.py
@ -0,0 +1,66 @@
+"""Celery tasks for podcast generation."""
+
+import logging
+
+from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
+from sqlalchemy.pool import NullPool
+
+from app.celery_app import celery_app
+from app.config import config
+from app.tasks.podcast_tasks import generate_chat_podcast
+
+logger = logging.getLogger(__name__)
+
+
+def get_celery_session_maker():
+    """
+    Create a new async session maker for Celery tasks.
+    This is necessary because Celery tasks run in a new event loop,
+    and the default session maker is bound to the main app's event loop.
+    """
+    engine = create_async_engine(
+        config.DATABASE_URL,
+        poolclass=NullPool,  # Don't use connection pooling for Celery tasks
+        echo=False,
+    )
+    return async_sessionmaker(engine, expire_on_commit=False)
+
+
+@celery_app.task(name="generate_chat_podcast", bind=True)
+def generate_chat_podcast_task(
+    self, chat_id: int, search_space_id: int, podcast_title: str, user_id: int
+):
+    """
+    Celery task to generate podcast from chat.
+
+    Args:
+        chat_id: ID of the chat to generate podcast from
+        search_space_id: ID of the search space
+        podcast_title: Title for the podcast
+        user_id: ID of the user
+    """
+    import asyncio
+
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _generate_chat_podcast(chat_id, search_space_id, podcast_title, user_id)
+        )
+    finally:
+        loop.close()
+
+
+async def _generate_chat_podcast(
+    chat_id: int, search_space_id: int, podcast_title: str, user_id: int
+):
+    """Generate chat podcast with new session."""
+    async with get_celery_session_maker()() as session:
+        try:
+            await generate_chat_podcast(
+                session, chat_id, search_space_id, podcast_title, user_id
+            )
+        except Exception as e:
+            logger.error(f"Error generating podcast from chat: {e!s}")
+            raise
--- a/surfsense_backend/celery_worker.py
+++ b/surfsense_backend/celery_worker.py
@ -0,0 +1,13 @@
+"""Celery worker startup script."""
+
+import os
+import sys
+
+# Add the app directory to the Python path
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "app"))
+
+from app.celery_app import celery_app
+
+if __name__ == "__main__":
+    # Start the Celery worker
+    celery_app.start()
--- a/surfsense_backend/pyproject.toml
+++ b/surfsense_backend/pyproject.toml
@ -45,6 +45,9 @@ dependencies = [
    "langchain-litellm>=0.2.3",
    "elasticsearch>=9.1.1",
    "faster-whisper>=1.1.0",
+    "celery[redis]>=5.5.3",
+    "flower>=2.0.1",
+    "redis>=5.2.1",
 ]

 [dependency-groups]
--- a/surfsense_backend/uv.lock
+++ b/surfsense_backend/uv.lock
@ -147,6 +147,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/dd/e2/88e425adac5ad887a087c38d04fe2030010572a3e0e627f8a6e8c33eeda8/alembic-1.16.2-py3-none-any.whl", hash = "sha256:5f42e9bd0afdbd1d5e3ad856c01754530367debdebf21ed6894e34af52b3bb03", size = 242717 },
 ]

+[[package]]
+name = "amqp"
+version = "5.3.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "vine" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/79/fc/ec94a357dfc6683d8c86f8b4cfa5416a4c36b28052ec8260c77aca96a443/amqp-5.3.1.tar.gz", hash = "sha256:cddc00c725449522023bad949f70fff7b48f0b1ade74d170a6f10ab044739432", size = 129013 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/26/99/fc813cd978842c26c82534010ea849eee9ab3a13ea2b74e95cb9c99e747b/amqp-5.3.1-py3-none-any.whl", hash = "sha256:43b3319e1b4e7d1251833a93d672b4af1e40f3d632d479b98661a95f117880a2", size = 50944 },
+]
+
 [[package]]
 name = "annotated-types"
 version = "0.7.0"
@ -415,6 +427,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/50/cd/30110dc0ffcf3b131156077b90e9f60ed75711223f306da4db08eff8403b/beautifulsoup4-4.13.4-py3-none-any.whl", hash = "sha256:9bbbb14bfde9d79f38b8cd5f8c7c85f4b8f2523190ebed90e950a8dea4cb1c4b", size = 187285 },
 ]

+[[package]]
+name = "billiard"
+version = "4.2.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b9/6a/1405343016bce8354b29d90aad6b0bf6485b5e60404516e4b9a3a9646cf0/billiard-4.2.2.tar.gz", hash = "sha256:e815017a062b714958463e07ba15981d802dc53d41c5b69d28c5a7c238f8ecf3", size = 155592 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a6/80/ef8dff49aae0e4430f81842f7403e14e0ca59db7bbaf7af41245b67c6b25/billiard-4.2.2-py3-none-any.whl", hash = "sha256:4bc05dcf0d1cc6addef470723aac2a6232f3c7ed7475b0b580473a9145829457", size = 86896 },
+]
+
 [[package]]
 name = "blis"
 version = "1.3.0"
@ -476,6 +497,30 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/9e/96/d32b941a501ab566a16358d68b6eb4e4acc373fab3c3c4d7d9e649f7b4bb/catalogue-2.0.10-py3-none-any.whl", hash = "sha256:58c2de0020aa90f4a2da7dfad161bf7b3b054c86a5f09fcedc0b2b740c109a9f", size = 17325 },
 ]

+[[package]]
+name = "celery"
+version = "5.5.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "billiard" },
+    { name = "click" },
+    { name = "click-didyoumean" },
+    { name = "click-plugins" },
+    { name = "click-repl" },
+    { name = "kombu" },
+    { name = "python-dateutil" },
+    { name = "vine" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/bb/7d/6c289f407d219ba36d8b384b42489ebdd0c84ce9c413875a8aae0c85f35b/celery-5.5.3.tar.gz", hash = "sha256:6c972ae7968c2b5281227f01c3a3f984037d21c5129d07bf3550cc2afc6b10a5", size = 1667144 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c9/af/0dcccc7fdcdf170f9a1585e5e96b6fb0ba1749ef6be8c89a6202284759bd/celery-5.5.3-py3-none-any.whl", hash = "sha256:0b5761a07057acee94694464ca482416b959568904c9dfa41ce8413a7d65d525", size = 438775 },
+]
+
+[package.optional-dependencies]
+redis = [
+    { name = "kombu", extra = ["redis"] },
+]
+
 [[package]]
 name = "certifi"
 version = "2025.6.15"
@ -657,6 +702,43 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/85/32/10bb5764d90a8eee674e9dc6f4db6a0ab47c8c4d0d83c27f7c39ac415a4d/click-8.2.1-py3-none-any.whl", hash = "sha256:61a3265b914e850b85317d0b3109c7f8cd35a670f963866005d6ef1d5175a12b", size = 102215 },
 ]

+[[package]]
+name = "click-didyoumean"
+version = "0.3.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "click" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/30/ce/217289b77c590ea1e7c24242d9ddd6e249e52c795ff10fac2c50062c48cb/click_didyoumean-0.3.1.tar.gz", hash = "sha256:4f82fdff0dbe64ef8ab2279bd6aa3f6a99c3b28c05aa09cbfc07c9d7fbb5a463", size = 3089 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1b/5b/974430b5ffdb7a4f1941d13d83c64a0395114503cc357c6b9ae4ce5047ed/click_didyoumean-0.3.1-py3-none-any.whl", hash = "sha256:5c4bb6007cfea5f2fd6583a2fb6701a22a41eb98957e63d0fac41c10e7c3117c", size = 3631 },
+]
+
+[[package]]
+name = "click-plugins"
+version = "1.1.1.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "click" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c3/a4/34847b59150da33690a36da3681d6bbc2ec14ee9a846bc30a6746e5984e4/click_plugins-1.1.1.2.tar.gz", hash = "sha256:d7af3984a99d243c131aa1a828331e7630f4a88a9741fd05c927b204bcf92261", size = 8343 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/3d/9a/2abecb28ae875e39c8cad711eb1186d8d14eab564705325e77e4e6ab9ae5/click_plugins-1.1.1.2-py2.py3-none-any.whl", hash = "sha256:008d65743833ffc1f5417bf0e78e8d2c23aab04d9745ba817bd3e71b0feb6aa6", size = 11051 },
+]
+
+[[package]]
+name = "click-repl"
+version = "0.3.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "click" },
+    { name = "prompt-toolkit" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/cb/a2/57f4ac79838cfae6912f997b4d1a64a858fb0c86d7fcaae6f7b58d267fca/click-repl-0.3.0.tar.gz", hash = "sha256:17849c23dba3d667247dc4defe1757fff98694e90fe37474f3feebb69ced26a9", size = 10449 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/52/40/9d857001228658f0d59e97ebd4c346fe73e138c6de1bce61dc568a57c7f8/click_repl-0.3.0-py3-none-any.whl", hash = "sha256:fb7e06deb8da8de86180a33a9da97ac316751c094c6899382da7feeeeb51b812", size = 10289 },
+]
+
 [[package]]
 name = "cloudpathlib"
 version = "0.21.1"
@ -1424,6 +1506,22 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b8/25/155f9f080d5e4bc0082edfda032ea2bc2b8fab3f4d25d46c1e9dd22a1a89/flatbuffers-25.2.10-py2.py3-none-any.whl", hash = "sha256:ebba5f4d5ea615af3f7fd70fc310636fbb2bbd1f566ac0a23d98dd412de50051", size = 30953 },
 ]

+[[package]]
+name = "flower"
+version = "2.0.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "celery" },
+    { name = "humanize" },
+    { name = "prometheus-client" },
+    { name = "pytz" },
+    { name = "tornado" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/09/a1/357f1b5d8946deafdcfdd604f51baae9de10aafa2908d0b7322597155f92/flower-2.0.1.tar.gz", hash = "sha256:5ab717b979530770c16afb48b50d2a98d23c3e9fe39851dcf6bc4d01845a02a0", size = 3220408 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a6/ff/ee2f67c0ff146ec98b5df1df637b2bc2d17beeb05df9f427a67bd7a7d79c/flower-2.0.1-py2.py3-none-any.whl", hash = "sha256:9db2c621eeefbc844c8dd88be64aef61e84e2deb29b271e02ab2b5b9f01068e2", size = 383553 },
+]
+
 [[package]]
 name = "fonttools"
 version = "4.58.4"
@ -1921,6 +2019,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f0/0f/310fb31e39e2d734ccaa2c0fb981ee41f7bd5056ce9bc29b2248bd569169/humanfriendly-10.0-py2.py3-none-any.whl", hash = "sha256:1697e1a8a8f550fd43c2865cd84542fc175a61dcb779b6fee18cf6b6ccba1477", size = 86794 },
 ]

+[[package]]
+name = "humanize"
+version = "4.14.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b6/43/50033d25ad96a7f3845f40999b4778f753c3901a11808a584fed7c00d9f5/humanize-4.14.0.tar.gz", hash = "sha256:2fa092705ea640d605c435b1ca82b2866a1b601cdf96f076d70b79a855eba90d", size = 82939 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c3/5b/9512c5fb6c8218332b530f13500c6ff5f3ce3342f35e0dd7be9ac3856fd3/humanize-4.14.0-py3-none-any.whl", hash = "sha256:d57701248d040ad456092820e6fde56c930f17749956ac47f4f655c0c547bfff", size = 132092 },
+]
+
 [[package]]
 name = "hyperframe"
 version = "6.1.0"
@ -2259,6 +2366,26 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ea/cc/75f41633c75224ba820a4533163bc8b070b6bf25416014074c63284c2d4e/kokoro-0.9.4-py3-none-any.whl", hash = "sha256:a129dc6364a286bd6a92c396e9862459d3d3e45f2c15596ed5a94dcee5789efd", size = 32592 },
 ]

+[[package]]
+name = "kombu"
+version = "5.5.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "amqp" },
+    { name = "packaging" },
+    { name = "tzdata" },
+    { name = "vine" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/0f/d3/5ff936d8319ac86b9c409f1501b07c426e6ad41966fedace9ef1b966e23f/kombu-5.5.4.tar.gz", hash = "sha256:886600168275ebeada93b888e831352fe578168342f0d1d5833d88ba0d847363", size = 461992 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ef/70/a07dcf4f62598c8ad579df241af55ced65bed76e42e45d3c368a6d82dbc1/kombu-5.5.4-py3-none-any.whl", hash = "sha256:a12ed0557c238897d8e518f1d1fdf84bd1516c5e305af2dacd85c2015115feb8", size = 210034 },
+]
+
+[package.optional-dependencies]
+redis = [
+    { name = "redis" },
+]
+
 [[package]]
 name = "kubernetes"
 version = "33.1.0"
@ -4029,6 +4156,27 @@ version = "1.6"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/2a/68/d8412d1e0d70edf9791cbac5426dc859f4649afc22f2abbeb0d947cf70fd/progress-1.6.tar.gz", hash = "sha256:c9c86e98b5c03fa1fe11e3b67c1feda4788b8d0fe7336c2ff7d5644ccfba34cd", size = 7842 }

+[[package]]
+name = "prometheus-client"
+version = "0.23.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/23/53/3edb5d68ecf6b38fcbcc1ad28391117d2a322d9a1a3eff04bfdb184d8c3b/prometheus_client-0.23.1.tar.gz", hash = "sha256:6ae8f9081eaaaf153a2e959d2e6c4f4fb57b12ef76c8c7980202f1e57b48b2ce", size = 80481 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b8/db/14bafcb4af2139e046d03fd00dea7873e48eafe18b7d2797e73d6681f210/prometheus_client-0.23.1-py3-none-any.whl", hash = "sha256:dd1913e6e76b59cfe44e7a4b83e01afc9873c1bdfd2ed8739f1e76aeca115f99", size = 61145 },
+]
+
+[[package]]
+name = "prompt-toolkit"
+version = "3.0.52"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "wcwidth" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a1/96/06e01a7b38dce6fe1db213e061a4602dd6032a8a97ef6c1a862537732421/prompt_toolkit-3.0.52.tar.gz", hash = "sha256:28cde192929c8e7321de85de1ddbe736f1375148b02f2e17edd840042b1be855", size = 434198 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/84/03/0d3ce49e2505ae70cf43bc5bb3033955d2fc9f932163e84dc0779cc47f48/prompt_toolkit-3.0.52-py3-none-any.whl", hash = "sha256:9aac639a3bbd33284347de5ad8d68ecc044b91a762dc39b7c21095fcd6a19955", size = 391431 },
+]
+
 [[package]]
 name = "propcache"
 version = "0.3.2"
@ -4727,6 +4875,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e1/67/921ec3024056483db83953ae8e48079ad62b92db7880013ca77632921dd0/readme_renderer-44.0-py3-none-any.whl", hash = "sha256:2fbca89b81a08526aadf1357a8c2ae889ec05fb03f5da67f9769c9a592166151", size = 13310 },
 ]

+[[package]]
+name = "redis"
+version = "5.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/47/da/d283a37303a995cd36f8b92db85135153dc4f7a8e4441aa827721b442cfb/redis-5.2.1.tar.gz", hash = "sha256:16f2e22dff21d5125e8481515e386711a34cbec50f0e44413dd7d9c060a54e0f", size = 4608355 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/3c/5f/fa26b9b2672cbe30e07d9a5bdf39cf16e3b80b42916757c5f92bca88e4ba/redis-5.2.1-py3-none-any.whl", hash = "sha256:ee7e1056b9aea0f04c6c2ed59452947f34c4940ee025f5dd83e6a6418b6989e4", size = 261502 },
+]
+
 [[package]]
 name = "referencing"
 version = "0.36.2"
@ -5426,6 +5583,7 @@ source = { virtual = "." }
 dependencies = [
    { name = "alembic" },
    { name = "asyncpg" },
+    { name = "celery", extra = ["redis"] },
    { name = "chonkie", extra = ["all"] },
    { name = "discord-py" },
    { name = "docling" },
@ -5435,6 +5593,7 @@ dependencies = [
    { name = "fastapi-users", extra = ["oauth", "sqlalchemy"] },
    { name = "faster-whisper" },
    { name = "firecrawl-py" },
+    { name = "flower" },
    { name = "github3-py" },
    { name = "google-api-python-client" },
    { name = "google-auth-oauthlib" },
@ -5452,6 +5611,7 @@ dependencies = [
    { name = "pgvector" },
    { name = "playwright" },
    { name = "python-ffmpeg" },
+    { name = "redis" },
    { name = "rerankers", extra = ["flashrank"] },
    { name = "sentence-transformers" },
    { name = "slack-sdk" },
@ -5475,6 +5635,7 @@ dev = [
 requires-dist = [
    { name = "alembic", specifier = ">=1.13.0" },
    { name = "asyncpg", specifier = ">=0.30.0" },
+    { name = "celery", extras = ["redis"], specifier = ">=5.5.3" },
    { name = "chonkie", extras = ["all"], specifier = ">=1.0.6" },
    { name = "discord-py", specifier = ">=2.5.2" },
    { name = "docling", specifier = ">=2.15.0" },
@ -5484,6 +5645,7 @@ requires-dist = [
    { name = "fastapi-users", extras = ["oauth", "sqlalchemy"], specifier = ">=14.0.1" },
    { name = "faster-whisper", specifier = ">=1.1.0" },
    { name = "firecrawl-py", specifier = ">=1.12.0" },
+    { name = "flower", specifier = ">=2.0.1" },
    { name = "github3-py", specifier = "==4.0.1" },
    { name = "google-api-python-client", specifier = ">=2.156.0" },
    { name = "google-auth-oauthlib", specifier = ">=1.2.1" },
@ -5501,6 +5663,7 @@ requires-dist = [
    { name = "pgvector", specifier = ">=0.3.6" },
    { name = "playwright", specifier = ">=1.50.0" },
    { name = "python-ffmpeg", specifier = ">=2.0.12" },
+    { name = "redis", specifier = ">=5.2.1" },
    { name = "rerankers", extras = ["flashrank"], specifier = ">=0.7.1" },
    { name = "sentence-transformers", specifier = ">=3.4.1" },
    { name = "slack-sdk", specifier = ">=3.34.0" },
@ -5751,6 +5914,25 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ab/c0/131628e6d42682b0502c63fd7f647b8b5ca4bd94088f6c85ca7225db8ac4/torchvision-0.22.1-cp313-cp313t-win_amd64.whl", hash = "sha256:7414eeacfb941fa21acddcd725f1617da5630ec822e498660a4b864d7d998075", size = 1629892 },
 ]

+[[package]]
+name = "tornado"
+version = "6.5.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/09/ce/1eb500eae19f4648281bb2186927bb062d2438c2e5093d1360391afd2f90/tornado-6.5.2.tar.gz", hash = "sha256:ab53c8f9a0fa351e2c0741284e06c7a45da86afb544133201c5cc8578eb076a0", size = 510821 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f6/48/6a7529df2c9cc12efd2e8f5dd219516184d703b34c06786809670df5b3bd/tornado-6.5.2-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:2436822940d37cde62771cff8774f4f00b3c8024fe482e16ca8387b8a2724db6", size = 442563 },
+    { url = "https://files.pythonhosted.org/packages/f2/b5/9b575a0ed3e50b00c40b08cbce82eb618229091d09f6d14bce80fc01cb0b/tornado-6.5.2-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:583a52c7aa94ee046854ba81d9ebb6c81ec0fd30386d96f7640c96dad45a03ef", size = 440729 },
+    { url = "https://files.pythonhosted.org/packages/1b/4e/619174f52b120efcf23633c817fd3fed867c30bff785e2cd5a53a70e483c/tornado-6.5.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b0fe179f28d597deab2842b86ed4060deec7388f1fd9c1b4a41adf8af058907e", size = 444295 },
+    { url = "https://files.pythonhosted.org/packages/95/fa/87b41709552bbd393c85dd18e4e3499dcd8983f66e7972926db8d96aa065/tornado-6.5.2-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:b186e85d1e3536d69583d2298423744740986018e393d0321df7340e71898882", size = 443644 },
+    { url = "https://files.pythonhosted.org/packages/f9/41/fb15f06e33d7430ca89420283a8762a4e6b8025b800ea51796ab5e6d9559/tornado-6.5.2-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e792706668c87709709c18b353da1f7662317b563ff69f00bab83595940c7108", size = 443878 },
+    { url = "https://files.pythonhosted.org/packages/11/92/fe6d57da897776ad2e01e279170ea8ae726755b045fe5ac73b75357a5a3f/tornado-6.5.2-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:06ceb1300fd70cb20e43b1ad8aaee0266e69e7ced38fa910ad2e03285009ce7c", size = 444549 },
+    { url = "https://files.pythonhosted.org/packages/9b/02/c8f4f6c9204526daf3d760f4aa555a7a33ad0e60843eac025ccfd6ff4a93/tornado-6.5.2-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:74db443e0f5251be86cbf37929f84d8c20c27a355dd452a5cfa2aada0d001ec4", size = 443973 },
+    { url = "https://files.pythonhosted.org/packages/ae/2d/f5f5707b655ce2317190183868cd0f6822a1121b4baeae509ceb9590d0bd/tornado-6.5.2-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:b5e735ab2889d7ed33b32a459cac490eda71a1ba6857b0118de476ab6c366c04", size = 443954 },
+    { url = "https://files.pythonhosted.org/packages/e8/59/593bd0f40f7355806bf6573b47b8c22f8e1374c9b6fd03114bd6b7a3dcfd/tornado-6.5.2-cp39-abi3-win32.whl", hash = "sha256:c6f29e94d9b37a95013bb669616352ddb82e3bfe8326fccee50583caebc8a5f0", size = 445023 },
+    { url = "https://files.pythonhosted.org/packages/c7/2a/f609b420c2f564a748a2d80ebfb2ee02a73ca80223af712fca591386cafb/tornado-6.5.2-cp39-abi3-win_amd64.whl", hash = "sha256:e56a5af51cc30dd2cae649429af65ca2f6571da29504a07995175df14c18f35f", size = 445427 },
+    { url = "https://files.pythonhosted.org/packages/5e/4f/e1f65e8f8c76d73658b33d33b81eed4322fb5085350e4328d5c956f0c8f9/tornado-6.5.2-cp39-abi3-win_arm64.whl", hash = "sha256:d6c33dc3672e3a1f3618eb63b7ef4683a7688e7b9e6e8f0d9aa5726360a004af", size = 444456 },
+]
+
 [[package]]
 name = "tqdm"
 version = "4.67.1"
@ -6180,6 +6362,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/fa/6e/3e955517e22cbdd565f2f8b2e73d52528b14b8bcfdb04f62466b071de847/validators-0.35.0-py3-none-any.whl", hash = "sha256:e8c947097eae7892cb3d26868d637f79f47b4a0554bc6b80065dfe5aac3705dd", size = 44712 },
 ]

+[[package]]
+name = "vine"
+version = "5.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/bd/e4/d07b5f29d283596b9727dd5275ccbceb63c44a1a82aa9e4bfd20426762ac/vine-5.1.0.tar.gz", hash = "sha256:8b62e981d35c41049211cf62a0a1242d8c1ee9bd15bb196ce38aefd6799e61e0", size = 48980 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/03/ff/7c0c86c43b3cbb927e0ccc0255cb4057ceba4799cd44ae95174ce8e8b5b2/vine-5.1.0-py3-none-any.whl", hash = "sha256:40fdf3c48b2cfe1c38a49e9ae2da6fda88e4794c810050a728bd7413811fb1dc", size = 9636 },
+]
+
 [[package]]
 name = "wasabi"
 version = "1.1.3"
@ -6259,6 +6450,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/32/fa/a4f5c2046385492b2273213ef815bf71a0d4c1943b784fb904e184e30201/watchfiles-1.1.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:af06c863f152005c7592df1d6a7009c836a247c9d8adb78fef8575a5a98699db", size = 623315 },
 ]

+[[package]]
+name = "wcwidth"
+version = "0.2.14"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/24/30/6b0809f4510673dc723187aeaf24c7f5459922d01e2f794277a3dfb90345/wcwidth-0.2.14.tar.gz", hash = "sha256:4d478375d31bc5395a3c55c40ccdf3354688364cd61c4f6adacaa9215d0b3605", size = 102293 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/af/b5/123f13c975e9f27ab9c0770f514345bd406d0e8d3b7a0723af9d43f710af/wcwidth-0.2.14-py2.py3-none-any.whl", hash = "sha256:a7bb560c8aee30f9957e5f9895805edd20602f2d7f720186dfd906e82b4982e1", size = 37286 },
+]
+
 [[package]]
 name = "weasel"
 version = "0.4.1"
--- a/surfsense_web/content/docs/docker-installation.mdx
+++ b/surfsense_web/content/docs/docker-installation.mdx
@ -17,7 +17,7 @@ Before you begin, ensure you have:
 - [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) installed on your machine
 - [Git](https://git-scm.com/downloads) (to clone the repository)
 - Completed all the [prerequisite setup steps](/docs) including:
-  - PGVector setup
+  - Auth setup
  - **File Processing ETL Service** (choose one):
    - Unstructured.io API key (Supports 34+ formats)
    - LlamaIndex API key (enhanced parsing, supports 50+ formats)
@ -56,7 +56,7 @@ Before you begin, ensure you have:

    Edit all `.env` files and fill in the required values:

-### Docker-Specific Environment Variables
+### Docker-Specific Environment Variables (Optional)

 | ENV VARIABLE               | DESCRIPTION                                                                 | DEFAULT VALUE       |
 |----------------------------|-----------------------------------------------------------------------------|---------------------|
@ -64,6 +64,8 @@ Before you begin, ensure you have:
 | BACKEND_PORT               | Port for the backend API service                                            | 8000                |
 | POSTGRES_PORT              | Port for the PostgreSQL database                                            | 5432                |
 | PGADMIN_PORT               | Port for pgAdmin web interface                                              | 5050                |
+| REDIS_PORT                 | Port for Redis (used by Celery)                                             | 6379                |
+| FLOWER_PORT                | Port for Flower (Celery monitoring tool)                                    | 5555                |
 | POSTGRES_USER              | PostgreSQL username                                                         | postgres            |
 | POSTGRES_PASSWORD          | PostgreSQL password                                                         | postgres            |
 | POSTGRES_DB                | PostgreSQL database name                                                    | surfsense           |
@ -81,7 +83,7 @@ Before you begin, ensure you have:
 | AUTH_TYPE                  | Authentication method: `GOOGLE` for OAuth with Google, `LOCAL` for email/password authentication                                                                                          |
 | GOOGLE_OAUTH_CLIENT_ID     | (Optional) Client ID from Google Cloud Console (required if AUTH_TYPE=GOOGLE)                                                                                                                        |
 | GOOGLE_OAUTH_CLIENT_SECRET | (Optional) Client secret from Google Cloud Console (required if AUTH_TYPE=GOOGLE)                                                                                                                    |
-| EMBEDDING_MODEL            | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`)                                                                                                                 |
+| EMBEDDING_MODEL            | Name of the embedding model (e.g., `sentence-transformers/all-MiniLM-L6-v2`, `openai://text-embedding-ada-002`)                                                                                                                 |
 | RERANKERS_MODEL_NAME       | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`)                                                                                                                              |
 | RERANKERS_MODEL_TYPE       | Type of reranker model (e.g., `flashrank`)                                                                                                                                                |
 | TTS_SERVICE                | Text-to-Speech API provider for Podcasts (e.g., `local/kokoro`, `openai/tts-1`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers)                            |
@ -94,6 +96,8 @@ Before you begin, ensure you have:
 | ETL_SERVICE                | Document parsing service: `UNSTRUCTURED` (supports 34+ formats), `LLAMACLOUD` (supports 50+ formats including legacy document types), or `DOCLING` (local processing, supports PDF, Office docs, images, HTML, CSV)                                                  |
 | UNSTRUCTURED_API_KEY       | API key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED)                                                                                           |
 | LLAMA_CLOUD_API_KEY        | API key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD)                                                                                                  |
+| CELERY_BROKER_URL          | Redis connection URL for Celery broker (e.g., `redis://localhost:6379/0`)                                                                                                                |
+| CELERY_RESULT_BACKEND      | Redis connection URL for Celery result backend (e.g., `redis://localhost:6379/0`)                                                                                                        |


 **Optional Backend LangSmith Observability:**
--- a/surfsense_web/content/docs/index.mdx
+++ b/surfsense_web/content/docs/index.mdx
@ -4,50 +4,8 @@ description: Required setup's before setting up SurfSense
 full: true
 ---

-## PGVector installation Guide 

-SurfSense requires the pgvector extension for PostgreSQL:
-
-### Linux and Mac
-
-Compile and install the extension (supports Postgres 13+)
-
-```sh
-cd /tmp
-git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
-cd pgvector
-make
-make install # may need sudo
-```
-
-See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---linux-and-mac) if you run into issues
-
-### Windows
-
-Ensure [C++ support in Visual Studio](https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170#download-and-install-the-tools) is installed, and run:
-
-```cmd
-call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
-```
-
-Note: The exact path will vary depending on your Visual Studio version and edition
-
-Then use `nmake` to build:
-
-```cmd
-set "PGROOT=C:\Program Files\PostgreSQL\16"
-cd %TEMP%
-git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
-cd pgvector
-nmake /F Makefile.win
-nmake /F Makefile.win install
-```
-
-See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---windows) if you run into issues
-
---
-
-## Google OAuth Setup (Optional)
+## Auth Setup 

 SurfSense supports both Google OAuth and local email/password authentication. Google OAuth is optional - if you prefer local authentication, you can skip this section.

--- a/surfsense_web/content/docs/manual-installation.mdx
+++ b/surfsense_web/content/docs/manual-installation.mdx
@ -10,14 +10,28 @@ This guide provides step-by-step instructions for setting up SurfSense without D

 ## Prerequisites

-Before beginning the manual installation, ensure you have completed all the [prerequisite setup steps](/docs), including:
+Before beginning the manual installation, ensure you have the following installed and configured:

- PGVector setup
+### Required Software
+- **Python 3.12+** - Backend runtime environment
+- **Node.js 20+** - Frontend runtime environment  
+- **PostgreSQL 14+** - Database server
+- **PGVector** - PostgreSQL extension for vector similarity search
+- **Redis** - Message broker for Celery task queue
+- **Git** - Version control (to clone the repository)
+
+### Required Services & API Keys
+
+Complete all the [setup steps](/docs), including:
+
+- **Authentication Setup** (choose one):
+  - Google OAuth credentials (for `AUTH_TYPE=GOOGLE`)
+  - Local authentication setup (for `AUTH_TYPE=LOCAL`)
 - **File Processing ETL Service** (choose one):
-    - Unstructured.io API key (Supports 34+ formats)
-    - LlamaIndex API key (enhanced parsing, supports 50+ formats)
-    - Docling (local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)
- Other required API keys
+  - Unstructured.io API key (Supports 34+ formats)
+  - LlamaCloud API key (enhanced parsing, supports 50+ formats)
+  - Docling (local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)
+- **Other API keys** as needed for your use case

 ## Backend Setup

@ -58,7 +72,7 @@ Edit the `.env` file and set the following variables:
 | AUTH_TYPE                  | Authentication method: `GOOGLE` for OAuth with Google, `LOCAL` for email/password authentication                                                                                          |
 | GOOGLE_OAUTH_CLIENT_ID     | (Optional) Client ID from Google Cloud Console (required if AUTH_TYPE=GOOGLE)                                                                                                                        |
 | GOOGLE_OAUTH_CLIENT_SECRET | (Optional) Client secret from Google Cloud Console (required if AUTH_TYPE=GOOGLE)                                                                                                                    |
-| EMBEDDING_MODEL            | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`)                                                                                                                 |
+| EMBEDDING_MODEL            | Name of the embedding model (e.g., `sentence-transformers/all-MiniLM-L6-v2`, `openai://text-embedding-ada-002`)                                                                                                                 |
 | RERANKERS_MODEL_NAME       | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`)                                                                                                                              |
 | RERANKERS_MODEL_TYPE       | Type of reranker model (e.g., `flashrank`)                                                                                                                                                |
 | TTS_SERVICE                | Text-to-Speech API provider for Podcasts (e.g., `local/kokoro`, `openai/tts-1`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers)                            |
@ -70,9 +84,11 @@ Edit the `.env` file and set the following variables:
 | ETL_SERVICE                | Document parsing service: `UNSTRUCTURED` (supports 34+ formats), `LLAMACLOUD` (supports 50+ formats including legacy document types), or `DOCLING` (local processing, supports PDF, Office docs, images, HTML, CSV)                                                  |
 | UNSTRUCTURED_API_KEY       | API key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED)                                                                                           |
 | LLAMA_CLOUD_API_KEY        | API key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD)                                                                                                  |
+| CELERY_BROKER_URL          | Redis connection URL for Celery broker (e.g., `redis://localhost:6379/0`)                                                                                                                |
+| CELERY_RESULT_BACKEND      | Redis connection URL for Celery result backend (e.g., `redis://localhost:6379/0`)                                                                                                        |


-**Optional Backend LangSmith Observability:**
+**(Optional) Backend LangSmith Observability:**
 | ENV VARIABLE | DESCRIPTION |
 |--------------|-------------|
 | LANGSMITH_TRACING | Enable LangSmith tracing (e.g., `true`) |
@ -80,7 +96,7 @@ Edit the `.env` file and set the following variables:
 | LANGSMITH_API_KEY | Your LangSmith API key |
 | LANGSMITH_PROJECT | LangSmith project name (e.g., `surfsense`) |

-**Uvicorn Server Configuration**
+**(Optional) Uvicorn Server Configuration**
 | ENV VARIABLE | DESCRIPTION | DEFAULT VALUE |
 |------------------------------|---------------------------------------------|---------------|
 | UVICORN_HOST                 | Host address to bind the server             | 0.0.0.0       |
@ -149,7 +165,91 @@ uv sync
 uv sync
 ```

-### 3. Run the Backend
+### 3. Start Redis Server
+
+Redis is required for Celery task queue. Start the Redis server:
+
+**Linux:**
+
+```bash
+# Start Redis server
+sudo systemctl start redis
+
+# Or if using Redis installed via package manager
+redis-server
+```
+
+**macOS:**
+
+```bash
+# If installed via Homebrew
+brew services start redis
+
+# Or run directly
+redis-server
+```
+
+**Windows:**
+
+```powershell
+# Option 1: If using Redis on Windows (via WSL or Windows port)
+redis-server
+
+# Option 2: If installed as a Windows service
+net start Redis
+```
+
+**Alternative for Windows - Run Redis in Docker:**
+
+If you have Docker Desktop installed, you can run Redis in a container:
+
+```powershell
+# Pull and run Redis container
+docker run -d --name redis -p 6379:6379 redis:latest
+
+# To stop Redis
+docker stop redis
+
+# To start Redis again
+docker start redis
+
+# To remove Redis container
+docker rm -f redis
+```
+
+Verify Redis is running by connecting to it:
+
+```bash
+redis-cli ping
+# Should return: PONG
+```
+
+### 4. Start Celery Worker
+
+In a new terminal window, start the Celery worker to handle background tasks:
+
+**Linux/macOS/Windows:**
+
+```bash
+# Make sure you're in the surfsense_backend directory
+cd surfsense_backend
+
+# Start Celery worker
+uv run celery -A celery_worker.celery_app worker --loglevel=info --concurrency=1 --pool=solo
+```
+
+**Optional: Start Flower for monitoring Celery tasks:**
+
+In another terminal window:
+
+```bash
+# Start Flower (Celery monitoring tool)
+uv run celery -A celery_worker.celery_app flower --port=5555
+```
+
+Access Flower at [http://localhost:5555](http://localhost:5555) to monitor your Celery tasks.
+
+### 5. Run the Backend

 Start the backend server:

@ -303,9 +403,11 @@ To verify your installation:
 ## Troubleshooting

 - **Database Connection Issues**: Verify your PostgreSQL server is running and pgvector is properly installed
+- **Redis Connection Issues**: Ensure Redis server is running (`redis-cli ping` should return `PONG`). Check that `CELERY_BROKER_URL` and `CELERY_RESULT_BACKEND` are correctly set in your `.env` file
+- **Celery Worker Issues**: Make sure the Celery worker is running in a separate terminal. Check worker logs for any errors
 - **Authentication Problems**: Check your Google OAuth configuration and ensure redirect URIs are set correctly
 - **LLM Errors**: Confirm your LLM API keys are valid and the selected models are accessible
- **File Upload Failures**: Validate your Unstructured.io API key
+- **File Upload Failures**: Validate your ETL service API key (Unstructured.io or LlamaCloud) or ensure Docling is properly configured
 - **Windows-specific**: If you encounter path issues, ensure you're using the correct path separator (`\` instead of `/`)
 - **macOS-specific**: If you encounter permission issues, you may need to use `sudo` for some installation commands