Merge upstream/dev into feat/vision-autocomplete

2026-04-25 00:36:31 +02:00 · 2026-04-04 09:15:13 +02:00 · 2026-04-04 09:15:13 +02:00 · d7315e7f27
commit d7315e7f27
parent 18103417bb 92d75ad622
142 changed files with 9440 additions and 3390 deletions
--- a/README.es.md
+++ b/README.es.md
@ -21,9 +21,28 @@
 </div>

 # SurfSense
-Conecta cualquier LLM a tus fuentes de conocimiento internas y chatea con él en tiempo real junto a tu equipo. Alternativa de código abierto a NotebookLM, Perplexity y Glean.

-SurfSense es un agente de investigación de IA altamente personalizable, conectado a fuentes externas como motores de búsqueda (SearxNG, Tavily, LinkUp), Google Drive, OneDrive, Dropbox, Slack, Microsoft Teams, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Circleback, Elasticsearch, Obsidian y más por venir.
+NotebookLM es una de las mejores y más útiles plataformas de IA que existen, pero una vez que comienzas a usarla regularmente también sientes sus limitaciones dejando algo que desear.
+
+1. Hay límites en la cantidad de fuentes que puedes agregar en un notebook.
+2. Hay límites en la cantidad de notebooks que puedes tener.
+3. No puedes tener fuentes que excedan 500,000 palabras y más de 200MB.
+4. Estás bloqueado con los servicios de Google (LLMs, modelos de uso, etc.) sin opción de configurarlos.
+5. Fuentes de datos externas e integraciones de servicios limitadas.
+6. El agente de NotebookLM está específicamente optimizado solo para estudiar e investigar, pero puedes hacer mucho más con los datos de origen.
+7. Falta de soporte multijugador.
+
+...y más.
+
+**SurfSense está específicamente hecho para resolver estos problemas.** SurfSense te permite:
+
+- **Controla Tu Flujo de Datos** - Mantén tus datos privados y seguros.
+- **Sin Límites de Datos** - Agrega una cantidad ilimitada de fuentes y notebooks.
+- **Sin Dependencia de Proveedores** - Configura cualquier modelo LLM, de imagen, TTS y STT.
+- **25+ Fuentes de Datos Externas** - Agrega tus fuentes desde Google Drive, OneDrive, Dropbox, Notion y muchos otros servicios externos.
+- **Soporte Multijugador en Tiempo Real** - Trabaja fácilmente con los miembros de tu equipo en un notebook compartido.
+
+...y más por venir.



@ -34,7 +53,7 @@ https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
 ## Ejemplo de Agente de Video


-https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
+https://github.com/user-attachments/assets/012a7ffa-6f76-4f06-9dda-7632b470057a



@ -133,24 +152,29 @@ Para Docker Compose, instalación manual y otras opciones de despliegue, consult

   <p align="center"><img src="https://github.com/user-attachments/assets/3b04477d-8f42-4baa-be95-867c1eaeba87" alt="Comentarios en Tiempo Real" /></p>

-## Funcionalidades Principales
+## SurfSense vs Google NotebookLM

-| Funcionalidad | Descripción |
-|----------------|-------------|
-| Alternativa OSS | Reemplazo directo de NotebookLM, Perplexity y Glean con colaboración en equipo en tiempo real |
-| 50+ Formatos de Archivo | Sube documentos, imágenes, videos vía LlamaCloud, Unstructured o Docling (local) |
-| Búsqueda Híbrida | Semántica + Texto completo con Índices Jerárquicos y Reciprocal Rank Fusion |
-| Respuestas con Citas | Chatea con tu base de conocimiento y obtén respuestas citadas al estilo Perplexity |
-| Arquitectura de Agentes Profundos | Impulsado por [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) con planificación, subagentes y acceso al sistema de archivos |
-| Soporte Universal de LLM | 100+ LLMs, 6000+ modelos de embeddings, todos los principales rerankers vía OpenAI spec y LiteLLM |
-| Privacidad Primero | Soporte completo de LLM local (vLLM, Ollama) tus datos son tuyos |
-| Colaboración en Equipo | RBAC con roles de Propietario / Admin / Editor / Visor, chat en tiempo real e hilos de comentarios |
-| Generación de Videos | Genera videos con narración y visuales |
-| Generación de Presentaciones | Crea presentaciones editables basadas en diapositivas |
-| Generación de Podcasts | Podcast de 3 min en menos de 20 segundos; múltiples proveedores TTS (OpenAI, Azure, Kokoro) |
-| Extensión de Navegador | Extensión multi-navegador para guardar cualquier página web, incluyendo páginas protegidas por autenticación |
-| 27+ Conectores | Motores de búsqueda, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord y [más](#fuentes-externas) |
-| Auto-Hospedable | Código abierto, Docker en un solo comando o Docker Compose completo para producción |
+| Característica | Google NotebookLM | SurfSense |
+|---------|-------------------|-----------|
+| **Fuentes por Notebook** | 50 (Gratis) a 600 (Ultra, $249.99/mes) | Ilimitadas |
+| **Número de Notebooks** | 100 (Gratis) a 500 (planes de pago) | Ilimitados |
+| **Límite de Tamaño de Fuente** | 500,000 palabras / 200MB por fuente | Sin límite |
+| **Precios** | Nivel gratuito disponible; Pro $19.99/mes, Ultra $249.99/mes | Gratuito y de código abierto, auto-hospedable en tu propia infra |
+| **Soporte de LLM** | Solo Google Gemini | 100+ LLMs vía OpenAI spec y LiteLLM |
+| **Modelos de Embeddings** | Solo Google | 6,000+ modelos de embeddings, todos los principales rerankers |
+| **LLMs Locales / Privados** | No disponible | Soporte completo (vLLM, Ollama) - tus datos son tuyos |
+| **Auto-Hospedable** | No | Sí - Docker en un solo comando o Docker Compose completo |
+| **Código Abierto** | No | Sí |
+| **Conectores Externos** | Google Drive, YouTube, sitios web | 27+ conectores - Motores de búsqueda, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord y [más](#fuentes-externas) |
+| **Soporte de Formatos de Archivo** | PDFs, Docs, Slides, Sheets, CSV, Word, EPUB, imágenes, URLs web, YouTube | 50+ formatos - documentos, imágenes, videos vía LlamaCloud, Unstructured o Docling (local) |
+| **Búsqueda** | Búsqueda semántica | Búsqueda Híbrida - Semántica + Texto completo con Índices Jerárquicos y Reciprocal Rank Fusion |
+| **Respuestas con Citas** | Sí | Sí - Respuestas citadas al estilo Perplexity |
+| **Arquitectura de Agentes** | No | Sí - impulsado por [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) con planificación, subagentes y acceso al sistema de archivos |
+| **Multijugador en Tiempo Real** | Notebooks compartidos con roles de Visor/Editor (sin chat en tiempo real) | RBAC con roles de Propietario / Admin / Editor / Visor, chat en tiempo real e hilos de comentarios |
+| **Generación de Videos** | Resúmenes en video cinemáticos vía Veo 3 (solo Ultra) | Disponible (NotebookLM es mejor aquí, mejorando activamente) |
+| **Generación de Presentaciones** | Diapositivas más atractivas pero no editables | Crea presentaciones editables basadas en diapositivas |
+| **Generación de Podcasts** | Resúmenes de audio con hosts e idiomas personalizables | Disponible con múltiples proveedores TTS (NotebookLM es mejor aquí, mejorando activamente) |
+| **Extensión de Navegador** | No | Extensión multi-navegador para guardar cualquier página web, incluyendo páginas protegidas por autenticación |

 <details>
 <summary><b>Lista completa de Fuentes Externas</b></summary>
--- a/README.hi.md
+++ b/README.hi.md
@ -21,9 +21,28 @@
 </div>

 # SurfSense
-किसी भी LLM को अपने आंतरिक ज्ञान स्रोतों से जोड़ें और अपनी टीम के साथ रीयल-टाइम में चैट करें। NotebookLM, Perplexity और Glean का ओपन सोर्स विकल्प।

-SurfSense एक अत्यधिक अनुकूलन योग्य AI शोध एजेंट है, जो बाहरी स्रोतों से जुड़ा है जैसे सर्च इंजन (SearxNG, Tavily, LinkUp), Google Drive, OneDrive, Dropbox, Slack, Microsoft Teams, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Circleback, Elasticsearch, Obsidian और भी बहुत कुछ आने वाला है।
+NotebookLM वहाँ उपलब्ध सबसे अच्छे और सबसे उपयोगी AI प्लेटफ़ॉर्म में से एक है, लेकिन जब आप इसे नियमित रूप से उपयोग करना शुरू करते हैं तो आप इसकी सीमाओं को भी महसूस करते हैं जो कुछ और की चाह छोड़ती हैं।
+
+1. एक notebook में जोड़े जा सकने वाले स्रोतों की मात्रा पर सीमाएं हैं।
+2. आपके पास कितने notebooks हो सकते हैं इस पर सीमाएं हैं।
+3. आपके पास ऐसे स्रोत नहीं हो सकते जो 500,000 शब्दों और 200MB से अधिक हों।
+4. आप Google सेवाओं (LLMs, उपयोग मॉडल, आदि) में बंद हैं और उन्हें कॉन्फ़िगर करने का कोई विकल्प नहीं है।
+5. सीमित बाहरी डेटा स्रोत और सेवा एकीकरण।
+6. NotebookLM एजेंट विशेष रूप से केवल अध्ययन और शोध के लिए अनुकूलित है, लेकिन आप स्रोत डेटा के साथ और भी बहुत कुछ कर सकते हैं।
+7. मल्टीप्लेयर सपोर्ट की कमी।
+
+...और भी बहुत कुछ।
+
+**SurfSense विशेष रूप से इन समस्याओं को हल करने के लिए बनाया गया है।** SurfSense आपको सक्षम बनाता है:
+
+- **अपने डेटा प्रवाह को नियंत्रित करें** - अपने डेटा को निजी और सुरक्षित रखें।
+- **कोई डेटा सीमा नहीं** - असीमित मात्रा में स्रोत और notebooks जोड़ें।
+- **कोई विक्रेता लॉक-इन नहीं** - किसी भी LLM, इमेज, TTS और STT मॉडल को कॉन्फ़िगर करें।
+- **25+ बाहरी डेटा स्रोत** - Google Drive, OneDrive, Dropbox, Notion और कई अन्य बाहरी सेवाओं से अपने स्रोत जोड़ें।
+- **रीयल-टाइम मल्टीप्लेयर सपोर्ट** - एक साझा notebook में अपनी टीम के सदस्यों के साथ आसानी से काम करें।
+
+...और भी बहुत कुछ आने वाला है।



@ -34,7 +53,7 @@ https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
 ## वीडियो एजेंट नमूना


-https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
+https://github.com/user-attachments/assets/012a7ffa-6f76-4f06-9dda-7632b470057a



@ -133,24 +152,29 @@ Docker Compose, मैनुअल इंस्टॉलेशन और अन

   <p align="center"><img src="https://github.com/user-attachments/assets/3b04477d-8f42-4baa-be95-867c1eaeba87" alt="रीयल-टाइम कमेंट्स" /></p>

-## प्रमुख विशेषताएं
+## SurfSense vs Google NotebookLM

-| विशेषता | विवरण |
-|----------|--------|
-| OSS विकल्प | रीयल-टाइम टीम सहयोग के साथ NotebookLM, Perplexity और Glean का सीधा प्रतिस्थापन |
-| 50+ फ़ाइल फ़ॉर्मेट | LlamaCloud, Unstructured या Docling (लोकल) के माध्यम से दस्तावेज़, चित्र, वीडियो अपलोड करें |
-| हाइब्रिड सर्च | हायरार्किकल इंडाइसेस और Reciprocal Rank Fusion के साथ सिमैंटिक + फुल टेक्स्ट सर्च |
-| उद्धृत उत्तर | अपने ज्ञान आधार के साथ चैट करें और Perplexity शैली के उद्धृत उत्तर पाएं |
-| डीप एजेंट आर्किटेक्चर | [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) द्वारा संचालित, योजना, सब-एजेंट और फ़ाइल सिस्टम एक्सेस |
-| यूनिवर्सल LLM सपोर्ट | 100+ LLMs, 6000+ एम्बेडिंग मॉडल, सभी प्रमुख रीरैंकर्स OpenAI spec और LiteLLM के माध्यम से |
-| प्राइवेसी फर्स्ट | पूर्ण लोकल LLM सपोर्ट (vLLM, Ollama) आपका डेटा आपका रहता है |
-| टीम सहयोग | मालिक / एडमिन / संपादक / दर्शक भूमिकाओं के साथ RBAC, रीयल-टाइम चैट और कमेंट थ्रेड |
-| वीडियो जनरेशन | नैरेशन और विज़ुअल के साथ वीडियो बनाएं |
-| प्रेजेंटेशन जनरेशन | संपादन योग्य, स्लाइड आधारित प्रेजेंटेशन बनाएं |
-| पॉडकास्ट जनरेशन | 20 सेकंड से कम में 3 मिनट का पॉडकास्ट; कई TTS प्रदाता (OpenAI, Azure, Kokoro) |
-| ब्राउज़र एक्सटेंशन | किसी भी वेबपेज को सहेजने के लिए क्रॉस-ब्राउज़र एक्सटेंशन, प्रमाणीकरण सुरक्षित पेज सहित |
-| 27+ कनेक्टर्स | सर्च इंजन, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord और [अधिक](#बाहरी-स्रोत) |
-| सेल्फ-होस्ट करने योग्य | ओपन सोर्स, Docker एक कमांड या प्रोडक्शन के लिए पूर्ण Docker Compose |
+| विशेषता | Google NotebookLM | SurfSense |
+|---------|-------------------|-----------|
+| **प्रति Notebook स्रोत** | 50 (मुफ़्त) से 600 (Ultra, $249.99/माह) | असीमित |
+| **Notebooks की संख्या** | 100 (मुफ़्त) से 500 (सशुल्क योजनाएं) | असीमित |
+| **स्रोत आकार सीमा** | 500,000 शब्द / 200MB प्रति स्रोत | कोई सीमा नहीं |
+| **मूल्य निर्धारण** | मुफ़्त स्तर उपलब्ध; Pro $19.99/माह, Ultra $249.99/माह | मुफ़्त और ओपन सोर्स, अपनी इंफ्रा पर सेल्फ-होस्ट करें |
+| **LLM सपोर्ट** | केवल Google Gemini | 100+ LLMs OpenAI spec और LiteLLM के माध्यम से |
+| **एम्बेडिंग मॉडल** | केवल Google | 6,000+ एम्बेडिंग मॉडल, सभी प्रमुख रीरैंकर्स |
+| **लोकल / प्राइवेट LLMs** | उपलब्ध नहीं | पूर्ण सपोर्ट (vLLM, Ollama) - आपका डेटा आपका रहता है |
+| **सेल्फ-होस्ट करने योग्य** | नहीं | हाँ - Docker एक कमांड या पूर्ण Docker Compose |
+| **ओपन सोर्स** | नहीं | हाँ |
+| **बाहरी कनेक्टर्स** | Google Drive, YouTube, वेबसाइटें | 27+ कनेक्टर्स - सर्च इंजन, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord और [अधिक](#बाहरी-स्रोत) |
+| **फ़ाइल फ़ॉर्मेट सपोर्ट** | PDFs, Docs, Slides, Sheets, CSV, Word, EPUB, इमेज, वेब URLs, YouTube | 50+ फ़ॉर्मेट - दस्तावेज़, इमेज, वीडियो LlamaCloud, Unstructured या Docling (लोकल) के माध्यम से |
+| **सर्च** | सिमैंटिक सर्च | हाइब्रिड सर्च - हायरार्किकल इंडाइसेस और Reciprocal Rank Fusion के साथ सिमैंटिक + फुल टेक्स्ट |
+| **उद्धृत उत्तर** | हाँ | हाँ - Perplexity शैली के उद्धृत उत्तर |
+| **एजेंट आर्किटेक्चर** | नहीं | हाँ - [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) द्वारा संचालित, योजना, सब-एजेंट और फ़ाइल सिस्टम एक्सेस |
+| **रीयल-टाइम मल्टीप्लेयर** | दर्शक/संपादक भूमिकाओं के साथ साझा notebooks (कोई रीयल-टाइम चैट नहीं) | मालिक / एडमिन / संपादक / दर्शक भूमिकाओं के साथ RBAC, रीयल-टाइम चैट और कमेंट थ्रेड |
+| **वीडियो जनरेशन** | Veo 3 के माध्यम से सिनेमैटिक वीडियो ओवरव्यू (केवल Ultra) | उपलब्ध (NotebookLM यहाँ बेहतर है, सक्रिय रूप से सुधार हो रहा है) |
+| **प्रेजेंटेशन जनरेशन** | बेहतर दिखने वाली स्लाइड्स लेकिन संपादन योग्य नहीं | संपादन योग्य, स्लाइड आधारित प्रेजेंटेशन बनाएं |
+| **पॉडकास्ट जनरेशन** | कस्टमाइज़ेबल होस्ट और भाषाओं के साथ ऑडियो ओवरव्यू | कई TTS प्रदाताओं के साथ उपलब्ध (NotebookLM यहाँ बेहतर है, सक्रिय रूप से सुधार हो रहा है) |
+| **ब्राउज़र एक्सटेंशन** | नहीं | किसी भी वेबपेज को सहेजने के लिए क्रॉस-ब्राउज़र एक्सटेंशन, प्रमाणीकरण सुरक्षित पेज सहित |

 <details>
 <summary><b>बाहरी स्रोतों की पूरी सूची</b></summary>
--- a/README.md
+++ b/README.md
@ -21,9 +21,28 @@
 </div>

 # SurfSense
-Connect any LLM to your internal knowledge sources and chat with it in real time alongside your team. OSS alternative to NotebookLM, Perplexity, and Glean.

-SurfSense is a highly customizable AI research agent, connected to external sources such as Search Engines (SearxNG, Tavily, LinkUp), Google Drive, OneDrive, Dropbox, Slack, Microsoft Teams, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Circleback, Elasticsearch, Obsidian and more to come.
+NotebookLM is one of the best and most useful AI platforms out there, but once you start using it regularly you also feel its limitations leaving something to be desired more.
+
+1. There are limits on the amount of sources you can add in a notebook.
+2. There are limits on the number of notebooks you can have.
+3. You cannot have sources that exceed 500,000 words and are more than 200MB.
+4. You are vendor locked in to Google services (LLMs, usage models, etc.) with no option to configure them.
+5. Limited external data sources and service integrations.
+6. NotebookLM Agent is specifically optimised for just studying and researching, but you can do so much more with the source data.
+7. Lack of multiplayer support.
+
+...and more.
+
+**SurfSense is specifically made to solve these problems.** SurfSense empowers you to:
+
+- **Control Your Data Flow** - Keep your data private and secure.
+- **No Data Limits** - Add an unlimited amount of sources and notebooks.
+- **No Vendor Lock-in** - Configure any LLM, image, TTS, and STT models to use.
+- **25+ External Data Sources** - Add your sources from Google Drive, OneDrive, Dropbox, Notion, and many other external services.
+- **Real-Time Multiplayer Support** - Work easily with your team members in a shared notebook.
+
+...and more to come.



@ -134,24 +153,29 @@ For Docker Compose, manual installation, and other deployment options, see the [

   <p align="center"><img src="https://github.com/user-attachments/assets/3b04477d-8f42-4baa-be95-867c1eaeba87" alt="Realtime Comments" /></p>

-## Key Features
+## SurfSense vs Google NotebookLM

-| Feature | Description |
-|---------|-------------|
-| OSS Alternative | Drop in replacement for NotebookLM, Perplexity, and Glean with real time team collaboration |
-| 50+ File Formats | Upload documents, images, videos via LlamaCloud, Unstructured, or Docling (local) |
-| Hybrid Search | Semantic + Full Text Search with Hierarchical Indices and Reciprocal Rank Fusion |
-| Cited Answers | Chat with your knowledge base and get Perplexity style cited responses |
-| Deep Agent Architecture | Powered by [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) planning, subagents, and file system access |
-| Universal LLM Support | 100+ LLMs, 6000+ embedding models, all major rerankers via OpenAI spec & LiteLLM |
-| Privacy First | Full local LLM support (vLLM, Ollama) your data stays yours |
-| Team Collaboration | RBAC with Owner / Admin / Editor / Viewer roles, real time chat & comment threads |
-| Video Generation | Generate videos with narration and visuals |
-| Presentation Generation | Create editable, slide based presentations |
-| Podcast Generation | 3 min podcast in under 20 seconds; multiple TTS providers (OpenAI, Azure, Kokoro) |
-| Browser Extension | Cross browser extension to save any webpage, including auth protected pages |
-| 27+ Connectors | Search Engines, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord & [more](#external-sources) |
-| Self Hostable | Open source, Docker one liner or full Docker Compose for production |
+| Feature | Google NotebookLM | SurfSense |
+|---------|-------------------|-----------|
+| **Sources per Notebook** | 50 (Free) to 600 (Ultra, $249.99/mo) | Unlimited |
+| **Number of Notebooks** | 100 (Free) to 500 (paid tiers) | Unlimited |
+| **Source Size Limit** | 500,000 words / 200MB per source | No limit |
+| **Pricing** | Free tier available; Pro $19.99/mo, Ultra $249.99/mo | Free and open source, self-host on your own infra |
+| **LLM Support** | Google Gemini only | 100+ LLMs via OpenAI spec & LiteLLM |
+| **Embedding Models** | Google only | 6,000+ embedding models, all major rerankers |
+| **Local / Private LLMs** | Not available | Full support (vLLM, Ollama) - your data stays yours |
+| **Self Hostable** | No | Yes - Docker one-liner or full Docker Compose |
+| **Open Source** | No | Yes |
+| **External Connectors** | Google Drive, YouTube, websites | 27+ connectors - Search Engines, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord & [more](#external-sources) |
+| **File Format Support** | PDFs, Docs, Slides, Sheets, CSV, Word, EPUB, images, web URLs, YouTube | 50+ formats - documents, images, videos via LlamaCloud, Unstructured, or Docling (local) |
+| **Search** | Semantic search | Hybrid Search - Semantic + Full Text with Hierarchical Indices & Reciprocal Rank Fusion |
+| **Cited Answers** | Yes | Yes - Perplexity-style cited responses |
+| **Agentic Architecture** | No | Yes - powered by [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) with planning, subagents, and file system access |
+| **Real-Time Multiplayer** | Shared notebooks with Viewer/Editor roles (no real-time chat) | RBAC with Owner / Admin / Editor / Viewer roles, real-time chat & comment threads |
+| **Video Generation** | Cinematic Video Overviews via Veo 3 (Ultra only) | Available (NotebookLM is better here, actively improving) |
+| **Presentation Generation** | Better looking slides but not editable | Create editable, slide-based presentations |
+| **Podcast Generation** | Audio Overviews with customizable hosts and languages | Available with multiple TTS providers (NotebookLM is better here, actively improving) |
+| **Browser Extension** | No | Cross-browser extension to save any webpage, including auth-protected pages |

 <details>
 <summary><b>Full list of External Sources</b></summary>
--- a/README.pt-BR.md
+++ b/README.pt-BR.md
@ -21,9 +21,28 @@
 </div>

 # SurfSense
-Conecte qualquer LLM às suas fontes de conhecimento internas e converse com ele em tempo real junto com sua equipe. Alternativa de código aberto ao NotebookLM, Perplexity e Glean.

-SurfSense é um agente de pesquisa de IA altamente personalizável, conectado a fontes externas como mecanismos de busca (SearxNG, Tavily, LinkUp), Google Drive, OneDrive, Dropbox, Slack, Microsoft Teams, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Circleback, Elasticsearch, Obsidian e mais por vir.
+O NotebookLM é uma das melhores e mais úteis plataformas de IA disponíveis, mas quando você começa a usá-lo regularmente também sente suas limitações deixando algo a desejar.
+
+1. Há limites na quantidade de fontes que você pode adicionar em um notebook.
+2. Há limites no número de notebooks que você pode ter.
+3. Você não pode ter fontes que excedam 500.000 palavras e mais de 200MB.
+4. Você fica preso aos serviços do Google (LLMs, modelos de uso, etc.) sem opção de configurá-los.
+5. Fontes de dados externas e integrações de serviços limitadas.
+6. O agente do NotebookLM é especificamente otimizado apenas para estudar e pesquisar, mas você pode fazer muito mais com os dados de origem.
+7. Falta de suporte multiplayer.
+
+...e mais.
+
+**O SurfSense foi feito especificamente para resolver esses problemas.** O SurfSense permite que você:
+
+- **Controle Seu Fluxo de Dados** - Mantenha seus dados privados e seguros.
+- **Sem Limites de Dados** - Adicione uma quantidade ilimitada de fontes e notebooks.
+- **Sem Dependência de Fornecedor** - Configure qualquer modelo LLM, de imagem, TTS e STT.
+- **25+ Fontes de Dados Externas** - Adicione suas fontes do Google Drive, OneDrive, Dropbox, Notion e muitos outros serviços externos.
+- **Suporte Multiplayer em Tempo Real** - Trabalhe facilmente com os membros da sua equipe em um notebook compartilhado.
+
+...e mais por vir.



@ -34,7 +53,7 @@ https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
 ## Exemplo de Agente de Vídeo


-https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
+https://github.com/user-attachments/assets/012a7ffa-6f76-4f06-9dda-7632b470057a



@ -133,24 +152,29 @@ Para Docker Compose, instalação manual e outras opções de implantação, con

   <p align="center"><img src="https://github.com/user-attachments/assets/3b04477d-8f42-4baa-be95-867c1eaeba87" alt="Comentários em Tempo Real" /></p>

-## Funcionalidades Principais
+## SurfSense vs Google NotebookLM

-| Funcionalidade | Descrição |
-|----------------|-----------|
-| Alternativa OSS | Substituto direto do NotebookLM, Perplexity e Glean com colaboração em equipe em tempo real |
-| 50+ Formatos de Arquivo | Faça upload de documentos, imagens, vídeos via LlamaCloud, Unstructured ou Docling (local) |
-| Busca Híbrida | Semântica + Texto completo com Índices Hierárquicos e Reciprocal Rank Fusion |
-| Respostas com Citações | Converse com sua base de conhecimento e obtenha respostas citadas no estilo Perplexity |
-| Arquitetura de Agentes Profundos | Alimentado por [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) com planejamento, subagentes e acesso ao sistema de arquivos |
-| Suporte Universal de LLM | 100+ LLMs, 6000+ modelos de embeddings, todos os principais rerankers via OpenAI spec e LiteLLM |
-| Privacidade em Primeiro Lugar | Suporte completo a LLM local (vLLM, Ollama) seus dados ficam com você |
-| Colaboração em Equipe | RBAC com papéis de Proprietário / Admin / Editor / Visualizador, chat em tempo real e threads de comentários |
-| Geração de Vídeos | Gera vídeos com narração e visuais |
-| Geração de Apresentações | Cria apresentações editáveis baseadas em slides |
-| Geração de Podcasts | Podcast de 3 min em menos de 20 segundos; múltiplos provedores TTS (OpenAI, Azure, Kokoro) |
-| Extensão de Navegador | Extensão multi-navegador para salvar qualquer página web, incluindo páginas protegidas por autenticação |
-| 27+ Conectores | Mecanismos de busca, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord e [mais](#fontes-externas) |
-| Auto-Hospedável | Código aberto, Docker em um único comando ou Docker Compose completo para produção |
+| Recurso | Google NotebookLM | SurfSense |
+|---------|-------------------|-----------|
+| **Fontes por Notebook** | 50 (Grátis) a 600 (Ultra, $249.99/mês) | Ilimitadas |
+| **Número de Notebooks** | 100 (Grátis) a 500 (planos pagos) | Ilimitados |
+| **Limite de Tamanho da Fonte** | 500.000 palavras / 200MB por fonte | Sem limite |
+| **Preços** | Nível gratuito disponível; Pro $19.99/mês, Ultra $249.99/mês | Gratuito e de código aberto, auto-hospedável na sua própria infra |
+| **Suporte a LLM** | Apenas Google Gemini | 100+ LLMs via OpenAI spec e LiteLLM |
+| **Modelos de Embeddings** | Apenas Google | 6.000+ modelos de embeddings, todos os principais rerankers |
+| **LLMs Locais / Privados** | Não disponível | Suporte completo (vLLM, Ollama) - seus dados ficam com você |
+| **Auto-Hospedável** | Não | Sim - Docker em um único comando ou Docker Compose completo |
+| **Código Aberto** | Não | Sim |
+| **Conectores Externos** | Google Drive, YouTube, sites | 27+ conectores - Mecanismos de busca, Google Drive, OneDrive, Dropbox, Slack, Teams, Jira, Notion, GitHub, Discord e [mais](#fontes-externas) |
+| **Suporte a Formatos de Arquivo** | PDFs, Docs, Slides, Sheets, CSV, Word, EPUB, imagens, URLs web, YouTube | 50+ formatos - documentos, imagens, vídeos via LlamaCloud, Unstructured ou Docling (local) |
+| **Busca** | Busca semântica | Busca Híbrida - Semântica + Texto completo com Índices Hierárquicos e Reciprocal Rank Fusion |
+| **Respostas com Citações** | Sim | Sim - Respostas citadas no estilo Perplexity |
+| **Arquitetura de Agentes** | Não | Sim - alimentado por [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) com planejamento, subagentes e acesso ao sistema de arquivos |
+| **Multiplayer em Tempo Real** | Notebooks compartilhados com papéis de Visualizador/Editor (sem chat em tempo real) | RBAC com papéis de Proprietário / Admin / Editor / Visualizador, chat em tempo real e threads de comentários |
+| **Geração de Vídeos** | Visões gerais cinemáticas via Veo 3 (apenas Ultra) | Disponível (NotebookLM é melhor aqui, melhorando ativamente) |
+| **Geração de Apresentações** | Slides mais bonitos mas não editáveis | Cria apresentações editáveis baseadas em slides |
+| **Geração de Podcasts** | Visões gerais em áudio com hosts e idiomas personalizáveis | Disponível com múltiplos provedores TTS (NotebookLM é melhor aqui, melhorando ativamente) |
+| **Extensão de Navegador** | Não | Extensão multi-navegador para salvar qualquer página web, incluindo páginas protegidas por autenticação |

 <details>
 <summary><b>Lista completa de Fontes Externas</b></summary>
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@ -21,9 +21,28 @@
 </div>

 # SurfSense
-将任何 LLM 连接到您的内部知识源，并与团队成员实时聊天。NotebookLM、Perplexity 和 Glean 的开源替代方案。

-SurfSense 是一个高度可定制的 AI 研究助手，可以连接外部数据源，如搜索引擎（SearxNG、Tavily、LinkUp）、Google Drive、OneDrive、Dropbox、Slack、Microsoft Teams、Linear、Jira、ClickUp、Confluence、BookStack、Gmail、Notion、YouTube、GitHub、Discord、Airtable、Google Calendar、Luma、Circleback、Elasticsearch、Obsidian 等，未来还会支持更多。
+NotebookLM 是目前最好、最实用的 AI 平台之一，但当你开始经常使用它时，你也会感受到它的局限性，总觉得还有不足之处。
+
+1. 一个笔记本中可以添加的来源数量有限制。
+2. 可以拥有的笔记本数量有限制。
+3. 来源不能超过 500,000 个单词和 200MB。
+4. 你被锁定在 Google 服务中（LLM、使用模型等），没有配置选项。
+5. 有限的外部数据源和服务集成。
+6. NotebookLM 代理专门针对学习和研究进行了优化，但你可以用源数据做更多事情。
+7. 缺乏多人协作支持。
+
+...还有更多。
+
+**SurfSense 正是为了解决这些问题而生。** SurfSense 赋予你：
+
+- **控制你的数据流** - 保持数据私密和安全。
+- **无数据限制** - 添加无限数量的来源和笔记本。
+- **无供应商锁定** - 配置任何 LLM、图像、TTS 和 STT 模型。
+- **25+ 外部数据源** - 从 Google Drive、OneDrive、Dropbox、Notion 和许多其他外部服务添加你的来源。
+- **实时多人协作支持** - 在共享笔记本中轻松与团队成员协作。
+
+...更多功能即将推出。



@ -34,7 +53,7 @@ https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
 ## 视频代理示例


-https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
+https://github.com/user-attachments/assets/012a7ffa-6f76-4f06-9dda-7632b470057a



@ -133,24 +152,29 @@ irm https://raw.githubusercontent.com/MODSetter/SurfSense/main/docker/scripts/in

   <p align="center"><img src="https://github.com/user-attachments/assets/3b04477d-8f42-4baa-be95-867c1eaeba87" alt="实时评论" /></p>

-## 核心功能
+## SurfSense vs Google NotebookLM

-| 功能 | 描述 |
-|------|------|
-| 开源替代方案 | 支持实时团队协作的 NotebookLM、Perplexity 和 Glean 替代品 |
-| 50+ 文件格式 | 通过 LlamaCloud、Unstructured 或 Docling（本地）上传文档、图像、视频 |
-| 混合搜索 | 语义搜索 + 全文搜索，结合层次化索引和倒数排名融合 |
-| 引用回答 | 与知识库对话，获得 Perplexity 风格的引用回答 |
-| 深度代理架构 | 基于 [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) 构建，支持规划、子代理和文件系统访问 |
-| 通用 LLM 支持 | 100+ LLM、6000+ 嵌入模型、所有主流重排序器，通过 OpenAI spec 和 LiteLLM |
-| 隐私优先 | 完整本地 LLM 支持（vLLM、Ollama），您的数据由您掌控 |
-| 团队协作 | RBAC 角色控制（所有者/管理员/编辑者/查看者），实时聊天和评论线程 |
-| 视频生成 | 生成带有旁白和视觉效果的视频 |
-| 演示文稿生成 | 创建可编辑的幻灯片式演示文稿 |
-| 播客生成 | 20 秒内生成 3 分钟播客；多种 TTS 提供商（OpenAI、Azure、Kokoro） |
-| 浏览器扩展 | 跨浏览器扩展，保存任何网页，包括需要身份验证的页面 |
-| 27+ 连接器 | 搜索引擎、Google Drive、OneDrive、Dropbox、Slack、Teams、Jira、Notion、GitHub、Discord 等[更多](#外部数据源) |
-| 可自托管 | 开源，Docker 一行命令或完整 Docker Compose 用于生产环境 |
+| 功能 | Google NotebookLM | SurfSense |
+|---------|-------------------|-----------|
+| **每个笔记本的来源数** | 50（免费）到 600（Ultra，$249.99/月） | 无限制 |
+| **笔记本数量** | 100（免费）到 500（付费方案） | 无限制 |
+| **来源大小限制** | 500,000 词 / 200MB 每个来源 | 无限制 |
+| **定价** | 免费版可用；Pro $19.99/月，Ultra $249.99/月 | 免费开源，在自己的基础设施上自托管 |
+| **LLM 支持** | 仅 Google Gemini | 100+ LLM，通过 OpenAI spec 和 LiteLLM |
+| **嵌入模型** | 仅 Google | 6,000+ 嵌入模型，所有主流重排序器 |
+| **本地 / 私有 LLM** | 不可用 | 完整支持（vLLM、Ollama）- 您的数据由您掌控 |
+| **可自托管** | 否 | 是 - Docker 一行命令或完整 Docker Compose |
+| **开源** | 否 | 是 |
+| **外部连接器** | Google Drive、YouTube、网站 | 27+ 连接器 - 搜索引擎、Google Drive、OneDrive、Dropbox、Slack、Teams、Jira、Notion、GitHub、Discord 等[更多](#外部数据源) |
+| **文件格式支持** | PDF、Docs、Slides、Sheets、CSV、Word、EPUB、图像、网页 URL、YouTube | 50+ 格式 - 文档、图像、视频，通过 LlamaCloud、Unstructured 或 Docling（本地） |
+| **搜索** | 语义搜索 | 混合搜索 - 语义 + 全文搜索，结合层次化索引和倒数排名融合 |
+| **引用回答** | 是 | 是 - Perplexity 风格的引用回答 |
+| **代理架构** | 否 | 是 - 基于 [LangChain Deep Agents](https://docs.langchain.com/oss/python/deepagents/overview) 构建，支持规划、子代理和文件系统访问 |
+| **实时多人协作** | 共享笔记本，支持查看者/编辑者角色（无实时聊天） | RBAC 角色控制（所有者/管理员/编辑者/查看者），实时聊天和评论线程 |
+| **视频生成** | 通过 Veo 3 的电影级视频概览（仅 Ultra） | 可用（NotebookLM 在此方面更好，正在积极改进） |
+| **演示文稿生成** | 更美观的幻灯片但不可编辑 | 创建可编辑的幻灯片式演示文稿 |
+| **播客生成** | 可自定义主持人和语言的音频概览 | 可用，支持多种 TTS 提供商（NotebookLM 在此方面更好，正在积极改进） |
+| **浏览器扩展** | 否 | 跨浏览器扩展，保存任何网页，包括需要身份验证的页面 |

 <details>
 <summary><b>外部数据源完整列表</b></summary>
--- a/docs/chinese-llm-setup.md
+++ b/docs/chinese-llm-setup.md
@ -24,7 +24,7 @@ SurfSense 现已支持以下国产 LLM：

 1. 登录 SurfSense Dashboard
 2. 进入 **Settings** → **API Keys** (或 **LLM Configurations**)
-3. 点击 **Add LLM Model**
+3. 点击 **Add Model**
 4. 从 **Provider** 下拉菜单中选择你的国产 LLM 提供商
 5. 填写必填字段（见下方各提供商详细配置）
 6. 点击 **Save**
--- a/package-lock.json
+++ b/package-lock.json
@ -0,0 +1,6 @@
+{
+  "name": "SurfSense",
+  "lockfileVersion": 3,
+  "requires": true,
+  "packages": {}
+}
--- a/surfsense_backend/alembic/versions/116_create_zero_publication.py
+++ b/surfsense_backend/alembic/versions/116_create_zero_publication.py
@ -42,9 +42,7 @@ def upgrade() -> None:
    if not exists:
        table_list = ", ".join(TABLES)
        conn.execute(
-            sa.text(
-                f"CREATE PUBLICATION {PUBLICATION_NAME} FOR TABLE {table_list}"
-            )
+            sa.text(f"CREATE PUBLICATION {PUBLICATION_NAME} FOR TABLE {table_list}")
        )


--- a/surfsense_backend/alembic/versions/117_optimize_zero_publication_column_lists.py
+++ b/surfsense_backend/alembic/versions/117_optimize_zero_publication_column_lists.py
@ -0,0 +1,123 @@
+"""optimize zero_publication with column lists
+
+Recreates the zero_publication using column lists for the documents
+table so that large text columns (content, source_markdown,
+blocknote_document, etc.) are excluded from WAL replication.
+This prevents RangeError: Invalid string length in zero-cache's
+change-streamer when documents have very large content.
+
+Also resets REPLICA IDENTITY to DEFAULT on tables that had it set
+to FULL for the old Electric SQL setup (migration 66/75/76).
+With DEFAULT (primary-key) identity, column-list publications
+only need to include the PK — not every column.
+
+IMPORTANT — before AND after running this migration:
+  1. Stop zero-cache  (it holds replication locks that will deadlock DDL)
+  2. Run:  alembic upgrade head
+  3. Delete / reset the zero-cache data volume
+  4. Restart zero-cache  (it will do a fresh initial sync)
+
+Revision ID: 117
+Revises: 116
+"""
+
+from collections.abc import Sequence
+
+import sqlalchemy as sa
+
+from alembic import op
+
+revision: str = "117"
+down_revision: str | None = "116"
+branch_labels: str | Sequence[str] | None = None
+depends_on: str | Sequence[str] | None = None
+
+PUBLICATION_NAME = "zero_publication"
+
+TABLES_WITH_FULL_IDENTITY = [
+    "documents",
+    "notifications",
+    "search_source_connectors",
+    "new_chat_messages",
+    "chat_comments",
+    "chat_session_state",
+]
+
+DOCUMENT_COLS = [
+    "id",
+    "title",
+    "document_type",
+    "search_space_id",
+    "folder_id",
+    "created_by_id",
+    "status",
+    "created_at",
+    "updated_at",
+]
+
+PUBLICATION_DDL_FULL = f"""\
+CREATE PUBLICATION {PUBLICATION_NAME} FOR TABLE
+  notifications, documents, folders,
+  search_source_connectors, new_chat_messages,
+  chat_comments, chat_session_state
+"""
+
+
+def _terminate_blocked_pids(conn, table: str) -> None:
+    """Kill backends whose locks on *table* would block our AccessExclusiveLock."""
+    conn.execute(
+        sa.text(
+            "SELECT pg_terminate_backend(l.pid) "
+            "FROM pg_locks l "
+            "JOIN pg_class c ON c.oid = l.relation "
+            "WHERE c.relname = :tbl "
+            "  AND l.pid != pg_backend_pid()"
+        ),
+        {"tbl": table},
+    )
+
+
+def upgrade() -> None:
+    conn = op.get_bind()
+
+    conn.execute(sa.text("SET lock_timeout = '10s'"))
+
+    for tbl in sorted(TABLES_WITH_FULL_IDENTITY):
+        _terminate_blocked_pids(conn, tbl)
+        conn.execute(sa.text(f'LOCK TABLE "{tbl}" IN ACCESS EXCLUSIVE MODE'))
+
+    for tbl in TABLES_WITH_FULL_IDENTITY:
+        conn.execute(sa.text(f'ALTER TABLE "{tbl}" REPLICA IDENTITY DEFAULT'))
+
+    conn.execute(sa.text(f"DROP PUBLICATION IF EXISTS {PUBLICATION_NAME}"))
+
+    has_zero_ver = conn.execute(
+        sa.text(
+            "SELECT 1 FROM information_schema.columns "
+            "WHERE table_name = 'documents' AND column_name = '_0_version'"
+        )
+    ).fetchone()
+
+    cols = DOCUMENT_COLS + (['"_0_version"'] if has_zero_ver else [])
+    col_list = ", ".join(cols)
+
+    conn.execute(
+        sa.text(
+            f"CREATE PUBLICATION {PUBLICATION_NAME} FOR TABLE "
+            f"notifications, "
+            f"documents ({col_list}), "
+            f"folders, "
+            f"search_source_connectors, "
+            f"new_chat_messages, "
+            f"chat_comments, "
+            f"chat_session_state"
+        )
+    )
+
+
+def downgrade() -> None:
+    conn = op.get_bind()
+    conn.execute(sa.text(f"DROP PUBLICATION IF EXISTS {PUBLICATION_NAME}"))
+    conn.execute(sa.text(PUBLICATION_DDL_FULL))
+    for tbl in TABLES_WITH_FULL_IDENTITY:
+        conn.execute(sa.text(f'ALTER TABLE "{tbl}" REPLICA IDENTITY FULL'))
--- a/surfsense_backend/alembic/versions/118_add_local_folder_sync_and_versioning.py
+++ b/surfsense_backend/alembic/versions/118_add_local_folder_sync_and_versioning.py
@ -0,0 +1,149 @@
+"""Add LOCAL_FOLDER_FILE document type, folder metadata, and document_versions table
+
+Revision ID: 118
+Revises: 117
+"""
+
+from collections.abc import Sequence
+
+import sqlalchemy as sa
+
+from alembic import op
+
+revision: str = "118"
+down_revision: str | None = "117"
+branch_labels: str | Sequence[str] | None = None
+depends_on: str | Sequence[str] | None = None
+
+PUBLICATION_NAME = "zero_publication"
+
+
+def upgrade() -> None:
+    conn = op.get_bind()
+
+    # Add LOCAL_FOLDER_FILE to documenttype enum
+    op.execute(
+        """
+    DO $$
+    BEGIN
+        IF NOT EXISTS (
+            SELECT 1 FROM pg_type t
+            JOIN pg_enum e ON t.oid = e.enumtypid
+            WHERE t.typname = 'documenttype' AND e.enumlabel = 'LOCAL_FOLDER_FILE'
+        ) THEN
+            ALTER TYPE documenttype ADD VALUE 'LOCAL_FOLDER_FILE';
+        END IF;
+    END
+    $$;
+    """
+    )
+
+    # Add JSONB metadata column to folders table
+    col_exists = conn.execute(
+        sa.text(
+            "SELECT 1 FROM information_schema.columns "
+            "WHERE table_name = 'folders' AND column_name = 'metadata'"
+        )
+    ).fetchone()
+    if not col_exists:
+        op.add_column(
+            "folders",
+            sa.Column("metadata", sa.dialects.postgresql.JSONB, nullable=True),
+        )
+
+    # Create document_versions table
+    table_exists = conn.execute(
+        sa.text(
+            "SELECT 1 FROM information_schema.tables WHERE table_name = 'document_versions'"
+        )
+    ).fetchone()
+    if not table_exists:
+        op.create_table(
+            "document_versions",
+            sa.Column("id", sa.Integer(), nullable=False, autoincrement=True),
+            sa.Column("document_id", sa.Integer(), nullable=False),
+            sa.Column("version_number", sa.Integer(), nullable=False),
+            sa.Column("source_markdown", sa.Text(), nullable=True),
+            sa.Column("content_hash", sa.String(), nullable=False),
+            sa.Column("title", sa.String(), nullable=True),
+            sa.Column(
+                "created_at",
+                sa.TIMESTAMP(timezone=True),
+                server_default=sa.text("now()"),
+                nullable=False,
+            ),
+            sa.ForeignKeyConstraint(
+                ["document_id"],
+                ["documents.id"],
+                ondelete="CASCADE",
+            ),
+            sa.PrimaryKeyConstraint("id"),
+            sa.UniqueConstraint(
+                "document_id",
+                "version_number",
+                name="uq_document_version",
+            ),
+        )
+
+    op.execute(
+        "CREATE INDEX IF NOT EXISTS ix_document_versions_document_id "
+        "ON document_versions (document_id)"
+    )
+    op.execute(
+        "CREATE INDEX IF NOT EXISTS ix_document_versions_created_at "
+        "ON document_versions (created_at)"
+    )
+
+    # Add document_versions to Zero publication
+    pub_exists = conn.execute(
+        sa.text("SELECT 1 FROM pg_publication WHERE pubname = :name"),
+        {"name": PUBLICATION_NAME},
+    ).fetchone()
+    if pub_exists:
+        already_in_pub = conn.execute(
+            sa.text(
+                "SELECT 1 FROM pg_publication_tables "
+                "WHERE pubname = :name AND tablename = 'document_versions'"
+            ),
+            {"name": PUBLICATION_NAME},
+        ).fetchone()
+        if not already_in_pub:
+            op.execute(
+                f"ALTER PUBLICATION {PUBLICATION_NAME} ADD TABLE document_versions"
+            )
+
+
+def downgrade() -> None:
+    conn = op.get_bind()
+
+    # Remove from publication
+    pub_exists = conn.execute(
+        sa.text("SELECT 1 FROM pg_publication WHERE pubname = :name"),
+        {"name": PUBLICATION_NAME},
+    ).fetchone()
+    if pub_exists:
+        already_in_pub = conn.execute(
+            sa.text(
+                "SELECT 1 FROM pg_publication_tables "
+                "WHERE pubname = :name AND tablename = 'document_versions'"
+            ),
+            {"name": PUBLICATION_NAME},
+        ).fetchone()
+        if already_in_pub:
+            op.execute(
+                f"ALTER PUBLICATION {PUBLICATION_NAME} DROP TABLE document_versions"
+            )
+
+    op.execute("DROP INDEX IF EXISTS ix_document_versions_created_at")
+    op.execute("DROP INDEX IF EXISTS ix_document_versions_document_id")
+    op.execute("DROP TABLE IF EXISTS document_versions")
+
+    # Drop metadata column from folders
+    col_exists = conn.execute(
+        sa.text(
+            "SELECT 1 FROM information_schema.columns "
+            "WHERE table_name = 'folders' AND column_name = 'metadata'"
+        )
+    ).fetchone()
+    if col_exists:
+        op.drop_column("folders", "metadata")
--- a/surfsense_backend/alembic/versions/51_add_new_llm_config_table.py
+++ b/surfsense_backend/alembic/versions/51_add_new_llm_config_table.py
@ -17,10 +17,10 @@ depends_on: str | Sequence[str] | None = None

 def upgrade() -> None:
    """
-    Add the new_llm_configs table that combines LLM model settings with prompt configuration.
+    Add the new_llm_configs table that combines model settings with prompt configuration.

    This table includes:
-    - LLM model configuration (provider, model_name, api_key, etc.)
+    - Model configuration (provider, model_name, api_key, etc.)
    - Configurable system instructions
    - Citation toggle
    """
@ -41,7 +41,7 @@ def upgrade() -> None:
                    name VARCHAR(100) NOT NULL,
                    description VARCHAR(500),
                    
-                    -- LLM Model Configuration (same as llm_configs, excluding language)
+                    -- Model Configuration (same as llm_configs, excluding language)
                    provider litellmprovider NOT NULL,
                    custom_provider VARCHAR(100),
                    model_name VARCHAR(100) NOT NULL,
--- a/surfsense_backend/app/agents/new_chat/chat_deepagent.py
+++ b/surfsense_backend/app/agents/new_chat/chat_deepagent.py
@ -159,6 +159,7 @@ async def create_surfsense_deep_agent(
    additional_tools: Sequence[BaseTool] | None = None,
    firecrawl_api_key: str | None = None,
    thread_visibility: ChatVisibility | None = None,
+    mentioned_document_ids: list[int] | None = None,
 ):
    """
    Create a SurfSense deep agent with configurable tools and prompts.
@ -451,6 +452,7 @@ async def create_surfsense_deep_agent(
            search_space_id=search_space_id,
            available_connectors=available_connectors,
            available_document_types=available_document_types,
+            mentioned_document_ids=mentioned_document_ids,
        ),
        SurfSenseFilesystemMiddleware(
            search_space_id=search_space_id,
--- a/surfsense_backend/app/agents/new_chat/middleware/filesystem.py
+++ b/surfsense_backend/app/agents/new_chat/middleware/filesystem.py
@ -66,6 +66,16 @@ the `<chunk_index>`, identify chunks marked `matched="true"`, then use
 those sections instead of reading the entire file sequentially.

 Use `<chunk id='...'>` values as citation IDs in your answers.
+
+## User-Mentioned Documents
+
+When the `ls` output tags a file with `[MENTIONED BY USER — read deeply]`,
+the user **explicitly selected** that document. These files are your highest-
+priority sources:
+1. **Always read them thoroughly** — scan the full `<chunk_index>`, then read
+   all major sections, not just matched chunks.
+2. **Prefer their content** over other search results when answering.
+3. **Cite from them first** whenever applicable.
 """

 # =============================================================================
--- a/surfsense_backend/app/agents/new_chat/middleware/knowledge_search.py
+++ b/surfsense_backend/app/agents/new_chat/middleware/knowledge_search.py
@ -28,7 +28,13 @@ from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession

 from app.agents.new_chat.utils import parse_date_or_datetime, resolve_date_range
-from app.db import NATIVE_TO_LEGACY_DOCTYPE, Document, Folder, shielded_async_session
+from app.db import (
+    NATIVE_TO_LEGACY_DOCTYPE,
+    Chunk,
+    Document,
+    Folder,
+    shielded_async_session,
+)
 from app.retriever.chunks_hybrid_search import ChucksHybridSearchRetriever
 from app.utils.document_converters import embed_texts
 from app.utils.perf import get_perf_logger
@ -430,21 +436,36 @@ async def _get_folder_paths(
 def _build_synthetic_ls(
    existing_files: dict[str, Any] | None,
    new_files: dict[str, Any],
+    *,
+    mentioned_paths: set[str] | None = None,
 ) -> tuple[AIMessage, ToolMessage]:
    """Build a synthetic ls("/documents") tool-call + result for the LLM context.

-    Paths are listed with *new* (rank-ordered) files first, then existing files
-    that were already in state from prior turns.
+    Mentioned files are listed first.  A separate header tells the LLM which
+    files the user explicitly selected; the path list itself stays clean so
+    paths can be passed directly to ``read_file`` without stripping tags.
    """
+    _mentioned = mentioned_paths or set()
    merged: dict[str, Any] = {**(existing_files or {}), **new_files}
    doc_paths = [
        p for p, v in merged.items() if p.startswith("/documents/") and v is not None
    ]

    new_set = set(new_files)
-    new_paths = [p for p in doc_paths if p in new_set]
+    mentioned_list = [p for p in doc_paths if p in _mentioned]
+    new_non_mentioned = [p for p in doc_paths if p in new_set and p not in _mentioned]
    old_paths = [p for p in doc_paths if p not in new_set]
-    ordered = new_paths + old_paths
+    ordered = mentioned_list + new_non_mentioned + old_paths
+
+    parts: list[str] = []
+    if mentioned_list:
+        parts.append(
+            "USER-MENTIONED documents (read these thoroughly before answering):"
+        )
+        for p in mentioned_list:
+            parts.append(f"  {p}")
+        parts.append("")
+    parts.append(str(ordered) if ordered else "No documents found.")

    tool_call_id = f"auto_ls_{uuid.uuid4().hex[:12]}"
    ai_msg = AIMessage(
@ -452,7 +473,7 @@ def _build_synthetic_ls(
        tool_calls=[{"name": "ls", "args": {"path": "/documents"}, "id": tool_call_id}],
    )
    tool_msg = ToolMessage(
-        content=str(ordered) if ordered else "No documents found.",
+        content="\n".join(parts),
        tool_call_id=tool_call_id,
    )
    return ai_msg, tool_msg
@ -524,12 +545,92 @@ async def search_knowledge_base(
    return results[:top_k]


+async def fetch_mentioned_documents(
+    *,
+    document_ids: list[int],
+    search_space_id: int,
+) -> list[dict[str, Any]]:
+    """Fetch explicitly mentioned documents with *all* their chunks.
+
+    Returns the same dict structure as ``search_knowledge_base`` so results
+    can be merged directly into ``build_scoped_filesystem``.  Unlike search
+    results, every chunk is included (no top-K limiting) and none are marked
+    as ``matched`` since the entire document is relevant by virtue of the
+    user's explicit mention.
+    """
+    if not document_ids:
+        return []
+
+    async with shielded_async_session() as session:
+        doc_result = await session.execute(
+            select(Document).where(
+                Document.id.in_(document_ids),
+                Document.search_space_id == search_space_id,
+            )
+        )
+        docs = {doc.id: doc for doc in doc_result.scalars().all()}
+
+        if not docs:
+            return []
+
+        chunk_result = await session.execute(
+            select(Chunk.id, Chunk.content, Chunk.document_id)
+            .where(Chunk.document_id.in_(list(docs.keys())))
+            .order_by(Chunk.document_id, Chunk.id)
+        )
+        chunks_by_doc: dict[int, list[dict[str, Any]]] = {doc_id: [] for doc_id in docs}
+        for row in chunk_result.all():
+            if row.document_id in chunks_by_doc:
+                chunks_by_doc[row.document_id].append(
+                    {"chunk_id": row.id, "content": row.content}
+                )
+
+    results: list[dict[str, Any]] = []
+    for doc_id in document_ids:
+        doc = docs.get(doc_id)
+        if doc is None:
+            continue
+        metadata = doc.document_metadata or {}
+        results.append(
+            {
+                "document_id": doc.id,
+                "content": "",
+                "score": 1.0,
+                "chunks": chunks_by_doc.get(doc.id, []),
+                "matched_chunk_ids": [],
+                "document": {
+                    "id": doc.id,
+                    "title": doc.title,
+                    "document_type": (
+                        doc.document_type.value
+                        if getattr(doc, "document_type", None)
+                        else None
+                    ),
+                    "metadata": metadata,
+                },
+                "source": (
+                    doc.document_type.value
+                    if getattr(doc, "document_type", None)
+                    else None
+                ),
+                "_user_mentioned": True,
+            }
+        )
+    return results
+
+
 async def build_scoped_filesystem(
    *,
    documents: Sequence[dict[str, Any]],
    search_space_id: int,
-) -> dict[str, dict[str, str]]:
-    """Build a StateBackend-compatible files dict from search results."""
+) -> tuple[dict[str, dict[str, str]], dict[int, str]]:
+    """Build a StateBackend-compatible files dict from search results.
+
+    Returns ``(files, doc_id_to_path)`` so callers can reliably map a
+    document id back to its filesystem path without guessing by title.
+    Paths are collision-proof: when two documents resolve to the same
+    path the doc-id is appended to disambiguate.
+    """
    async with shielded_async_session() as session:
        folder_paths = await _get_folder_paths(session, search_space_id)
        doc_ids = [
@ -551,6 +652,7 @@ async def build_scoped_filesystem(
            }

    files: dict[str, dict[str, str]] = {}
+    doc_id_to_path: dict[int, str] = {}
    for document in documents:
        doc_meta = document.get("document") or {}
        title = str(doc_meta.get("title") or "untitled")
@ -559,6 +661,9 @@ async def build_scoped_filesystem(
        base_folder = folder_paths.get(folder_id, "/documents")
        file_name = _safe_filename(title)
        path = f"{base_folder}/{file_name}"
+        if path in files:
+            stem = file_name.removesuffix(".xml")
+            path = f"{base_folder}/{stem} ({doc_id}).xml"
        matched_ids = set(document.get("matched_chunk_ids") or [])
        xml_content = _build_document_xml(document, matched_chunk_ids=matched_ids)
        files[path] = {
@ -567,7 +672,9 @@ async def build_scoped_filesystem(
            "created_at": "",
            "modified_at": "",
        }
-    return files
+        if isinstance(doc_id, int):
+            doc_id_to_path[doc_id] = path
+    return files, doc_id_to_path


 class KnowledgeBaseSearchMiddleware(AgentMiddleware):  # type: ignore[type-arg]
@ -583,12 +690,14 @@ class KnowledgeBaseSearchMiddleware(AgentMiddleware):  # type: ignore[type-arg]
        available_connectors: list[str] | None = None,
        available_document_types: list[str] | None = None,
        top_k: int = 10,
+        mentioned_document_ids: list[int] | None = None,
    ) -> None:
        self.llm = llm
        self.search_space_id = search_space_id
        self.available_connectors = available_connectors
        self.available_document_types = available_document_types
        self.top_k = top_k
+        self.mentioned_document_ids = mentioned_document_ids or []

    async def _plan_search_inputs(
        self,
@ -680,6 +789,18 @@ class KnowledgeBaseSearchMiddleware(AgentMiddleware):  # type: ignore[type-arg]
            user_text=user_text,
        )

+        # --- 1. Fetch mentioned documents (user-selected, all chunks) ---
+        mentioned_results: list[dict[str, Any]] = []
+        if self.mentioned_document_ids:
+            mentioned_results = await fetch_mentioned_documents(
+                document_ids=self.mentioned_document_ids,
+                search_space_id=self.search_space_id,
+            )
+            # Clear after first turn so they are not re-fetched on subsequent
+            # messages within the same agent instance.
+            self.mentioned_document_ids = []
+
+        # --- 2. Run KB hybrid search ---
        search_results = await search_knowledge_base(
            query=planned_query,
            search_space_id=self.search_space_id,
@ -689,19 +810,50 @@ class KnowledgeBaseSearchMiddleware(AgentMiddleware):  # type: ignore[type-arg]
            start_date=start_date,
            end_date=end_date,
        )
-        new_files = await build_scoped_filesystem(
-            documents=search_results,
+
+        # --- 3. Merge: mentioned first, then search (dedup by doc id) ---
+        seen_doc_ids: set[int] = set()
+        merged: list[dict[str, Any]] = []
+        for doc in mentioned_results:
+            doc_id = (doc.get("document") or {}).get("id")
+            if doc_id is not None:
+                seen_doc_ids.add(doc_id)
+            merged.append(doc)
+        for doc in search_results:
+            doc_id = (doc.get("document") or {}).get("id")
+            if doc_id is not None and doc_id in seen_doc_ids:
+                continue
+            merged.append(doc)
+
+        # --- 4. Build scoped filesystem ---
+        new_files, doc_id_to_path = await build_scoped_filesystem(
+            documents=merged,
            search_space_id=self.search_space_id,
        )

-        ai_msg, tool_msg = _build_synthetic_ls(existing_files, new_files)
+        # Identify which paths belong to user-mentioned documents using
+        # the authoritative doc_id -> path mapping (no title guessing).
+        mentioned_doc_ids = {
+            (d.get("document") or {}).get("id") for d in mentioned_results
+        }
+        mentioned_paths = {
+            doc_id_to_path[did] for did in mentioned_doc_ids if did in doc_id_to_path
+        }
+
+        ai_msg, tool_msg = _build_synthetic_ls(
+            existing_files,
+            new_files,
+            mentioned_paths=mentioned_paths,
+        )

        if t0 is not None:
            _perf_log.info(
-                "[kb_fs_middleware] completed in %.3fs query=%r optimized=%r new_files=%d total=%d",
+                "[kb_fs_middleware] completed in %.3fs query=%r optimized=%r "
+                "mentioned=%d new_files=%d total=%d",
                asyncio.get_event_loop().time() - t0,
                user_text[:80],
                planned_query[:120],
+                len(mentioned_results),
                len(new_files),
                len(new_files) + len(existing_files or {}),
            )
--- a/surfsense_backend/app/config/global_llm_config.example.yaml
+++ b/surfsense_backend/app/config/global_llm_config.example.yaml
@ -17,7 +17,7 @@
 # - Configure router_settings below to customize the load balancing behavior
 #
 # Structure matches NewLLMConfig:
-# - LLM model configuration (provider, model_name, api_key, etc.)
+# - Model configuration (provider, model_name, api_key, etc.)
 # - Prompt configuration (system_instructions, citations_enabled)

 # Router Settings for Auto Mode
--- a/surfsense_backend/app/db.py
+++ b/surfsense_backend/app/db.py
@ -64,6 +64,7 @@ class DocumentType(StrEnum):
    COMPOSIO_GOOGLE_DRIVE_CONNECTOR = "COMPOSIO_GOOGLE_DRIVE_CONNECTOR"
    COMPOSIO_GMAIL_CONNECTOR = "COMPOSIO_GMAIL_CONNECTOR"
    COMPOSIO_GOOGLE_CALENDAR_CONNECTOR = "COMPOSIO_GOOGLE_CALENDAR_CONNECTOR"
+    LOCAL_FOLDER_FILE = "LOCAL_FOLDER_FILE"


 # Native Google document types → their legacy Composio equivalents.
@ -955,6 +956,7 @@ class Folder(BaseModel, TimestampMixin):
        onupdate=lambda: datetime.now(UTC),
        index=True,
    )
+    folder_metadata = Column("metadata", JSONB, nullable=True)

    parent = relationship("Folder", remote_side="Folder.id", backref="children")
    search_space = relationship("SearchSpace", back_populates="folders")
@ -1039,6 +1041,26 @@ class Document(BaseModel, TimestampMixin):
    )


+class DocumentVersion(BaseModel, TimestampMixin):
+    __tablename__ = "document_versions"
+    __table_args__ = (
+        UniqueConstraint("document_id", "version_number", name="uq_document_version"),
+    )
+
+    document_id = Column(
+        Integer,
+        ForeignKey("documents.id", ondelete="CASCADE"),
+        nullable=False,
+        index=True,
+    )
+    version_number = Column(Integer, nullable=False)
+    source_markdown = Column(Text, nullable=True)
+    content_hash = Column(String, nullable=False)
+    title = Column(String, nullable=True)
+
+    document = relationship("Document", backref="versions")
+
+
 class Chunk(BaseModel, TimestampMixin):
    __tablename__ = "chunks"

--- a/surfsense_backend/app/indexing_pipeline/exceptions.py
+++ b/surfsense_backend/app/indexing_pipeline/exceptions.py
@ -59,7 +59,7 @@ class PipelineMessages:

    LLM_AUTH = "LLM authentication failed. Check your API key."
    LLM_PERMISSION = "LLM request denied. Check your account permissions."
-    LLM_NOT_FOUND = "LLM model not found. Check your model configuration."
+    LLM_NOT_FOUND = "Model not found. Check your model configuration."
    LLM_BAD_REQUEST = "LLM rejected the request. Document content may be invalid."
    LLM_UNPROCESSABLE = (
        "Document exceeds the LLM context window even after optimization."
@ -67,7 +67,7 @@ class PipelineMessages:
    LLM_RESPONSE = "LLM returned an invalid response."
    LLM_AUTH = "LLM authentication failed. Check your API key."
    LLM_PERMISSION = "LLM request denied. Check your account permissions."
-    LLM_NOT_FOUND = "LLM model not found. Check your model configuration."
+    LLM_NOT_FOUND = "Model not found. Check your model configuration."
    LLM_BAD_REQUEST = "LLM rejected the request. Document content may be invalid."
    LLM_UNPROCESSABLE = (
        "Document exceeds the LLM context window even after optimization."
--- a/surfsense_backend/app/routes/init.py
+++ b/surfsense_backend/app/routes/init.py
@ -85,7 +85,7 @@ router.include_router(confluence_add_connector_router)
 router.include_router(clickup_add_connector_router)
 router.include_router(dropbox_add_connector_router)
 router.include_router(new_llm_config_router)  # LLM configs with prompt configuration
-router.include_router(model_list_router)  # Dynamic LLM model catalogue from OpenRouter
+router.include_router(model_list_router)  # Dynamic model catalogue from OpenRouter
 router.include_router(logs_router)
 router.include_router(circleback_webhook_router)  # Circleback meeting webhooks
 router.include_router(surfsense_docs_router)  # Surfsense documentation for citations
--- a/surfsense_backend/app/routes/documents_routes.py
+++ b/surfsense_backend/app/routes/documents_routes.py
@ -1,7 +1,8 @@
 # Force asyncio to use standard event loop before unstructured imports
 import asyncio

-from fastapi import APIRouter, Depends, Form, HTTPException, UploadFile
+from fastapi import APIRouter, Depends, Form, HTTPException, Query, UploadFile
+from pydantic import BaseModel as PydanticBaseModel
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
 from sqlalchemy.orm import selectinload
@ -10,6 +11,8 @@ from app.db import (
    Chunk,
    Document,
    DocumentType,
+    DocumentVersion,
+    Folder,
    Permission,
    SearchSpace,
    SearchSpaceMembership,
@ -17,6 +20,7 @@ from app.db import (
    get_async_session,
 )
 from app.schemas import (
+    ChunkRead,
    DocumentRead,
    DocumentsCreate,
    DocumentStatusBatchResponse,
@ -26,6 +30,7 @@ from app.schemas import (
    DocumentTitleSearchResponse,
    DocumentUpdate,
    DocumentWithChunksRead,
+    FolderRead,
    PaginatedResponse,
 )
 from app.services.task_dispatcher import TaskDispatcher, get_task_dispatcher
@ -45,9 +50,7 @@ os.environ["UNSTRUCTURED_HAS_PATCHED_LOOP"] = "1"

 router = APIRouter()

-MAX_FILES_PER_UPLOAD = 10
-MAX_FILE_SIZE_BYTES = 50 * 1024 * 1024  # 50 MB per file
-MAX_TOTAL_SIZE_BYTES = 200 * 1024 * 1024  # 200 MB total
+MAX_FILE_SIZE_BYTES = 500 * 1024 * 1024  # 500 MB per file


@router.post("/documents")
@ -156,13 +159,6 @@ async def create_documents_file_upload(
        if not files:
            raise HTTPException(status_code=400, detail="No files provided")

-        if len(files) > MAX_FILES_PER_UPLOAD:
-            raise HTTPException(
-                status_code=413,
-                detail=f"Too many files. Maximum {MAX_FILES_PER_UPLOAD} files per upload.",
-            )
-
-        total_size = 0
        for file in files:
            file_size = file.size or 0
            if file_size > MAX_FILE_SIZE_BYTES:
@ -171,14 +167,6 @@ async def create_documents_file_upload(
                    detail=f"File '{file.filename}' ({file_size / (1024 * 1024):.1f} MB) "
                    f"exceeds the {MAX_FILE_SIZE_BYTES // (1024 * 1024)} MB per-file limit.",
                )
-            total_size += file_size
-
-        if total_size > MAX_TOTAL_SIZE_BYTES:
-            raise HTTPException(
-                status_code=413,
-                detail=f"Total upload size ({total_size / (1024 * 1024):.1f} MB) "
-                f"exceeds the {MAX_TOTAL_SIZE_BYTES // (1024 * 1024)} MB limit.",
-            )

        # ===== Read all files concurrently to avoid blocking the event loop =====
        async def _read_and_save(file: UploadFile) -> tuple[str, str, int]:
@ -206,16 +194,6 @@ async def create_documents_file_upload(

        saved_files = await asyncio.gather(*(_read_and_save(f) for f in files))

-        actual_total_size = sum(size for _, _, size in saved_files)
-        if actual_total_size > MAX_TOTAL_SIZE_BYTES:
-            for temp_path, _, _ in saved_files:
-                os.unlink(temp_path)
-            raise HTTPException(
-                status_code=413,
-                detail=f"Total upload size ({actual_total_size / (1024 * 1024):.1f} MB) "
-                f"exceeds the {MAX_TOTAL_SIZE_BYTES // (1024 * 1024)} MB limit.",
-            )
-
        # ===== PHASE 1: Create pending documents for all files =====
        created_documents: list[Document] = []
        files_to_process: list[tuple[Document, str, str]] = []
@ -451,13 +429,15 @@ async def read_documents(
                    reason=doc.status.get("reason"),
                )

+            raw_content = doc.content or ""
            api_documents.append(
                DocumentRead(
                    id=doc.id,
                    title=doc.title,
                    document_type=doc.document_type,
                    document_metadata=doc.document_metadata,
-                    content=doc.content,
+                    content="",
+                    content_preview=raw_content[:300],
                    content_hash=doc.content_hash,
                    unique_identifier_hash=doc.unique_identifier_hash,
                    created_at=doc.created_at,
@ -609,13 +589,15 @@ async def search_documents(
                    reason=doc.status.get("reason"),
                )

+            raw_content = doc.content or ""
            api_documents.append(
                DocumentRead(
                    id=doc.id,
                    title=doc.title,
                    document_type=doc.document_type,
                    document_metadata=doc.document_metadata,
-                    content=doc.content,
+                    content="",
+                    content_preview=raw_content[:300],
                    content_hash=doc.content_hash,
                    unique_identifier_hash=doc.unique_identifier_hash,
                    created_at=doc.created_at,
@ -884,16 +866,19 @@ async def get_document_type_counts(
@router.get("/documents/by-chunk/{chunk_id}", response_model=DocumentWithChunksRead)
 async def get_document_by_chunk_id(
    chunk_id: int,
+    chunk_window: int = Query(
+        5, ge=0, description="Number of chunks before/after the cited chunk to include"
+    ),
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
 ):
    """
-    Retrieves a document based on a chunk ID, including all its chunks ordered by creation time.
-    Requires DOCUMENTS_READ permission for the search space.
-    The document's embedding and chunk embeddings are excluded from the response.
+    Retrieves a document based on a chunk ID, including a window of chunks around the cited one.
+    Uses SQL-level pagination to avoid loading all chunks into memory.
    """
    try:
-        # First, get the chunk and verify it exists
+        from sqlalchemy import and_, func, or_
+
        chunk_result = await session.execute(select(Chunk).filter(Chunk.id == chunk_id))
        chunk = chunk_result.scalars().first()

@ -902,11 +887,8 @@ async def get_document_by_chunk_id(
                status_code=404, detail=f"Chunk with id {chunk_id} not found"
            )

-        # Get the associated document
        document_result = await session.execute(
-            select(Document)
-            .options(selectinload(Document.chunks))
-            .filter(Document.id == chunk.document_id)
+            select(Document).filter(Document.id == chunk.document_id)
        )
        document = document_result.scalars().first()

@ -916,7 +898,6 @@ async def get_document_by_chunk_id(
                detail="Document not found",
            )

-        # Check permission for the search space
        await check_permission(
            session,
            user,
@ -925,10 +906,38 @@ async def get_document_by_chunk_id(
            "You don't have permission to read documents in this search space",
        )

-        # Sort chunks by creation time
-        sorted_chunks = sorted(document.chunks, key=lambda x: x.created_at)
+        total_result = await session.execute(
+            select(func.count())
+            .select_from(Chunk)
+            .filter(Chunk.document_id == document.id)
+        )
+        total_chunks = total_result.scalar() or 0
+
+        cited_idx_result = await session.execute(
+            select(func.count())
+            .select_from(Chunk)
+            .filter(
+                Chunk.document_id == document.id,
+                or_(
+                    Chunk.created_at < chunk.created_at,
+                    and_(Chunk.created_at == chunk.created_at, Chunk.id < chunk.id),
+                ),
+            )
+        )
+        cited_idx = cited_idx_result.scalar() or 0
+
+        start = max(0, cited_idx - chunk_window)
+        end = min(total_chunks, cited_idx + chunk_window + 1)
+
+        windowed_result = await session.execute(
+            select(Chunk)
+            .filter(Chunk.document_id == document.id)
+            .order_by(Chunk.created_at, Chunk.id)
+            .offset(start)
+            .limit(end - start)
+        )
+        windowed_chunks = windowed_result.scalars().all()

-        # Return the document with its chunks
        return DocumentWithChunksRead(
            id=document.id,
            title=document.title,
@ -940,7 +949,9 @@ async def get_document_by_chunk_id(
            created_at=document.created_at,
            updated_at=document.updated_at,
            search_space_id=document.search_space_id,
-            chunks=sorted_chunks,
+            chunks=windowed_chunks,
+            total_chunks=total_chunks,
+            chunk_start_index=start,
        )
    except HTTPException:
        raise
@ -950,6 +961,108 @@ async def get_document_by_chunk_id(
        ) from e


+@router.get("/documents/watched-folders", response_model=list[FolderRead])
+async def get_watched_folders(
+    search_space_id: int,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """Return root folders that are marked as watched (metadata->>'watched' = 'true')."""
+    await check_permission(
+        session,
+        user,
+        search_space_id,
+        Permission.DOCUMENTS_READ.value,
+        "You don't have permission to read documents in this search space",
+    )
+
+    folders = (
+        (
+            await session.execute(
+                select(Folder).where(
+                    Folder.search_space_id == search_space_id,
+                    Folder.parent_id.is_(None),
+                    Folder.folder_metadata.isnot(None),
+                    Folder.folder_metadata["watched"].astext == "true",
+                )
+            )
+        )
+        .scalars()
+        .all()
+    )
+
+    return folders
+
+
+@router.get(
+    "/documents/{document_id}/chunks",
+    response_model=PaginatedResponse[ChunkRead],
+)
+async def get_document_chunks_paginated(
+    document_id: int,
+    page: int = Query(0, ge=0),
+    page_size: int = Query(20, ge=1, le=100),
+    start_offset: int | None = Query(
+        None, ge=0, description="Direct offset; overrides page * page_size"
+    ),
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """
+    Paginated chunk loading for a document.
+    Supports both page-based and offset-based access.
+    """
+    try:
+        from sqlalchemy import func
+
+        doc_result = await session.execute(
+            select(Document).filter(Document.id == document_id)
+        )
+        document = doc_result.scalars().first()
+
+        if not document:
+            raise HTTPException(status_code=404, detail="Document not found")
+
+        await check_permission(
+            session,
+            user,
+            document.search_space_id,
+            Permission.DOCUMENTS_READ.value,
+            "You don't have permission to read documents in this search space",
+        )
+
+        total_result = await session.execute(
+            select(func.count())
+            .select_from(Chunk)
+            .filter(Chunk.document_id == document_id)
+        )
+        total = total_result.scalar() or 0
+
+        offset = start_offset if start_offset is not None else page * page_size
+        chunks_result = await session.execute(
+            select(Chunk)
+            .filter(Chunk.document_id == document_id)
+            .order_by(Chunk.created_at, Chunk.id)
+            .offset(offset)
+            .limit(page_size)
+        )
+        chunks = chunks_result.scalars().all()
+
+        return PaginatedResponse(
+            items=chunks,
+            total=total,
+            page=offset // page_size if page_size else page,
+            page_size=page_size,
+            has_more=(offset + len(chunks)) < total,
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to fetch chunks: {e!s}"
+        ) from e
+
+
@router.get("/documents/{document_id}", response_model=DocumentRead)
 async def read_document(
    document_id: int,
@ -980,13 +1093,14 @@ async def read_document(
            "You don't have permission to read documents in this search space",
        )

-        # Convert database object to API-friendly format
+        raw_content = document.content or ""
        return DocumentRead(
            id=document.id,
            title=document.title,
            document_type=document.document_type,
            document_metadata=document.document_metadata,
-            content=document.content,
+            content=raw_content,
+            content_preview=raw_content[:300],
            content_hash=document.content_hash,
            unique_identifier_hash=document.unique_identifier_hash,
            created_at=document.created_at,
@ -1135,3 +1249,297 @@ async def delete_document(
        raise HTTPException(
            status_code=500, detail=f"Failed to delete document: {e!s}"
        ) from e
+
+
+# ====================================================================
+# Version History Endpoints
+# ====================================================================
+
+
+@router.get("/documents/{document_id}/versions")
+async def list_document_versions(
+    document_id: int,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """List all versions for a document, ordered by version_number descending."""
+    document = (
+        await session.execute(select(Document).where(Document.id == document_id))
+    ).scalar_one_or_none()
+    if not document:
+        raise HTTPException(status_code=404, detail="Document not found")
+
+    await check_permission(
+        session, user, document.search_space_id, Permission.DOCUMENTS_READ.value
+    )
+
+    versions = (
+        (
+            await session.execute(
+                select(DocumentVersion)
+                .where(DocumentVersion.document_id == document_id)
+                .order_by(DocumentVersion.version_number.desc())
+            )
+        )
+        .scalars()
+        .all()
+    )
+
+    return [
+        {
+            "version_number": v.version_number,
+            "title": v.title,
+            "content_hash": v.content_hash,
+            "created_at": v.created_at.isoformat() if v.created_at else None,
+        }
+        for v in versions
+    ]
+
+
+@router.get("/documents/{document_id}/versions/{version_number}")
+async def get_document_version(
+    document_id: int,
+    version_number: int,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """Get full version content including source_markdown."""
+    document = (
+        await session.execute(select(Document).where(Document.id == document_id))
+    ).scalar_one_or_none()
+    if not document:
+        raise HTTPException(status_code=404, detail="Document not found")
+
+    await check_permission(
+        session, user, document.search_space_id, Permission.DOCUMENTS_READ.value
+    )
+
+    version = (
+        await session.execute(
+            select(DocumentVersion).where(
+                DocumentVersion.document_id == document_id,
+                DocumentVersion.version_number == version_number,
+            )
+        )
+    ).scalar_one_or_none()
+    if not version:
+        raise HTTPException(status_code=404, detail="Version not found")
+
+    return {
+        "version_number": version.version_number,
+        "title": version.title,
+        "content_hash": version.content_hash,
+        "source_markdown": version.source_markdown,
+        "created_at": version.created_at.isoformat() if version.created_at else None,
+    }
+
+
+@router.post("/documents/{document_id}/versions/{version_number}/restore")
+async def restore_document_version(
+    document_id: int,
+    version_number: int,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """Restore a previous version: snapshot current state, then overwrite document content."""
+    document = (
+        await session.execute(select(Document).where(Document.id == document_id))
+    ).scalar_one_or_none()
+    if not document:
+        raise HTTPException(status_code=404, detail="Document not found")
+
+    await check_permission(
+        session, user, document.search_space_id, Permission.DOCUMENTS_UPDATE.value
+    )
+
+    version = (
+        await session.execute(
+            select(DocumentVersion).where(
+                DocumentVersion.document_id == document_id,
+                DocumentVersion.version_number == version_number,
+            )
+        )
+    ).scalar_one_or_none()
+    if not version:
+        raise HTTPException(status_code=404, detail="Version not found")
+
+    # Snapshot current state before restoring
+    from app.utils.document_versioning import create_version_snapshot
+
+    await create_version_snapshot(session, document)
+
+    # Restore the version's content onto the document
+    document.source_markdown = version.source_markdown
+    document.title = version.title or document.title
+    document.content_needs_reindexing = True
+    await session.commit()
+
+    from app.tasks.celery_tasks.document_reindex_tasks import reindex_document_task
+
+    reindex_document_task.delay(document_id, str(user.id))
+
+    return {
+        "message": f"Restored version {version_number}",
+        "document_id": document_id,
+        "restored_version": version_number,
+    }
+
+
+# ===== Local folder indexing endpoints =====
+
+
+class FolderIndexRequest(PydanticBaseModel):
+    folder_path: str
+    folder_name: str
+    search_space_id: int
+    exclude_patterns: list[str] | None = None
+    file_extensions: list[str] | None = None
+    root_folder_id: int | None = None
+    enable_summary: bool = False
+
+
+class FolderIndexFilesRequest(PydanticBaseModel):
+    folder_path: str
+    folder_name: str
+    search_space_id: int
+    target_file_paths: list[str]
+    root_folder_id: int | None = None
+    enable_summary: bool = False
+
+
+@router.post("/documents/folder-index")
+async def folder_index(
+    request: FolderIndexRequest,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """Full-scan index of a local folder. Creates the root Folder row synchronously
+    and dispatches the heavy indexing work to a Celery task.
+    Returns the root_folder_id so the desktop can persist it.
+    """
+    from app.config import config as app_config
+
+    if not app_config.is_self_hosted():
+        raise HTTPException(
+            status_code=400,
+            detail="Local folder indexing is only available in self-hosted mode",
+        )
+
+    await check_permission(
+        session,
+        user,
+        request.search_space_id,
+        Permission.DOCUMENTS_CREATE.value,
+        "You don't have permission to create documents in this search space",
+    )
+
+    watched_metadata = {
+        "watched": True,
+        "folder_path": request.folder_path,
+        "exclude_patterns": request.exclude_patterns,
+        "file_extensions": request.file_extensions,
+    }
+
+    root_folder_id = request.root_folder_id
+    if root_folder_id:
+        existing = (
+            await session.execute(select(Folder).where(Folder.id == root_folder_id))
+        ).scalar_one_or_none()
+        if not existing:
+            root_folder_id = None
+        else:
+            existing.folder_metadata = watched_metadata
+            await session.commit()
+
+    if not root_folder_id:
+        root_folder = Folder(
+            name=request.folder_name,
+            search_space_id=request.search_space_id,
+            created_by_id=str(user.id),
+            position="a0",
+            folder_metadata=watched_metadata,
+        )
+        session.add(root_folder)
+        await session.flush()
+        root_folder_id = root_folder.id
+        await session.commit()
+
+    from app.tasks.celery_tasks.document_tasks import index_local_folder_task
+
+    index_local_folder_task.delay(
+        search_space_id=request.search_space_id,
+        user_id=str(user.id),
+        folder_path=request.folder_path,
+        folder_name=request.folder_name,
+        exclude_patterns=request.exclude_patterns,
+        file_extensions=request.file_extensions,
+        root_folder_id=root_folder_id,
+        enable_summary=request.enable_summary,
+    )
+
+    return {
+        "message": "Folder indexing started",
+        "status": "processing",
+        "root_folder_id": root_folder_id,
+    }
+
+
+@router.post("/documents/folder-index-files")
+async def folder_index_files(
+    request: FolderIndexFilesRequest,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """Index multiple files within a watched folder (batched chokidar trigger).
+    Validates that all target_file_paths are under folder_path.
+    Dispatches a single Celery task that processes them in parallel.
+    """
+    from app.config import config as app_config
+
+    if not app_config.is_self_hosted():
+        raise HTTPException(
+            status_code=400,
+            detail="Local folder indexing is only available in self-hosted mode",
+        )
+
+    if not request.target_file_paths:
+        raise HTTPException(
+            status_code=400, detail="target_file_paths must not be empty"
+        )
+
+    await check_permission(
+        session,
+        user,
+        request.search_space_id,
+        Permission.DOCUMENTS_CREATE.value,
+        "You don't have permission to create documents in this search space",
+    )
+
+    from pathlib import Path
+
+    for fp in request.target_file_paths:
+        try:
+            Path(fp).relative_to(request.folder_path)
+        except ValueError as err:
+            raise HTTPException(
+                status_code=400,
+                detail=f"target_file_path {fp} must be inside folder_path",
+            ) from err
+
+    from app.tasks.celery_tasks.document_tasks import index_local_folder_task
+
+    index_local_folder_task.delay(
+        search_space_id=request.search_space_id,
+        user_id=str(user.id),
+        folder_path=request.folder_path,
+        folder_name=request.folder_name,
+        target_file_paths=request.target_file_paths,
+        root_folder_id=request.root_folder_id,
+        enable_summary=request.enable_summary,
+    )
+
+    return {
+        "message": f"Batch indexing started for {len(request.target_file_paths)} file(s)",
+        "status": "processing",
+        "file_count": len(request.target_file_paths),
+    }
--- a/surfsense_backend/app/routes/editor_routes.py
+++ b/surfsense_backend/app/routes/editor_routes.py
@ -15,11 +15,10 @@ import pypandoc
 import typst
 from fastapi import APIRouter, Depends, HTTPException, Query
 from fastapi.responses import StreamingResponse
-from sqlalchemy import select
+from sqlalchemy import func, select
 from sqlalchemy.ext.asyncio import AsyncSession
-from sqlalchemy.orm import selectinload

-from app.db import Document, DocumentType, Permission, User, get_async_session
+from app.db import Chunk, Document, DocumentType, Permission, User, get_async_session
 from app.routes.reports_routes import (
    _FILE_EXTENSIONS,
    _MEDIA_TYPES,
@ -44,6 +43,9 @@ router = APIRouter()
 async def get_editor_content(
    search_space_id: int,
    document_id: int,
+    max_length: int | None = Query(
+        None, description="Truncate source_markdown to this many characters"
+    ),
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
 ):
@ -65,9 +67,7 @@ async def get_editor_content(
    )

    result = await session.execute(
-        select(Document)
-        .options(selectinload(Document.chunks))
-        .filter(
+        select(Document).filter(
            Document.id == document_id,
            Document.search_space_id == search_space_id,
        )
@ -77,80 +77,152 @@ async def get_editor_content(
    if not document:
        raise HTTPException(status_code=404, detail="Document not found")

-    # Priority 1: Return source_markdown if it exists (check `is not None` to allow empty strings)
-    if document.source_markdown is not None:
+    count_result = await session.execute(
+        select(func.count()).select_from(Chunk).filter(Chunk.document_id == document_id)
+    )
+    chunk_count = count_result.scalar() or 0
+
+    def _build_response(md: str) -> dict:
+        size_bytes = len(md.encode("utf-8"))
+        truncated = False
+        output_md = md
+        if max_length is not None and size_bytes > max_length:
+            output_md = md[:max_length]
+            truncated = True
        return {
            "document_id": document.id,
            "title": document.title,
            "document_type": document.document_type.value,
-            "source_markdown": document.source_markdown,
+            "source_markdown": output_md,
+            "content_size_bytes": size_bytes,
+            "chunk_count": chunk_count,
+            "truncated": truncated,
            "updated_at": document.updated_at.isoformat()
            if document.updated_at
            else None,
        }

-    # Priority 2: Lazy-migrate from blocknote_document (pure Python, no external deps)
+    if document.source_markdown is not None:
+        return _build_response(document.source_markdown)
+
    if document.blocknote_document:
        from app.utils.blocknote_to_markdown import blocknote_to_markdown

        markdown = blocknote_to_markdown(document.blocknote_document)
        if markdown:
-            # Persist the migration so we don't repeat it
            document.source_markdown = markdown
            await session.commit()
-            return {
-                "document_id": document.id,
-                "title": document.title,
-                "document_type": document.document_type.value,
-                "source_markdown": markdown,
-                "updated_at": document.updated_at.isoformat()
-                if document.updated_at
-                else None,
-            }
+            return _build_response(markdown)

-    # Priority 3: For NOTE type with no content, return empty markdown
    if document.document_type == DocumentType.NOTE:
        empty_markdown = ""
        document.source_markdown = empty_markdown
        await session.commit()
-        return {
-            "document_id": document.id,
-            "title": document.title,
-            "document_type": document.document_type.value,
-            "source_markdown": empty_markdown,
-            "updated_at": document.updated_at.isoformat()
-            if document.updated_at
-            else None,
-        }
+        return _build_response(empty_markdown)

-    # Priority 4: Reconstruct from chunks
-    chunks = sorted(document.chunks, key=lambda c: c.id)
+    chunk_contents_result = await session.execute(
+        select(Chunk.content)
+        .filter(Chunk.document_id == document_id)
+        .order_by(Chunk.id)
+    )
+    chunk_contents = chunk_contents_result.scalars().all()

-    if not chunks:
+    if not chunk_contents:
+        doc_status = document.status or {}
+        state = (
+            doc_status.get("state", "ready")
+            if isinstance(doc_status, dict)
+            else "ready"
+        )
+        if state in ("pending", "processing"):
+            raise HTTPException(
+                status_code=409,
+                detail="This document is still being processed. Please wait a moment and try again.",
+            )
        raise HTTPException(
            status_code=400,
-            detail="This document has no content and cannot be edited. Please re-upload to enable editing.",
+            detail="This document has no viewable content yet. It may still be syncing. Try again in a few seconds, or re-upload if the issue persists.",
        )

-    markdown_content = "\n\n".join(chunk.content for chunk in chunks)
+    markdown_content = "\n\n".join(chunk_contents)

    if not markdown_content.strip():
        raise HTTPException(
            status_code=400,
-            detail="This document has empty content and cannot be edited.",
+            detail="This document appears to be empty. Try re-uploading or editing it to add content.",
        )

-    # Persist the lazy migration
    document.source_markdown = markdown_content
    await session.commit()

-    return {
-        "document_id": document.id,
-        "title": document.title,
-        "document_type": document.document_type.value,
-        "source_markdown": markdown_content,
-        "updated_at": document.updated_at.isoformat() if document.updated_at else None,
-    }
+    return _build_response(markdown_content)
+
+
+@router.get(
+    "/search-spaces/{search_space_id}/documents/{document_id}/download-markdown"
+)
+async def download_document_markdown(
+    search_space_id: int,
+    document_id: int,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """
+    Download the full document content as a .md file.
+    Reconstructs markdown from source_markdown or chunks.
+    """
+    await check_permission(
+        session,
+        user,
+        search_space_id,
+        Permission.DOCUMENTS_READ.value,
+        "You don't have permission to read documents in this search space",
+    )
+
+    result = await session.execute(
+        select(Document).filter(
+            Document.id == document_id,
+            Document.search_space_id == search_space_id,
+        )
+    )
+    document = result.scalars().first()
+
+    if not document:
+        raise HTTPException(status_code=404, detail="Document not found")
+
+    markdown: str | None = document.source_markdown
+    if markdown is None and document.blocknote_document:
+        from app.utils.blocknote_to_markdown import blocknote_to_markdown
+
+        markdown = blocknote_to_markdown(document.blocknote_document)
+    if markdown is None:
+        chunk_contents_result = await session.execute(
+            select(Chunk.content)
+            .filter(Chunk.document_id == document_id)
+            .order_by(Chunk.id)
+        )
+        chunk_contents = chunk_contents_result.scalars().all()
+        if chunk_contents:
+            markdown = "\n\n".join(chunk_contents)
+
+    if not markdown or not markdown.strip():
+        raise HTTPException(
+            status_code=400, detail="Document has no content to download"
+        )
+
+    safe_title = (
+        "".join(
+            c if c.isalnum() or c in " -_" else "_"
+            for c in (document.title or "document")
+        ).strip()[:80]
+        or "document"
+    )
+
+    return StreamingResponse(
+        io.BytesIO(markdown.encode("utf-8")),
+        media_type="text/markdown; charset=utf-8",
+        headers={"Content-Disposition": f'attachment; filename="{safe_title}.md"'},
+    )


@router.post("/search-spaces/{search_space_id}/documents/{document_id}/save")
@ -258,9 +330,7 @@ async def export_document(
    )

    result = await session.execute(
-        select(Document)
-        .options(selectinload(Document.chunks))
-        .filter(
+        select(Document).filter(
            Document.id == document_id,
            Document.search_space_id == search_space_id,
        )
@ -269,16 +339,20 @@ async def export_document(
    if not document:
        raise HTTPException(status_code=404, detail="Document not found")

-    # Resolve markdown content (same priority as editor-content endpoint)
    markdown_content: str | None = document.source_markdown
    if markdown_content is None and document.blocknote_document:
        from app.utils.blocknote_to_markdown import blocknote_to_markdown

        markdown_content = blocknote_to_markdown(document.blocknote_document)
    if markdown_content is None:
-        chunks = sorted(document.chunks, key=lambda c: c.id)
-        if chunks:
-            markdown_content = "\n\n".join(chunk.content for chunk in chunks)
+        chunk_contents_result = await session.execute(
+            select(Chunk.content)
+            .filter(Chunk.document_id == document_id)
+            .order_by(Chunk.id)
+        )
+        chunk_contents = chunk_contents_result.scalars().all()
+        if chunk_contents:
+            markdown_content = "\n\n".join(chunk_contents)

    if not markdown_content or not markdown_content.strip():
        raise HTTPException(status_code=400, detail="Document has no content to export")
--- a/surfsense_backend/app/routes/folders_routes.py
+++ b/surfsense_backend/app/routes/folders_routes.py
@ -192,6 +192,33 @@ async def get_folder_breadcrumb(
        ) from e


+@router.patch("/folders/{folder_id}/watched")
+async def stop_watching_folder(
+    folder_id: int,
+    session: AsyncSession = Depends(get_async_session),
+    user: User = Depends(current_active_user),
+):
+    """Clear the watched flag from a folder's metadata."""
+    folder = await session.get(Folder, folder_id)
+    if not folder:
+        raise HTTPException(status_code=404, detail="Folder not found")
+
+    await check_permission(
+        session,
+        user,
+        folder.search_space_id,
+        Permission.DOCUMENTS_UPDATE.value,
+        "You don't have permission to update folders in this search space",
+    )
+
+    if folder.folder_metadata and isinstance(folder.folder_metadata, dict):
+        updated = {**folder.folder_metadata, "watched": False}
+        folder.folder_metadata = updated
+    await session.commit()
+
+    return {"message": "Folder watch status updated"}
+
+
@router.put("/folders/{folder_id}", response_model=FolderRead)
 async def update_folder(
    folder_id: int,
@ -340,7 +367,7 @@ async def delete_folder(
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user),
 ):
-    """Delete a folder and cascade-delete subfolders. Documents are async-deleted via Celery."""
+    """Mark documents for deletion and dispatch Celery to delete docs first, then folders."""
    try:
        folder = await session.get(Folder, folder_id)
        if not folder:
@ -372,30 +399,29 @@ async def delete_folder(
            )
            await session.commit()

-        await session.execute(Folder.__table__.delete().where(Folder.id == folder_id))
-        await session.commit()
+        try:
+            from app.tasks.celery_tasks.document_tasks import (
+                delete_folder_documents_task,
+            )

-        if document_ids:
-            try:
-                from app.tasks.celery_tasks.document_tasks import (
-                    delete_folder_documents_task,
-                )
-
-                delete_folder_documents_task.delay(document_ids)
-            except Exception as err:
+            delete_folder_documents_task.delay(
+                document_ids, folder_subtree_ids=list(subtree_ids)
+            )
+        except Exception as err:
+            if document_ids:
                await session.execute(
                    Document.__table__.update()
                    .where(Document.id.in_(document_ids))
                    .values(status={"state": "ready"})
                )
                await session.commit()
-                raise HTTPException(
-                    status_code=503,
-                    detail="Folder deleted but document cleanup could not be queued. Documents have been restored.",
-                ) from err
+            raise HTTPException(
+                status_code=503,
+                detail="Could not queue folder deletion. Documents have been restored.",
+            ) from err

        return {
-            "message": "Folder deleted successfully",
+            "message": "Folder deletion started",
            "documents_queued_for_deletion": len(document_ids),
        }

--- a/surfsense_backend/app/routes/model_list_routes.py
+++ b/surfsense_backend/app/routes/model_list_routes.py
@ -1,5 +1,5 @@
 """
-API route for fetching the available LLM models catalogue.
+API route for fetching the available models catalogue.

 Serves a dynamically-updated list sourced from the OpenRouter public API,
 with a local JSON fallback when the API is unreachable.
@ -30,7 +30,7 @@ async def list_available_models(
    user: User = Depends(current_active_user),
 ):
    """
-    Return all available LLM models grouped by provider.
+    Return all available models grouped by provider.

    The list is sourced from the OpenRouter public API and cached for 1 hour.
    If the API is unreachable, a local fallback file is used instead.
--- a/surfsense_backend/app/routes/new_llm_config_routes.py
+++ b/surfsense_backend/app/routes/new_llm_config_routes.py
@ -1,7 +1,7 @@
 """
 API routes for NewLLMConfig CRUD operations.

-NewLLMConfig combines LLM model settings with prompt configuration:
+NewLLMConfig combines model settings with prompt configuration:
 - LLM provider, model, API key, etc.
 - Configurable system instructions
 - Citation toggle
--- a/surfsense_backend/app/routes/search_source_connectors_routes.py
+++ b/surfsense_backend/app/routes/search_source_connectors_routes.py
@ -55,23 +55,12 @@ from app.schemas import (
 )
 from app.services.composio_service import ComposioService, get_composio_service
 from app.services.notification_service import NotificationService
-from app.tasks.connector_indexers import (
-    index_airtable_records,
-    index_clickup_tasks,
-    index_confluence_pages,
-    index_crawled_urls,
-    index_discord_messages,
-    index_elasticsearch_documents,
-    index_github_repos,
-    index_google_calendar_events,
-    index_google_gmail_messages,
-    index_jira_issues,
-    index_linear_issues,
-    index_luma_events,
-    index_notion_pages,
-    index_slack_messages,
-)
 from app.users import current_active_user
+
+# NOTE: connector indexer functions are imported lazily inside each
+# ``run_*_indexing`` helper to break a circular import cycle:
+#   connector_indexers.__init__ → airtable_indexer → airtable_history
+#   → app.routes.__init__ → this file → connector_indexers (not ready yet)
 from app.utils.connector_naming import ensure_unique_connector_name
 from app.utils.indexing_locks import (
    acquire_connector_indexing_lock,
@ -1378,6 +1367,8 @@ async def run_slack_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_slack_messages
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -1824,6 +1815,8 @@ async def run_notion_indexing_with_new_session(
    Create a new session and run the Notion indexing task.
    This prevents session leaks by creating a dedicated session for the background task.
    """
+    from app.tasks.connector_indexers import index_notion_pages
+
    async with async_session_maker() as session:
        await _run_indexing_with_notifications(
            session=session,
@ -1858,6 +1851,8 @@ async def run_notion_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_notion_pages
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -1910,6 +1905,8 @@ async def run_github_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_github_repos
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -1961,6 +1958,8 @@ async def run_linear_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_linear_issues
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2011,6 +2010,8 @@ async def run_discord_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_discord_messages
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2113,6 +2114,8 @@ async def run_jira_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_jira_issues
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2166,6 +2169,8 @@ async def run_confluence_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_confluence_pages
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2217,6 +2222,8 @@ async def run_clickup_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_clickup_tasks
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2268,6 +2275,8 @@ async def run_airtable_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_airtable_records
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2321,6 +2330,8 @@ async def run_google_calendar_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_google_calendar_events
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2370,6 +2381,7 @@ async def run_google_gmail_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_google_gmail_messages

    # Create a wrapper function that calls index_google_gmail_messages with max_messages
    async def gmail_indexing_wrapper(
@ -2836,6 +2848,8 @@ async def run_luma_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_luma_events
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2888,6 +2902,8 @@ async def run_elasticsearch_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_elasticsearch_documents
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
@ -2938,6 +2954,8 @@ async def run_web_page_indexing(
        start_date: Start date for indexing
        end_date: End date for indexing
    """
+    from app.tasks.connector_indexers import index_crawled_urls
+
    await _run_indexing_with_notifications(
        session=session,
        connector_id=connector_id,
--- a/surfsense_backend/app/schemas/documents.py
+++ b/surfsense_backend/app/schemas/documents.py
@ -53,25 +53,26 @@ class DocumentRead(BaseModel):
    title: str
    document_type: DocumentType
    document_metadata: dict
-    content: str  # Changed to string to match frontend
+    content: str = ""
+    content_preview: str = ""
    content_hash: str
    unique_identifier_hash: str | None
    created_at: datetime
    updated_at: datetime | None
    search_space_id: int
    folder_id: int | None = None
-    created_by_id: UUID | None = None  # User who created/uploaded this document
+    created_by_id: UUID | None = None
    created_by_name: str | None = None
    created_by_email: str | None = None
-    status: DocumentStatusSchema | None = (
-        None  # Processing status (ready, processing, failed)
-    )
+    status: DocumentStatusSchema | None = None

    model_config = ConfigDict(from_attributes=True)


 class DocumentWithChunksRead(DocumentRead):
    chunks: list[ChunkRead] = []
+    total_chunks: int = 0
+    chunk_start_index: int = 0

    model_config = ConfigDict(from_attributes=True)

--- a/surfsense_backend/app/schemas/folders.py
+++ b/surfsense_backend/app/schemas/folders.py
@ -1,6 +1,7 @@
 """Pydantic schemas for folder CRUD, move, and reorder operations."""

 from datetime import datetime
+from typing import Any
 from uuid import UUID

 from pydantic import BaseModel, ConfigDict, Field
@ -34,6 +35,9 @@ class FolderRead(BaseModel):
    created_by_id: UUID | None
    created_at: datetime
    updated_at: datetime
+    metadata: dict[str, Any] | None = Field(
+        default=None, validation_alias="folder_metadata"
+    )

    model_config = ConfigDict(from_attributes=True)

--- a/surfsense_backend/app/schemas/new_llm_config.py
+++ b/surfsense_backend/app/schemas/new_llm_config.py
@ -1,7 +1,7 @@
 """
 Pydantic schemas for the NewLLMConfig API.

-NewLLMConfig combines LLM model settings with prompt configuration:
+NewLLMConfig combines model settings with prompt configuration:
 - LLM provider, model, API key, etc.
 - Configurable system instructions
 - Citation toggle
@ -26,7 +26,7 @@ class NewLLMConfigBase(BaseModel):
        None, max_length=500, description="Optional description"
    )

-    # LLM Model Configuration
+    # Model Configuration
    provider: LiteLLMProvider = Field(..., description="LiteLLM provider type")
    custom_provider: str | None = Field(
        None, max_length=100, description="Custom provider name when provider is CUSTOM"
@ -71,7 +71,7 @@ class NewLLMConfigUpdate(BaseModel):
    name: str | None = Field(None, max_length=100)
    description: str | None = Field(None, max_length=500)

-    # LLM Model Configuration
+    # Model Configuration
    provider: LiteLLMProvider | None = None
    custom_provider: str | None = Field(None, max_length=100)
    model_name: str | None = Field(None, max_length=100)
@ -106,7 +106,7 @@ class NewLLMConfigPublic(BaseModel):
    name: str
    description: str | None = None

-    # LLM Model Configuration (no api_key)
+    # Model Configuration (no api_key)
    provider: LiteLLMProvider
    custom_provider: str | None = None
    model_name: str
@ -149,7 +149,7 @@ class GlobalNewLLMConfigRead(BaseModel):
    name: str
    description: str | None = None

-    # LLM Model Configuration (no api_key)
+    # Model Configuration (no api_key)
    provider: str  # String because YAML doesn't enforce enum, "AUTO" for Auto mode
    custom_provider: str | None = None
    model_name: str
--- a/surfsense_backend/app/services/model_list_service.py
+++ b/surfsense_backend/app/services/model_list_service.py
@ -1,5 +1,5 @@
 """
-Service for fetching and caching the available LLM model list.
+Service for fetching and caching the available model list.

 Uses the OpenRouter public API as the primary source, with a local
 fallback JSON file when the API is unreachable.
--- a/surfsense_backend/app/tasks/celery_tasks/document_tasks.py
+++ b/surfsense_backend/app/tasks/celery_tasks/document_tasks.py
@ -1,6 +1,7 @@
 """Celery tasks for document processing."""

 import asyncio
+import contextlib
 import logging
 import os
 from uuid import UUID
@ -10,6 +11,7 @@ from app.config import config
 from app.services.notification_service import NotificationService
 from app.services.task_logging_service import TaskLoggingService
 from app.tasks.celery_tasks import get_celery_session_maker
+from app.tasks.connector_indexers.local_folder_indexer import index_local_folder
 from app.tasks.document_processors import (
    add_extension_received_document,
    add_youtube_video_document,
@ -141,21 +143,30 @@ async def _delete_document_background(document_id: int) -> None:
    retry_backoff_max=300,
    max_retries=5,
 )
-def delete_folder_documents_task(self, document_ids: list[int]):
-    """Celery task to batch-delete documents orphaned by folder deletion."""
+def delete_folder_documents_task(
+    self,
+    document_ids: list[int],
+    folder_subtree_ids: list[int] | None = None,
+):
+    """Celery task to delete documents first, then the folder rows."""
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    try:
-        loop.run_until_complete(_delete_folder_documents(document_ids))
+        loop.run_until_complete(
+            _delete_folder_documents(document_ids, folder_subtree_ids)
+        )
    finally:
        loop.close()


-async def _delete_folder_documents(document_ids: list[int]) -> None:
-    """Delete chunks in batches, then document rows for each orphaned document."""
+async def _delete_folder_documents(
+    document_ids: list[int],
+    folder_subtree_ids: list[int] | None = None,
+) -> None:
+    """Delete chunks in batches, then document rows, then folder rows."""
    from sqlalchemy import delete as sa_delete, select

-    from app.db import Chunk, Document
+    from app.db import Chunk, Document, Folder

    async with get_celery_session_maker()() as session:
        batch_size = 500
@ -177,6 +188,12 @@ async def _delete_folder_documents(document_ids: list[int]) -> None:
                await session.delete(doc)
                await session.commit()

+        if folder_subtree_ids:
+            await session.execute(
+                sa_delete(Folder).where(Folder.id.in_(folder_subtree_ids))
+            )
+            await session.commit()
+

@celery_app.task(
    name="delete_search_space_background",
@ -1243,3 +1260,154 @@ async def _process_circleback_meeting(
                heartbeat_task.cancel()
            if notification:
                _stop_heartbeat(notification.id)
+
+
+# ===== Local folder indexing task =====
+
+
+@celery_app.task(name="index_local_folder", bind=True)
+def index_local_folder_task(
+    self,
+    search_space_id: int,
+    user_id: str,
+    folder_path: str,
+    folder_name: str,
+    exclude_patterns: list[str] | None = None,
+    file_extensions: list[str] | None = None,
+    root_folder_id: int | None = None,
+    enable_summary: bool = False,
+    target_file_paths: list[str] | None = None,
+):
+    """Celery task to index a local folder. Config is passed directly — no connector row."""
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+
+    try:
+        loop.run_until_complete(
+            _index_local_folder_async(
+                search_space_id=search_space_id,
+                user_id=user_id,
+                folder_path=folder_path,
+                folder_name=folder_name,
+                exclude_patterns=exclude_patterns,
+                file_extensions=file_extensions,
+                root_folder_id=root_folder_id,
+                enable_summary=enable_summary,
+                target_file_paths=target_file_paths,
+            )
+        )
+    finally:
+        loop.close()
+
+
+async def _index_local_folder_async(
+    search_space_id: int,
+    user_id: str,
+    folder_path: str,
+    folder_name: str,
+    exclude_patterns: list[str] | None = None,
+    file_extensions: list[str] | None = None,
+    root_folder_id: int | None = None,
+    enable_summary: bool = False,
+    target_file_paths: list[str] | None = None,
+):
+    """Run local folder indexing with notification + heartbeat."""
+    is_batch = bool(target_file_paths)
+    is_full_scan = not target_file_paths
+    file_count = len(target_file_paths) if target_file_paths else None
+
+    if is_batch:
+        doc_name = f"{folder_name} ({file_count} file{'s' if file_count != 1 else ''})"
+    else:
+        doc_name = folder_name
+
+    notification = None
+    notification_id: int | None = None
+    heartbeat_task = None
+
+    async with get_celery_session_maker()() as session:
+        try:
+            notification = (
+                await NotificationService.document_processing.notify_processing_started(
+                    session=session,
+                    user_id=UUID(user_id),
+                    document_type="LOCAL_FOLDER_FILE",
+                    document_name=doc_name,
+                    search_space_id=search_space_id,
+                )
+            )
+            notification_id = notification.id
+            _start_heartbeat(notification_id)
+            heartbeat_task = asyncio.create_task(_run_heartbeat_loop(notification_id))
+        except Exception:
+            logger.warning(
+                "Failed to create notification for local folder indexing",
+                exc_info=True,
+            )
+
+        async def _heartbeat_progress(completed_count: int) -> None:
+            """Refresh heartbeat and optionally update notification progress."""
+            if notification:
+                with contextlib.suppress(Exception):
+                    await NotificationService.document_processing.notify_processing_progress(
+                        session=session,
+                        notification=notification,
+                        stage="indexing",
+                        stage_message=f"Syncing files ({completed_count}/{file_count or '?'})",
+                    )
+
+        try:
+            _indexed, _skipped_or_failed, _rfid, err = await index_local_folder(
+                session=session,
+                search_space_id=search_space_id,
+                user_id=user_id,
+                folder_path=folder_path,
+                folder_name=folder_name,
+                exclude_patterns=exclude_patterns,
+                file_extensions=file_extensions,
+                root_folder_id=root_folder_id,
+                enable_summary=enable_summary,
+                target_file_paths=target_file_paths,
+                on_heartbeat_callback=_heartbeat_progress
+                if (is_batch or is_full_scan)
+                else None,
+            )
+
+            if notification:
+                try:
+                    await session.refresh(notification)
+                    if err:
+                        await NotificationService.document_processing.notify_processing_completed(
+                            session=session,
+                            notification=notification,
+                            error_message=err,
+                        )
+                    else:
+                        await NotificationService.document_processing.notify_processing_completed(
+                            session=session,
+                            notification=notification,
+                        )
+                except Exception:
+                    logger.warning(
+                        "Failed to update notification after local folder indexing",
+                        exc_info=True,
+                    )
+
+        except Exception as e:
+            logger.exception(f"Local folder indexing failed: {e}")
+            if notification:
+                try:
+                    await session.refresh(notification)
+                    await NotificationService.document_processing.notify_processing_completed(
+                        session=session,
+                        notification=notification,
+                        error_message=str(e)[:200],
+                    )
+                except Exception:
+                    pass
+            raise
+        finally:
+            if heartbeat_task:
+                heartbeat_task.cancel()
+            if notification_id is not None:
+                _stop_heartbeat(notification_id)
--- a/surfsense_backend/app/tasks/chat/stream_new_chat.py
+++ b/surfsense_backend/app/tasks/chat/stream_new_chat.py
@ -39,7 +39,6 @@ from app.agents.new_chat.llm_config import (
 )
 from app.db import (
    ChatVisibility,
-    Document,
    NewChatMessage,
    NewChatThread,
    Report,
@ -63,74 +62,6 @@ _perf_log = get_perf_logger()
 _background_tasks: set[asyncio.Task] = set()


-def format_mentioned_documents_as_context(documents: list[Document]) -> str:
-    """
-    Format mentioned documents as context for the agent.
-
-    Uses the same XML structure as knowledge_base.format_documents_for_context
-    to ensure citations work properly with chunk IDs.
-    """
-    if not documents:
-        return ""
-
-    context_parts = ["<mentioned_documents>"]
-    context_parts.append(
-        "The user has explicitly mentioned the following documents from their knowledge base. "
-        "These documents are directly relevant to the query and should be prioritized as primary sources. "
-        "Use [citation:CHUNK_ID] format for citations (e.g., [citation:123])."
-    )
-    context_parts.append("")
-
-    for doc in documents:
-        # Build metadata JSON
-        metadata = doc.document_metadata or {}
-        metadata_json = json.dumps(metadata, ensure_ascii=False)
-
-        # Get URL from metadata
-        url = (
-            metadata.get("url")
-            or metadata.get("source")
-            or metadata.get("page_url")
-            or ""
-        )
-
-        context_parts.append("<document>")
-        context_parts.append("<document_metadata>")
-        context_parts.append(f"  <document_id>{doc.id}</document_id>")
-        context_parts.append(
-            f"  <document_type>{doc.document_type.value}</document_type>"
-        )
-        context_parts.append(f"  <title><![CDATA[{doc.title}]]></title>")
-        context_parts.append(f"  <url><![CDATA[{url}]]></url>")
-        context_parts.append(
-            f"  <metadata_json><![CDATA[{metadata_json}]]></metadata_json>"
-        )
-        context_parts.append("</document_metadata>")
-        context_parts.append("")
-        context_parts.append("<document_content>")
-
-        # Use chunks if available (preferred for proper citations)
-        if hasattr(doc, "chunks") and doc.chunks:
-            for chunk in doc.chunks:
-                context_parts.append(
-                    f"  <chunk id='{chunk.id}'><![CDATA[{chunk.content}]]></chunk>"
-                )
-        else:
-            # Fallback to document content if chunks not loaded
-            # Use document ID as chunk ID prefix for consistency
-            context_parts.append(
-                f"  <chunk id='{doc.id}'><![CDATA[{doc.content}]]></chunk>"
-            )
-
-        context_parts.append("</document_content>")
-        context_parts.append("</document>")
-        context_parts.append("")
-
-    context_parts.append("</mentioned_documents>")
-
-    return "\n".join(context_parts)
-
-
 def format_mentioned_surfsense_docs_as_context(
    documents: list[SurfsenseDocsDocument],
 ) -> str:
@ -1317,6 +1248,7 @@ async def stream_new_chat(
            firecrawl_api_key=firecrawl_api_key,
            thread_visibility=visibility,
            disabled_tools=disabled_tools,
+            mentioned_document_ids=mentioned_document_ids,
        )
        _perf_log.info(
            "[stream_new_chat] Agent created in %.3fs", time.perf_counter() - _t0
@ -1340,18 +1272,9 @@ async def stream_new_chat(
                thread.needs_history_bootstrap = False
                await session.commit()

-        # Fetch mentioned documents if any (with chunks for proper citations)
-        mentioned_documents: list[Document] = []
-        if mentioned_document_ids:
-            result = await session.execute(
-                select(Document)
-                .options(selectinload(Document.chunks))
-                .filter(
-                    Document.id.in_(mentioned_document_ids),
-                    Document.search_space_id == search_space_id,
-                )
-            )
-            mentioned_documents = list(result.scalars().all())
+        # Mentioned KB documents are now handled by KnowledgeBaseSearchMiddleware
+        # which merges them into the scoped filesystem with full document
+        # structure. Only SurfSense docs and report context are inlined here.

        # Fetch mentioned SurfSense docs if any
        mentioned_surfsense_docs: list[SurfsenseDocsDocument] = []
@ -1379,15 +1302,10 @@ async def stream_new_chat(
        )
        recent_reports = list(recent_reports_result.scalars().all())

-        # Format the user query with context (mentioned documents + SurfSense docs)
+        # Format the user query with context (SurfSense docs + reports only)
        final_query = user_query
        context_parts = []

-        if mentioned_documents:
-            context_parts.append(
-                format_mentioned_documents_as_context(mentioned_documents)
-            )
-
        if mentioned_surfsense_docs:
            context_parts.append(
                format_mentioned_surfsense_docs_as_context(mentioned_surfsense_docs)
@ -1479,7 +1397,7 @@ async def stream_new_chat(
        yield streaming_service.format_start_step()

        # Initial thinking step - analyzing the request
-        if mentioned_documents or mentioned_surfsense_docs:
+        if mentioned_surfsense_docs:
            initial_title = "Analyzing referenced content"
            action_verb = "Analyzing"
        else:
@ -1490,18 +1408,6 @@ async def stream_new_chat(
        query_text = user_query[:80] + ("..." if len(user_query) > 80 else "")
        processing_parts.append(query_text)

-        if mentioned_documents:
-            doc_names = []
-            for doc in mentioned_documents:
-                title = doc.title
-                if len(title) > 30:
-                    title = title[:27] + "..."
-                doc_names.append(title)
-            if len(doc_names) == 1:
-                processing_parts.append(f"[{doc_names[0]}]")
-            else:
-                processing_parts.append(f"[{len(doc_names)} documents]")
-
        if mentioned_surfsense_docs:
            doc_names = []
            for doc in mentioned_surfsense_docs:
@ -1527,7 +1433,7 @@ async def stream_new_chat(
        # These ORM objects (with eagerly-loaded chunks) can be very large.
        # They're only needed to build context strings already copied into
        # final_query / langchain_messages — release them before streaming.
-        del mentioned_documents, mentioned_surfsense_docs, recent_reports
+        del mentioned_surfsense_docs, recent_reports
        del langchain_messages, final_query

        # Check if this is the first assistant response so we can generate
--- a/surfsense_backend/app/tasks/connector_indexers/init.py
+++ b/surfsense_backend/app/tasks/connector_indexers/init.py
@ -42,9 +42,9 @@ from .jira_indexer import index_jira_issues

 # Issue tracking and project management
 from .linear_indexer import index_linear_issues
-from .luma_indexer import index_luma_events

 # Documentation and knowledge management
+from .luma_indexer import index_luma_events
 from .notion_indexer import index_notion_pages
 from .obsidian_indexer import index_obsidian_vault
 from .slack_indexer import index_slack_messages
--- a/surfsense_backend/app/tasks/connector_indexers/local_folder_indexer.py
+++ b/surfsense_backend/app/tasks/connector_indexers/local_folder_indexer.py
--- a/surfsense_backend/app/tasks/document_processors/init.py
+++ b/surfsense_backend/app/tasks/document_processors/init.py
@ -12,16 +12,14 @@ Available processors:
 - YouTube processor: Process YouTube videos and extract transcripts
 """

-# URL crawler
 # Extension processor
-from .extension_processor import add_extension_received_document
-
-# File processors
-from .file_processors import (
+# File processors (backward-compatible re-exports from _save)
+from ._save import (
    add_received_file_document_using_docling,
    add_received_file_document_using_llamacloud,
    add_received_file_document_using_unstructured,
 )
+from .extension_processor import add_extension_received_document

 # Markdown processor
 from .markdown_processor import add_received_markdown_file_document
@ -32,9 +30,9 @@ from .youtube_processor import add_youtube_video_document
 __all__ = [
    # Extension processing
    "add_extension_received_document",
+    # File processing with different ETL services
    "add_received_file_document_using_docling",
    "add_received_file_document_using_llamacloud",
-    # File processing with different ETL services
    "add_received_file_document_using_unstructured",
    # Markdown file processing
    "add_received_markdown_file_document",
--- a/surfsense_backend/app/tasks/document_processors/_constants.py
+++ b/surfsense_backend/app/tasks/document_processors/_constants.py
@ -0,0 +1,74 @@
+"""
+Constants for file document processing.
+
+Centralizes file type classification, LlamaCloud retry configuration,
+and timeout calculation parameters.
+"""
+
+import ssl
+from enum import Enum
+
+import httpx
+
+# ---------------------------------------------------------------------------
+# File type classification
+# ---------------------------------------------------------------------------
+
+MARKDOWN_EXTENSIONS = (".md", ".markdown", ".txt")
+AUDIO_EXTENSIONS = (".mp3", ".mp4", ".mpeg", ".mpga", ".m4a", ".wav", ".webm")
+DIRECT_CONVERT_EXTENSIONS = (".csv", ".tsv", ".html", ".htm")
+
+
+class FileCategory(Enum):
+    MARKDOWN = "markdown"
+    AUDIO = "audio"
+    DIRECT_CONVERT = "direct_convert"
+    DOCUMENT = "document"
+
+
+def classify_file(filename: str) -> FileCategory:
+    """Classify a file by its extension into a processing category."""
+    lower = filename.lower()
+    if lower.endswith(MARKDOWN_EXTENSIONS):
+        return FileCategory.MARKDOWN
+    if lower.endswith(AUDIO_EXTENSIONS):
+        return FileCategory.AUDIO
+    if lower.endswith(DIRECT_CONVERT_EXTENSIONS):
+        return FileCategory.DIRECT_CONVERT
+    return FileCategory.DOCUMENT
+
+
+# ---------------------------------------------------------------------------
+# LlamaCloud retry configuration
+# ---------------------------------------------------------------------------
+
+LLAMACLOUD_MAX_RETRIES = 5
+LLAMACLOUD_BASE_DELAY = 10  # seconds (exponential backoff base)
+LLAMACLOUD_MAX_DELAY = 120  # max delay between retries (2 minutes)
+LLAMACLOUD_RETRYABLE_EXCEPTIONS = (
+    ssl.SSLError,
+    httpx.ConnectError,
+    httpx.ConnectTimeout,
+    httpx.ReadError,
+    httpx.ReadTimeout,
+    httpx.WriteError,
+    httpx.WriteTimeout,
+    httpx.RemoteProtocolError,
+    httpx.LocalProtocolError,
+    ConnectionError,
+    ConnectionResetError,
+    TimeoutError,
+    OSError,
+)
+
+# ---------------------------------------------------------------------------
+# Timeout calculation constants
+# ---------------------------------------------------------------------------
+
+UPLOAD_BYTES_PER_SECOND_SLOW = (
+    100 * 1024
+)  # 100 KB/s (conservative for slow connections)
+MIN_UPLOAD_TIMEOUT = 120  # Minimum 2 minutes for any file
+MAX_UPLOAD_TIMEOUT = 1800  # Maximum 30 minutes for very large files
+BASE_JOB_TIMEOUT = 600  # 10 minutes base for job processing
+PER_PAGE_JOB_TIMEOUT = 60  # 1 minute per page for processing
--- a/surfsense_backend/app/tasks/document_processors/_direct_converters.py
+++ b/surfsense_backend/app/tasks/document_processors/_direct_converters.py
@ -0,0 +1,90 @@
+"""
+Lossless file-to-markdown converters for text-based formats.
+
+These converters handle file types that can be faithfully represented as
+markdown without any external ETL/OCR service:
+
+- CSV / TSV  → markdown table  (stdlib ``csv``)
+- HTML / HTM → markdown        (``markdownify``)
+"""
+
+from __future__ import annotations
+
+import csv
+from collections.abc import Callable
+from pathlib import Path
+
+from markdownify import markdownify
+
+# The stdlib csv module defaults to a 128 KB field-size limit which is too
+# small for real-world exports (e.g. chat logs, CRM dumps).  We raise it once
+# at import time so every csv.reader call in this module can handle large fields.
+csv.field_size_limit(2**31 - 1)
+
+
+def _escape_pipe(cell: str) -> str:
+    """Escape literal pipe characters inside a markdown table cell."""
+    return cell.replace("|", "\\|")
+
+
+def csv_to_markdown(file_path: str, *, delimiter: str = ",") -> str:
+    """Convert a CSV (or TSV) file to a markdown table.
+
+    The first row is treated as the header.  An empty file returns an
+    empty string so the caller can decide how to handle it.
+    """
+    with open(file_path, encoding="utf-8", newline="") as fh:
+        reader = csv.reader(fh, delimiter=delimiter)
+        rows = list(reader)
+
+    if not rows:
+        return ""
+
+    header, *body = rows
+    col_count = len(header)
+
+    lines: list[str] = []
+
+    header_cells = [_escape_pipe(c.strip()) for c in header]
+    lines.append("| " + " | ".join(header_cells) + " |")
+    lines.append("| " + " | ".join(["---"] * col_count) + " |")
+
+    for row in body:
+        padded = row + [""] * (col_count - len(row))
+        cells = [_escape_pipe(c.strip()) for c in padded[:col_count]]
+        lines.append("| " + " | ".join(cells) + " |")
+
+    return "\n".join(lines) + "\n"
+
+
+def tsv_to_markdown(file_path: str) -> str:
+    """Convert a TSV file to a markdown table."""
+    return csv_to_markdown(file_path, delimiter="\t")
+
+
+def html_to_markdown(file_path: str) -> str:
+    """Convert an HTML file to markdown via ``markdownify``."""
+    html = Path(file_path).read_text(encoding="utf-8")
+    return markdownify(html).strip()
+
+
+_CONVERTER_MAP: dict[str, Callable[..., str]] = {
+    ".csv": csv_to_markdown,
+    ".tsv": tsv_to_markdown,
+    ".html": html_to_markdown,
+    ".htm": html_to_markdown,
+}
+
+
+def convert_file_directly(file_path: str, filename: str) -> str:
+    """Dispatch to the appropriate lossless converter based on file extension.
+
+    Raises ``ValueError`` if the extension is not supported.
+    """
+    suffix = Path(filename).suffix.lower()
+    converter = _CONVERTER_MAP.get(suffix)
+    if converter is None:
+        raise ValueError(
+            f"No direct converter for extension '{suffix}' (file: {filename})"
+        )
+    return converter(file_path)
--- a/surfsense_backend/app/tasks/document_processors/_etl.py
+++ b/surfsense_backend/app/tasks/document_processors/_etl.py
@ -0,0 +1,209 @@
+"""
+ETL parsing strategies for different document processing services.
+
+Provides parse functions for Unstructured, LlamaCloud, and Docling, along with
+LlamaCloud retry logic and dynamic timeout calculations.
+"""
+
+import asyncio
+import logging
+import os
+import random
+import warnings
+from logging import ERROR, getLogger
+
+import httpx
+
+from app.config import config as app_config
+from app.db import Log
+from app.services.task_logging_service import TaskLoggingService
+
+from ._constants import (
+    LLAMACLOUD_BASE_DELAY,
+    LLAMACLOUD_MAX_DELAY,
+    LLAMACLOUD_MAX_RETRIES,
+    LLAMACLOUD_RETRYABLE_EXCEPTIONS,
+    PER_PAGE_JOB_TIMEOUT,
+)
+from ._helpers import calculate_job_timeout, calculate_upload_timeout
+
+# ---------------------------------------------------------------------------
+# LlamaCloud parsing with retry
+# ---------------------------------------------------------------------------
+
+
+async def parse_with_llamacloud_retry(
+    file_path: str,
+    estimated_pages: int,
+    task_logger: TaskLoggingService | None = None,
+    log_entry: Log | None = None,
+):
+    """
+    Parse a file with LlamaCloud with retry logic for transient SSL/connection errors.
+
+    Uses dynamic timeout calculations based on file size and page count to handle
+    very large files reliably.
+
+    Returns:
+        LlamaParse result object
+
+    Raises:
+        Exception: If all retries fail
+    """
+    from llama_cloud_services import LlamaParse
+    from llama_cloud_services.parse.utils import ResultType
+
+    file_size_bytes = os.path.getsize(file_path)
+    file_size_mb = file_size_bytes / (1024 * 1024)
+
+    upload_timeout = calculate_upload_timeout(file_size_bytes)
+    job_timeout = calculate_job_timeout(estimated_pages, file_size_bytes)
+
+    custom_timeout = httpx.Timeout(
+        connect=120.0,
+        read=upload_timeout,
+        write=upload_timeout,
+        pool=120.0,
+    )
+
+    logging.info(
+        f"LlamaCloud upload configured: file_size={file_size_mb:.1f}MB, "
+        f"pages={estimated_pages}, upload_timeout={upload_timeout:.0f}s, "
+        f"job_timeout={job_timeout:.0f}s"
+    )
+
+    last_exception = None
+    attempt_errors: list[str] = []
+
+    for attempt in range(1, LLAMACLOUD_MAX_RETRIES + 1):
+        try:
+            async with httpx.AsyncClient(timeout=custom_timeout) as custom_client:
+                parser = LlamaParse(
+                    api_key=app_config.LLAMA_CLOUD_API_KEY,
+                    num_workers=1,
+                    verbose=True,
+                    language="en",
+                    result_type=ResultType.MD,
+                    max_timeout=int(max(2000, job_timeout + upload_timeout)),
+                    job_timeout_in_seconds=job_timeout,
+                    job_timeout_extra_time_per_page_in_seconds=PER_PAGE_JOB_TIMEOUT,
+                    custom_client=custom_client,
+                )
+                result = await parser.aparse(file_path)
+
+                if attempt > 1:
+                    logging.info(
+                        f"LlamaCloud upload succeeded on attempt {attempt} after "
+                        f"{len(attempt_errors)} failures"
+                    )
+                return result
+
+        except LLAMACLOUD_RETRYABLE_EXCEPTIONS as e:
+            last_exception = e
+            error_type = type(e).__name__
+            error_msg = str(e)[:200]
+            attempt_errors.append(f"Attempt {attempt}: {error_type} - {error_msg}")
+
+            if attempt < LLAMACLOUD_MAX_RETRIES:
+                base_delay = min(
+                    LLAMACLOUD_BASE_DELAY * (2 ** (attempt - 1)),
+                    LLAMACLOUD_MAX_DELAY,
+                )
+                jitter = base_delay * 0.25 * (2 * random.random() - 1)
+                delay = base_delay + jitter
+
+                if task_logger and log_entry:
+                    await task_logger.log_task_progress(
+                        log_entry,
+                        f"LlamaCloud upload failed "
+                        f"(attempt {attempt}/{LLAMACLOUD_MAX_RETRIES}), "
+                        f"retrying in {delay:.0f}s",
+                        {
+                            "error_type": error_type,
+                            "error_message": error_msg,
+                            "attempt": attempt,
+                            "retry_delay": delay,
+                            "file_size_mb": round(file_size_mb, 1),
+                            "upload_timeout": upload_timeout,
+                        },
+                    )
+                else:
+                    logging.warning(
+                        f"LlamaCloud upload failed "
+                        f"(attempt {attempt}/{LLAMACLOUD_MAX_RETRIES}): "
+                        f"{error_type}. File: {file_size_mb:.1f}MB. "
+                        f"Retrying in {delay:.0f}s..."
+                    )
+
+                await asyncio.sleep(delay)
+            else:
+                logging.error(
+                    f"LlamaCloud upload failed after {LLAMACLOUD_MAX_RETRIES} "
+                    f"attempts. File size: {file_size_mb:.1f}MB, "
+                    f"Pages: {estimated_pages}. "
+                    f"Errors: {'; '.join(attempt_errors)}"
+                )
+
+        except Exception:
+            raise
+
+    raise last_exception or RuntimeError(
+        f"LlamaCloud parsing failed after {LLAMACLOUD_MAX_RETRIES} retries. "
+        f"File size: {file_size_mb:.1f}MB"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Per-service parse functions
+# ---------------------------------------------------------------------------
+
+
+async def parse_with_unstructured(file_path: str):
+    """
+    Parse a file using the Unstructured ETL service.
+
+    Returns:
+        List of LangChain Document elements.
+    """
+    from langchain_unstructured import UnstructuredLoader
+
+    loader = UnstructuredLoader(
+        file_path,
+        mode="elements",
+        post_processors=[],
+        languages=["eng"],
+        include_orig_elements=False,
+        include_metadata=False,
+        strategy="auto",
+    )
+    return await loader.aload()
+
+
+async def parse_with_docling(file_path: str, filename: str) -> str:
+    """
+    Parse a file using the Docling ETL service (via the Docling service wrapper).
+
+    Returns:
+        Markdown content string.
+    """
+    from app.services.docling_service import create_docling_service
+
+    docling_service = create_docling_service()
+
+    pdfminer_logger = getLogger("pdfminer")
+    original_level = pdfminer_logger.level
+
+    with warnings.catch_warnings():
+        warnings.filterwarnings("ignore", category=UserWarning, module="pdfminer")
+        warnings.filterwarnings(
+            "ignore", message=".*Cannot set gray non-stroke color.*"
+        )
+        warnings.filterwarnings("ignore", message=".*invalid float value.*")
+        pdfminer_logger.setLevel(ERROR)
+
+        try:
+            result = await docling_service.process_document(file_path, filename)
+        finally:
+            pdfminer_logger.setLevel(original_level)
+
+    return result["content"]
--- a/surfsense_backend/app/tasks/document_processors/_helpers.py
+++ b/surfsense_backend/app/tasks/document_processors/_helpers.py
@ -0,0 +1,218 @@
+"""
+Document helper functions for deduplication, migration, and connector updates.
+
+Provides reusable logic shared across file processors and ETL strategies.
+"""
+
+import logging
+
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import Document, DocumentStatus, DocumentType
+from app.utils.document_converters import generate_unique_identifier_hash
+
+from ._constants import (
+    BASE_JOB_TIMEOUT,
+    MAX_UPLOAD_TIMEOUT,
+    MIN_UPLOAD_TIMEOUT,
+    PER_PAGE_JOB_TIMEOUT,
+    UPLOAD_BYTES_PER_SECOND_SLOW,
+)
+from .base import (
+    check_document_by_unique_identifier,
+    check_duplicate_document,
+)
+
+# ---------------------------------------------------------------------------
+# Unique identifier helpers
+# ---------------------------------------------------------------------------
+
+
+def get_google_drive_unique_identifier(
+    connector: dict | None,
+    filename: str,
+    search_space_id: int,
+) -> tuple[str, str | None]:
+    """
+    Get unique identifier hash, using file_id for Google Drive (stable across renames).
+
+    Returns:
+        Tuple of (primary_hash, legacy_hash or None).
+        For Google Drive: (file_id-based hash, filename-based hash for migration).
+        For other sources: (filename-based hash, None).
+    """
+    if connector and connector.get("type") == DocumentType.GOOGLE_DRIVE_FILE:
+        metadata = connector.get("metadata", {})
+        file_id = metadata.get("google_drive_file_id")
+
+        if file_id:
+            primary_hash = generate_unique_identifier_hash(
+                DocumentType.GOOGLE_DRIVE_FILE, file_id, search_space_id
+            )
+            legacy_hash = generate_unique_identifier_hash(
+                DocumentType.GOOGLE_DRIVE_FILE, filename, search_space_id
+            )
+            return primary_hash, legacy_hash
+
+    primary_hash = generate_unique_identifier_hash(
+        DocumentType.FILE, filename, search_space_id
+    )
+    return primary_hash, None
+
+
+# ---------------------------------------------------------------------------
+# Document deduplication and migration
+# ---------------------------------------------------------------------------
+
+
+async def handle_existing_document_update(
+    session: AsyncSession,
+    existing_document: Document,
+    content_hash: str,
+    connector: dict | None,
+    filename: str,
+    primary_hash: str,
+) -> tuple[bool, Document | None]:
+    """
+    Handle update logic for an existing document.
+
+    Returns:
+        Tuple of (should_skip_processing, document_to_return):
+        - (True, document): Content unchanged, return existing document
+        - (False, None): Content changed, needs re-processing
+    """
+    if existing_document.unique_identifier_hash != primary_hash:
+        existing_document.unique_identifier_hash = primary_hash
+        logging.info(f"Migrated document to file_id-based identifier: {filename}")
+
+    if existing_document.content_hash == content_hash:
+        if connector and connector.get("type") == DocumentType.GOOGLE_DRIVE_FILE:
+            connector_metadata = connector.get("metadata", {})
+            new_name = connector_metadata.get("google_drive_file_name")
+            doc_metadata = existing_document.document_metadata or {}
+            old_name = doc_metadata.get("FILE_NAME") or doc_metadata.get(
+                "google_drive_file_name"
+            )
+
+            if new_name and old_name and old_name != new_name:
+                from sqlalchemy.orm.attributes import flag_modified
+
+                existing_document.title = new_name
+                if not existing_document.document_metadata:
+                    existing_document.document_metadata = {}
+                existing_document.document_metadata["FILE_NAME"] = new_name
+                existing_document.document_metadata["google_drive_file_name"] = new_name
+                flag_modified(existing_document, "document_metadata")
+                await session.commit()
+                logging.info(
+                    f"File renamed in Google Drive: '{old_name}' → '{new_name}' "
+                    f"(no re-processing needed)"
+                )
+
+        logging.info(f"Document for file {filename} unchanged. Skipping.")
+        return True, existing_document
+
+    # Content has changed — guard against content_hash collision before
+    # expensive ETL processing.
+    collision_doc = await check_duplicate_document(session, content_hash)
+    if collision_doc and collision_doc.id != existing_document.id:
+        logging.warning(
+            "Content-hash collision for %s: identical content exists in "
+            "document #%s (%s). Skipping re-processing.",
+            filename,
+            collision_doc.id,
+            collision_doc.document_type,
+        )
+        if DocumentStatus.is_state(
+            existing_document.status, DocumentStatus.PENDING
+        ) or DocumentStatus.is_state(
+            existing_document.status, DocumentStatus.PROCESSING
+        ):
+            await session.delete(existing_document)
+            await session.commit()
+            return True, None
+
+        return True, existing_document
+
+    logging.info(f"Content changed for file {filename}. Updating document.")
+    return False, None
+
+
+async def find_existing_document_with_migration(
+    session: AsyncSession,
+    primary_hash: str,
+    legacy_hash: str | None,
+    content_hash: str | None = None,
+) -> Document | None:
+    """
+    Find existing document, checking primary hash, legacy hash, and content_hash.
+
+    Supports migration from filename-based to file_id-based hashing for
+    Google Drive files, with content_hash fallback for cross-source dedup.
+    """
+    existing_document = await check_document_by_unique_identifier(session, primary_hash)
+
+    if not existing_document and legacy_hash:
+        existing_document = await check_document_by_unique_identifier(
+            session, legacy_hash
+        )
+        if existing_document:
+            logging.info(
+                "Found legacy document (filename-based hash), "
+                "will migrate to file_id-based hash"
+            )
+
+    if not existing_document and content_hash:
+        existing_document = await check_duplicate_document(session, content_hash)
+        if existing_document:
+            logging.info(
+                f"Found duplicate content from different source (content_hash match). "
+                f"Original document ID: {existing_document.id}, "
+                f"type: {existing_document.document_type}"
+            )
+
+    return existing_document
+
+
+# ---------------------------------------------------------------------------
+# Connector helpers
+# ---------------------------------------------------------------------------
+
+
+async def update_document_from_connector(
+    document: Document | None,
+    connector: dict | None,
+    session: AsyncSession,
+) -> None:
+    """Update document type, metadata, and connector_id from connector info."""
+    if not document or not connector:
+        return
+    if "type" in connector:
+        document.document_type = connector["type"]
+    if "metadata" in connector:
+        if not document.document_metadata:
+            document.document_metadata = connector["metadata"]
+        else:
+            merged = {**document.document_metadata, **connector["metadata"]}
+            document.document_metadata = merged
+    if "connector_id" in connector:
+        document.connector_id = connector["connector_id"]
+    await session.commit()
+
+
+# ---------------------------------------------------------------------------
+# Timeout calculations
+# ---------------------------------------------------------------------------
+
+
+def calculate_upload_timeout(file_size_bytes: int) -> float:
+    """Calculate upload timeout based on file size (conservative for slow connections)."""
+    estimated_time = (file_size_bytes / UPLOAD_BYTES_PER_SECOND_SLOW) * 1.5
+    return max(MIN_UPLOAD_TIMEOUT, min(estimated_time, MAX_UPLOAD_TIMEOUT))
+
+
+def calculate_job_timeout(estimated_pages: int, file_size_bytes: int) -> float:
+    """Calculate job processing timeout based on page count and file size."""
+    page_based_timeout = BASE_JOB_TIMEOUT + (estimated_pages * PER_PAGE_JOB_TIMEOUT)
+    size_based_timeout = BASE_JOB_TIMEOUT + (file_size_bytes / (10 * 1024 * 1024)) * 60
+    return max(page_based_timeout, size_based_timeout)
--- a/surfsense_backend/app/tasks/document_processors/_save.py
+++ b/surfsense_backend/app/tasks/document_processors/_save.py
@ -0,0 +1,285 @@
+"""
+Unified document save/update logic for file processors.
+
+Replaces the three nearly-identical ``add_received_file_document_using_*``
+functions with a single ``save_file_document`` function plus thin wrappers
+for backward compatibility.
+"""
+
+import logging
+
+from langchain_core.documents import Document as LangChainDocument
+from sqlalchemy.exc import SQLAlchemyError
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import Document, DocumentStatus, DocumentType
+from app.services.llm_service import get_user_long_context_llm
+from app.utils.document_converters import (
+    create_document_chunks,
+    embed_text,
+    generate_content_hash,
+    generate_document_summary,
+)
+
+from ._helpers import (
+    find_existing_document_with_migration,
+    get_google_drive_unique_identifier,
+    handle_existing_document_update,
+)
+from .base import get_current_timestamp, safe_set_chunks
+
+# ---------------------------------------------------------------------------
+# Summary generation
+# ---------------------------------------------------------------------------
+
+
+async def _generate_summary(
+    markdown_content: str,
+    file_name: str,
+    etl_service: str,
+    user_llm,
+    enable_summary: bool,
+) -> tuple[str, list[float]]:
+    """
+    Generate a document summary and embedding.
+
+    Docling uses its own large-document summary strategy; other ETL services
+    use the standard ``generate_document_summary`` helper.
+    """
+    if not enable_summary:
+        summary = f"File: {file_name}\n\n{markdown_content[:4000]}"
+        return summary, embed_text(summary)
+
+    if etl_service == "DOCLING":
+        from app.services.docling_service import create_docling_service
+
+        docling_service = create_docling_service()
+        summary_text = await docling_service.process_large_document_summary(
+            content=markdown_content, llm=user_llm, document_title=file_name
+        )
+
+        meta = {
+            "file_name": file_name,
+            "etl_service": etl_service,
+            "document_type": "File Document",
+        }
+        parts = ["# DOCUMENT METADATA"]
+        for key, value in meta.items():
+            if value:
+                formatted_key = key.replace("_", " ").title()
+                parts.append(f"**{formatted_key}:** {value}")
+
+        enhanced = "\n".join(parts) + "\n\n# DOCUMENT SUMMARY\n\n" + summary_text
+        return enhanced, embed_text(enhanced)
+
+    # Standard summary (Unstructured / LlamaCloud / others)
+    meta = {
+        "file_name": file_name,
+        "etl_service": etl_service,
+        "document_type": "File Document",
+    }
+    return await generate_document_summary(markdown_content, user_llm, meta)
+
+
+# ---------------------------------------------------------------------------
+# Unified save function
+# ---------------------------------------------------------------------------
+
+
+async def save_file_document(
+    session: AsyncSession,
+    file_name: str,
+    markdown_content: str,
+    search_space_id: int,
+    user_id: str,
+    etl_service: str,
+    connector: dict | None = None,
+    enable_summary: bool = True,
+) -> Document | None:
+    """
+    Process and store a file document with deduplication and migration support.
+
+    Handles both creating new documents and updating existing ones.  This is
+    the single implementation behind the per-ETL-service wrapper functions.
+
+    Args:
+        session: Database session
+        file_name: Name of the processed file
+        markdown_content: Markdown content to store
+        search_space_id: ID of the search space
+        user_id: ID of the user
+        etl_service: Name of the ETL service (UNSTRUCTURED, LLAMACLOUD, DOCLING)
+        connector: Optional connector info for Google Drive files
+        enable_summary: Whether to generate an AI summary
+
+    Returns:
+        Document object if successful, None if duplicate detected
+    """
+    try:
+        primary_hash, legacy_hash = get_google_drive_unique_identifier(
+            connector, file_name, search_space_id
+        )
+        content_hash = generate_content_hash(markdown_content, search_space_id)
+
+        existing_document = await find_existing_document_with_migration(
+            session, primary_hash, legacy_hash, content_hash
+        )
+
+        if existing_document:
+            should_skip, doc = await handle_existing_document_update(
+                session,
+                existing_document,
+                content_hash,
+                connector,
+                file_name,
+                primary_hash,
+            )
+            if should_skip:
+                return doc
+
+        user_llm = await get_user_long_context_llm(session, user_id, search_space_id)
+        if not user_llm:
+            raise RuntimeError(
+                f"No long context LLM configured for user {user_id} "
+                f"in search space {search_space_id}"
+            )
+
+        summary_content, summary_embedding = await _generate_summary(
+            markdown_content, file_name, etl_service, user_llm, enable_summary
+        )
+        chunks = await create_document_chunks(markdown_content)
+        doc_metadata = {"FILE_NAME": file_name, "ETL_SERVICE": etl_service}
+
+        if existing_document:
+            existing_document.title = file_name
+            existing_document.content = summary_content
+            existing_document.content_hash = content_hash
+            existing_document.embedding = summary_embedding
+            existing_document.document_metadata = doc_metadata
+            await safe_set_chunks(session, existing_document, chunks)
+            existing_document.source_markdown = markdown_content
+            existing_document.content_needs_reindexing = False
+            existing_document.updated_at = get_current_timestamp()
+            existing_document.status = DocumentStatus.ready()
+
+            await session.commit()
+            await session.refresh(existing_document)
+            return existing_document
+
+        doc_type = DocumentType.FILE
+        if connector and connector.get("type") == DocumentType.GOOGLE_DRIVE_FILE:
+            doc_type = DocumentType.GOOGLE_DRIVE_FILE
+
+        document = Document(
+            search_space_id=search_space_id,
+            title=file_name,
+            document_type=doc_type,
+            document_metadata=doc_metadata,
+            content=summary_content,
+            embedding=summary_embedding,
+            chunks=chunks,
+            content_hash=content_hash,
+            unique_identifier_hash=primary_hash,
+            source_markdown=markdown_content,
+            content_needs_reindexing=False,
+            updated_at=get_current_timestamp(),
+            created_by_id=user_id,
+            connector_id=connector.get("connector_id") if connector else None,
+            status=DocumentStatus.ready(),
+        )
+        session.add(document)
+        await session.commit()
+        await session.refresh(document)
+        return document
+
+    except SQLAlchemyError as db_error:
+        await session.rollback()
+        if "ix_documents_content_hash" in str(db_error):
+            logging.warning(
+                "content_hash collision during commit for %s (%s). Skipping.",
+                file_name,
+                etl_service,
+            )
+            return None
+        raise db_error
+    except Exception as e:
+        await session.rollback()
+        raise RuntimeError(
+            f"Failed to process file document using {etl_service}: {e!s}"
+        ) from e
+
+
+# ---------------------------------------------------------------------------
+# Backward-compatible wrapper functions
+# ---------------------------------------------------------------------------
+
+
+async def add_received_file_document_using_unstructured(
+    session: AsyncSession,
+    file_name: str,
+    unstructured_processed_elements: list[LangChainDocument],
+    search_space_id: int,
+    user_id: str,
+    connector: dict | None = None,
+    enable_summary: bool = True,
+) -> Document | None:
+    """Process and store a file document using the Unstructured service."""
+    from app.utils.document_converters import convert_document_to_markdown
+
+    markdown_content = await convert_document_to_markdown(
+        unstructured_processed_elements
+    )
+    return await save_file_document(
+        session,
+        file_name,
+        markdown_content,
+        search_space_id,
+        user_id,
+        "UNSTRUCTURED",
+        connector,
+        enable_summary,
+    )
+
+
+async def add_received_file_document_using_llamacloud(
+    session: AsyncSession,
+    file_name: str,
+    llamacloud_markdown_document: str,
+    search_space_id: int,
+    user_id: str,
+    connector: dict | None = None,
+    enable_summary: bool = True,
+) -> Document | None:
+    """Process and store document content parsed by LlamaCloud."""
+    return await save_file_document(
+        session,
+        file_name,
+        llamacloud_markdown_document,
+        search_space_id,
+        user_id,
+        "LLAMACLOUD",
+        connector,
+        enable_summary,
+    )
+
+
+async def add_received_file_document_using_docling(
+    session: AsyncSession,
+    file_name: str,
+    docling_markdown_document: str,
+    search_space_id: int,
+    user_id: str,
+    connector: dict | None = None,
+    enable_summary: bool = True,
+) -> Document | None:
+    """Process and store document content parsed by Docling."""
+    return await save_file_document(
+        session,
+        file_name,
+        docling_markdown_document,
+        search_space_id,
+        user_id,
+        "DOCLING",
+        connector,
+        enable_summary,
+    )
--- a/surfsense_backend/app/tasks/document_processors/file_processors.py
+++ b/surfsense_backend/app/tasks/document_processors/file_processors.py
--- a/surfsense_backend/app/tasks/document_processors/markdown_processor.py
+++ b/surfsense_backend/app/tasks/document_processors/markdown_processor.py
@ -14,88 +14,19 @@ from app.utils.document_converters import (
    create_document_chunks,
    generate_content_hash,
    generate_document_summary,
-    generate_unique_identifier_hash,
 )

+from ._helpers import (
+    find_existing_document_with_migration,
+    get_google_drive_unique_identifier,
+)
 from .base import (
-    check_document_by_unique_identifier,
    check_duplicate_document,
    get_current_timestamp,
    safe_set_chunks,
 )


-def _get_google_drive_unique_identifier(
-    connector: dict | None,
-    filename: str,
-    search_space_id: int,
-) -> tuple[str, str | None]:
-    """
-    Get unique identifier hash for a file, with special handling for Google Drive.
-
-    For Google Drive files, uses file_id as the unique identifier (doesn't change on rename).
-    For other files, uses filename.
-
-    Args:
-        connector: Optional connector info dict with type and metadata
-        filename: The filename (used for non-Google Drive files or as fallback)
-        search_space_id: The search space ID
-
-    Returns:
-        Tuple of (primary_hash, legacy_hash or None)
-    """
-    if connector and connector.get("type") == DocumentType.GOOGLE_DRIVE_FILE:
-        metadata = connector.get("metadata", {})
-        file_id = metadata.get("google_drive_file_id")
-
-        if file_id:
-            primary_hash = generate_unique_identifier_hash(
-                DocumentType.GOOGLE_DRIVE_FILE, file_id, search_space_id
-            )
-            legacy_hash = generate_unique_identifier_hash(
-                DocumentType.GOOGLE_DRIVE_FILE, filename, search_space_id
-            )
-            return primary_hash, legacy_hash
-
-    primary_hash = generate_unique_identifier_hash(
-        DocumentType.FILE, filename, search_space_id
-    )
-    return primary_hash, None
-
-
-async def _find_existing_document_with_migration(
-    session: AsyncSession,
-    primary_hash: str,
-    legacy_hash: str | None,
-    content_hash: str | None = None,
-) -> Document | None:
-    """
-    Find existing document, checking both new hash and legacy hash for migration,
-    with fallback to content_hash for cross-source deduplication.
-    """
-    existing_document = await check_document_by_unique_identifier(session, primary_hash)
-
-    if not existing_document and legacy_hash:
-        existing_document = await check_document_by_unique_identifier(
-            session, legacy_hash
-        )
-        if existing_document:
-            logging.info(
-                "Found legacy document (filename-based hash), will migrate to file_id-based hash"
-            )
-
-    # Fallback: check by content_hash to catch duplicates from different sources
-    if not existing_document and content_hash:
-        existing_document = await check_duplicate_document(session, content_hash)
-        if existing_document:
-            logging.info(
-                f"Found duplicate content from different source (content_hash match). "
-                f"Original document ID: {existing_document.id}, type: {existing_document.document_type}"
-            )
-
-    return existing_document
-
-
 async def _handle_existing_document_update(
    session: AsyncSession,
    existing_document: Document,
@ -224,7 +155,7 @@ async def add_received_markdown_file_document(

    try:
        # Generate unique identifier hash (uses file_id for Google Drive, filename for others)
-        primary_hash, legacy_hash = _get_google_drive_unique_identifier(
+        primary_hash, legacy_hash = get_google_drive_unique_identifier(
            connector, file_name, search_space_id
        )

@ -232,7 +163,7 @@ async def add_received_markdown_file_document(
        content_hash = generate_content_hash(file_in_markdown, search_space_id)

        # Check if document exists (with migration support for Google Drive and content_hash fallback)
-        existing_document = await _find_existing_document_with_migration(
+        existing_document = await find_existing_document_with_migration(
            session, primary_hash, legacy_hash, content_hash
        )

--- a/surfsense_backend/app/utils/document_versioning.py
+++ b/surfsense_backend/app/utils/document_versioning.py
@ -0,0 +1,107 @@
+"""Document versioning: snapshot creation and cleanup.
+
+Rules:
+- 30-minute debounce window: if the latest version was created < 30 min ago,
+  overwrite it instead of creating a new row.
+- Maximum 20 versions per document.
+- Versions older than 90 days are cleaned up.
+"""
+
+from datetime import UTC, datetime, timedelta
+
+from sqlalchemy import delete, func, select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import Document, DocumentVersion
+
+MAX_VERSIONS_PER_DOCUMENT = 20
+DEBOUNCE_MINUTES = 30
+RETENTION_DAYS = 90
+
+
+def _now() -> datetime:
+    return datetime.now(UTC)
+
+
+async def create_version_snapshot(
+    session: AsyncSession,
+    document: Document,
+) -> DocumentVersion | None:
+    """Snapshot the document's current state into a DocumentVersion row.
+
+    Returns the created/updated DocumentVersion, or None if nothing was done.
+    """
+    now = _now()
+
+    latest = (
+        await session.execute(
+            select(DocumentVersion)
+            .where(DocumentVersion.document_id == document.id)
+            .order_by(DocumentVersion.version_number.desc())
+            .limit(1)
+        )
+    ).scalar_one_or_none()
+
+    if latest is not None:
+        age = now - latest.created_at.replace(tzinfo=UTC)
+        if age < timedelta(minutes=DEBOUNCE_MINUTES):
+            latest.source_markdown = document.source_markdown
+            latest.content_hash = document.content_hash
+            latest.title = document.title
+            latest.created_at = now
+            await session.flush()
+            return latest
+
+    max_num = (
+        await session.execute(
+            select(func.coalesce(func.max(DocumentVersion.version_number), 0)).where(
+                DocumentVersion.document_id == document.id
+            )
+        )
+    ).scalar_one()
+
+    version = DocumentVersion(
+        document_id=document.id,
+        version_number=max_num + 1,
+        source_markdown=document.source_markdown,
+        content_hash=document.content_hash,
+        title=document.title,
+        created_at=now,
+    )
+    session.add(version)
+    await session.flush()
+
+    # Cleanup: remove versions older than 90 days
+    cutoff = now - timedelta(days=RETENTION_DAYS)
+    await session.execute(
+        delete(DocumentVersion).where(
+            DocumentVersion.document_id == document.id,
+            DocumentVersion.created_at < cutoff,
+        )
+    )
+
+    # Cleanup: cap at MAX_VERSIONS_PER_DOCUMENT
+    count = (
+        await session.execute(
+            select(func.count())
+            .select_from(DocumentVersion)
+            .where(DocumentVersion.document_id == document.id)
+        )
+    ).scalar_one()
+
+    if count > MAX_VERSIONS_PER_DOCUMENT:
+        excess = count - MAX_VERSIONS_PER_DOCUMENT
+        oldest_ids_result = await session.execute(
+            select(DocumentVersion.id)
+            .where(DocumentVersion.document_id == document.id)
+            .order_by(DocumentVersion.version_number.asc())
+            .limit(excess)
+        )
+        oldest_ids = [row[0] for row in oldest_ids_result.all()]
+        if oldest_ids:
+            await session.execute(
+                delete(DocumentVersion).where(DocumentVersion.id.in_(oldest_ids))
+            )
+
+    await session.flush()
+    return version
--- a/surfsense_backend/tests/integration/document_upload/test_upload_limits.py
+++ b/surfsense_backend/tests/integration/document_upload/test_upload_limits.py
@ -2,12 +2,11 @@
 Integration tests for backend file upload limit enforcement.

 These tests verify that the API rejects uploads that exceed:
-  - Max files per upload (10)
-  - Max per-file size (50 MB)
-  - Max total upload size (200 MB)
+  - Max per-file size (500 MB)

-The limits mirror the frontend's DocumentUploadTab.tsx constants and are
-enforced server-side to protect against direct API calls.
+No file count or total size limits are enforced — the frontend batches
+uploads in groups of 5 and there is no cap on how many files a user can
+upload in a single session.

 Prerequisites:
  - PostgreSQL + pgvector
@ -24,60 +23,12 @@ pytestmark = pytest.mark.integration


 # ---------------------------------------------------------------------------
-# Test A: File count limit
-# ---------------------------------------------------------------------------
-
-
-class TestFileCountLimit:
-    """Uploading more than 10 files in a single request should be rejected."""
-
-    async def test_11_files_returns_413(
-        self,
-        client: httpx.AsyncClient,
-        headers: dict[str, str],
-        search_space_id: int,
-    ):
-        files = [
-            ("files", (f"file_{i}.txt", io.BytesIO(b"test content"), "text/plain"))
-            for i in range(11)
-        ]
-        resp = await client.post(
-            "/api/v1/documents/fileupload",
-            headers=headers,
-            files=files,
-            data={"search_space_id": str(search_space_id)},
-        )
-        assert resp.status_code == 413
-        assert "too many files" in resp.json()["detail"].lower()
-
-    async def test_10_files_accepted(
-        self,
-        client: httpx.AsyncClient,
-        headers: dict[str, str],
-        search_space_id: int,
-        cleanup_doc_ids: list[int],
-    ):
-        files = [
-            ("files", (f"file_{i}.txt", io.BytesIO(b"test content"), "text/plain"))
-            for i in range(10)
-        ]
-        resp = await client.post(
-            "/api/v1/documents/fileupload",
-            headers=headers,
-            files=files,
-            data={"search_space_id": str(search_space_id)},
-        )
-        assert resp.status_code == 200
-        cleanup_doc_ids.extend(resp.json().get("document_ids", []))
-
-
-# ---------------------------------------------------------------------------
-# Test B: Per-file size limit
+# Test: Per-file size limit (500 MB)
 # ---------------------------------------------------------------------------


 class TestPerFileSizeLimit:
-    """A single file exceeding 50 MB should be rejected."""
+    """A single file exceeding 500 MB should be rejected."""

    async def test_oversized_file_returns_413(
        self,
@ -85,7 +36,7 @@ class TestPerFileSizeLimit:
        headers: dict[str, str],
        search_space_id: int,
    ):
-        oversized = io.BytesIO(b"\x00" * (50 * 1024 * 1024 + 1))
+        oversized = io.BytesIO(b"\x00" * (500 * 1024 * 1024 + 1))
        resp = await client.post(
            "/api/v1/documents/fileupload",
            headers=headers,
@ -102,11 +53,11 @@ class TestPerFileSizeLimit:
        search_space_id: int,
        cleanup_doc_ids: list[int],
    ):
-        at_limit = io.BytesIO(b"\x00" * (50 * 1024 * 1024))
+        at_limit = io.BytesIO(b"\x00" * (500 * 1024 * 1024))
        resp = await client.post(
            "/api/v1/documents/fileupload",
            headers=headers,
-            files=[("files", ("exact50mb.txt", at_limit, "text/plain"))],
+            files=[("files", ("exact500mb.txt", at_limit, "text/plain"))],
            data={"search_space_id": str(search_space_id)},
        )
        assert resp.status_code == 200
@ -114,26 +65,23 @@ class TestPerFileSizeLimit:


 # ---------------------------------------------------------------------------
-# Test C: Total upload size limit
+# Test: Multiple files accepted without count limit
 # ---------------------------------------------------------------------------


-class TestTotalSizeLimit:
-    """Multiple files whose combined size exceeds 200 MB should be rejected."""
+class TestNoFileCountLimit:
+    """Many files in a single request should be accepted."""

-    async def test_total_size_over_200mb_returns_413(
+    async def test_many_files_accepted(
        self,
        client: httpx.AsyncClient,
        headers: dict[str, str],
        search_space_id: int,
+        cleanup_doc_ids: list[int],
    ):
-        chunk_size = 45 * 1024 * 1024  # 45 MB each
        files = [
-            (
-                "files",
-                (f"chunk_{i}.txt", io.BytesIO(b"\x00" * chunk_size), "text/plain"),
-            )
-            for i in range(5)  # 5 x 45 MB = 225 MB > 200 MB
+            ("files", (f"file_{i}.txt", io.BytesIO(b"test content"), "text/plain"))
+            for i in range(20)
        ]
        resp = await client.post(
            "/api/v1/documents/fileupload",
@ -141,5 +89,5 @@ class TestTotalSizeLimit:
            files=files,
            data={"search_space_id": str(search_space_id)},
        )
-        assert resp.status_code == 413
-        assert "total upload size" in resp.json()["detail"].lower()
+        assert resp.status_code == 200
+        cleanup_doc_ids.extend(resp.json().get("document_ids", []))
--- a/surfsense_backend/tests/integration/indexing_pipeline/test_local_folder_pipeline.py
+++ b/surfsense_backend/tests/integration/indexing_pipeline/test_local_folder_pipeline.py
--- a/surfsense_backend/tests/integration/test_document_versioning.py
+++ b/surfsense_backend/tests/integration/test_document_versioning.py
@ -0,0 +1,167 @@
+"""Integration tests for document versioning snapshot + cleanup."""
+
+from datetime import UTC, datetime, timedelta
+
+import pytest
+import pytest_asyncio
+from sqlalchemy import func, select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import Document, DocumentType, DocumentVersion, SearchSpace, User
+
+pytestmark = pytest.mark.integration
+
+
+@pytest_asyncio.fixture
+async def db_document(
+    db_session: AsyncSession, db_user: User, db_search_space: SearchSpace
+) -> Document:
+    doc = Document(
+        title="Test Doc",
+        document_type=DocumentType.LOCAL_FOLDER_FILE,
+        document_metadata={},
+        content="Summary of test doc.",
+        content_hash="abc123",
+        unique_identifier_hash="local_folder:test-folder:test.md",
+        source_markdown="# Test\n\nOriginal content.",
+        search_space_id=db_search_space.id,
+        created_by_id=db_user.id,
+    )
+    db_session.add(doc)
+    await db_session.flush()
+    return doc
+
+
+async def _version_count(session: AsyncSession, document_id: int) -> int:
+    result = await session.execute(
+        select(func.count())
+        .select_from(DocumentVersion)
+        .where(DocumentVersion.document_id == document_id)
+    )
+    return result.scalar_one()
+
+
+async def _get_versions(
+    session: AsyncSession, document_id: int
+) -> list[DocumentVersion]:
+    result = await session.execute(
+        select(DocumentVersion)
+        .where(DocumentVersion.document_id == document_id)
+        .order_by(DocumentVersion.version_number)
+    )
+    return list(result.scalars().all())
+
+
+class TestCreateVersionSnapshot:
+    """V1-V5: TDD slices for create_version_snapshot."""
+
+    async def test_v1_creates_first_version(self, db_session, db_document):
+        """V1: First snapshot creates version 1 with the document's current state."""
+        from app.utils.document_versioning import create_version_snapshot
+
+        await create_version_snapshot(db_session, db_document)
+
+        versions = await _get_versions(db_session, db_document.id)
+        assert len(versions) == 1
+        assert versions[0].version_number == 1
+        assert versions[0].source_markdown == "# Test\n\nOriginal content."
+        assert versions[0].content_hash == "abc123"
+        assert versions[0].title == "Test Doc"
+        assert versions[0].document_id == db_document.id
+
+    async def test_v2_creates_version_2_after_30_min(
+        self, db_session, db_document, monkeypatch
+    ):
+        """V2: After 30+ minutes, a new version is created (not overwritten)."""
+        from app.utils.document_versioning import create_version_snapshot
+
+        t0 = datetime(2025, 1, 1, 12, 0, 0, tzinfo=UTC)
+        monkeypatch.setattr("app.utils.document_versioning._now", lambda: t0)
+        await create_version_snapshot(db_session, db_document)
+
+        # Simulate content change and time passing
+        db_document.source_markdown = "# Test\n\nUpdated content."
+        db_document.content_hash = "def456"
+        t1 = t0 + timedelta(minutes=31)
+        monkeypatch.setattr("app.utils.document_versioning._now", lambda: t1)
+        await create_version_snapshot(db_session, db_document)
+
+        versions = await _get_versions(db_session, db_document.id)
+        assert len(versions) == 2
+        assert versions[0].version_number == 1
+        assert versions[1].version_number == 2
+        assert versions[1].source_markdown == "# Test\n\nUpdated content."
+
+    async def test_v3_overwrites_within_30_min(
+        self, db_session, db_document, monkeypatch
+    ):
+        """V3: Within 30 minutes, the latest version is overwritten."""
+        from app.utils.document_versioning import create_version_snapshot
+
+        t0 = datetime(2025, 1, 1, 12, 0, 0, tzinfo=UTC)
+        monkeypatch.setattr("app.utils.document_versioning._now", lambda: t0)
+        await create_version_snapshot(db_session, db_document)
+        count_after_first = await _version_count(db_session, db_document.id)
+        assert count_after_first == 1
+
+        # Simulate quick edit within 30 minutes
+        db_document.source_markdown = "# Test\n\nQuick edit."
+        db_document.content_hash = "quick123"
+        t1 = t0 + timedelta(minutes=10)
+        monkeypatch.setattr("app.utils.document_versioning._now", lambda: t1)
+        await create_version_snapshot(db_session, db_document)
+
+        count_after_second = await _version_count(db_session, db_document.id)
+        assert count_after_second == 1  # still 1, not 2
+
+        versions = await _get_versions(db_session, db_document.id)
+        assert versions[0].source_markdown == "# Test\n\nQuick edit."
+        assert versions[0].content_hash == "quick123"
+
+    async def test_v4_cleanup_90_day_old_versions(
+        self, db_session, db_document, monkeypatch
+    ):
+        """V4: Versions older than 90 days are cleaned up."""
+        from app.utils.document_versioning import create_version_snapshot
+
+        base = datetime(2025, 1, 1, 12, 0, 0, tzinfo=UTC)
+
+        # Create 5 versions spread across time: 3 older than 90 days, 2 recent
+        for i in range(5):
+            db_document.source_markdown = f"Content v{i + 1}"
+            db_document.content_hash = f"hash_{i + 1}"
+            t = base + timedelta(days=i) if i < 3 else base + timedelta(days=100 + i)
+            monkeypatch.setattr("app.utils.document_versioning._now", lambda _t=t: _t)
+            await create_version_snapshot(db_session, db_document)
+
+        # Now trigger cleanup from a "current" time that makes the first 3 versions > 90 days old
+        now = base + timedelta(days=200)
+        monkeypatch.setattr("app.utils.document_versioning._now", lambda: now)
+        db_document.source_markdown = "Content v6"
+        db_document.content_hash = "hash_6"
+        await create_version_snapshot(db_session, db_document)
+
+        versions = await _get_versions(db_session, db_document.id)
+        # The first 3 (old) should be cleaned up; versions 4, 5, 6 remain
+        for v in versions:
+            age = now - v.created_at.replace(tzinfo=UTC)
+            assert age <= timedelta(days=90), f"Version {v.version_number} is too old"
+
+    async def test_v5_cap_at_20_versions(self, db_session, db_document, monkeypatch):
+        """V5: More than 20 versions triggers cap — oldest gets deleted."""
+        from app.utils.document_versioning import create_version_snapshot
+
+        base = datetime(2025, 6, 1, 12, 0, 0, tzinfo=UTC)
+
+        # Create 21 versions (all within 90 days, each 31 min apart)
+        for i in range(21):
+            db_document.source_markdown = f"Content v{i + 1}"
+            db_document.content_hash = f"hash_{i + 1}"
+            t = base + timedelta(minutes=31 * i)
+            monkeypatch.setattr("app.utils.document_versioning._now", lambda _t=t: _t)
+            await create_version_snapshot(db_session, db_document)
+
+        versions = await _get_versions(db_session, db_document.id)
+        assert len(versions) == 20
+        # The lowest version_number should be 2 (version 1 was the oldest and got capped)
+        assert versions[0].version_number == 2
--- a/surfsense_backend/tests/unit/connector_indexers/test_local_folder_scan.py
+++ b/surfsense_backend/tests/unit/connector_indexers/test_local_folder_scan.py
@ -0,0 +1,78 @@
+"""Unit tests for scan_folder() pure logic — Tier 2 TDD slices (S1-S4)."""
+
+from pathlib import Path
+
+import pytest
+
+pytestmark = pytest.mark.unit
+
+
+class TestScanFolder:
+    """S1-S4: scan_folder() with real tmp_path filesystem."""
+
+    def test_s1_single_md_file(self, tmp_path: Path):
+        """S1: scan_folder on a dir with one .md file returns correct entry."""
+        from app.tasks.connector_indexers.local_folder_indexer import scan_folder
+
+        md = tmp_path / "note.md"
+        md.write_text("# Hello")
+
+        results = scan_folder(str(tmp_path))
+
+        assert len(results) == 1
+        entry = results[0]
+        assert entry["relative_path"] == "note.md"
+        assert entry["size"] > 0
+        assert "modified_at" in entry
+        assert entry["path"] == str(md)
+
+    def test_s2_extension_filter(self, tmp_path: Path):
+        """S2: file_extensions filter returns only matching files."""
+        from app.tasks.connector_indexers.local_folder_indexer import scan_folder
+
+        (tmp_path / "a.md").write_text("md")
+        (tmp_path / "b.txt").write_text("txt")
+        (tmp_path / "c.pdf").write_bytes(b"%PDF")
+
+        results = scan_folder(str(tmp_path), file_extensions=[".md"])
+        names = {r["relative_path"] for r in results}
+
+        assert names == {"a.md"}
+
+    def test_s3_exclude_patterns(self, tmp_path: Path):
+        """S3: exclude_patterns skips files inside excluded directories."""
+        from app.tasks.connector_indexers.local_folder_indexer import scan_folder
+
+        (tmp_path / "good.md").write_text("good")
+        nm = tmp_path / "node_modules"
+        nm.mkdir()
+        (nm / "dep.js").write_text("module")
+        git = tmp_path / ".git"
+        git.mkdir()
+        (git / "config").write_text("gitconfig")
+
+        results = scan_folder(str(tmp_path), exclude_patterns=["node_modules", ".git"])
+        names = {r["relative_path"] for r in results}
+
+        assert "good.md" in names
+        assert not any("node_modules" in n for n in names)
+        assert not any(".git" in n for n in names)
+
+    def test_s4_nested_dirs(self, tmp_path: Path):
+        """S4: nested subdirectories produce correct relative paths."""
+        from app.tasks.connector_indexers.local_folder_indexer import scan_folder
+
+        daily = tmp_path / "notes" / "daily"
+        daily.mkdir(parents=True)
+        weekly = tmp_path / "notes" / "weekly"
+        weekly.mkdir(parents=True)
+        (daily / "today.md").write_text("today")
+        (weekly / "review.md").write_text("review")
+        (tmp_path / "root.txt").write_text("root")
+
+        results = scan_folder(str(tmp_path))
+        paths = {r["relative_path"] for r in results}
+
+        assert "notes/daily/today.md" in paths or "notes\\daily\\today.md" in paths
+        assert "notes/weekly/review.md" in paths or "notes\\weekly\\review.md" in paths
+        assert "root.txt" in paths
--- a/surfsense_backend/tests/unit/middleware/test_knowledge_search.py
+++ b/surfsense_backend/tests/unit/middleware/test_knowledge_search.py
@ -248,7 +248,7 @@ class TestKnowledgeBaseSearchMiddlewarePlanner:
            return []

        async def fake_build_scoped_filesystem(**kwargs):
-            return {}
+            return {}, {}

        monkeypatch.setattr(
            "app.agents.new_chat.middleware.knowledge_search.search_knowledge_base",
@ -298,7 +298,7 @@ class TestKnowledgeBaseSearchMiddlewarePlanner:
            return []

        async def fake_build_scoped_filesystem(**kwargs):
-            return {}
+            return {}, {}

        monkeypatch.setattr(
            "app.agents.new_chat.middleware.knowledge_search.search_knowledge_base",
@ -334,7 +334,7 @@ class TestKnowledgeBaseSearchMiddlewarePlanner:
            return []

        async def fake_build_scoped_filesystem(**kwargs):
-            return {}
+            return {}, {}

        monkeypatch.setattr(
            "app.agents.new_chat.middleware.knowledge_search.search_knowledge_base",
--- a/surfsense_desktop/package.json
+++ b/surfsense_desktop/package.json
@ -30,6 +30,8 @@
  },
  "dependencies": {
    "bindings": "^1.5.0",
+    "chokidar": "^5.0.0",
+    "electron-store": "^11.0.2",
    "electron-updater": "^6.8.3",
    "get-port-please": "^3.2.0",
    "node-mac-permissions": "^2.5.0"
--- a/surfsense_desktop/pnpm-lock.yaml
+++ b/surfsense_desktop/pnpm-lock.yaml
@ -11,6 +11,12 @@ importers:
      bindings:
        specifier: ^1.5.0
        version: 1.5.0
+      chokidar:
+        specifier: ^5.0.0
+        version: 5.0.0
+      electron-store:
+        specifier: ^11.0.2
+        version: 11.0.2
      electron-updater:
        specifier: ^6.8.3
        version: 6.8.3
@ -362,6 +368,14 @@ packages:
    resolution: {integrity: sha512-MnA+YT8fwfJPgBx3m60MNqakm30XOkyIoH1y6huTQvC0PwZG7ki8NacLBcrPbNoo8vEZy7Jpuk7+jMO+CUovTQ==}
    engines: {node: '>= 14'}

+  ajv-formats@3.0.1:
+    resolution: {integrity: sha512-8iUql50EUR+uUcdRQ3HDqa6EVyo3docL8g5WJ3FNcWmu62IbkGUue/pEyLBW8VGKKucTPgqeks4fIU1DA4yowQ==}
+    peerDependencies:
+      ajv: ^8.0.0
+    peerDependenciesMeta:
+      ajv:
+        optional: true
+
  ajv-keywords@3.5.2:
    resolution: {integrity: sha512-5p6WTN0DdTGVQk6VjcEju19IgaHudalcfabD7yhDGeA6bcQnmL+CpveLJq/3hvfwd1aof6L386Ougkx6RfyMIQ==}
    peerDependencies:
@ -370,6 +384,9 @@ packages:
  ajv@6.14.0:
    resolution: {integrity: sha512-IWrosm/yrn43eiKqkfkHis7QioDleaXQHdDVPKg0FSwwd/DuvyX79TZnFOnYpB7dcsFAMmtFztZuXPDvSePkFw==}

+  ajv@8.18.0:
+    resolution: {integrity: sha512-PlXPeEWMXMZ7sPYOHqmDyCJzcfNrUr3fGNKtezX14ykXOEIvyK81d+qydx89KY5O71FKMPaQ2vBfBFI5NHR63A==}
+
  ansi-regex@5.0.1:
    resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==}
    engines: {node: '>=8'}
@ -421,6 +438,9 @@ packages:
    resolution: {integrity: sha512-+q/t7Ekv1EDY2l6Gda6LLiX14rU9TV20Wa3ofeQmwPFZbOMo9DXrLbOjFaaclkXKWidIaopwAObQDqwWtGUjqg==}
    engines: {node: '>= 4.0.0'}

+  atomically@2.1.1:
+    resolution: {integrity: sha512-P4w9o2dqARji6P7MHprklbfiArZAWvo07yW7qs3pdljb3BWr12FIB7W+p0zJiuiVsUpRO0iZn1kFFcpPegg0tQ==}
+
  axios@1.13.6:
    resolution: {integrity: sha512-ChTCHMouEe2kn713WHbQGcuYrr6fXTBiu460OTwWrWob16g1bXn4vtz07Ope7ewMozJAnEquLk5lWQWtBig9DQ==}

@ -490,6 +510,10 @@ packages:
    resolution: {integrity: sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==}
    engines: {node: '>=10'}

+  chokidar@5.0.0:
+    resolution: {integrity: sha512-TQMmc3w+5AxjpL8iIiwebF73dRDF4fBIieAqGn9RGCWaEVwQ6Fb2cGe31Yns0RRIzii5goJ1Y7xbMwo1TxMplw==}
+    engines: {node: '>= 20.19.0'}
+
  chownr@3.0.0:
    resolution: {integrity: sha512-+IxzY9BZOQd/XuYPRmrvEVjF/nqj5kgT4kEq7VofrDoM1MxoRjEWkrCC3EtLi59TVawxTAn+orJwFQcrqEN1+g==}
    engines: {node: '>=18'}
@ -559,6 +583,10 @@ packages:
    engines: {node: '>=18'}
    hasBin: true

+  conf@15.1.0:
+    resolution: {integrity: sha512-Uy5YN9KEu0WWDaZAVJ5FAmZoaJt9rdK6kH+utItPyGsCqCgaTKkrmZx3zoE0/3q6S3bcp3Ihkk+ZqPxWxFK5og==}
+    engines: {node: '>=20'}
+
  core-util-is@1.0.2:
    resolution: {integrity: sha512-3lqz5YjWTYnW6dlDa5TLaTCcShfar1e40rmcJVwCBJC6mWlFuj0eCHIElmG1g5kyuJ/GD+8Wn4FFCcz4gJPfaQ==}

@ -572,6 +600,10 @@ packages:
    resolution: {integrity: sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==}
    engines: {node: '>= 8'}

+  debounce-fn@6.0.0:
+    resolution: {integrity: sha512-rBMW+F2TXryBwB54Q0d8drNEI+TfoS9JpNTAoVpukbWEhjXQq4rySFYLaqXMFXwdv61Zb2OHtj5bviSoimqxRQ==}
+    engines: {node: '>=18'}
+
  debug@4.4.3:
    resolution: {integrity: sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==}
    engines: {node: '>=6.0'}
@ -623,6 +655,10 @@ packages:
    os: [darwin]
    hasBin: true

+  dot-prop@10.1.0:
+    resolution: {integrity: sha512-MVUtAugQMOff5RnBy2d9N31iG0lNwg1qAoAOn7pOK5wf94WIaE3My2p3uwTQuvS2AcqchkcR3bHByjaM0mmi7Q==}
+    engines: {node: '>=20'}
+
  dotenv-expand@11.0.7:
    resolution: {integrity: sha512-zIHwmZPRshsCdpMDyVsqGmgyP0yT8GAgXUnkdAoJisxvf33k7yO6OuoKmcTGuXPWSsm8Oh88nZicRLA9Y0rUeA==}
    engines: {node: '>=12'}
@ -658,6 +694,10 @@ packages:
  electron-publish@26.8.1:
    resolution: {integrity: sha512-q+jrSTIh/Cv4eGZa7oVR+grEJo/FoLMYBAnSL5GCtqwUpr1T+VgKB/dn1pnzxIxqD8S/jP1yilT9VrwCqINR4w==}

+  electron-store@11.0.2:
+    resolution: {integrity: sha512-4VkNRdN+BImL2KcCi41WvAYbh6zLX5AUTi4so68yPqiItjbgTjqpEnGAqasgnG+lB6GuAyUltKwVopp6Uv+gwQ==}
+    engines: {node: '>=20'}
+
  electron-updater@6.8.3:
    resolution: {integrity: sha512-Z6sgw3jgbikWKXei1ENdqFOxBP0WlXg3TtKfz0rgw2vIZFJUyI4pD7ZN7jrkm7EoMK+tcm/qTnPUdqfZukBlBQ==}

@ -686,6 +726,10 @@ packages:
    resolution: {integrity: sha512-+h1lkLKhZMTYjog1VEpJNG7NZJWcuc2DDk/qsqSTRRCOXiLjeQ1d1/udrUGhqMxUgAlwKNZ0cf2uqan5GLuS2A==}
    engines: {node: '>=6'}

+  env-paths@3.0.0:
+    resolution: {integrity: sha512-dtJUTepzMW3Lm/NPxRf3wP4642UWhjL2sQxc+ym2YMj1m/H2zDNQOlezafzkHwn6sMstjHTwG6iQQsctDW/b1A==}
+    engines: {node: ^12.20.0 || ^14.13.1 || >=16.0.0}
+
  err-code@2.0.3:
    resolution: {integrity: sha512-2bmlRpNKBxT/CRmPOlyISQpNj+qSeYvcym/uT0Jx2bMOlKLtSy1ZmLuVxSEKKyor/N5yhvp/ZiG1oE3DEYMSFA==}

@ -739,6 +783,9 @@ packages:
  fast-json-stable-stringify@2.1.0:
    resolution: {integrity: sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw==}

+  fast-uri@3.1.0:
+    resolution: {integrity: sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA==}
+
  fd-slicer@1.1.0:
    resolution: {integrity: sha512-cE1qsB/VwyQozZ+q1dGxR8LBYNZeofhEdUNGSMbQD3Gw2lAzX9Zb3uIU6Ebc/Fmyjo9AWWfnn0AUCHqtevs/8g==}

@ -969,6 +1016,12 @@ packages:
  json-schema-traverse@0.4.1:
    resolution: {integrity: sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg==}

+  json-schema-traverse@1.0.0:
+    resolution: {integrity: sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug==}
+
+  json-schema-typed@8.0.2:
+    resolution: {integrity: sha512-fQhoXdcvc3V28x7C7BMs4P5+kNlgUURe2jmUT1T//oBRMDrqy1QPelJimwZGo7Hg9VPV3EQV5Bnq4hbFy2vetA==}
+
  json-stringify-safe@5.0.1:
    resolution: {integrity: sha512-ZClg6AaYvamvYEE82d3Iyd3vSSIjQ+odgjaTzRuO3s7toCdFKczob2i0zCh7JE8kWn17yvAWhUVxvqGwUalsRA==}

@ -999,6 +1052,9 @@ packages:
  lodash@4.17.23:
    resolution: {integrity: sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==}

+  lodash@4.18.1:
+    resolution: {integrity: sha512-dMInicTPVE8d1e5otfwmmjlxkZoUpiVLwyeTdUsi/Caj/gfzzblBcCE5sRHV/AsjuCmxWrte2TNGSYuCeCq+0Q==}
+
  log-symbols@4.1.0:
    resolution: {integrity: sha512-8XPvpAA8uyhfteu8pIvQxpJZ7SYYdpUivZpGy6sFsBuKRY/7rQGavedeB8aK+Zkyq6upMFVL/9AW6vOYzfRyLg==}
    engines: {node: '>=10'}
@ -1043,6 +1099,10 @@ packages:
    resolution: {integrity: sha512-OqbOk5oEQeAZ8WXWydlu9HJjz9WVdEIvamMCcXmuqUYjTknH/sqsWvhQ3vgwKFRR1HpjvNBKQ37nbJgYzGqGcg==}
    engines: {node: '>=6'}

+  mimic-function@5.0.1:
+    resolution: {integrity: sha512-VP79XUPxV2CigYP3jWwAUFSku2aKqBH7uTAapFWCBqutsbmDo96KY5o8uh6U+/YSIn5OxJnXp73beVkpqMIGhA==}
+    engines: {node: '>=18'}
+
  mimic-response@1.0.1:
    resolution: {integrity: sha512-j5EctnkH7amfV/q5Hgmoal1g2QHFJRraOtmx0JpIqkxhBhI/lJSl1nMpQ45hVarwNETOoWEimndZ4QK0RHxuxQ==}
    engines: {node: '>=4'}
@ -1245,10 +1305,18 @@ packages:
    resolution: {integrity: sha512-9u/sniCrY3D5WdsERHzHE4G2YCXqoG5FTHUiCC4SIbr6XcLZBY05ya9EKjYek9O5xOAwjGq+1JdGBAS7Q9ScoA==}
    engines: {node: '>= 6'}

+  readdirp@5.0.0:
+    resolution: {integrity: sha512-9u/XQ1pvrQtYyMpZe7DXKv2p5CNvyVwzUB6uhLAnQwHMSgKMBR62lc7AHljaeteeHXn11XTAaLLUVZYVZyuRBQ==}
+    engines: {node: '>= 20.19.0'}
+
  require-directory@2.1.1:
    resolution: {integrity: sha512-fGxEI7+wsG9xrvdjsrlmL22OMTTiHRwAMroiEeMgq8gzoLC/PQr7RsRDSTLUg/bZAZtF+TVIkHc6/4RIKrui+Q==}
    engines: {node: '>=0.10.0'}

+  require-from-string@2.0.2:
+    resolution: {integrity: sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw==}
+    engines: {node: '>=0.10.0'}
+
  resedit@1.7.2:
    resolution: {integrity: sha512-vHjcY2MlAITJhC0eRD/Vv8Vlgmu9Sd3LX9zZvtGzU5ZImdTN3+d6e/4mnTyV8vEbyf1sgNIrWxhWlrys52OkEA==}
    engines: {node: '>=12', npm: '>=6'}
@ -1388,6 +1456,12 @@ packages:
    resolution: {integrity: sha512-yDPMNjp4WyfYBkHnjIRLfca1i6KMyGCtsVgoKe/z1+6vukgaENdgGBZt+ZmKPc4gavvEZ5OgHfHdrazhgNyG7w==}
    engines: {node: '>=12'}

+  stubborn-fs@2.0.0:
+    resolution: {integrity: sha512-Y0AvSwDw8y+nlSNFXMm2g6L51rBGdAQT20J3YSOqxC53Lo3bjWRtr2BKcfYoAf352WYpsZSTURrA0tqhfgudPA==}
+
+  stubborn-utils@1.0.2:
+    resolution: {integrity: sha512-zOh9jPYI+xrNOyisSelgym4tolKTJCQd5GBhK0+0xJvcYDcwlOoxF/rnFKQ2KRZknXSG9jWAp66fwP6AxN9STg==}
+
  sumchecker@3.0.1:
    resolution: {integrity: sha512-MvjXzkz/BOfyVDkG0oFOtBxHX2u3gKbMHIF/dXblZsgD3BWOFLmHovIpZY7BykJdAjcqRCBi1WYBNdEC9yI7vg==}
    engines: {node: '>= 8.0'}
@ -1400,6 +1474,10 @@ packages:
    resolution: {integrity: sha512-MpUEN2OodtUzxvKQl72cUF7RQ5EiHsGvSsVG0ia9c5RbWGL2CI4C7EpPS8UTBIplnlzZiNuV56w+FuNxy3ty2Q==}
    engines: {node: '>=10'}

+  tagged-tag@1.0.0:
+    resolution: {integrity: sha512-yEFYrVhod+hdNyx7g5Bnkkb0G6si8HJurOoOEgC8B/O0uXLHlaey/65KRv6cuWBNhBgHKAROVpc7QyYqE5gFng==}
+    engines: {node: '>=20'}
+
  tar@7.5.11:
    resolution: {integrity: sha512-ChjMH33/KetonMTAtpYdgUFr0tbz69Fp2v7zWxQfYZX4g5ZN2nOBXm1R2xyA+lMIKrLKIoKAwFj93jE/avX9cQ==}
    engines: {node: '>=18'}
@ -1442,11 +1520,19 @@ packages:
    resolution: {integrity: sha512-34R7HTnG0XIJcBSn5XhDd7nNFPRcXYRZrBB2O2jdKqYODldSzBAqzsWoZYYvduky73toYS/ESqxPvkDf/F0XMg==}
    engines: {node: '>=10'}

+  type-fest@5.5.0:
+    resolution: {integrity: sha512-PlBfpQwiUvGViBNX84Yxwjsdhd1TUlXr6zjX7eoirtCPIr08NAmxwa+fcYBTeRQxHo9YC9wwF3m9i700sHma8g==}
+    engines: {node: '>=20'}
+
  typescript@5.9.3:
    resolution: {integrity: sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==}
    engines: {node: '>=14.17'}
    hasBin: true

+  uint8array-extras@1.5.0:
+    resolution: {integrity: sha512-rvKSBiC5zqCCiDZ9kAOszZcDvdAHwwIKJG33Ykj43OKcWsnmcBRL09YTU4nOeHZ8Y2a7l1MgTd08SBe9A8Qj6A==}
+    engines: {node: '>=18'}
+
  undici-types@7.16.0:
    resolution: {integrity: sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw==}

@ -1490,6 +1576,9 @@ packages:
  wcwidth@1.0.1:
    resolution: {integrity: sha512-XHPEwS0q6TaxcvG85+8EYkbiCux2XtWG2mkc47Ng2A77BQu9+DqIOJldST4HgPkuea7dvKSj5VgX3P1d4rW8Tg==}

+  when-exit@2.1.5:
+    resolution: {integrity: sha512-VGkKJ564kzt6Ms1dbgPP/yuIoQCrsFAnRbptpC5wOEsDaNsbCB2bnfnaA8i/vRs5tjUSEOtIuvl9/MyVsvQZCg==}
+
  which@2.0.2:
    resolution: {integrity: sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==}
    engines: {node: '>= 8'}
@ -1850,6 +1939,10 @@ snapshots:

  agent-base@7.1.4: {}

+  ajv-formats@3.0.1(ajv@8.18.0):
+    optionalDependencies:
+      ajv: 8.18.0
+
  ajv-keywords@3.5.2(ajv@6.14.0):
    dependencies:
      ajv: 6.14.0
@ -1861,6 +1954,13 @@ snapshots:
      json-schema-traverse: 0.4.1
      uri-js: 4.4.1

+  ajv@8.18.0:
+    dependencies:
+      fast-deep-equal: 3.1.3
+      fast-uri: 3.1.0
+      json-schema-traverse: 1.0.0
+      require-from-string: 2.0.2
+
  ansi-regex@5.0.1: {}

  ansi-regex@6.2.2: {}
@ -1932,6 +2032,11 @@ snapshots:

  at-least-node@1.0.0: {}

+  atomically@2.1.1:
+    dependencies:
+      stubborn-fs: 2.0.0
+      when-exit: 2.1.5
+
  axios@1.13.6:
    dependencies:
      follow-redirects: 1.15.11
@ -2046,6 +2151,10 @@ snapshots:
      ansi-styles: 4.3.0
      supports-color: 7.2.0

+  chokidar@5.0.0:
+    dependencies:
+      readdirp: 5.0.0
+
  chownr@3.0.0: {}

  chromium-pickle-js@0.2.0: {}
@ -2106,6 +2215,18 @@ snapshots:
      tree-kill: 1.2.2
      yargs: 17.7.2

+  conf@15.1.0:
+    dependencies:
+      ajv: 8.18.0
+      ajv-formats: 3.0.1(ajv@8.18.0)
+      atomically: 2.1.1
+      debounce-fn: 6.0.0
+      dot-prop: 10.1.0
+      env-paths: 3.0.0
+      json-schema-typed: 8.0.2
+      semver: 7.7.4
+      uint8array-extras: 1.5.0
+
  core-util-is@1.0.2:
    optional: true

@ -2123,6 +2244,10 @@ snapshots:
      shebang-command: 2.0.0
      which: 2.0.2

+  debounce-fn@6.0.0:
+    dependencies:
+      mimic-function: 5.0.1
+
  debug@4.4.3:
    dependencies:
      ms: 2.1.3
@ -2188,6 +2313,10 @@ snapshots:
      verror: 1.10.1
    optional: true

+  dot-prop@10.1.0:
+    dependencies:
+      type-fest: 5.5.0
+
  dotenv-expand@11.0.7:
    dependencies:
      dotenv: 16.6.1
@ -2246,6 +2375,11 @@ snapshots:
    transitivePeerDependencies:
      - supports-color

+  electron-store@11.0.2:
+    dependencies:
+      conf: 15.1.0
+      type-fest: 5.5.0
+
  electron-updater@6.8.3:
    dependencies:
      builder-util-runtime: 9.5.1
@ -2264,7 +2398,7 @@ snapshots:
      '@electron/asar': 3.4.1
      debug: 4.4.3
      fs-extra: 7.0.1
-      lodash: 4.17.23
+      lodash: 4.18.1
      temp: 0.9.4
    optionalDependencies:
      '@electron/windows-sign': 1.2.2
@ -2294,6 +2428,8 @@ snapshots:

  env-paths@2.2.1: {}

+  env-paths@3.0.0: {}
+
  err-code@2.0.3: {}

  es-define-property@1.0.1: {}
@ -2367,6 +2503,8 @@ snapshots:

  fast-json-stable-stringify@2.1.0: {}

+  fast-uri@3.1.0: {}
+
  fd-slicer@1.1.0:
    dependencies:
      pend: 1.2.0
@ -2624,6 +2762,10 @@ snapshots:

  json-schema-traverse@0.4.1: {}

+  json-schema-traverse@1.0.0: {}
+
+  json-schema-typed@8.0.2: {}
+
  json-stringify-safe@5.0.1:
    optional: true

@ -2651,6 +2793,8 @@ snapshots:

  lodash@4.17.23: {}

+  lodash@4.18.1: {}
+
  log-symbols@4.1.0:
    dependencies:
      chalk: 4.1.2
@ -2697,6 +2841,8 @@ snapshots:

  mimic-fn@2.1.0: {}

+  mimic-function@5.0.1: {}
+
  mimic-response@1.0.1: {}

  mimic-response@3.1.0: {}
@ -2899,8 +3045,12 @@ snapshots:
      string_decoder: 1.3.0
      util-deprecate: 1.0.2

+  readdirp@5.0.0: {}
+
  require-directory@2.1.1: {}

+  require-from-string@2.0.2: {}
+
  resedit@1.7.2:
    dependencies:
      pe-library: 0.4.1
@ -3038,6 +3188,12 @@ snapshots:
    dependencies:
      ansi-regex: 6.2.2

+  stubborn-fs@2.0.0:
+    dependencies:
+      stubborn-utils: 1.0.2
+
+  stubborn-utils@1.0.2: {}
+
  sumchecker@3.0.1:
    dependencies:
      debug: 4.4.3
@ -3052,6 +3208,8 @@ snapshots:
    dependencies:
      has-flag: 4.0.0

+  tagged-tag@1.0.0: {}
+
  tar@7.5.11:
    dependencies:
      '@isaacs/fs-minipass': 4.0.1
@ -3098,8 +3256,14 @@ snapshots:
  type-fest@0.13.1:
    optional: true

+  type-fest@5.5.0:
+    dependencies:
+      tagged-tag: 1.0.0
+
  typescript@5.9.3: {}

+  uint8array-extras@1.5.0: {}
+
  undici-types@7.16.0: {}

  undici-types@7.18.2: {}
@ -3145,6 +3309,8 @@ snapshots:
    dependencies:
      defaults: 1.0.4

+  when-exit@2.1.5: {}
+
  which@2.0.2:
    dependencies:
      isexe: 2.0.0
--- a/surfsense_desktop/src/ipc/channels.ts
+++ b/surfsense_desktop/src/ipc/channels.ts
@ -17,4 +17,19 @@ export const IPC_CHANNELS = {
  DISMISS_SUGGESTION: 'dismiss-suggestion',
  SET_AUTOCOMPLETE_ENABLED: 'set-autocomplete-enabled',
  GET_AUTOCOMPLETE_ENABLED: 'get-autocomplete-enabled',
+  // Folder sync channels
+  FOLDER_SYNC_SELECT_FOLDER: 'folder-sync:select-folder',
+  FOLDER_SYNC_ADD_FOLDER: 'folder-sync:add-folder',
+  FOLDER_SYNC_REMOVE_FOLDER: 'folder-sync:remove-folder',
+  FOLDER_SYNC_GET_FOLDERS: 'folder-sync:get-folders',
+  FOLDER_SYNC_GET_STATUS: 'folder-sync:get-status',
+  FOLDER_SYNC_FILE_CHANGED: 'folder-sync:file-changed',
+  FOLDER_SYNC_WATCHER_READY: 'folder-sync:watcher-ready',
+  FOLDER_SYNC_PAUSE: 'folder-sync:pause',
+  FOLDER_SYNC_RESUME: 'folder-sync:resume',
+  FOLDER_SYNC_RENDERER_READY: 'folder-sync:renderer-ready',
+  FOLDER_SYNC_GET_PENDING_EVENTS: 'folder-sync:get-pending-events',
+  FOLDER_SYNC_ACK_EVENTS: 'folder-sync:ack-events',
+  BROWSE_FILES: 'browse:files',
+  READ_LOCAL_FILES: 'browse:read-local-files',
 } as const;
--- a/surfsense_desktop/src/ipc/handlers.ts
+++ b/surfsense_desktop/src/ipc/handlers.ts
@ -6,6 +6,20 @@ import {
  requestScreenRecording,
  restartApp,
 } from '../modules/permissions';
+import {
+  selectFolder,
+  addWatchedFolder,
+  removeWatchedFolder,
+  getWatchedFolders,
+  getWatcherStatus,
+  getPendingFileEvents,
+  acknowledgeFileEvents,
+  pauseWatcher,
+  resumeWatcher,
+  markRendererReady,
+  browseFiles,
+  readLocalFiles,
+} from '../modules/folder-watcher';

 export function registerIpcHandlers(): void {
  ipcMain.on(IPC_CHANNELS.OPEN_EXTERNAL, (_event, url: string) => {
@ -38,4 +52,41 @@ export function registerIpcHandlers(): void {
  ipcMain.handle(IPC_CHANNELS.RESTART_APP, () => {
    restartApp();
  });
+
+  // Folder sync handlers
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_SELECT_FOLDER, () => selectFolder());
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_ADD_FOLDER, (_event, config) =>
+    addWatchedFolder(config)
+  );
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_REMOVE_FOLDER, (_event, folderPath: string) =>
+    removeWatchedFolder(folderPath)
+  );
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_GET_FOLDERS, () => getWatchedFolders());
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_GET_STATUS, () => getWatcherStatus());
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_PAUSE, () => pauseWatcher());
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_RESUME, () => resumeWatcher());
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_RENDERER_READY, () => {
+    markRendererReady();
+  });
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_GET_PENDING_EVENTS, () =>
+    getPendingFileEvents()
+  );
+
+  ipcMain.handle(IPC_CHANNELS.FOLDER_SYNC_ACK_EVENTS, (_event, eventIds: string[]) =>
+    acknowledgeFileEvents(eventIds)
+  );
+
+  ipcMain.handle(IPC_CHANNELS.BROWSE_FILES, () => browseFiles());
+
+  ipcMain.handle(IPC_CHANNELS.READ_LOCAL_FILES, (_event, paths: string[]) =>
+    readLocalFiles(paths)
+  );
 }
--- a/surfsense_desktop/src/main.ts
+++ b/surfsense_desktop/src/main.ts
@ -7,6 +7,7 @@ import { setupAutoUpdater } from './modules/auto-updater';
 import { setupMenu } from './modules/menu';
 import { registerQuickAsk, unregisterQuickAsk } from './modules/quick-ask';
 import { registerAutocomplete, unregisterAutocomplete } from './modules/autocomplete';
+import { registerFolderWatcher, unregisterFolderWatcher } from './modules/folder-watcher';
 import { registerIpcHandlers } from './ipc/handlers';

 registerGlobalErrorHandlers();
@ -30,6 +31,7 @@ app.whenReady().then(async () => {
  createMainWindow('/dashboard');
  registerQuickAsk();
  registerAutocomplete();
+  registerFolderWatcher();
  setupAutoUpdater();

  handlePendingDeepLink();
@ -50,4 +52,5 @@ app.on('window-all-closed', () => {
 app.on('will-quit', () => {
  unregisterQuickAsk();
  unregisterAutocomplete();
+  unregisterFolderWatcher();
 });
--- a/surfsense_desktop/src/modules/folder-watcher.ts
+++ b/surfsense_desktop/src/modules/folder-watcher.ts
@ -0,0 +1,534 @@
+import { BrowserWindow, dialog } from 'electron';
+import chokidar, { type FSWatcher } from 'chokidar';
+import { randomUUID } from 'crypto';
+import * as path from 'path';
+import * as fs from 'fs';
+import { IPC_CHANNELS } from '../ipc/channels';
+
+export interface WatchedFolderConfig {
+  path: string;
+  name: string;
+  excludePatterns: string[];
+  fileExtensions: string[] | null;
+  rootFolderId: number | null;
+  searchSpaceId: number;
+  active: boolean;
+}
+
+interface WatcherEntry {
+  config: WatchedFolderConfig;
+  watcher: FSWatcher | null;
+}
+
+type MtimeMap = Record<string, number>;
+type FolderSyncAction = 'add' | 'change' | 'unlink';
+
+export interface FolderSyncFileChangedEvent {
+  id: string;
+  rootFolderId: number | null;
+  searchSpaceId: number;
+  folderPath: string;
+  folderName: string;
+  relativePath: string;
+  fullPath: string;
+  action: FolderSyncAction;
+  timestamp: number;
+}
+
+const STORE_KEY = 'watchedFolders';
+const OUTBOX_STORE_KEY = 'events';
+const MTIME_TOLERANCE_S = 1.0;
+
+let store: any = null;
+let mtimeStore: any = null;
+let outboxStore: any = null;
+let watchers: Map<string, WatcherEntry> = new Map();
+
+/**
+ * In-memory cache of mtime maps, keyed by folder path.
+ * Persisted to electron-store on mutation.
+ */
+const mtimeMaps: Map<string, MtimeMap> = new Map();
+
+let rendererReady = false;
+const outboxEvents: Map<string, FolderSyncFileChangedEvent> = new Map();
+let outboxLoaded = false;
+
+export function markRendererReady() {
+  rendererReady = true;
+}
+
+async function getStore() {
+  if (!store) {
+    const { default: Store } = await import('electron-store');
+    store = new Store({
+      name: 'folder-watcher',
+      defaults: {
+        [STORE_KEY]: [] as WatchedFolderConfig[],
+      },
+    });
+  }
+  return store;
+}
+
+async function getMtimeStore() {
+  if (!mtimeStore) {
+    const { default: Store } = await import('electron-store');
+    mtimeStore = new Store({
+      name: 'folder-mtime-maps',
+      defaults: {} as Record<string, MtimeMap>,
+    });
+  }
+  return mtimeStore;
+}
+
+async function getOutboxStore() {
+  if (!outboxStore) {
+    const { default: Store } = await import('electron-store');
+    outboxStore = new Store({
+      name: 'folder-sync-outbox',
+      defaults: {
+        [OUTBOX_STORE_KEY]: [] as FolderSyncFileChangedEvent[],
+      },
+    });
+  }
+  return outboxStore;
+}
+
+function makeEventKey(event: Pick<FolderSyncFileChangedEvent, 'folderPath' | 'relativePath'>): string {
+  return `${event.folderPath}:${event.relativePath}`;
+}
+
+function persistOutbox() {
+  getOutboxStore().then((s) => {
+    s.set(OUTBOX_STORE_KEY, Array.from(outboxEvents.values()));
+  });
+}
+
+async function loadOutbox() {
+  if (outboxLoaded) return;
+  const s = await getOutboxStore();
+  const stored: FolderSyncFileChangedEvent[] = s.get(OUTBOX_STORE_KEY, []);
+  outboxEvents.clear();
+  for (const event of stored) {
+    if (!event?.id || !event.folderPath || !event.relativePath) continue;
+    outboxEvents.set(makeEventKey(event), event);
+  }
+  outboxLoaded = true;
+}
+
+function sendFileChangedEvent(
+  data: Omit<FolderSyncFileChangedEvent, 'id'>
+) {
+  const event: FolderSyncFileChangedEvent = {
+    id: randomUUID(),
+    ...data,
+  };
+
+  outboxEvents.set(makeEventKey(event), event);
+  persistOutbox();
+
+  if (rendererReady) {
+    sendToRenderer(IPC_CHANNELS.FOLDER_SYNC_FILE_CHANGED, event);
+  }
+}
+
+function loadMtimeMap(folderPath: string): MtimeMap {
+  return mtimeMaps.get(folderPath) ?? {};
+}
+
+function persistMtimeMap(folderPath: string) {
+  const map = mtimeMaps.get(folderPath) ?? {};
+  getMtimeStore().then((s) => s.set(folderPath, map));
+}
+
+function walkFolderMtimes(config: WatchedFolderConfig): MtimeMap {
+  const root = config.path;
+  const result: MtimeMap = {};
+  const excludes = new Set(config.excludePatterns);
+
+  function walk(dir: string) {
+    let entries: fs.Dirent[];
+    try {
+      entries = fs.readdirSync(dir, { withFileTypes: true });
+    } catch {
+      return;
+    }
+
+    for (const entry of entries) {
+      const name = entry.name;
+
+      if (name.startsWith('.') || excludes.has(name)) continue;
+
+      const full = path.join(dir, name);
+
+      if (entry.isDirectory()) {
+        walk(full);
+      } else if (entry.isFile()) {
+        if (
+          config.fileExtensions &&
+          config.fileExtensions.length > 0
+        ) {
+          const ext = path.extname(name).toLowerCase();
+          if (!config.fileExtensions.includes(ext)) continue;
+        }
+
+        try {
+          const stat = fs.statSync(full);
+          const rel = path.relative(root, full);
+          result[rel] = stat.mtimeMs;
+        } catch {
+          // File may have been removed between readdir and stat
+        }
+      }
+    }
+  }
+
+  walk(root);
+  return result;
+}
+
+function getMainWindow(): BrowserWindow | null {
+  const windows = BrowserWindow.getAllWindows();
+  return windows.length > 0 ? windows[0] : null;
+}
+
+function sendToRenderer(channel: string, data: any) {
+  const win = getMainWindow();
+  if (win && !win.isDestroyed()) {
+    win.webContents.send(channel, data);
+  }
+}
+
+async function startWatcher(config: WatchedFolderConfig) {
+  if (watchers.has(config.path)) {
+    return;
+  }
+
+  const ms = await getMtimeStore();
+  const storedMap: MtimeMap = ms.get(config.path) ?? {};
+  mtimeMaps.set(config.path, { ...storedMap });
+
+  const ignored = [
+    /(^|[/\\])\../, // dotfiles by default
+    ...config.excludePatterns.map((p) => `**/${p}/**`),
+  ];
+
+  const watcher = chokidar.watch(config.path, {
+    persistent: true,
+    ignoreInitial: true,
+    awaitWriteFinish: {
+      stabilityThreshold: 500,
+      pollInterval: 100,
+    },
+    ignored,
+  });
+
+  let ready = false;
+
+  watcher.on('ready', () => {
+    ready = true;
+
+    const currentMap = walkFolderMtimes(config);
+    const storedSnapshot = loadMtimeMap(config.path);
+    const now = Date.now();
+
+    // Track which files are unchanged so we can selectively update the mtime map
+    const unchangedMap: MtimeMap = {};
+
+    for (const [rel, currentMtime] of Object.entries(currentMap)) {
+      const storedMtime = storedSnapshot[rel];
+      if (storedMtime === undefined) {
+        sendFileChangedEvent({
+          rootFolderId: config.rootFolderId,
+          searchSpaceId: config.searchSpaceId,
+          folderPath: config.path,
+          folderName: config.name,
+          relativePath: rel,
+          fullPath: path.join(config.path, rel),
+          action: 'add',
+          timestamp: now,
+        });
+      } else if (Math.abs(currentMtime - storedMtime) >= MTIME_TOLERANCE_S * 1000) {
+        sendFileChangedEvent({
+          rootFolderId: config.rootFolderId,
+          searchSpaceId: config.searchSpaceId,
+          folderPath: config.path,
+          folderName: config.name,
+          relativePath: rel,
+          fullPath: path.join(config.path, rel),
+          action: 'change',
+          timestamp: now,
+        });
+      } else {
+        unchangedMap[rel] = currentMtime;
+      }
+    }
+
+    for (const rel of Object.keys(storedSnapshot)) {
+      if (!(rel in currentMap)) {
+        sendFileChangedEvent({
+          rootFolderId: config.rootFolderId,
+          searchSpaceId: config.searchSpaceId,
+          folderPath: config.path,
+          folderName: config.name,
+          relativePath: rel,
+          fullPath: path.join(config.path, rel),
+          action: 'unlink',
+          timestamp: now,
+        });
+      }
+    }
+
+    // Only update the mtime map for unchanged files; changed files keep their
+    // stored mtime so they'll be re-detected if the app crashes before indexing.
+    mtimeMaps.set(config.path, unchangedMap);
+    persistMtimeMap(config.path);
+
+    sendToRenderer(IPC_CHANNELS.FOLDER_SYNC_WATCHER_READY, {
+      rootFolderId: config.rootFolderId,
+      folderPath: config.path,
+    });
+  });
+
+  const handleFileEvent = (filePath: string, action: FolderSyncAction) => {
+    if (!ready) return;
+
+    const relativePath = path.relative(config.path, filePath);
+
+    if (
+      config.fileExtensions &&
+      config.fileExtensions.length > 0
+    ) {
+      const ext = path.extname(filePath).toLowerCase();
+      if (!config.fileExtensions.includes(ext)) return;
+    }
+
+    const map = mtimeMaps.get(config.path);
+    if (map) {
+      if (action === 'unlink') {
+        delete map[relativePath];
+      } else {
+        try {
+          map[relativePath] = fs.statSync(filePath).mtimeMs;
+        } catch {
+          // File may have been removed between event and stat
+        }
+      }
+      persistMtimeMap(config.path);
+    }
+
+    sendFileChangedEvent({
+      rootFolderId: config.rootFolderId,
+      searchSpaceId: config.searchSpaceId,
+      folderPath: config.path,
+      folderName: config.name,
+      relativePath,
+      fullPath: filePath,
+      action,
+      timestamp: Date.now(),
+    });
+  };
+
+  watcher.on('add', (fp) => handleFileEvent(fp, 'add'));
+  watcher.on('change', (fp) => handleFileEvent(fp, 'change'));
+  watcher.on('unlink', (fp) => handleFileEvent(fp, 'unlink'));
+
+  watchers.set(config.path, { config, watcher });
+}
+
+function stopWatcher(folderPath: string) {
+  persistMtimeMap(folderPath);
+  const entry = watchers.get(folderPath);
+  if (entry?.watcher) {
+    entry.watcher.close();
+  }
+  watchers.delete(folderPath);
+}
+
+export async function selectFolder(): Promise<string | null> {
+  const result = await dialog.showOpenDialog({
+    properties: ['openDirectory'],
+    title: 'Select a folder to watch',
+  });
+  if (result.canceled || result.filePaths.length === 0) {
+    return null;
+  }
+  return result.filePaths[0];
+}
+
+export async function addWatchedFolder(
+  config: WatchedFolderConfig
+): Promise<WatchedFolderConfig[]> {
+  const s = await getStore();
+  const folders: WatchedFolderConfig[] = s.get(STORE_KEY, []);
+
+  const existing = folders.findIndex((f: WatchedFolderConfig) => f.path === config.path);
+  if (existing >= 0) {
+    folders[existing] = config;
+  } else {
+    folders.push(config);
+  }
+
+  s.set(STORE_KEY, folders);
+
+  if (config.active) {
+    await startWatcher(config);
+  }
+
+  return folders;
+}
+
+export async function removeWatchedFolder(
+  folderPath: string
+): Promise<WatchedFolderConfig[]> {
+  const s = await getStore();
+  const folders: WatchedFolderConfig[] = s.get(STORE_KEY, []);
+  const updated = folders.filter((f: WatchedFolderConfig) => f.path !== folderPath);
+  s.set(STORE_KEY, updated);
+
+  stopWatcher(folderPath);
+
+  mtimeMaps.delete(folderPath);
+  const ms = await getMtimeStore();
+  ms.delete(folderPath);
+
+  return updated;
+}
+
+export async function getWatchedFolders(): Promise<WatchedFolderConfig[]> {
+  const s = await getStore();
+  return s.get(STORE_KEY, []);
+}
+
+export async function getWatcherStatus(): Promise<
+  { path: string; active: boolean; watching: boolean }[]
+> {
+  const s = await getStore();
+  const folders: WatchedFolderConfig[] = s.get(STORE_KEY, []);
+  return folders.map((f: WatchedFolderConfig) => ({
+    path: f.path,
+    active: f.active,
+    watching: watchers.has(f.path),
+  }));
+}
+
+export async function getPendingFileEvents(): Promise<FolderSyncFileChangedEvent[]> {
+  await loadOutbox();
+  return Array.from(outboxEvents.values()).sort((a, b) => a.timestamp - b.timestamp);
+}
+
+export async function acknowledgeFileEvents(eventIds: string[]): Promise<{ acknowledged: number }> {
+  if (!eventIds || eventIds.length === 0) return { acknowledged: 0 };
+  await loadOutbox();
+
+  const ackSet = new Set(eventIds);
+  let acknowledged = 0;
+
+  for (const [key, event] of outboxEvents.entries()) {
+    if (ackSet.has(event.id)) {
+      outboxEvents.delete(key);
+      acknowledged += 1;
+    }
+  }
+
+  if (acknowledged > 0) {
+    persistOutbox();
+  }
+
+  return { acknowledged };
+}
+
+export async function pauseWatcher(): Promise<void> {
+  for (const [, entry] of watchers) {
+    if (entry.watcher) {
+      await entry.watcher.close();
+      entry.watcher = null;
+    }
+  }
+}
+
+export async function resumeWatcher(): Promise<void> {
+  for (const [, entry] of watchers) {
+    if (!entry.watcher && entry.config.active) {
+      await startWatcher(entry.config);
+    }
+  }
+}
+
+export async function registerFolderWatcher(): Promise<void> {
+  await loadOutbox();
+  const s = await getStore();
+  const folders: WatchedFolderConfig[] = s.get(STORE_KEY, []);
+
+  for (const config of folders) {
+    if (config.active && fs.existsSync(config.path)) {
+      await startWatcher(config);
+    }
+  }
+}
+
+export async function unregisterFolderWatcher(): Promise<void> {
+  for (const [folderPath] of watchers) {
+    stopWatcher(folderPath);
+  }
+  watchers.clear();
+}
+
+export async function browseFiles(): Promise<string[] | null> {
+  const result = await dialog.showOpenDialog({
+    properties: ['openFile', 'multiSelections'],
+    title: 'Select files',
+  });
+  if (result.canceled || result.filePaths.length === 0) return null;
+  return result.filePaths;
+}
+
+const MIME_MAP: Record<string, string> = {
+  '.pdf': 'application/pdf',
+  '.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
+  '.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
+  '.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
+  '.html': 'text/html', '.htm': 'text/html',
+  '.csv': 'text/csv',
+  '.txt': 'text/plain',
+  '.md': 'text/markdown', '.markdown': 'text/markdown',
+  '.mp3': 'audio/mpeg', '.mpeg': 'audio/mpeg', '.mpga': 'audio/mpeg',
+  '.mp4': 'audio/mp4', '.m4a': 'audio/mp4',
+  '.wav': 'audio/wav',
+  '.webm': 'audio/webm',
+  '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
+  '.png': 'image/png',
+  '.bmp': 'image/bmp',
+  '.webp': 'image/webp',
+  '.tiff': 'image/tiff',
+  '.doc': 'application/msword',
+  '.rtf': 'application/rtf',
+  '.xml': 'application/xml',
+  '.epub': 'application/epub+zip',
+  '.xls': 'application/vnd.ms-excel',
+  '.ppt': 'application/vnd.ms-powerpoint',
+  '.eml': 'message/rfc822',
+  '.odt': 'application/vnd.oasis.opendocument.text',
+  '.msg': 'application/vnd.ms-outlook',
+};
+
+export interface LocalFileData {
+  name: string;
+  data: ArrayBuffer;
+  mimeType: string;
+  size: number;
+}
+
+export function readLocalFiles(filePaths: string[]): LocalFileData[] {
+  return filePaths.map((p) => {
+    const buf = fs.readFileSync(p);
+    const ext = path.extname(p).toLowerCase();
+    return {
+      name: path.basename(p),
+      data: buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength),
+      mimeType: MIME_MAP[ext] || 'application/octet-stream',
+      size: buf.byteLength,
+    };
+  });
+}
--- a/surfsense_desktop/src/preload.ts
+++ b/surfsense_desktop/src/preload.ts
@ -38,4 +38,34 @@ contextBridge.exposeInMainWorld('electronAPI', {
  dismissSuggestion: () => ipcRenderer.invoke(IPC_CHANNELS.DISMISS_SUGGESTION),
  setAutocompleteEnabled: (enabled: boolean) => ipcRenderer.invoke(IPC_CHANNELS.SET_AUTOCOMPLETE_ENABLED, enabled),
  getAutocompleteEnabled: () => ipcRenderer.invoke(IPC_CHANNELS.GET_AUTOCOMPLETE_ENABLED),
+
+  // Folder sync
+  selectFolder: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_SELECT_FOLDER),
+  addWatchedFolder: (config: any) => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_ADD_FOLDER, config),
+  removeWatchedFolder: (folderPath: string) => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_REMOVE_FOLDER, folderPath),
+  getWatchedFolders: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_GET_FOLDERS),
+  getWatcherStatus: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_GET_STATUS),
+  onFileChanged: (callback: (data: any) => void) => {
+    const listener = (_event: unknown, data: any) => callback(data);
+    ipcRenderer.on(IPC_CHANNELS.FOLDER_SYNC_FILE_CHANGED, listener);
+    return () => {
+      ipcRenderer.removeListener(IPC_CHANNELS.FOLDER_SYNC_FILE_CHANGED, listener);
+    };
+  },
+  onWatcherReady: (callback: (data: any) => void) => {
+    const listener = (_event: unknown, data: any) => callback(data);
+    ipcRenderer.on(IPC_CHANNELS.FOLDER_SYNC_WATCHER_READY, listener);
+    return () => {
+      ipcRenderer.removeListener(IPC_CHANNELS.FOLDER_SYNC_WATCHER_READY, listener);
+    };
+  },
+  pauseWatcher: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_PAUSE),
+  resumeWatcher: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_RESUME),
+  signalRendererReady: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_RENDERER_READY),
+  getPendingFileEvents: () => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_GET_PENDING_EVENTS),
+  acknowledgeFileEvents: (eventIds: string[]) => ipcRenderer.invoke(IPC_CHANNELS.FOLDER_SYNC_ACK_EVENTS, eventIds),
+
+  // Browse files via native dialog
+  browseFiles: () => ipcRenderer.invoke(IPC_CHANNELS.BROWSE_FILES),
+  readLocalFiles: (paths: string[]) => ipcRenderer.invoke(IPC_CHANNELS.READ_LOCAL_FILES, paths),
 });
--- a/surfsense_web/app/(home)/changelog/page.tsx
+++ b/surfsense_web/app/(home)/changelog/page.tsx
@ -29,7 +29,7 @@ interface ChangelogPageItem {

 export default async function ChangelogPage() {
 	const allPages = source.getPages() as ChangelogPageItem[];
-	const sortedChangelogs = allPages.sort((a, b) => {
+	const sortedChangelogs = allPages.toSorted((a, b) => {
 		const dateA = new Date(a.data.date).getTime();
 		const dateB = new Date(b.data.date).getTime();
 		return dateB - dateA;
--- a/surfsense_web/app/(home)/login/LocalLoginForm.tsx
+++ b/surfsense_web/app/(home)/login/LocalLoginForm.tsx
@ -160,10 +160,10 @@ export function LocalLoginForm() {
 						placeholder="you@example.com"
 						value={username}
 						onChange={(e) => setUsername(e.target.value)}
-						className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-offset-2 bg-background text-foreground transition-all ${
+						className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-1 bg-background text-foreground transition-all ${
 							error.title
-								? "border-destructive focus:border-destructive focus:ring-destructive"
-								: "border-border focus:border-primary focus:ring-primary"
+								? "border-destructive focus:border-destructive focus:ring-destructive/40"
+								: "border-border focus:border-primary focus:ring-primary/40"
 						}`}
 						disabled={isLoggingIn}
 					/>
@ -181,10 +181,10 @@ export function LocalLoginForm() {
 							placeholder="Enter your password"
 							value={password}
 							onChange={(e) => setPassword(e.target.value)}
-							className={`mt-1 block w-full rounded-md border pr-10 px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-offset-2 bg-background text-foreground transition-all ${
+							className={`mt-1 block w-full rounded-md border pr-10 px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-1 bg-background text-foreground transition-all ${
 								error.title
-									? "border-destructive focus:border-destructive focus:ring-destructive"
-									: "border-border focus:border-primary focus:ring-primary"
+									? "border-destructive focus:border-destructive focus:ring-destructive/40"
+									: "border-border focus:border-primary focus:ring-primary/40"
 							}`}
 							disabled={isLoggingIn}
 						/>
--- a/surfsense_web/app/(home)/login/page.tsx
+++ b/surfsense_web/app/(home)/login/page.tsx
@ -115,7 +115,7 @@ function LoginContent() {
 		<div className="relative w-full overflow-hidden">
 			<AmbientBackground />
 			<div className="mx-auto flex h-screen max-w-lg flex-col items-center justify-center">
-				<Logo className="h-16 w-16 md:h-32 md:w-32 rounded-md transition-all" />
+				<Logo priority className="h-16 w-16 md:h-32 md:w-32 rounded-md transition-all" />
 				<h1 className="mt-4 mb-6 text-xl font-bold text-neutral-800 dark:text-neutral-100 md:mt-8 md:mb-8 md:text-3xl lg:text-4xl transition-all">
 					{t("sign_in")}
 				</h1>
--- a/surfsense_web/app/(home)/register/page.tsx
+++ b/surfsense_web/app/(home)/register/page.tsx
@ -160,7 +160,7 @@ export default function RegisterPage() {
 		<div className="relative w-full overflow-hidden">
 			<AmbientBackground />
 			<div className="mx-auto flex h-screen max-w-lg flex-col items-center justify-center px-6 md:px-0">
-				<Logo className="h-16 w-16 md:h-32 md:w-32 rounded-md transition-all" />
+				<Logo priority className="h-16 w-16 md:h-32 md:w-32 rounded-md transition-all" />
 				<h1 className="mt-4 mb-6 text-xl font-bold text-neutral-800 dark:text-neutral-100 md:mt-8 md:mb-8 md:text-3xl lg:text-4xl transition-all">
 					{t("create_account")}
 				</h1>
@ -229,10 +229,7 @@ export default function RegisterPage() {
 						</AnimatePresence>

 						<div>
-							<label
-								htmlFor="email"
-								className="block text-sm font-medium text-gray-700 dark:text-gray-300"
-							>
+							<label htmlFor="email" className="block text-sm font-medium text-foreground">
 								{t("email")}
 							</label>
 							<input
@ -242,20 +239,17 @@ export default function RegisterPage() {
 								placeholder="you@example.com"
 								value={email}
 								onChange={(e) => setEmail(e.target.value)}
-								className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-offset-2 dark:bg-gray-800 dark:text-white transition-all ${
+								className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-1 bg-background text-foreground transition-all ${
 									error.title
-										? "border-red-300 focus:border-red-500 focus:ring-red-500 dark:border-red-700"
-										: "border-gray-300 focus:border-blue-500 focus:ring-blue-500 dark:border-gray-700"
+										? "border-destructive focus:border-destructive focus:ring-destructive/40"
+										: "border-border focus:border-primary focus:ring-primary/40"
 								}`}
 								disabled={isRegistering}
 							/>
 						</div>

 						<div>
-							<label
-								htmlFor="password"
-								className="block text-sm font-medium text-gray-700 dark:text-gray-300"
-							>
+							<label htmlFor="password" className="block text-sm font-medium text-foreground">
 								{t("password")}
 							</label>
 							<input
@ -265,10 +259,10 @@ export default function RegisterPage() {
 								placeholder="Enter your password"
 								value={password}
 								onChange={(e) => setPassword(e.target.value)}
-								className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-offset-2 dark:bg-gray-800 dark:text-white transition-all ${
+								className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-1 bg-background text-foreground transition-all ${
 									error.title
-										? "border-red-300 focus:border-red-500 focus:ring-red-500 dark:border-red-700"
-										: "border-gray-300 focus:border-blue-500 focus:ring-blue-500 dark:border-gray-700"
+										? "border-destructive focus:border-destructive focus:ring-destructive/40"
+										: "border-border focus:border-primary focus:ring-primary/40"
 								}`}
 								disabled={isRegistering}
 							/>
@ -277,7 +271,7 @@ export default function RegisterPage() {
 						<div>
 							<label
 								htmlFor="confirmPassword"
-								className="block text-sm font-medium text-gray-700 dark:text-gray-300"
+								className="block text-sm font-medium text-foreground"
 							>
 								{t("confirm_password")}
 							</label>
@ -288,10 +282,10 @@ export default function RegisterPage() {
 								placeholder="Confirm your password"
 								value={confirmPassword}
 								onChange={(e) => setConfirmPassword(e.target.value)}
-								className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-offset-2 dark:bg-gray-800 dark:text-white transition-all ${
+								className={`mt-1 block w-full rounded-md border px-3 py-1.5 md:py-2 shadow-sm focus:outline-none focus:ring-1 bg-background text-foreground transition-all ${
 									error.title
-										? "border-red-300 focus:border-red-500 focus:ring-red-500 dark:border-red-700"
-										: "border-gray-300 focus:border-blue-500 focus:ring-blue-500 dark:border-gray-700"
+										? "border-destructive focus:border-destructive focus:ring-destructive/40"
+										: "border-border focus:border-primary focus:ring-primary/40"
 								}`}
 								disabled={isRegistering}
 							/>
@ -300,7 +294,7 @@ export default function RegisterPage() {
 						<button
 							type="submit"
 							disabled={isRegistering}
-							className="relative w-full rounded-md bg-blue-600 px-4 py-1.5 md:py-2 text-white shadow-sm hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-50 transition-all text-sm md:text-base flex items-center justify-center gap-2"
+							className="relative w-full rounded-md bg-primary px-4 py-1.5 md:py-2 text-primary-foreground shadow-sm hover:bg-primary/90 focus:outline-none focus:ring-1 focus:ring-primary/40 disabled:cursor-not-allowed disabled:opacity-50 transition-all text-sm md:text-base flex items-center justify-center gap-2"
 						>
 							<span className={isRegistering ? "invisible" : ""}>{t("register")}</span>
 							{isRegistering && (
@ -312,12 +306,9 @@ export default function RegisterPage() {
 					</form>

 					<div className="mt-4 text-center text-sm">
-						<p className="text-gray-600 dark:text-gray-400">
+						<p className="text-muted-foreground">
 							{t("already_have_account")}{" "}
-							<Link
-								href="/login"
-								className="font-medium text-blue-600 hover:text-blue-500 dark:text-blue-400"
-							>
+							<Link href="/login" className="font-medium text-primary hover:text-primary/90">
 								{t("sign_in")}
 							</Link>
 						</p>
--- a/surfsense_web/app/dashboard/[search_space_id]/client-layout.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/client-layout.tsx
@ -17,6 +17,7 @@ import { DocumentUploadDialogProvider } from "@/components/assistant-ui/document
 import { LayoutDataProvider } from "@/components/layout";
 import { OnboardingTour } from "@/components/onboarding-tour";
 import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card";
+import { useFolderSync } from "@/hooks/use-folder-sync";
 import { useGlobalLoadingEffect } from "@/hooks/use-global-loading";

 export function DashboardClientLayout({
@ -159,6 +160,9 @@ export function DashboardClientLayout({
 	// Use global loading screen - spinner animation won't reset
 	useGlobalLoadingEffect(shouldShowLoading);

+	// Wire desktop app file watcher -> single-file re-index API
+	useFolderSync();
+
 	if (shouldShowLoading) {
 		return null;
 	}
--- a/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentTypeIcon.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentTypeIcon.tsx
@ -35,6 +35,7 @@ export function getDocumentTypeLabel(type: string): string {
 		BOOKSTACK_CONNECTOR: "BookStack",
 		CIRCLEBACK: "Circleback",
 		OBSIDIAN_CONNECTOR: "Obsidian",
+		LOCAL_FOLDER_FILE: "Local Folder",
 		SURFSENSE_DOCS: "SurfSense Docs",
 		NOTE: "Note",
 		COMPOSIO_GOOGLE_DRIVE_CONNECTOR: "Composio Google Drive",
--- a/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentsTableShell.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentsTableShell.tsx
@ -267,12 +267,23 @@ export function DocumentsTableShell({
 	const [metadataJson, setMetadataJson] = useState<Record<string, unknown> | null>(null);
 	const [metadataLoading, setMetadataLoading] = useState(false);
 	const [previewScrollPos, setPreviewScrollPos] = useState<"top" | "middle" | "bottom">("top");
+	const previewRafRef = useRef<number>();
 	const handlePreviewScroll = useCallback((e: React.UIEvent<HTMLDivElement>) => {
 		const el = e.currentTarget;
-		const atTop = el.scrollTop <= 2;
-		const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight <= 2;
-		setPreviewScrollPos(atTop ? "top" : atBottom ? "bottom" : "middle");
+		if (previewRafRef.current) return;
+		previewRafRef.current = requestAnimationFrame(() => {
+			const atTop = el.scrollTop <= 2;
+			const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight <= 2;
+			setPreviewScrollPos(atTop ? "top" : atBottom ? "bottom" : "middle");
+			previewRafRef.current = undefined;
+		});
 	}, []);
+	useEffect(
+		() => () => {
+			if (previewRafRef.current) cancelAnimationFrame(previewRafRef.current);
+		},
+		[]
+	);

 	const [deleteDoc, setDeleteDoc] = useState<Document | null>(null);
 	const [isDeleting, setIsDeleting] = useState(false);
@ -329,14 +340,15 @@ export function DocumentsTableShell({

 	const handleViewDocument = useCallback(async (doc: Document) => {
 		setViewingDoc(doc);
-		if (doc.content) {
-			setViewingContent(doc.content);
+		const preview = doc.content_preview || doc.content;
+		if (preview) {
+			setViewingContent(preview);
 			return;
 		}
 		setViewingLoading(true);
 		try {
 			const fullDoc = await documentsApiService.getDocument({ id: doc.id });
-			setViewingContent(fullDoc.content);
+			setViewingContent(fullDoc.content_preview || fullDoc.content);
 		} catch (err) {
 			console.error("[DocumentsTableShell] Failed to fetch document content:", err);
 			setViewingContent("Failed to load document content.");
@ -630,7 +642,7 @@ export function DocumentsTableShell({
 									return (
 										<tr
 											key={doc.id}
-											className={`group border-b border-border/50 transition-colors ${
+											className={`list-item-lazy group border-b border-border/50 transition-colors ${
 												isMentioned ? "bg-primary/5 hover:bg-primary/8" : "hover:bg-muted/30"
 											} ${canInteract && hasChatMode ? "cursor-pointer" : ""}`}
 											onClick={handleRowClick}
@ -748,6 +760,7 @@ export function DocumentsTableShell({
 																	onClick={() =>
 																		onOpenInTab ? onOpenInTab(doc) : handleViewDocument(doc)
 																	}
+																	disabled={isBeingProcessed}
 																>
 																	<Eye className="h-4 w-4" />
 																	Open
@ -871,7 +884,7 @@ export function DocumentsTableShell({
 						return (
 							<MobileCardWrapper key={doc.id} onLongPress={() => setMobileActionDoc(doc)}>
 								<div
-									className={`relative px-3 py-2 transition-colors ${
+									className={`list-item-lazy relative px-3 py-2 transition-colors ${
 										isMentioned ? "bg-primary/5" : "hover:bg-muted/20"
 									} ${canInteract && hasChatMode ? "cursor-pointer" : ""}`}
 								>
@ -951,7 +964,30 @@ export function DocumentsTableShell({
 								<Spinner size="lg" className="text-muted-foreground" />
 							</div>
 						) : (
-							<MarkdownViewer content={viewingContent} />
+							<>
+								<MarkdownViewer content={viewingContent} maxLength={50_000} />
+								{viewingDoc && (
+									<div className="mt-4 flex justify-center">
+										<Button
+											variant="outline"
+											size="sm"
+											onClick={() => {
+												if (viewingDoc) {
+													openEditor({
+														documentId: viewingDoc.id,
+														searchSpaceId: Number(searchSpaceId),
+														title: viewingDoc.title,
+													});
+													handleCloseViewer();
+												}
+											}}
+										>
+											<Eye className="h-3.5 w-3.5 mr-1.5" />
+											View full document
+										</Button>
+									</div>
+								)}
+							</>
 						)}
 					</div>
 				</DrawerContent>
@ -1020,6 +1056,10 @@ export function DocumentsTableShell({
 						<Button
 							variant="secondary"
 							className="justify-start gap-2"
+							disabled={
+								mobileActionDoc?.status?.state === "pending" ||
+								mobileActionDoc?.status?.state === "processing"
+							}
 							onClick={() => {
 								if (mobileActionDoc) handleViewDocument(mobileActionDoc);
 								setMobileActionDoc(null);
--- a/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/types.ts
+++ b/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/components/types.ts
@ -9,9 +9,9 @@ export type Document = {
 	id: number;
 	title: string;
 	document_type: DocumentType;
-	// Optional: Only needed when viewing document details (lazy loaded)
 	document_metadata?: any;
 	content?: string;
+	content_preview?: string;
 	created_at: string;
 	search_space_id: number;
 	created_by_id?: string | null;
--- a/surfsense_web/app/dashboard/[search_space_id]/new-chat/[[...chat_id]]/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/new-chat/[[...chat_id]]/page.tsx
@ -8,7 +8,7 @@ import {
 } from "@assistant-ui/react";
 import { useQueryClient } from "@tanstack/react-query";
 import { useAtomValue, useSetAtom } from "jotai";
-import { useParams, useSearchParams } from "next/navigation";
+import { useParams } from "next/navigation";
 import { useCallback, useEffect, useMemo, useRef, useState } from "react";
 import { toast } from "sonner";
 import { z } from "zod";
@ -228,13 +228,14 @@ export default function NewChatPage() {
 					return prev;
 				}

+				const memberById = new Map(membersData?.map((m) => [m.user_id, m]) ?? []);
+				const prevById = new Map(prev.map((m) => [m.id, m]));
+
 				return syncedMessages.map((msg) => {
-					const member = msg.author_id
-						? membersData?.find((m) => m.user_id === msg.author_id)
-						: null;
+					const member = msg.author_id ? (memberById.get(msg.author_id) ?? null) : null;

 					// Preserve existing author info if member lookup fails (e.g., cloned chats)
-					const existingMsg = prev.find((m) => m.id === `msg-${msg.id}`);
+					const existingMsg = prevById.get(`msg-${msg.id}`);
 					const existingAuthor = existingMsg?.metadata?.custom?.author as
 						| { displayName?: string | null; avatarUrl?: string | null }
 						| undefined;
@ -388,22 +389,32 @@ export default function NewChatPage() {
 	}, [searchSpaceId, queryClient]);

 	// Handle scroll to comment from URL query params (e.g., from inbox item click)
-	const searchParams = useSearchParams();
-	const targetCommentIdParam = searchParams.get("commentId");
-
-	// Set target comment ID from URL param - the AssistantMessage and CommentItem
-	// components will handle scrolling and highlighting once comments are loaded
+	// Read from window.location.search inside the effect instead of subscribing via
+	// useSearchParams() — avoids re-rendering this heavy component tree on every
+	// unrelated query-string change. (Vercel Best Practice: rerender-defer-reads 5.2)
 	useEffect(() => {
-		if (targetCommentIdParam && !isInitializing) {
-			const commentId = Number.parseInt(targetCommentIdParam, 10);
-			if (!Number.isNaN(commentId)) {
-				setTargetCommentId(commentId);
+		const readAndApplyCommentId = () => {
+			const params = new URLSearchParams(window.location.search);
+			const raw = params.get("commentId");
+			if (raw && !isInitializing) {
+				const commentId = Number.parseInt(raw, 10);
+				if (!Number.isNaN(commentId)) {
+					setTargetCommentId(commentId);
+				}
 			}
-		}
+		};
+
+		readAndApplyCommentId();
+
+		// Also respond to SPA navigations (back/forward) that change the query string
+		window.addEventListener("popstate", readAndApplyCommentId);

 		// Cleanup on unmount or when navigating away
-		return () => clearTargetCommentId();
-	}, [targetCommentIdParam, isInitializing, setTargetCommentId, clearTargetCommentId]);
+		return () => {
+			window.removeEventListener("popstate", readAndApplyCommentId);
+			clearTargetCommentId();
+		};
+	}, [isInitializing, setTargetCommentId, clearTargetCommentId]);

 	// Sync current thread state to atom
 	useEffect(() => {
--- a/surfsense_web/app/dashboard/[search_space_id]/user-settings/components/CommunityPromptsContent.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/user-settings/components/CommunityPromptsContent.tsx
@ -60,7 +60,7 @@ export function CommunityPromptsContent() {

 			{list.length === 0 && (
 				<div className="rounded-lg border border-dashed border-border/60 p-8 text-center">
-					<Globe className="mx-auto size-8 text-muted-foreground/40" />
+					<Globe className="mx-auto size-8 text-muted-foreground" />
 					<p className="mt-2 text-sm text-muted-foreground">No community prompts yet</p>
 					<p className="text-xs text-muted-foreground/60">
 						Share your own prompts from the My Prompts tab
--- a/surfsense_web/app/dashboard/[search_space_id]/user-settings/components/PromptsContent.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/user-settings/components/PromptsContent.tsx
@ -1,7 +1,7 @@
 "use client";

 import { useAtomValue } from "jotai";
-import { AlertTriangle, Globe, Lock, PenLine, Plus, Sparkles, Trash2 } from "lucide-react";
+import { AlertTriangle, Globe, Lock, PenLine, Sparkles, Trash2 } from "lucide-react";
 import { useCallback, useState } from "react";
 import { toast } from "sonner";
 import {
@ -23,6 +23,7 @@ import {
 import { Button } from "@/components/ui/button";
 import { Input } from "@/components/ui/input";
 import { Label } from "@/components/ui/label";
+import { ShortcutKbd } from "@/components/ui/shortcut-kbd";
 import { Spinner } from "@/components/ui/spinner";
 import { Switch } from "@/components/ui/switch";
 import type { PromptRead } from "@/contracts/types/prompts.types";
@ -144,9 +145,8 @@ export function PromptsContent() {
 		<div className="space-y-6 min-w-0 overflow-hidden">
 			<div className="flex items-center justify-between">
 				<p className="text-sm text-muted-foreground">
-					Create prompt templates triggered with{" "}
-					<kbd className="rounded border bg-muted px-1.5 py-0.5 text-xs font-mono">/</kbd> in the
-					chat composer.
+					Create prompt templates triggered with <ShortcutKbd keys={["/"]} className="ml-0" /> in
+					the chat composer.
 				</p>
 				{!showForm && (
 					<Button
@ -158,7 +158,6 @@ export function PromptsContent() {
 						}}
 						className="shrink-0 gap-1.5"
 					>
-						<Plus className="size-3.5" />
 						New
 					</Button>
 				)}
--- a/surfsense_web/app/dashboard/page.tsx
+++ b/surfsense_web/app/dashboard/page.tsx
@ -3,7 +3,7 @@
 import { useAtomValue } from "jotai";
 import { AlertCircle, Plus, Search } from "lucide-react";
 import { motion } from "motion/react";
-import { useRouter, useSearchParams } from "next/navigation";
+import { useRouter } from "next/navigation";
 import { useTranslations } from "next-intl";
 import { useEffect, useState } from "react";
 import { searchSpacesAtom } from "@/atoms/search-spaces/search-space-query.atoms";
@ -89,7 +89,6 @@ function EmptyState({ onCreateClick }: { onCreateClick: () => void }) {

 export default function DashboardPage() {
 	const router = useRouter();
-	const searchParams = useSearchParams();
 	const [showCreateDialog, setShowCreateDialog] = useState(false);

 	const t = useTranslations("dashboard");
@ -99,11 +98,12 @@ export default function DashboardPage() {
 		if (isLoading) return;

 		if (searchSpaces.length > 0) {
-			const params = searchParams.toString();
-			const query = params ? `?${params}` : "";
+			// Read the query string at the time of redirect — no subscription needed.
+			// (Vercel Best Practice: rerender-defer-reads 5.2)
+			const query = window.location.search;
 			router.replace(`/dashboard/${searchSpaces[0].id}/new-chat${query}`);
 		}
-	}, [isLoading, searchSpaces, router, searchParams]);
+	}, [isLoading, searchSpaces, router]);

 	// Show loading while fetching or while we have spaces and are about to redirect
 	const shouldShowLoading = isLoading || searchSpaces.length > 0;
--- a/surfsense_web/app/docs/[[...slug]]/page.tsx
+++ b/surfsense_web/app/docs/[[...slug]]/page.tsx
@ -1,11 +1,16 @@
 import { DocsBody, DocsDescription, DocsPage, DocsTitle } from "fumadocs-ui/page";
 import { notFound } from "next/navigation";
+import { cache } from "react";
 import { source } from "@/lib/source";
 import { getMDXComponents } from "@/mdx-components";

+const getDocPage = cache((slug?: string[]) => {
+	return source.getPage(slug);
+});
+
 export default async function Page(props: { params: Promise<{ slug?: string[] }> }) {
 	const params = await props.params;
-	const page = source.getPage(params.slug);
+	const page = getDocPage(params.slug);
 	if (!page) notFound();

 	const MDX = page.data.body;
@ -37,7 +42,7 @@ export async function generateStaticParams() {

 export async function generateMetadata(props: { params: Promise<{ slug?: string[] }> }) {
 	const params = await props.params;
-	const page = source.getPage(params.slug);
+	const page = getDocPage(params.slug);
 	if (!page) notFound();

 	return {
--- a/surfsense_web/app/error.tsx
+++ b/surfsense_web/app/error.tsx
@ -1,6 +1,5 @@
 "use client";

-import posthog from "posthog-js";
 import { useEffect } from "react";

 export default function ErrorPage({
@ -11,7 +10,11 @@ export default function ErrorPage({
 	reset: () => void;
 }) {
 	useEffect(() => {
-		posthog.captureException(error);
+		import("posthog-js")
+			.then(({ default: posthog }) => {
+				posthog.captureException(error);
+			})
+			.catch(() => {});
 	}, [error]);

 	return (
--- a/surfsense_web/app/globals.css
+++ b/surfsense_web/app/globals.css
@ -246,6 +246,17 @@ button {
 	}
 }

+/* content-visibility utilities — skip layout/paint for off-screen list items */
+.list-item-lazy {
+	content-visibility: auto;
+	contain-intrinsic-size: 0 48px;
+}
+
+.sidebar-item-lazy {
+	content-visibility: auto;
+	contain-intrinsic-size: 0 40px;
+}
+
@source "../node_modules/@llamaindex/chat-ui/**/*.{ts,tsx}";
@source "../node_modules/streamdown/dist/*.js";
@source "../node_modules/@streamdown/code/dist/*.js";
--- a/surfsense_web/atoms/new-llm-config/new-llm-config-mutation.atoms.ts
+++ b/surfsense_web/atoms/new-llm-config/new-llm-config-mutation.atoms.ts
@ -34,7 +34,7 @@ export const createNewLLMConfigMutationAtom = atomWithMutation((get) => {
 			});
 		},
 		onError: (error: Error) => {
-			toast.error(error.message || "Failed to create LLM model");
+			toast.error(error.message || "Failed to create model");
 		},
 	};
 });
@ -109,10 +109,11 @@ export const updateLLMPreferencesMutationAtom = atomWithMutation((get) => {
 		mutationFn: async (request: UpdateLLMPreferencesRequest) => {
 			return newLLMConfigApiService.updateLLMPreferences(request);
 		},
-		onSuccess: () => {
-			queryClient.invalidateQueries({
-				queryKey: cacheKeys.newLLMConfigs.preferences(Number(searchSpaceId)),
-			});
+		onSuccess: (_data, request: UpdateLLMPreferencesRequest) => {
+			queryClient.setQueryData(
+				cacheKeys.newLLMConfigs.preferences(Number(searchSpaceId)),
+				(old: Record<string, unknown> | undefined) => ({ ...old, ...request.data })
+			);
 		},
 		onError: (error: Error) => {
 			toast.error(error.message || "Failed to update LLM preferences");
--- a/surfsense_web/atoms/new-llm-config/new-llm-config-query.atoms.ts
+++ b/surfsense_web/atoms/new-llm-config/new-llm-config-query.atoms.ts
@ -66,7 +66,7 @@ export const defaultSystemInstructionsAtom = atomWithQuery(() => {
 });

 /**
- * Query atom for the dynamic LLM model catalogue.
+ * Query atom for the dynamic model catalogue.
 * Fetched from the backend (which proxies OpenRouter's public API).
 * Falls back to the static hardcoded list on error.
 */
--- a/surfsense_web/components/Logo.tsx
+++ b/surfsense_web/components/Logo.tsx
@ -5,9 +5,11 @@ import { cn } from "@/lib/utils";
 export const Logo = ({
 	className,
 	disableLink = false,
+	priority = false,
 }: {
 	className?: string;
 	disableLink?: boolean;
+	priority?: boolean;
 }) => {
 	const image = (
 		<Image
@ -16,6 +18,7 @@ export const Logo = ({
 			alt="logo"
 			width={128}
 			height={128}
+			priority={priority}
 		/>
 	);

--- a/surfsense_web/components/TokenHandler.tsx
+++ b/surfsense_web/components/TokenHandler.tsx
@ -1,6 +1,5 @@
 "use client";

-import { useSearchParams } from "next/navigation";
 import { useEffect } from "react";
 import { useGlobalLoadingEffect } from "@/hooks/use-global-loading";
 import { getAndClearRedirectPath, setBearerToken, setRefreshToken } from "@/lib/auth-utils";
@ -26,8 +25,6 @@ const TokenHandler = ({
 	tokenParamName = "token",
 	storageKey = "surfsense_bearer_token",
 }: TokenHandlerProps) => {
-	const searchParams = useSearchParams();
-
 	// Always show loading for this component - spinner animation won't reset
 	useGlobalLoadingEffect(true);

@ -35,9 +32,13 @@ const TokenHandler = ({
 		// Only run on client-side
 		if (typeof window === "undefined") return;

-		// Get tokens from URL parameters
-		const token = searchParams.get(tokenParamName);
-		const refreshToken = searchParams.get("refresh_token");
+		// Read tokens from URL at mount time — no subscription needed.
+		// TokenHandler only runs once after an auth redirect, so a stale read
+		// is impossible and useSearchParams() would add a pointless subscription.
+		// (Vercel Best Practice: rerender-defer-reads 5.2)
+		const params = new URLSearchParams(window.location.search);
+		const token = params.get(tokenParamName);
+		const refreshToken = params.get("refresh_token");

 		if (token) {
 			try {
@ -74,7 +75,7 @@ const TokenHandler = ({
 				window.location.href = redirectPath;
 			}
 		}
-	}, [searchParams, tokenParamName, storageKey, redirectPath]);
+	}, [tokenParamName, storageKey, redirectPath]);

 	// Return null - the global provider handles the loading UI
 	return null;
--- a/surfsense_web/components/assistant-ui/connector-popup.tsx
+++ b/surfsense_web/components/assistant-ui/connector-popup.tsx
@ -11,7 +11,6 @@ import {
 } from "@/atoms/new-llm-config/new-llm-config-query.atoms";
 import { activeSearchSpaceIdAtom } from "@/atoms/search-spaces/search-space-query.atoms";
 import { searchSpaceSettingsDialogAtom } from "@/atoms/settings/settings-dialog.atoms";
-import { currentUserAtom } from "@/atoms/user/user-query.atoms";
 import { Alert, AlertDescription, AlertTitle } from "@/components/ui/alert";
 import { Button } from "@/components/ui/button";
 import { Dialog, DialogContent, DialogTitle } from "@/components/ui/dialog";
@ -47,7 +46,6 @@ export const ConnectorIndicator = forwardRef<ConnectorIndicatorHandle, Connector
 	(_props, ref) => {
 		const searchSpaceId = useAtomValue(activeSearchSpaceIdAtom);
 		const setSearchSpaceSettingsDialog = useSetAtom(searchSpaceSettingsDialogAtom);
-		useAtomValue(currentUserAtom);
 		const { data: preferences = {}, isFetching: preferencesLoading } =
 			useAtomValue(llmPreferencesAtom);
 		const { data: globalConfigs = [], isFetching: globalConfigsLoading } =
@ -376,14 +374,17 @@ export const ConnectorIndicator = forwardRef<ConnectorIndicatorHandle, Connector
 									<div className="px-4 sm:px-12 py-4 sm:py-8 pb-12 sm:pb-16">
 										{/* LLM Configuration Warning */}
 										{!llmConfigLoading && !hasDocumentSummaryLLM && (
-											<Alert variant="destructive" className="mb-6">
+											<Alert
+												variant="destructive"
+												className="mb-6 bg-muted/50 rounded-xl border-destructive/30"
+											>
 												<AlertTriangle className="h-4 w-4" />
 												<AlertTitle>LLM Configuration Required</AlertTitle>
 												<AlertDescription className="mt-2">
 													<p className="mb-3">
 														{isAutoMode && !hasGlobalConfigs
-															? "Auto mode is selected but no global LLM configurations are available. Please configure a custom LLM in Settings to process and summarize documents from your connected sources."
-															: "You need to configure a Document Summary LLM before adding connectors. This LLM is used to process and summarize documents from your connected sources."}
+															? "Auto mode requires a global LLM configuration. Please add one in Settings"
+															: "A Document Summary LLM is required to process uploads, configure one in Settings"}
 													</p>
 													<Button
 														size="sm"
--- a/surfsense_web/components/assistant-ui/connector-popup/connect-forms/index.tsx
+++ b/surfsense_web/components/assistant-ui/connector-popup/connect-forms/index.tsx
@ -58,7 +58,6 @@ export function getConnectFormComponent(connectorType: string): ConnectFormCompo
 			return MCPConnectForm;
 		case "OBSIDIAN_CONNECTOR":
 			return ObsidianConnectForm;
-		// Add other connector types here as needed
 		default:
 			return null;
 	}
--- a/surfsense_web/components/assistant-ui/connector-popup/connector-configs/components/circleback-config.tsx
+++ b/surfsense_web/components/assistant-ui/connector-popup/connector-configs/components/circleback-config.tsx
@ -34,9 +34,12 @@ export const CirclebackConfig: FC<CirclebackConfigProps> = ({ connector, onNameC
 	const [isLoading, setIsLoading] = useState(true);
 	const [copied, setCopied] = useState(false);

+	// Fetch webhook info
 	// Fetch webhook info
 	useEffect(() => {
-		const fetchWebhookInfo = async () => {
+		const controller = new AbortController();
+
+		const doFetch = async () => {
 			if (!connector.search_space_id) return;

 			const baseUrl = process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL;
@ -49,8 +52,11 @@ export const CirclebackConfig: FC<CirclebackConfigProps> = ({ connector, onNameC
 			setIsLoading(true);
 			try {
 				const response = await authenticatedFetch(
-					`${baseUrl}/api/v1/webhooks/circleback/${connector.search_space_id}/info`
+					`${baseUrl}/api/v1/webhooks/circleback/${connector.search_space_id}/info`,
+					{ signal: controller.signal }
 				);
+				if (controller.signal.aborted) return;
+
 				if (response.ok) {
 					const data: unknown = await response.json();
 					// Runtime validation with zod schema
@ -59,16 +65,18 @@ export const CirclebackConfig: FC<CirclebackConfigProps> = ({ connector, onNameC
 					setWebhookUrl(validatedData.webhook_url);
 				}
 			} catch (error) {
+				if (controller.signal.aborted) return;
 				console.error("Failed to fetch webhook info:", error);
 				// Reset state on error
 				setWebhookInfo(null);
 				setWebhookUrl("");
 			} finally {
-				setIsLoading(false);
+				if (!controller.signal.aborted) setIsLoading(false);
 			}
 		};

-		fetchWebhookInfo();
+		doFetch().catch(() => {});
+		return () => controller.abort();
 	}, [connector.search_space_id]);

 	const handleNameChange = (value: string) => {
--- a/surfsense_web/components/assistant-ui/connector-popup/connector-configs/views/connector-edit-view.tsx
+++ b/surfsense_web/components/assistant-ui/connector-popup/connector-configs/views/connector-edit-view.tsx
@ -272,7 +272,7 @@ export const ConnectorEditView: FC<ConnectorEditViewProps> = ({
 								{/* AI Summary toggle */}
 								<SummaryConfig enabled={enableSummary} onEnabledChange={onEnableSummaryChange} />

-								{/* Date range selector - not shown for file-based connectors (Drive, Dropbox, OneDrive), Webcrawler, or GitHub (indexes full repo snapshots) */}
+								{/* Date range selector - not shown for file-based connectors (Drive, Dropbox, OneDrive), Webcrawler, GitHub, or Local Folder */}
 								{connector.connector_type !== "GOOGLE_DRIVE_CONNECTOR" &&
 									connector.connector_type !== "COMPOSIO_GOOGLE_DRIVE_CONNECTOR" &&
 									connector.connector_type !== "DROPBOX_CONNECTOR" &&
@ -293,9 +293,7 @@ export const ConnectorEditView: FC<ConnectorEditViewProps> = ({
 										/>
 									)}

-								{/* Periodic sync - shown for all indexable connectors */}
 								{(() => {
-									// Check if Google Drive (regular or Composio) has folders/files selected
 									const isGoogleDrive = connector.connector_type === "GOOGLE_DRIVE_CONNECTOR";
 									const isComposioGoogleDrive =
 										connector.connector_type === "COMPOSIO_GOOGLE_DRIVE_CONNECTOR";
--- a/surfsense_web/components/assistant-ui/connector-popup/connector-configs/views/indexing-configuration-view.tsx
+++ b/surfsense_web/components/assistant-ui/connector-popup/connector-configs/views/indexing-configuration-view.tsx
@ -158,7 +158,7 @@ export const IndexingConfigurationView: FC<IndexingConfigurationViewProps> = ({
 								{/* AI Summary toggle */}
 								<SummaryConfig enabled={enableSummary} onEnabledChange={onEnableSummaryChange} />

-								{/* Date range selector - not shown for file-based connectors (Drive, Dropbox, OneDrive), Webcrawler, or GitHub (indexes full repo snapshots) */}
+								{/* Date range selector - not shown for file-based connectors (Drive, Dropbox, OneDrive), Webcrawler, GitHub, or Local Folder */}
 								{config.connectorType !== "GOOGLE_DRIVE_CONNECTOR" &&
 									config.connectorType !== "COMPOSIO_GOOGLE_DRIVE_CONNECTOR" &&
 									config.connectorType !== "DROPBOX_CONNECTOR" &&
@ -179,9 +179,10 @@ export const IndexingConfigurationView: FC<IndexingConfigurationViewProps> = ({
 										/>
 									)}

-								{/* Periodic sync - not shown for Google Drive (regular and Composio) */}
 								{config.connectorType !== "GOOGLE_DRIVE_CONNECTOR" &&
-									config.connectorType !== "COMPOSIO_GOOGLE_DRIVE_CONNECTOR" && (
+									config.connectorType !== "COMPOSIO_GOOGLE_DRIVE_CONNECTOR" &&
+									config.connectorType !== "DROPBOX_CONNECTOR" &&
+									config.connectorType !== "ONEDRIVE_CONNECTOR" && (
 										<PeriodicSyncConfig
 											enabled={periodicEnabled}
 											frequencyMinutes={frequencyMinutes}
--- a/surfsense_web/components/assistant-ui/connector-popup/tabs/all-connectors-tab.tsx
+++ b/surfsense_web/components/assistant-ui/connector-popup/tabs/all-connectors-tab.tsx
@ -76,29 +76,26 @@ export const AllConnectorsTab: FC<AllConnectorsTabProps> = ({
 }) => {
 	// Check if self-hosted mode (for showing self-hosted only connectors)
 	const selfHosted = isSelfHosted();
+	const isDesktop = typeof window !== "undefined" && !!window.electronAPI;
+
+	const matchesSearch = (title: string, description: string) =>
+		title.toLowerCase().includes(searchQuery.toLowerCase()) ||
+		description.toLowerCase().includes(searchQuery.toLowerCase());
+
+	const passesDeploymentFilter = (c: { selfHostedOnly?: boolean; desktopOnly?: boolean }) =>
+		(!c.selfHostedOnly || selfHosted) && (!c.desktopOnly || isDesktop);

 	// Filter connectors based on search and deployment mode
 	const filteredOAuth = OAUTH_CONNECTORS.filter(
-		(c) =>
-			// Filter by search query
-			(c.title.toLowerCase().includes(searchQuery.toLowerCase()) ||
-				c.description.toLowerCase().includes(searchQuery.toLowerCase())) &&
-			// Filter self-hosted only connectors in cloud mode
-			(!("selfHostedOnly" in c) || !c.selfHostedOnly || selfHosted)
+		(c) => matchesSearch(c.title, c.description) && passesDeploymentFilter(c)
 	);

 	const filteredCrawlers = CRAWLERS.filter(
-		(c) =>
-			(c.title.toLowerCase().includes(searchQuery.toLowerCase()) ||
-				c.description.toLowerCase().includes(searchQuery.toLowerCase())) &&
-			(!("selfHostedOnly" in c) || !c.selfHostedOnly || selfHosted)
+		(c) => matchesSearch(c.title, c.description) && passesDeploymentFilter(c)
 	);

 	const filteredOther = OTHER_CONNECTORS.filter(
-		(c) =>
-			(c.title.toLowerCase().includes(searchQuery.toLowerCase()) ||
-				c.description.toLowerCase().includes(searchQuery.toLowerCase())) &&
-			(!("selfHostedOnly" in c) || !c.selfHostedOnly || selfHosted)
+		(c) => matchesSearch(c.title, c.description) && passesDeploymentFilter(c)
 	);

 	// Filter Composio connectors
--- a/surfsense_web/components/assistant-ui/document-upload-popup.tsx
+++ b/surfsense_web/components/assistant-ui/document-upload-popup.tsx
@ -125,38 +125,35 @@ const DocumentUploadPopupContent: FC<{
 				onPointerDownOutside={(e) => e.preventDefault()}
 				onInteractOutside={(e) => e.preventDefault()}
 				onEscapeKeyDown={(e) => e.preventDefault()}
-				className="select-none max-w-4xl w-[95vw] sm:w-full h-[calc(100dvh-2rem)] sm:h-[85vh] flex flex-col p-0 gap-0 overflow-hidden border border-border ring-0 bg-muted dark:bg-muted text-foreground [&>button]:right-3 sm:[&>button]:right-12 [&>button]:top-3 sm:[&>button]:top-10 [&>button]:opacity-80 hover:[&>button]:opacity-100 [&>button]:z-[100] [&>button_svg]:size-4 sm:[&>button_svg]:size-5"
+				className="select-none max-w-2xl w-[95vw] sm:w-[640px] h-[min(440px,75dvh)] sm:h-[min(500px,80vh)] flex flex-col p-0 gap-0 overflow-hidden border border-border ring-0 bg-muted dark:bg-muted text-foreground [&>button]:right-3 sm:[&>button]:right-6 [&>button]:top-3 sm:[&>button]:top-5 [&>button]:opacity-80 hover:[&>button]:opacity-100 [&>button]:z-[100] [&>button_svg]:size-4 sm:[&>button_svg]:size-5"
 			>
 				<DialogTitle className="sr-only">Upload Document</DialogTitle>

-				{/* Scrollable container for mobile */}
 				<div className="flex-1 min-h-0 overflow-y-auto overscroll-contain">
-					{/* Header - scrolls with content on mobile */}
-					<div className="sticky top-0 z-20 bg-muted px-4 sm:px-12 pt-4 sm:pt-10 pb-2 sm:pb-0">
-						{/* Upload header */}
-						<div className="flex items-center gap-2 sm:gap-4 mb-2 sm:mb-6">
-							<div className="flex-1 min-w-0 pr-8 sm:pr-0">
-								<h2 className="text-base sm:text-2xl font-semibold tracking-tight">
-									Upload Documents
-								</h2>
-								<p className="text-xs sm:text-base text-muted-foreground mt-0.5 sm:mt-1 line-clamp-1 sm:line-clamp-none">
-									Upload and sync your documents to your search space
-								</p>
-							</div>
+					<div className="sticky top-0 z-20 bg-muted px-4 sm:px-6 pt-4 sm:pt-5 pb-10">
+						<div className="flex items-center gap-2 mb-1 pr-8 sm:pr-0">
+							<h2 className="text-base sm:text-lg font-semibold tracking-tight">
+								Upload Documents
+							</h2>
 						</div>
+						<p className="text-xs sm:text-sm text-muted-foreground line-clamp-1">
+							Upload and sync your documents to your search space
+						</p>
 					</div>

-					{/* Content */}
-					<div className="px-4 sm:px-12 pb-4 sm:pb-16">
+					<div className="px-4 sm:px-6 pb-4 sm:pb-6">
 						{!isLoading && !hasDocumentSummaryLLM ? (
-							<Alert variant="destructive" className="mb-4">
+							<Alert
+								variant="destructive"
+								className="mb-4 bg-muted/50 rounded-xl border-destructive/30"
+							>
 								<AlertTriangle className="h-4 w-4" />
 								<AlertTitle>LLM Configuration Required</AlertTitle>
 								<AlertDescription className="mt-2">
 									<p className="mb-3">
 										{isAutoMode && !hasGlobalConfigs
-											? "Auto mode is selected but no global LLM configurations are available. Please configure a custom LLM in Settings to process and summarize your uploaded documents."
-											: "You need to configure a Document Summary LLM before uploading files. This LLM is used to process and summarize your uploaded documents."}
+											? "Auto mode requires a global LLM configuration. Please add one in Settings"
+											: "A Document Summary LLM is required to process uploads, configure one in Settings"}
 									</p>
 									<Button
 										size="sm"
@ -179,9 +176,6 @@ const DocumentUploadPopupContent: FC<{
 						)}
 					</div>
 				</div>
-
-				{/* Bottom fade shadow - hidden on very small screens */}
-				<div className="hidden sm:block absolute bottom-0 left-0 right-0 h-7 bg-gradient-to-t from-muted via-muted/80 to-transparent pointer-events-none z-10" />
 			</DialogContent>
 		</Dialog>
 	);
--- a/surfsense_web/components/assistant-ui/image.tsx
+++ b/surfsense_web/components/assistant-ui/image.tsx
@ -6,6 +6,7 @@ import { ImageIcon, ImageOffIcon } from "lucide-react";
 import { memo, type PropsWithChildren, useEffect, useRef, useState } from "react";
 import { createPortal } from "react-dom";
 import { cn } from "@/lib/utils";
+import NextImage from 'next/image';

 const imageVariants = cva("aui-image-root relative overflow-hidden rounded-lg", {
 	variants: {
@ -86,23 +87,57 @@ function ImagePreview({
 				>
 					<ImageOffIcon className="size-8 text-muted-foreground" />
 				</div>
-			) : (
+			) : isDataOrBlobUrl(src) ? (
+                // biome-ignore lint/performance/noImgElement: data/blob URLs need plain img
+                <img
+                    ref={imgRef}
+                    src={src}
+                    alt={alt}
+                    className={cn("block h-auto w-full object-contain", !loaded && "invisible", className)}
+                    onLoad={(e) => {
+                        if (typeof src === "string") setLoadedSrc(src);
+                        onLoad?.(e);
+                    }}
+                    onError={(e) => {
+                        if (typeof src === "string") setErrorSrc(src);
+                        onError?.(e);
+                    }}
+                    {...props}
+                />
+            ) : (
 				// biome-ignore lint/performance/noImgElement: intentional for dynamic external URLs
-				<img
-					ref={imgRef}
-					src={src}
-					alt={alt}
-					className={cn("block h-auto w-full object-contain", !loaded && "invisible", className)}
-					onLoad={(e) => {
-						if (typeof src === "string") setLoadedSrc(src);
-						onLoad?.(e);
-					}}
-					onError={(e) => {
-						if (typeof src === "string") setErrorSrc(src);
-						onError?.(e);
-					}}
-					{...props}
-				/>
+				// <img
+				// 	ref={imgRef}
+				// 	src={src}
+				// 	alt={alt}
+				// 	className={cn("block h-auto w-full object-contain", !loaded && "invisible", className)}
+				// 	onLoad={(e) => {
+				// 		if (typeof src === "string") setLoadedSrc(src);
+				// 		onLoad?.(e);
+				// 	}}
+				// 	onError={(e) => {
+				// 		if (typeof src === "string") setErrorSrc(src);
+				// 		onError?.(e);
+				// 	}}
+				// 	{...props}
+				// />
+				<NextImage
+				fill
+				src={src || ""}
+				alt={alt}
+				sizes="(max-width: 768px) 100vw, (max-width: 1200px) 80vw, 60vw"
+				className={cn("block object-contain", !loaded && "invisible", className)}
+				onLoad={() => {
+					if (typeof src === "string") setLoadedSrc(src);
+					onLoad?.();
+				}}
+				onError={() => {
+					if (typeof src === "string") setErrorSrc(src);
+					onError?.();
+				}}
+				unoptimized={false}
+				{...props}
+			/>
 			)}
 		</div>
 	);
@ -126,7 +161,10 @@ type ImageZoomProps = PropsWithChildren<{
 	src: string;
 	alt?: string;
 }>;
-
+function isDataOrBlobUrl(src: string | undefined): boolean {
+    if (!src || typeof src !== "string") return false;
+    return src.startsWith("data:") || src.startsWith("blob:");
+}
 function ImageZoom({ src, alt = "Image preview", children }: ImageZoomProps) {
 	const [isMounted, setIsMounted] = useState(false);
 	const [isOpen, setIsOpen] = useState(false);
@ -177,22 +215,39 @@ function ImageZoom({ src, alt = "Image preview", children }: ImageZoomProps) {
 						aria-label="Close zoomed image"
 					>
 						{/** biome-ignore lint/performance/noImgElement: <explanation> */}
-						<img
-							data-slot="image-zoom-content"
-							src={src}
-							alt={alt}
-							className="aui-image-zoom-content fade-in zoom-in-95 max-h-[90vh] max-w-[90vw] animate-in object-contain duration-200"
-							onClick={(e) => {
-								e.stopPropagation();
-								handleClose();
-							}}
-							onKeyDown={(e) => {
-								if (e.key === "Enter") {
-									e.stopPropagation();
-									handleClose();
-								}
-							}}
-						/>
+						{isDataOrBlobUrl(src) ? (
+                            // biome-ignore lint/performance/noImgElement: data/blob URLs need plain img
+                            <img
+                                data-slot="image-zoom-content"
+                                src={src}
+                                alt={alt}
+                                className="aui-image-zoom-content fade-in zoom-in-95 max-h-[90vh] max-w-[90vw] animate-in object-contain duration-200"
+                                onClick={(e) => {
+                                    e.stopPropagation();
+                                    handleClose();
+                                }}
+                                onKeyDown={(e) => {
+                                    if (e.key === "Enter") {
+                                        e.stopPropagation();
+                                        handleClose();
+                                    }
+                                }}
+                            />
+                        ) : (
+							<NextImage
+                                data-slot="image-zoom-content"
+                                fill
+                                src={src}
+                                alt={alt}
+                                sizes="90vw"
+                                className="aui-image-zoom-content fade-in zoom-in-95 object-contain duration-200"
+                                onClick={(e) => {
+                                    e.stopPropagation();
+                                    handleClose();
+                                }}
+                                unoptimized={false}
+                            />
+                        )}
 					</button>,
 					document.body
 				)}
--- a/surfsense_web/components/assistant-ui/inline-citation.tsx
+++ b/surfsense_web/components/assistant-ui/inline-citation.tsx
@ -32,7 +32,7 @@ export const InlineCitation: FC<InlineCitationProps> = ({ chunkId, isDocsChunk =
 			<button
 				type="button"
 				onClick={() => setIsOpen(true)}
-				className="ml-0.5 inline-flex h-5 min-w-5 cursor-pointer items-center justify-center rounded-md bg-muted/60 px-1.5 text-[11px] font-medium text-muted-foreground align-super shadow-sm transition-colors hover:bg-muted hover:text-foreground focus-visible:ring-ring focus-visible:ring-2 focus-visible:outline-none"
+				className="ml-0.5 inline-flex h-5 min-w-5 cursor-pointer items-center justify-center rounded-md bg-muted/60 px-1.5 text-[11px] font-medium text-muted-foreground align-baseline shadow-sm transition-colors hover:bg-muted hover:text-foreground focus-visible:ring-ring focus-visible:ring-2 focus-visible:outline-none"
 				title={`View source chunk #${chunkId}`}
 			>
 				{chunkId}
--- a/surfsense_web/components/assistant-ui/thread.tsx
+++ b/surfsense_web/components/assistant-ui/thread.tsx
@ -15,6 +15,7 @@ import {
 	ChevronDown,
 	ChevronUp,
 	Clipboard,
+	Dot,
 	Globe,
 	Plus,
 	Settings2,
@ -816,12 +817,23 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 	const isDesktop = useMediaQuery("(min-width: 640px)");
 	const { openDialog: openUploadDialog } = useDocumentUploadDialog();
 	const [toolsScrollPos, setToolsScrollPos] = useState<"top" | "middle" | "bottom">("top");
+	const toolsRafRef = useRef<number>();
 	const handleToolsScroll = useCallback((e: React.UIEvent<HTMLDivElement>) => {
 		const el = e.currentTarget;
-		const atTop = el.scrollTop <= 2;
-		const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight <= 2;
-		setToolsScrollPos(atTop ? "top" : atBottom ? "bottom" : "middle");
+		if (toolsRafRef.current) return;
+		toolsRafRef.current = requestAnimationFrame(() => {
+			const atTop = el.scrollTop <= 2;
+			const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight <= 2;
+			setToolsScrollPos(atTop ? "top" : atBottom ? "bottom" : "middle");
+			toolsRafRef.current = undefined;
+		});
 	}, []);
+	useEffect(
+		() => () => {
+			if (toolsRafRef.current) cancelAnimationFrame(toolsRafRef.current);
+		},
+		[]
+	);
 	const isComposerTextEmpty = useAuiState(({ composer }) => {
 		const text = composer.text?.trim() || "";
 		return text.length === 0;
@ -834,6 +846,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false

 	const { data: agentTools } = useAtomValue(agentToolsAtom);
 	const disabledTools = useAtomValue(disabledToolsAtom);
+	const disabledToolsSet = useMemo(() => new Set(disabledTools), [disabledTools]);
 	const toggleTool = useSetAtom(toggleToolAtom);
 	const setDisabledTools = useSetAtom(disabledToolsAtom);
 	const hydrateDisabled = useSetAtom(hydrateDisabledToolsAtom);
@ -846,18 +859,18 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false

 	const toggleToolGroup = useCallback(
 		(toolNames: string[]) => {
-			const allDisabled = toolNames.every((name) => disabledTools.includes(name));
+			const allDisabled = toolNames.every((name) => disabledToolsSet.has(name));
 			if (allDisabled) {
 				setDisabledTools((prev) => prev.filter((t) => !toolNames.includes(t)));
 			} else {
 				setDisabledTools((prev) => [...new Set([...prev, ...toolNames])]);
 			}
 		},
-		[disabledTools, setDisabledTools]
+		[disabledToolsSet, setDisabledTools]
 	);

 	const hasWebSearchTool = agentTools?.some((t) => t.name === "web_search") ?? false;
-	const isWebSearchEnabled = hasWebSearchTool && !disabledTools.includes("web_search");
+	const isWebSearchEnabled = hasWebSearchTool && !disabledToolsSet.has("web_search");
 	const filteredTools = useMemo(
 		() => agentTools?.filter((t) => t.name !== "web_search"),
 		[agentTools]
@ -957,7 +970,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 													{group.label}
 												</div>
 												{group.tools.map((tool) => {
-													const isDisabled = disabledTools.includes(tool.name);
+													const isDisabled = disabledToolsSet.has(tool.name);
 													const ToolIcon = getToolIcon(tool.name);
 													return (
 														<div
@ -989,7 +1002,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 													const iconKey = group.connectorIcon ?? "";
 													const iconInfo = CONNECTOR_TOOL_ICON_PATHS[iconKey];
 													const toolNames = group.tools.map((t) => t.name);
-													const allDisabled = toolNames.every((n) => disabledTools.includes(n));
+													const allDisabled = toolNames.every((n) => disabledToolsSet.has(n));
 													return (
 														<div
 															key={group.label}
@ -1063,7 +1076,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 						>
 							<div className="sr-only">Manage Tools</div>
 							<div
-								className="max-h-48 sm:max-h-64 overflow-y-auto py-0.5 sm:py-1"
+								className="max-h-48 sm:max-h-64 overflow-y-auto overscroll-none py-0.5 sm:py-1"
 								onScroll={handleToolsScroll}
 								style={{
 									maskImage: `linear-gradient(to bottom, ${toolsScrollPos === "top" ? "black" : "transparent"}, black 16px, black calc(100% - 16px), ${toolsScrollPos === "bottom" ? "black" : "transparent"})`,
@ -1078,7 +1091,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 												{group.label}
 											</div>
 											{group.tools.map((tool) => {
-												const isDisabled = disabledTools.includes(tool.name);
+												const isDisabled = disabledToolsSet.has(tool.name);
 												const ToolIcon = getToolIcon(tool.name);
 												const row = (
 													<div className="flex w-full items-center gap-2 sm:gap-3 px-2.5 sm:px-3 py-1 sm:py-1.5 hover:bg-muted-foreground/10 transition-colors">
@ -1115,7 +1128,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 												const iconKey = group.connectorIcon ?? "";
 												const iconInfo = CONNECTOR_TOOL_ICON_PATHS[iconKey];
 												const toolNames = group.tools.map((t) => t.name);
-												const allDisabled = toolNames.every((n) => disabledTools.includes(n));
+												const allDisabled = toolNames.every((n) => disabledToolsSet.has(n));
 												const groupDef = TOOL_GROUPS.find((g) => g.label === group.label);
 												const row = (
 													<div className="flex w-full items-center gap-2 sm:gap-3 px-2.5 sm:px-3 py-1 sm:py-1.5 hover:bg-muted-foreground/10 transition-colors">
@ -1146,7 +1159,11 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
 														<TooltipTrigger asChild>{row}</TooltipTrigger>
 														<TooltipContent side="right" className="max-w-72 text-xs">
 															{groupDef?.tooltip ??
-																group.tools.map((t) => t.description).join(" · ")}
+																group.tools.flatMap((t, i) =>
+																	i === 0
+																		? [t.description]
+																		: [<Dot key={i} className="inline h-4 w-4" />, t.description]
+																)}
 														</TooltipContent>
 													</Tooltip>
 												);
--- a/surfsense_web/components/assistant-ui/tool-fallback.tsx
+++ b/surfsense_web/components/assistant-ui/tool-fallback.tsx
@ -1,6 +1,6 @@
 import type { ToolCallMessagePartComponent } from "@assistant-ui/react";
 import { CheckIcon, ChevronDownIcon, ChevronUpIcon, XCircleIcon } from "lucide-react";
-import { useState } from "react";
+import { useMemo, useState } from "react";
 import { getToolIcon } from "@/contracts/enums/toolIcons";
 import { cn } from "@/lib/utils";

@ -19,17 +19,28 @@ export const ToolFallback: ToolCallMessagePartComponent = ({
 	const isCancelled = status?.type === "incomplete" && status.reason === "cancelled";
 	const isError = status?.type === "incomplete" && status.reason === "error";
 	const isRunning = status?.type === "running" || status?.type === "requires-action";
+	const errorData = status?.type === "incomplete" ? status.error : undefined;
+	const serializedError = useMemo(
+		() => (errorData && typeof errorData !== "string" ? JSON.stringify(errorData) : null),
+		[errorData]
+	);
+
+	const serializedResult = useMemo(
+		() => (result !== undefined && typeof result !== "string" ? JSON.stringify(result, null, 2) : null),
+		[result]
+	);
+
 	const cancelledReason =
 		isCancelled && status.error
 			? typeof status.error === "string"
 				? status.error
-				: JSON.stringify(status.error)
+				: serializedError
 			: null;
 	const errorReason =
 		isError && status.error
 			? typeof status.error === "string"
 				? status.error
-				: JSON.stringify(status.error)
+				: serializedError
 			: null;

 	const Icon = getToolIcon(toolName);
@ -122,7 +133,7 @@ export const ToolFallback: ToolCallMessagePartComponent = ({
 								<div>
 									<p className="text-xs font-medium text-muted-foreground mb-1">Result</p>
 									<pre className="text-xs text-foreground/80 whitespace-pre-wrap break-all">
-										{typeof result === "string" ? result : JSON.stringify(result, null, 2)}
+										{typeof result === "string" ? result : serializedResult}
 									</pre>
 								</div>
 							</>
--- a/surfsense_web/components/chat-comments/comment-composer/comment-composer.tsx
+++ b/surfsense_web/components/chat-comments/comment-composer/comment-composer.tsx
@ -15,13 +15,14 @@ function convertDisplayToData(displayContent: string, mentions: InsertedMention[

 	const sortedMentions = [...mentions].sort((a, b) => b.displayName.length - a.displayName.length);

-	for (const mention of sortedMentions) {
-		const displayPattern = new RegExp(
-			`@${escapeRegExp(mention.displayName)}(?=\\s|$|[.,!?;:])`,
-			"g"
-		);
-		const dataFormat = `@[${mention.id}]`;
-		result = result.replace(displayPattern, dataFormat);
+	const mentionPatterns = sortedMentions.map((mention) => ({
+		pattern: new RegExp(`@${escapeRegExp(mention.displayName)}(?=\\s|$|[.,!?;:])`, "g"),
+		dataFormat: `@[${mention.id}]`,
+	}));
+
+	for (const { pattern, dataFormat } of mentionPatterns) {
+		pattern.lastIndex = 0; // reset global regex state
+		result = result.replace(pattern, dataFormat);
 	}

 	return result;
--- a/surfsense_web/components/documents/DocumentNode.tsx
+++ b/surfsense_web/components/documents/DocumentNode.tsx
@ -5,6 +5,7 @@ import {
 	Clock,
 	Download,
 	Eye,
+	History,
 	MoreHorizontal,
 	Move,
 	PenLine,
@ -39,6 +40,7 @@ import { Tooltip, TooltipContent, TooltipTrigger } from "@/components/ui/tooltip
 import type { DocumentTypeEnum } from "@/contracts/types/document.types";
 import { cn } from "@/lib/utils";
 import { DND_TYPES } from "./FolderNode";
+import { isVersionableType } from "./version-history";

 const EDITABLE_DOCUMENT_TYPES = new Set(["FILE", "NOTE"]);

@ -60,6 +62,7 @@ interface DocumentNodeProps {
 	onDelete: (doc: DocumentNodeDoc) => void;
 	onMove: (doc: DocumentNodeDoc) => void;
 	onExport?: (doc: DocumentNodeDoc, format: string) => void;
+	onVersionHistory?: (doc: DocumentNodeDoc) => void;
 	contextMenuOpen?: boolean;
 	onContextMenuOpenChange?: (open: boolean) => void;
 }
@ -74,6 +77,7 @@ export const DocumentNode = React.memo(function DocumentNode({
 	onDelete,
 	onMove,
 	onExport,
+	onVersionHistory,
 	contextMenuOpen,
 	onContextMenuOpenChange,
 }: DocumentNodeProps) {
@ -195,12 +199,17 @@ export const DocumentNode = React.memo(function DocumentNode({

 					<span className="flex-1 min-w-0 truncate">{doc.title}</span>

-					<span className="shrink-0">
-						{getDocumentTypeIcon(
-							doc.document_type as DocumentTypeEnum,
-							"h-3.5 w-3.5 text-muted-foreground"
-						)}
-					</span>
+					{getDocumentTypeIcon(
+						doc.document_type as DocumentTypeEnum,
+						"h-3.5 w-3.5 text-muted-foreground"
+					) && (
+						<span className="shrink-0">
+							{getDocumentTypeIcon(
+								doc.document_type as DocumentTypeEnum,
+								"h-3.5 w-3.5 text-muted-foreground"
+							)}
+						</span>
+					)}

 					<DropdownMenu open={dropdownOpen} onOpenChange={setDropdownOpen}>
 						<DropdownMenuTrigger asChild>
@ -219,7 +228,7 @@ export const DocumentNode = React.memo(function DocumentNode({
 							</Button>
 						</DropdownMenuTrigger>
 						<DropdownMenuContent align="end" className="w-40" onClick={(e) => e.stopPropagation()}>
-							<DropdownMenuItem onClick={() => onPreview(doc)}>
+							<DropdownMenuItem onClick={() => onPreview(doc)} disabled={isProcessing}>
 								<Eye className="mr-2 h-4 w-4" />
 								Open
 							</DropdownMenuItem>
@ -235,7 +244,7 @@ export const DocumentNode = React.memo(function DocumentNode({
 							</DropdownMenuItem>
 							{onExport && (
 								<DropdownMenuSub>
-									<DropdownMenuSubTrigger>
+									<DropdownMenuSubTrigger disabled={isProcessing}>
 										<Download className="mr-2 h-4 w-4" />
 										Export
 									</DropdownMenuSubTrigger>
@ -244,6 +253,12 @@ export const DocumentNode = React.memo(function DocumentNode({
 									</DropdownMenuSubContent>
 								</DropdownMenuSub>
 							)}
+							{onVersionHistory && isVersionableType(doc.document_type) && (
+								<DropdownMenuItem disabled={isProcessing} onClick={() => onVersionHistory(doc)}>
+									<History className="mr-2 h-4 w-4" />
+									Versions
+								</DropdownMenuItem>
+							)}
 							<DropdownMenuItem
 								className="text-destructive focus:text-destructive"
 								disabled={isProcessing}
@ -259,7 +274,7 @@ export const DocumentNode = React.memo(function DocumentNode({

 			{contextMenuOpen && (
 				<ContextMenuContent className="w-40" onClick={(e) => e.stopPropagation()}>
-					<ContextMenuItem onClick={() => onPreview(doc)}>
+					<ContextMenuItem onClick={() => onPreview(doc)} disabled={isProcessing}>
 						<Eye className="mr-2 h-4 w-4" />
 						Open
 					</ContextMenuItem>
@ -275,7 +290,7 @@ export const DocumentNode = React.memo(function DocumentNode({
 					</ContextMenuItem>
 					{onExport && (
 						<ContextMenuSub>
-							<ContextMenuSubTrigger>
+							<ContextMenuSubTrigger disabled={isProcessing}>
 								<Download className="mr-2 h-4 w-4" />
 								Export
 							</ContextMenuSubTrigger>
@ -284,6 +299,12 @@ export const DocumentNode = React.memo(function DocumentNode({
 							</ContextMenuSubContent>
 						</ContextMenuSub>
 					)}
+					{onVersionHistory && isVersionableType(doc.document_type) && (
+						<ContextMenuItem disabled={isProcessing} onClick={() => onVersionHistory(doc)}>
+							<History className="mr-2 h-4 w-4" />
+							Versions
+						</ContextMenuItem>
+					)}
 					<ContextMenuItem
 						className="text-destructive focus:text-destructive"
 						disabled={isProcessing}
--- a/surfsense_web/components/documents/FolderNode.tsx
+++ b/surfsense_web/components/documents/FolderNode.tsx
@ -1,14 +1,18 @@
 "use client";

 import {
+	AlertCircle,
 	ChevronDown,
 	ChevronRight,
+	Eye,
+	EyeOff,
 	Folder,
 	FolderOpen,
 	FolderPlus,
 	MoreHorizontal,
 	Move,
 	PenLine,
+	RefreshCw,
 	Trash2,
 } from "lucide-react";
 import React, { useCallback, useEffect, useRef, useState } from "react";
@ -27,6 +31,8 @@ import {
 	DropdownMenuItem,
 	DropdownMenuTrigger,
 } from "@/components/ui/dropdown-menu";
+import { Spinner } from "@/components/ui/spinner";
+import { Tooltip, TooltipContent, TooltipTrigger } from "@/components/ui/tooltip";
 import { cn } from "@/lib/utils";
 import type { FolderSelectionState } from "./FolderTreeView";

@ -52,6 +58,7 @@ interface FolderNodeProps {
 	isRenaming: boolean;
 	childCount: number;
 	selectionState: FolderSelectionState;
+	processingState: "idle" | "processing" | "failed";
 	onToggleSelect: (folderId: number, selectAll: boolean) => void;
 	onToggleExpand: (folderId: number) => void;
 	onRename: (folder: FolderDisplay, newName: string) => void;
@ -70,6 +77,9 @@ interface FolderNodeProps {
 	disabledDropIds?: Set<number>;
 	contextMenuOpen?: boolean;
 	onContextMenuOpenChange?: (open: boolean) => void;
+	isWatched?: boolean;
+	onRescan?: (folder: FolderDisplay) => void;
+	onStopWatching?: (folder: FolderDisplay) => void;
 }

 function getDropZone(
@ -93,6 +103,7 @@ export const FolderNode = React.memo(function FolderNode({
 	isRenaming,
 	childCount,
 	selectionState,
+	processingState,
 	onToggleSelect,
 	onToggleExpand,
 	onRename,
@ -107,6 +118,9 @@ export const FolderNode = React.memo(function FolderNode({
 	disabledDropIds,
 	contextMenuOpen,
 	onContextMenuOpenChange,
+	isWatched,
+	onRescan,
+	onStopWatching,
 }: FolderNodeProps) {
 	const [renameValue, setRenameValue] = useState(folder.name);
 	const inputRef = useRef<HTMLInputElement>(null);
@ -242,7 +256,9 @@ export const FolderNode = React.memo(function FolderNode({
 						isOver && !canDrop && "cursor-not-allowed"
 					)}
 					style={{ paddingLeft: `${depth * 16 + 4}px` }}
-					onClick={() => onToggleExpand(folder.id)}
+					onClick={() => {
+						onToggleExpand(folder.id);
+					}}
 					onKeyDown={(e) => {
 						if (e.key === "Enter" || e.key === " ") {
 							e.preventDefault();
@ -262,14 +278,45 @@ export const FolderNode = React.memo(function FolderNode({
 						)}
 					</span>

-					<Checkbox
-						checked={
-							selectionState === "all" ? true : selectionState === "some" ? "indeterminate" : false
-						}
-						onCheckedChange={handleCheckChange}
-						onClick={(e) => e.stopPropagation()}
-						className="h-3.5 w-3.5 shrink-0"
-					/>
+					{processingState !== "idle" && selectionState === "none" ? (
+						<>
+							<Tooltip>
+								<TooltipTrigger asChild>
+									<span className="flex h-3.5 w-3.5 shrink-0 items-center justify-center group-hover:hidden">
+										{processingState === "processing" ? (
+											<Spinner size="xs" className="text-primary" />
+										) : (
+											<AlertCircle className="h-3.5 w-3.5 text-destructive" />
+										)}
+									</span>
+								</TooltipTrigger>
+								<TooltipContent side="top">
+									{processingState === "processing"
+										? "Syncing folder contents"
+										: "Some files failed to process"}
+								</TooltipContent>
+							</Tooltip>
+							<Checkbox
+								checked={false}
+								onCheckedChange={handleCheckChange}
+								onClick={(e) => e.stopPropagation()}
+								className="h-3.5 w-3.5 shrink-0 hidden group-hover:flex"
+							/>
+						</>
+					) : (
+						<Checkbox
+							checked={
+								selectionState === "all"
+									? true
+									: selectionState === "some"
+										? "indeterminate"
+										: false
+							}
+							onCheckedChange={handleCheckChange}
+							onClick={(e) => e.stopPropagation()}
+							className="h-3.5 w-3.5 shrink-0"
+						/>
+					)}

 					<FolderIcon className="h-4 w-4 shrink-0 text-muted-foreground" />

@ -308,6 +355,28 @@ export const FolderNode = React.memo(function FolderNode({
 								</Button>
 							</DropdownMenuTrigger>
 							<DropdownMenuContent align="end" className="w-40">
+								{isWatched && onRescan && (
+									<DropdownMenuItem
+										onClick={(e) => {
+											e.stopPropagation();
+											onRescan(folder);
+										}}
+									>
+										<RefreshCw className="mr-2 h-4 w-4" />
+										Re-scan
+									</DropdownMenuItem>
+								)}
+								{isWatched && onStopWatching && (
+									<DropdownMenuItem
+										onClick={(e) => {
+											e.stopPropagation();
+											onStopWatching(folder);
+										}}
+									>
+										<EyeOff className="mr-2 h-4 w-4" />
+										Stop watching
+									</DropdownMenuItem>
+								)}
 								<DropdownMenuItem
 									onClick={(e) => {
 										e.stopPropagation();
@ -353,6 +422,18 @@ export const FolderNode = React.memo(function FolderNode({

 			{!isRenaming && contextMenuOpen && (
 				<ContextMenuContent className="w-40">
+					{isWatched && onRescan && (
+						<ContextMenuItem onClick={() => onRescan(folder)}>
+							<RefreshCw className="mr-2 h-4 w-4" />
+							Re-scan
+						</ContextMenuItem>
+					)}
+					{isWatched && onStopWatching && (
+						<ContextMenuItem onClick={() => onStopWatching(folder)}>
+							<EyeOff className="mr-2 h-4 w-4" />
+							Stop watching
+						</ContextMenuItem>
+					)}
 					<ContextMenuItem onClick={() => onCreateSubfolder(folder.id)}>
 						<FolderPlus className="mr-2 h-4 w-4" />
 						New subfolder
--- a/surfsense_web/components/documents/FolderTreeView.tsx
+++ b/surfsense_web/components/documents/FolderTreeView.tsx
@ -1,7 +1,7 @@
 "use client";

 import { useAtom } from "jotai";
-import { CirclePlus } from "lucide-react";
+import { Search } from "lucide-react";
 import { useCallback, useMemo, useState } from "react";
 import { DndProvider } from "react-dnd";
 import { HTML5Backend } from "react-dnd-html5-backend";
@ -32,6 +32,7 @@ interface FolderTreeViewProps {
 	onDeleteDocument: (doc: DocumentNodeDoc) => void;
 	onMoveDocument: (doc: DocumentNodeDoc) => void;
 	onExportDocument?: (doc: DocumentNodeDoc, format: string) => void;
+	onVersionHistory?: (doc: DocumentNodeDoc) => void;
 	activeTypes: DocumentTypeEnum[];
 	searchQuery?: string;
 	onDropIntoFolder?: (
@ -40,6 +41,9 @@ interface FolderTreeViewProps {
 		targetFolderId: number | null
 	) => void;
 	onReorderFolder?: (folderId: number, beforePos: string | null, afterPos: string | null) => void;
+	watchedFolderIds?: Set<number>;
+	onRescanFolder?: (folder: FolderDisplay) => void;
+	onStopWatchingFolder?: (folder: FolderDisplay) => void;
 }

 function groupBy<T>(items: T[], keyFn: (item: T) => string | number): Record<string | number, T[]> {
@ -69,10 +73,14 @@ export function FolderTreeView({
 	onDeleteDocument,
 	onMoveDocument,
 	onExportDocument,
+	onVersionHistory,
 	activeTypes,
 	searchQuery,
 	onDropIntoFolder,
 	onReorderFolder,
+	watchedFolderIds,
+	onRescanFolder,
+	onStopWatchingFolder,
 }: FolderTreeViewProps) {
 	const foldersByParent = useMemo(() => groupBy(folders, (f) => f.parentId ?? "root"), [folders]);

@ -158,6 +166,35 @@ export function FolderTreeView({
 		return states;
 	}, [folders, docsByFolder, foldersByParent, mentionedDocIds]);

+	const folderProcessingStates = useMemo(() => {
+		const states: Record<number, "idle" | "processing" | "failed"> = {};
+
+		function compute(folderId: number): { hasProcessing: boolean; hasFailed: boolean } {
+			const directDocs = docsByFolder[folderId] ?? [];
+			let hasProcessing = directDocs.some(
+				(d) => d.status?.state === "pending" || d.status?.state === "processing"
+			);
+			let hasFailed = directDocs.some((d) => d.status?.state === "failed");
+
+			for (const child of foldersByParent[folderId] ?? []) {
+				const sub = compute(child.id);
+				hasProcessing = hasProcessing || sub.hasProcessing;
+				hasFailed = hasFailed || sub.hasFailed;
+			}
+
+			if (hasProcessing) states[folderId] = "processing";
+			else if (hasFailed) states[folderId] = "failed";
+			else states[folderId] = "idle";
+
+			return { hasProcessing, hasFailed };
+		}
+
+		for (const f of folders) {
+			if (states[f.id] === undefined) compute(f.id);
+		}
+		return states;
+	}, [folders, docsByFolder, foldersByParent]);
+
 	function renderLevel(parentId: number | null, depth: number): React.ReactNode[] {
 		const key = parentId ?? "root";
 		const childFolders = (foldersByParent[key] ?? [])
@ -191,6 +228,7 @@ export function FolderTreeView({
 					isRenaming={renamingFolderId === f.id}
 					childCount={folderChildCounts[f.id] ?? 0}
 					selectionState={folderSelectionStates[f.id] ?? "none"}
+					processingState={folderProcessingStates[f.id] ?? "idle"}
 					onToggleSelect={onToggleFolderSelect}
 					onToggleExpand={onToggleExpand}
 					onRename={onRenameFolder}
@ -204,6 +242,9 @@ export function FolderTreeView({
 					siblingPositions={siblingPositions}
 					contextMenuOpen={openContextMenuId === `folder-${f.id}`}
 					onContextMenuOpenChange={(open) => setOpenContextMenuId(open ? `folder-${f.id}` : null)}
+					isWatched={watchedFolderIds?.has(f.id)}
+					onRescan={onRescanFolder}
+					onStopWatching={onStopWatchingFolder}
 				/>
 			);

@ -225,6 +266,7 @@ export function FolderTreeView({
 					onDelete={onDeleteDocument}
 					onMove={onMoveDocument}
 					onExport={onExportDocument}
+					onVersionHistory={onVersionHistory}
 					contextMenuOpen={openContextMenuId === `doc-${d.id}`}
 					onContextMenuOpenChange={(open) => setOpenContextMenuId(open ? `doc-${d.id}` : null)}
 				/>
@ -250,8 +292,9 @@ export function FolderTreeView({
 	if (treeNodes.length === 0 && (activeTypes.length > 0 || searchQuery)) {
 		return (
 			<div className="flex flex-1 flex-col items-center justify-center gap-3 px-4 py-12 text-muted-foreground">
-				<CirclePlus className="h-10 w-10 rotate-45" />
-				<p className="text-sm">No matching documents</p>
+				<Search className="h-10 w-10" />
+				<p className="text-sm text-muted-foreground">No matching documents</p>
+				<p className="text-xs text-muted-foreground/70 mt-1">Try a different search term</p>
 			</div>
 		);
 	}
--- a/surfsense_web/components/documents/version-history.tsx
+++ b/surfsense_web/components/documents/version-history.tsx
@ -0,0 +1,258 @@
+"use client";
+
+import { Check, ChevronRight, Clock, Copy, RotateCcw } from "lucide-react";
+import { useCallback, useEffect, useState } from "react";
+import { toast } from "sonner";
+import { Button } from "@/components/ui/button";
+import { Dialog, DialogContent, DialogTitle, DialogTrigger } from "@/components/ui/dialog";
+import { Separator } from "@/components/ui/separator";
+import { Spinner } from "@/components/ui/spinner";
+import { documentsApiService } from "@/lib/apis/documents-api.service";
+import { cn } from "@/lib/utils";
+
+interface DocumentVersionSummary {
+	version_number: number;
+	title: string;
+	content_hash: string;
+	created_at: string | null;
+}
+
+interface VersionHistoryProps {
+	documentId: number;
+	documentType: string;
+}
+
+const VERSION_DOCUMENT_TYPES = new Set(["LOCAL_FOLDER_FILE", "OBSIDIAN_CONNECTOR"]);
+
+export function isVersionableType(documentType: string) {
+	return VERSION_DOCUMENT_TYPES.has(documentType);
+}
+
+const DIALOG_CLASSES =
+	"select-none max-w-[900px] w-[95vw] md:w-[90vw] h-[90vh] md:h-[80vh] max-h-[640px] flex flex-col md:flex-row p-0 gap-0 overflow-hidden [--card:var(--background)] dark:[--card:oklch(0.205_0_0)] dark:[--background:oklch(0.205_0_0)]";
+
+export function VersionHistoryButton({ documentId, documentType }: VersionHistoryProps) {
+	if (!isVersionableType(documentType)) return null;
+
+	return (
+		<Dialog>
+			<DialogTrigger asChild>
+				<Button variant="ghost" size="sm" className="gap-1.5 text-xs">
+					<Clock className="h-3.5 w-3.5" />
+					Versions
+				</Button>
+			</DialogTrigger>
+			<DialogContent className={DIALOG_CLASSES}>
+				<DialogTitle className="sr-only">Version History</DialogTitle>
+				<VersionHistoryPanel documentId={documentId} />
+			</DialogContent>
+		</Dialog>
+	);
+}
+
+export function VersionHistoryDialog({
+	open,
+	onOpenChange,
+	documentId,
+}: {
+	open: boolean;
+	onOpenChange: (open: boolean) => void;
+	documentId: number;
+}) {
+	return (
+		<Dialog open={open} onOpenChange={onOpenChange}>
+			<DialogContent className={DIALOG_CLASSES}>
+				<DialogTitle className="sr-only">Version History</DialogTitle>
+				{open && <VersionHistoryPanel documentId={documentId} />}
+			</DialogContent>
+		</Dialog>
+	);
+}
+
+function formatRelativeTime(dateStr: string): string {
+	const now = Date.now();
+	const then = new Date(dateStr).getTime();
+	const diffMs = now - then;
+	const diffMin = Math.floor(diffMs / 60_000);
+	if (diffMin < 1) return "Just now";
+	if (diffMin < 60) return `${diffMin} minute${diffMin !== 1 ? "s" : ""} ago`;
+	const diffHr = Math.floor(diffMin / 60);
+	if (diffHr < 24) return `${diffHr} hour${diffHr !== 1 ? "s" : ""} ago`;
+	return new Date(dateStr).toLocaleDateString(undefined, {
+		weekday: "short",
+		month: "short",
+		day: "numeric",
+		year: "numeric",
+		hour: "numeric",
+		minute: "2-digit",
+	});
+}
+
+function VersionHistoryPanel({ documentId }: { documentId: number }) {
+	const [versions, setVersions] = useState<DocumentVersionSummary[]>([]);
+	const [loading, setLoading] = useState(true);
+	const [selectedVersion, setSelectedVersion] = useState<number | null>(null);
+	const [versionContent, setVersionContent] = useState<string>("");
+	const [contentLoading, setContentLoading] = useState(false);
+	const [restoring, setRestoring] = useState(false);
+	const [copied, setCopied] = useState(false);
+
+	const loadVersions = useCallback(async () => {
+		setLoading(true);
+		try {
+			const data = await documentsApiService.listDocumentVersions(documentId);
+			setVersions(data as DocumentVersionSummary[]);
+		} catch {
+			toast.error("Failed to load version history");
+		} finally {
+			setLoading(false);
+		}
+	}, [documentId]);
+
+	useEffect(() => {
+		loadVersions();
+	}, [loadVersions]);
+
+	const handleSelectVersion = async (versionNumber: number) => {
+		if (selectedVersion === versionNumber) return;
+		setSelectedVersion(versionNumber);
+		setContentLoading(true);
+		try {
+			const data = (await documentsApiService.getDocumentVersion(documentId, versionNumber)) as {
+				source_markdown: string;
+			};
+			setVersionContent(data.source_markdown || "");
+		} catch {
+			toast.error("Failed to load version content");
+		} finally {
+			setContentLoading(false);
+		}
+	};
+
+	const handleRestore = async (versionNumber: number) => {
+		setRestoring(true);
+		try {
+			await documentsApiService.restoreDocumentVersion(documentId, versionNumber);
+			toast.success(`Restored version ${versionNumber}`);
+			await loadVersions();
+		} catch {
+			toast.error("Failed to restore version");
+		} finally {
+			setRestoring(false);
+		}
+	};
+
+	const handleCopy = () => {
+		navigator.clipboard.writeText(versionContent);
+		setCopied(true);
+		setTimeout(() => setCopied(false), 2000);
+	};
+
+	if (loading) {
+		return (
+			<div className="flex flex-1 items-center justify-center">
+				<Spinner size="lg" className="text-muted-foreground" />
+			</div>
+		);
+	}
+
+	if (versions.length === 0) {
+		return (
+			<div className="flex flex-1 flex-col items-center justify-center text-muted-foreground">
+				<p className="text-sm">No version history available yet</p>
+				<p className="text-xs mt-1">Versions are created when file content changes</p>
+			</div>
+		);
+	}
+
+	const selectedVersionData = versions.find((v) => v.version_number === selectedVersion);
+
+	return (
+		<>
+			{/* Left panel — version list */}
+			<nav className="w-full md:w-[260px] shrink-0 flex flex-col border-b md:border-b-0 md:border-r border-border">
+				<div className="px-4 pr-12 md:pr-4 pt-5 pb-2">
+					<h2 className="text-sm font-semibold text-foreground">Version History</h2>
+				</div>
+				<div className="flex-1 overflow-y-auto p-2">
+					<div className="flex flex-col gap-0.5">
+						{versions.map((v) => (
+							<button
+								key={v.version_number}
+								type="button"
+								onClick={() => handleSelectVersion(v.version_number)}
+								className={cn(
+									"flex items-center gap-2 rounded-lg px-3 py-2.5 text-left transition-colors focus:outline-none focus-visible:outline-none w-full",
+									selectedVersion === v.version_number
+										? "bg-accent text-accent-foreground"
+										: "text-muted-foreground hover:bg-accent/50 hover:text-foreground"
+								)}
+							>
+								<div className="flex-1 min-w-0 space-y-0.5">
+									<p className="text-sm font-medium truncate">
+										{v.created_at
+											? formatRelativeTime(v.created_at)
+											: `Version ${v.version_number}`}
+									</p>
+									{v.title && <p className="text-xs text-muted-foreground truncate">{v.title}</p>}
+								</div>
+								<ChevronRight className="h-3.5 w-3.5 shrink-0 opacity-50" />
+							</button>
+						))}
+					</div>
+				</div>
+			</nav>
+
+			{/* Right panel — content preview */}
+			<div className="flex flex-1 flex-col overflow-hidden min-w-0">
+				{selectedVersion !== null && selectedVersionData ? (
+					<>
+						<div className="flex items-center justify-between pl-6 pr-14 pt-5 pb-2">
+							<h2 className="text-sm font-semibold truncate">
+								{selectedVersionData.title || `Version ${selectedVersion}`}
+							</h2>
+							<div className="flex items-center gap-1.5 shrink-0">
+								<Button
+									variant="outline"
+									size="sm"
+									className="gap-1.5 text-xs"
+									onClick={handleCopy}
+									disabled={contentLoading || copied}
+								>
+									{copied ? <Check className="h-3 w-3" /> : <Copy className="h-3 w-3" />}
+									{copied ? "Copied" : "Copy"}
+								</Button>
+								<Button
+									variant="outline"
+									size="sm"
+									className="gap-1.5 text-xs"
+									disabled={restoring || contentLoading}
+									onClick={() => handleRestore(selectedVersion)}
+								>
+									{restoring ? <Spinner size="xs" /> : <RotateCcw className="h-3 w-3" />}
+									Restore
+								</Button>
+							</div>
+						</div>
+						<Separator />
+						<div className="flex-1 overflow-y-auto px-6 py-4">
+							{contentLoading ? (
+								<div className="flex items-center justify-center py-12">
+									<Spinner size="sm" className="text-muted-foreground" />
+								</div>
+							) : (
+								<pre className="text-sm whitespace-pre-wrap font-mono leading-relaxed text-foreground/90">
+									{versionContent || "(empty)"}
+								</pre>
+							)}
+						</div>
+					</>
+				) : (
+					<div className="flex flex-1 items-center justify-center text-muted-foreground">
+						<p className="text-sm">Select a version to preview</p>
+					</div>
+				)}
+			</div>
+		</>
+	);
+}
--- a/surfsense_web/components/editor-panel/editor-panel.tsx
+++ b/surfsense_web/components/editor-panel/editor-panel.tsx
@ -1,12 +1,14 @@
 "use client";

 import { useAtomValue, useSetAtom } from "jotai";
-import { AlertCircle, XIcon } from "lucide-react";
+import { Download, FileQuestionMark, FileText, Loader2, RefreshCw, XIcon } from "lucide-react";
 import dynamic from "next/dynamic";
 import { useCallback, useEffect, useRef, useState } from "react";
 import { toast } from "sonner";
 import { closeEditorPanelAtom, editorPanelAtom } from "@/atoms/editor/editor-panel.atom";
+import { VersionHistoryButton } from "@/components/documents/version-history";
 import { MarkdownViewer } from "@/components/markdown-viewer";
+import { Alert, AlertDescription } from "@/components/ui/alert";
 import { Button } from "@/components/ui/button";
 import { Drawer, DrawerContent, DrawerHandle, DrawerTitle } from "@/components/ui/drawer";
 import { Skeleton } from "@/components/ui/skeleton";
@ -18,11 +20,16 @@ const PlateEditor = dynamic(
 	{ ssr: false, loading: () => <Skeleton className="h-64 w-full" /> }
 );

+const LARGE_DOCUMENT_THRESHOLD = 2 * 1024 * 1024; // 2MB
+
 interface EditorContent {
 	document_id: number;
 	title: string;
 	document_type?: string;
 	source_markdown: string;
+	content_size_bytes?: number;
+	chunk_count?: number;
+	truncated?: boolean;
 }

 const EDITABLE_DOCUMENT_TYPES = new Set(["FILE", "NOTE"]);
@ -62,6 +69,7 @@ export function EditorPanelContent({
 	const [isLoading, setIsLoading] = useState(true);
 	const [error, setError] = useState<string | null>(null);
 	const [saving, setSaving] = useState(false);
+	const [downloading, setDownloading] = useState(false);

 	const [editedMarkdown, setEditedMarkdown] = useState<string | null>(null);
 	const markdownRef = useRef<string>("");
@ -69,8 +77,10 @@ export function EditorPanelContent({
 	const changeCountRef = useRef(0);
 	const [displayTitle, setDisplayTitle] = useState(title || "Untitled");

+	const isLargeDocument = (editorDoc?.content_size_bytes ?? 0) > LARGE_DOCUMENT_THRESHOLD;
+
 	useEffect(() => {
-		let cancelled = false;
+		const controller = new AbortController();
 		setIsLoading(true);
 		setError(null);
 		setEditorDoc(null);
@ -78,7 +88,7 @@ export function EditorPanelContent({
 		initialLoadDone.current = false;
 		changeCountRef.current = 0;

-		const fetchContent = async () => {
+		const doFetch = async () => {
 			const token = getBearerToken();
 			if (!token) {
 				redirectToLogin();
@ -88,10 +98,15 @@ export function EditorPanelContent({
 			try {
 				const response = await authenticatedFetch(
 					`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/search-spaces/${searchSpaceId}/documents/${documentId}/editor-content`,
-					{ method: "GET" }
+					{ method: "GET", signal: controller.signal }
+				const url = new URL(
+					`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/search-spaces/${searchSpaceId}/documents/${documentId}/editor-content`
 				);
+				url.searchParams.set("max_length", String(LARGE_DOCUMENT_THRESHOLD));

-				if (cancelled) return;
+				const response = await authenticatedFetch(url.toString(), { method: "GET" });
+
+				if (controller.signal.aborted) return;

 				if (!response.ok) {
 					const errorData = await response
@ -115,18 +130,16 @@ export function EditorPanelContent({
 				setEditorDoc(data);
 				initialLoadDone.current = true;
 			} catch (err) {
-				if (cancelled) return;
+				if (controller.signal.aborted) return;
 				console.error("Error fetching document:", err);
 				setError(err instanceof Error ? err.message : "Failed to fetch document");
 			} finally {
-				if (!cancelled) setIsLoading(false);
+				if (!controller.signal.aborted) setIsLoading(false);
 			}
 		};

-		fetchContent();
-		return () => {
-			cancelled = true;
-		};
+		doFetch().catch(() => {});
+		return () => controller.abort();
 	}, [documentId, searchSpaceId, title]);

 	const handleMarkdownChange = useCallback((md: string) => {
@ -175,7 +188,7 @@ export function EditorPanelContent({
 	}, [documentId, searchSpaceId]);

 	const isEditableType = editorDoc
-		? EDITABLE_DOCUMENT_TYPES.has(editorDoc.document_type ?? "")
+		? EDITABLE_DOCUMENT_TYPES.has(editorDoc.document_type ?? "") && !isLargeDocument
 		: false;

 	return (
@ -187,12 +200,17 @@ export function EditorPanelContent({
 						<p className="text-[10px] text-muted-foreground">Unsaved changes</p>
 					)}
 				</div>
-				{onClose && (
-					<Button variant="ghost" size="icon" onClick={onClose} className="size-7 shrink-0">
-						<XIcon className="size-4" />
-						<span className="sr-only">Close editor panel</span>
-					</Button>
-				)}
+				<div className="flex items-center gap-1 shrink-0">
+					{editorDoc?.document_type && (
+						<VersionHistoryButton documentId={documentId} documentType={editorDoc.document_type} />
+					)}
+					{onClose && (
+						<Button variant="ghost" size="icon" onClick={onClose} className="size-7 shrink-0">
+							<XIcon className="size-4" />
+							<span className="sr-only">Close editor panel</span>
+						</Button>
+					)}
+				</div>
 			</div>

 			<div className="flex-1 overflow-hidden">
@ -200,12 +218,79 @@ export function EditorPanelContent({
 					<EditorPanelSkeleton />
 				) : error || !editorDoc ? (
 					<div className="flex flex-1 flex-col items-center justify-center gap-3 p-6 text-center">
-						<AlertCircle className="size-8 text-destructive" />
-						<div>
-							<p className="font-medium text-foreground">Failed to load document</p>
-							<p className="text-sm text-red-500 mt-1">{error || "An unknown error occurred"}</p>
+						{error?.toLowerCase().includes("still being processed") ? (
+							<div className="rounded-full bg-muted/50 p-3">
+								<RefreshCw className="size-6 text-muted-foreground animate-spin" />
+							</div>
+						) : (
+							<div className="rounded-full bg-muted/50 p-3">
+								<FileQuestionMark className="size-6 text-muted-foreground" />
+							</div>
+						)}
+						<div className="space-y-1 max-w-xs">
+							<p className="font-medium text-foreground">
+								{error?.toLowerCase().includes("still being processed")
+									? "Document is processing"
+									: "Document unavailable"}
+							</p>
+							<p className="text-sm text-muted-foreground">
+								{error || "An unknown error occurred"}
+							</p>
 						</div>
 					</div>
+				) : isLargeDocument ? (
+					<div className="h-full overflow-y-auto px-5 py-4">
+						<Alert className="mb-4">
+							<FileText className="size-4" />
+							<AlertDescription className="flex items-center justify-between gap-4">
+								<span>
+									This document is too large for the editor (
+									{Math.round((editorDoc.content_size_bytes ?? 0) / 1024 / 1024)}MB,{" "}
+									{editorDoc.chunk_count ?? 0} chunks). Showing a preview below.
+								</span>
+								<Button
+									variant="outline"
+									size="sm"
+									className="shrink-0 gap-1.5"
+									disabled={downloading}
+									onClick={async () => {
+										setDownloading(true);
+										try {
+											const response = await authenticatedFetch(
+												`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/search-spaces/${searchSpaceId}/documents/${documentId}/download-markdown`,
+												{ method: "GET" }
+											);
+											if (!response.ok) throw new Error("Download failed");
+											const blob = await response.blob();
+											const url = URL.createObjectURL(blob);
+											const a = document.createElement("a");
+											a.href = url;
+											const disposition = response.headers.get("content-disposition");
+											const match = disposition?.match(/filename="(.+)"/);
+											a.download = match?.[1] ?? `${editorDoc.title || "document"}.md`;
+											document.body.appendChild(a);
+											a.click();
+											a.remove();
+											URL.revokeObjectURL(url);
+											toast.success("Download started");
+										} catch {
+											toast.error("Failed to download document");
+										} finally {
+											setDownloading(false);
+										}
+									}}
+								>
+									{downloading ? (
+										<Loader2 className="size-3.5 animate-spin" />
+									) : (
+										<Download className="size-3.5" />
+									)}
+									{downloading ? "Preparing..." : "Download .md"}
+								</Button>
+							</AlertDescription>
+						</Alert>
+						<MarkdownViewer content={editorDoc.source_markdown} />
+					</div>
 				) : isEditableType ? (
 					<PlateEditor
 						key={documentId}
--- a/surfsense_web/components/homepage/use-cases-grid.tsx
+++ b/surfsense_web/components/homepage/use-cases-grid.tsx
@ -1,4 +1,5 @@
 "use client";
+import Image from 'next/image';

 import { AnimatePresence, motion } from "motion/react";
 import { ExpandedGifOverlay, useExpandedGif } from "@/components/ui/expanded-gif-overlay";
@ -81,6 +82,15 @@ function UseCaseCard({
 						alt={title}
 						className="w-full rounded-xl object-cover transition-transform duration-500 group-hover:scale-[1.02]"
 					/>
+					<div className="relative w-full h-48">
+					<Image
+						src={src}
+						alt={title}
+						fill
+						className="rounded-xl object-cover transition-transform duration-500 group-hover:scale-[1.02]"
+						unoptimized={src.endsWith('.gif')}
+					/>
+					</div>
 				</div>
 				<div className="px-5 py-4">
 					<h3 className="text-base font-semibold text-neutral-900 dark:text-white">{title}</h3>
--- a/surfsense_web/components/layout/providers/LayoutDataProvider.tsx
+++ b/surfsense_web/components/layout/providers/LayoutDataProvider.tsx
@ -775,7 +775,8 @@ export function LayoutDataProvider({ searchSpaceId, children }: LayoutDataProvid
 					<AlertDialogHeader>
 						<AlertDialogTitle>{t("delete_chat")}</AlertDialogTitle>
 						<AlertDialogDescription>
-							{t("delete_chat_confirm")} <span className="font-medium">{chatToDelete?.name}</span>?{" "}
+							{t("delete_chat_confirm")}{" "}
+							<span className="font-medium break-all">{chatToDelete?.name}</span>?{" "}
 							{t("action_cannot_undone")}
 						</AlertDialogDescription>
 					</AlertDialogHeader>
@ -835,9 +836,7 @@ export function LayoutDataProvider({ searchSpaceId, children }: LayoutDataProvid
 							<span className={isRenamingChat ? "opacity-0" : ""}>
 								{tSidebar("rename") || "Rename"}
 							</span>
-							{isRenamingChat && (
-								<span className="absolute h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent" />
-							)}
+							{isRenamingChat && <Spinner size="sm" className="absolute" />}
 						</Button>
 					</DialogFooter>
 				</DialogContent>
@ -865,9 +864,7 @@ export function LayoutDataProvider({ searchSpaceId, children }: LayoutDataProvid
 							className="relative bg-destructive text-destructive-foreground hover:bg-destructive/90"
 						>
 							<span className={isDeletingSearchSpace ? "opacity-0" : ""}>{tCommon("delete")}</span>
-							{isDeletingSearchSpace && (
-								<span className="absolute h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent" />
-							)}
+							{isDeletingSearchSpace && <Spinner size="sm" className="absolute" />}
 						</AlertDialogAction>
 					</AlertDialogFooter>
 				</AlertDialogContent>
@ -895,9 +892,7 @@ export function LayoutDataProvider({ searchSpaceId, children }: LayoutDataProvid
 							className="relative bg-destructive text-destructive-foreground hover:bg-destructive/90"
 						>
 							<span className={isLeavingSearchSpace ? "opacity-0" : ""}>{t("leave")}</span>
-							{isLeavingSearchSpace && (
-								<span className="absolute h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent" />
-							)}
+							{isLeavingSearchSpace && <Spinner size="sm" className="absolute" />}
 						</AlertDialogAction>
 					</AlertDialogFooter>
 				</AlertDialogContent>
--- a/surfsense_web/components/layout/ui/right-panel/RightPanel.tsx
+++ b/surfsense_web/components/layout/ui/right-panel/RightPanel.tsx
@ -19,7 +19,7 @@ const EditorPanelContent = dynamic(
 		import("@/components/editor-panel/editor-panel").then((m) => ({
 			default: m.EditorPanelContent,
 		})),
-	{ ssr: false, loading: () => <Skeleton className="h-96 w-full" /> }
+	{ ssr: false, loading: () => null }
 );

 const HitlEditPanelContent = dynamic(
--- a/surfsense_web/components/layout/ui/sidebar/AllPrivateChatsSidebar.tsx
+++ b/surfsense_web/components/layout/ui/sidebar/AllPrivateChatsSidebar.tsx
@ -109,6 +109,7 @@ export function AllPrivateChatsSidebarContent({
 		queryKey: ["all-threads", searchSpaceId],
 		queryFn: () => fetchThreads(Number(searchSpaceId)),
 		enabled: !!searchSpaceId && !isSearchMode,
+		placeholderData: () => queryClient.getQueryData(["threads", searchSpaceId, { limit: 40 }]),
 	});

 	const {
@ -349,7 +350,7 @@ export function AllPrivateChatsSidebarContent({
 								<div
 									key={thread.id}
 									className={cn(
-										"group flex items-center gap-2 rounded-md px-2 py-1.5 text-sm",
+										"sidebar-item-lazy group flex items-center gap-2 rounded-md px-2 py-1.5 text-sm",
 										"hover:bg-accent hover:text-accent-foreground",
 										"transition-colors cursor-pointer",
 										isActive && "bg-accent text-accent-foreground",
--- a/surfsense_web/components/layout/ui/sidebar/AllSharedChatsSidebar.tsx
+++ b/surfsense_web/components/layout/ui/sidebar/AllSharedChatsSidebar.tsx
@ -349,7 +349,7 @@ export function AllSharedChatsSidebarContent({
 								<div
 									key={thread.id}
 									className={cn(
-										"group flex items-center gap-2 rounded-md px-2 py-1.5 text-sm",
+										"sidebar-item-lazy group flex items-center gap-2 rounded-md px-2 py-1.5 text-sm",
 										"hover:bg-accent hover:text-accent-foreground",
 										"transition-colors cursor-pointer",
 										isActive && "bg-accent text-accent-foreground",
--- a/surfsense_web/components/layout/ui/sidebar/DocumentsSidebar.tsx
+++ b/surfsense_web/components/layout/ui/sidebar/DocumentsSidebar.tsx
@ -21,6 +21,7 @@ import type { DocumentNodeDoc } from "@/components/documents/DocumentNode";
 import type { FolderDisplay } from "@/components/documents/FolderNode";
 import { FolderPickerDialog } from "@/components/documents/FolderPickerDialog";
 import { FolderTreeView } from "@/components/documents/FolderTreeView";
+import { VersionHistoryDialog } from "@/components/documents/version-history";
 import { EXPORT_FILE_EXTENSIONS } from "@/components/shared/ExportMenuItems";
 import {
 	AlertDialog,
@ -40,6 +41,7 @@ import { getConnectorIcon } from "@/contracts/enums/connectorIcons";
 import type { DocumentTypeEnum } from "@/contracts/types/document.types";
 import { useDebouncedValue } from "@/hooks/use-debounced-value";
 import { useMediaQuery } from "@/hooks/use-media-query";
+import { documentsApiService } from "@/lib/apis/documents-api.service";
 import { foldersApiService } from "@/lib/apis/folders-api.service";
 import { authenticatedFetch } from "@/lib/auth-utils";
 import { queries } from "@/zero/queries/index";
@ -92,6 +94,50 @@ export function DocumentsSidebar({
 	const [search, setSearch] = useState("");
 	const debouncedSearch = useDebouncedValue(search, 250);
 	const [activeTypes, setActiveTypes] = useState<DocumentTypeEnum[]>([]);
+	const [watchedFolderIds, setWatchedFolderIds] = useState<Set<number>>(new Set());
+
+	useEffect(() => {
+		const api = typeof window !== "undefined" ? window.electronAPI : null;
+		if (!api?.getWatchedFolders) return;
+
+		async function loadWatchedIds() {
+			const folders = await api!.getWatchedFolders();
+
+			if (folders.length === 0) {
+				try {
+					const backendFolders = await documentsApiService.getWatchedFolders(searchSpaceId);
+					for (const bf of backendFolders) {
+						const meta = bf.metadata as Record<string, unknown> | null;
+						if (!meta?.watched || !meta.folder_path) continue;
+						await api!.addWatchedFolder({
+							path: meta.folder_path as string,
+							name: bf.name,
+							rootFolderId: bf.id,
+							searchSpaceId: bf.search_space_id,
+							excludePatterns: (meta.exclude_patterns as string[]) ?? [],
+							fileExtensions: (meta.file_extensions as string[] | null) ?? null,
+							active: true,
+						});
+					}
+					const recovered = await api!.getWatchedFolders();
+					const ids = new Set(
+						recovered.filter((f) => f.rootFolderId != null).map((f) => f.rootFolderId as number)
+					);
+					setWatchedFolderIds(ids);
+					return;
+				} catch (err) {
+					console.error("[DocumentsSidebar] Recovery from backend failed:", err);
+				}
+			}
+
+			const ids = new Set(
+				folders.filter((f) => f.rootFolderId != null).map((f) => f.rootFolderId as number)
+			);
+			setWatchedFolderIds(ids);
+		}
+
+		loadWatchedIds();
+	}, [searchSpaceId]);
 	const { mutateAsync: deleteDocumentMutation } = useAtomValue(deleteDocumentMutationAtom);

 	const [sidebarDocs, setSidebarDocs] = useAtom(sidebarSelectedDocumentsAtom);
@ -134,7 +180,12 @@ export function DocumentsSidebar({

 	const treeDocuments: DocumentNodeDoc[] = useMemo(() => {
 		const zeroDocs = (zeroAllDocs ?? [])
-			.filter((d) => d.title && d.title.trim() !== "")
+			.filter((d) => {
+				if (!d.title || d.title.trim() === "") return false;
+				const state = (d.status as { state?: string } | undefined)?.state;
+				if (state === "deleting") return false;
+				return true;
+			})
 			.map((d) => ({
 				id: d.id,
 				title: d.title,
@ -223,6 +274,53 @@ export function DocumentsSidebar({
 		[createFolderParentId, searchSpaceId, setExpandedFolderMap]
 	);

+	const handleRescanFolder = useCallback(
+		async (folder: FolderDisplay) => {
+			const api = window.electronAPI;
+			if (!api) return;
+
+			const watchedFolders = await api.getWatchedFolders();
+			const matched = watchedFolders.find((wf) => wf.rootFolderId === folder.id);
+			if (!matched) {
+				toast.error("This folder is not being watched");
+				return;
+			}
+
+			try {
+				await documentsApiService.folderIndex(searchSpaceId, {
+					folder_path: matched.path,
+					folder_name: matched.name,
+					search_space_id: searchSpaceId,
+					root_folder_id: folder.id,
+				});
+				toast.success(`Re-scanning folder: ${matched.name}`);
+			} catch (err) {
+				toast.error((err as Error)?.message || "Failed to re-scan folder");
+			}
+		},
+		[searchSpaceId]
+	);
+
+	const handleStopWatching = useCallback(async (folder: FolderDisplay) => {
+		const api = window.electronAPI;
+		if (!api) return;
+
+		const watchedFolders = await api.getWatchedFolders();
+		const matched = watchedFolders.find((wf) => wf.rootFolderId === folder.id);
+		if (!matched) {
+			toast.error("This folder is not being watched");
+			return;
+		}
+
+		await api.removeWatchedFolder(matched.path);
+		try {
+			await foldersApiService.stopWatching(folder.id);
+		} catch (err) {
+			console.error("[DocumentsSidebar] Failed to clear watched metadata:", err);
+		}
+		toast.success(`Stopped watching: ${matched.name}`);
+	}, []);
+
 	const handleRenameFolder = useCallback(async (folder: FolderDisplay, newName: string) => {
 		try {
 			await foldersApiService.updateFolder(folder.id, { name: newName });
@ -235,6 +333,14 @@ export function DocumentsSidebar({
 	const handleDeleteFolder = useCallback(async (folder: FolderDisplay) => {
 		if (!confirm(`Delete folder "${folder.name}" and all its contents?`)) return;
 		try {
+			const api = window.electronAPI;
+			if (api) {
+				const watchedFolders = await api.getWatchedFolders();
+				const matched = watchedFolders.find((wf) => wf.rootFolderId === folder.id);
+				if (matched) {
+					await api.removeWatchedFolder(matched.path);
+				}
+			}
 			await foldersApiService.deleteFolder(folder.id);
 			toast.success("Folder deleted");
 		} catch (e: unknown) {
@ -448,6 +554,7 @@ export function DocumentsSidebar({

 	const [bulkDeleteConfirmOpen, setBulkDeleteConfirmOpen] = useState(false);
 	const [isBulkDeleting, setIsBulkDeleting] = useState(false);
+	const [versionDocId, setVersionDocId] = useState<number | null>(null);

 	const handleBulkDeleteSelected = useCallback(async () => {
 		if (deletableSelectedIds.length === 0) return;
@ -651,56 +758,72 @@ export function DocumentsSidebar({
 					/>
 				</div>

-				{deletableSelectedIds.length > 0 && (
-					<div className="shrink-0 flex items-center justify-center px-4 py-1.5 animate-in fade-in duration-150">
-						<button
-							type="button"
-							onClick={() => setBulkDeleteConfirmOpen(true)}
-							className="flex items-center gap-1.5 px-3 py-1 rounded-md bg-destructive text-destructive-foreground shadow-sm text-xs font-medium hover:bg-destructive/90 transition-colors"
-						>
-							<Trash2 size={12} />
-							Delete {deletableSelectedIds.length}{" "}
-							{deletableSelectedIds.length === 1 ? "item" : "items"}
-						</button>
-					</div>
-				)}
+				<div className="relative flex-1 min-h-0 overflow-auto">
+					{deletableSelectedIds.length > 0 && (
+						<div className="absolute inset-x-0 top-0 z-10 flex items-center justify-center px-4 py-1.5 animate-in fade-in duration-150 pointer-events-none">
+							<button
+								type="button"
+								onClick={() => setBulkDeleteConfirmOpen(true)}
+								className="pointer-events-auto flex items-center gap-1.5 px-3 py-1 rounded-md bg-destructive text-destructive-foreground shadow-lg text-xs font-medium hover:bg-destructive/90 transition-colors"
+							>
+								<Trash2 size={12} />
+								Delete {deletableSelectedIds.length}{" "}
+								{deletableSelectedIds.length === 1 ? "item" : "items"}
+							</button>
+						</div>
+					)}

-				<FolderTreeView
-					folders={treeFolders}
-					documents={searchFilteredDocuments}
-					expandedIds={expandedIds}
-					onToggleExpand={toggleFolderExpand}
-					mentionedDocIds={mentionedDocIds}
-					onToggleChatMention={handleToggleChatMention}
-					onToggleFolderSelect={handleToggleFolderSelect}
-					onRenameFolder={handleRenameFolder}
-					onDeleteFolder={handleDeleteFolder}
-					onMoveFolder={handleMoveFolder}
-					onCreateFolder={handleCreateFolder}
-					searchQuery={debouncedSearch.trim() || undefined}
-					onPreviewDocument={(doc) => {
-						openEditorPanel({
-							documentId: doc.id,
-							searchSpaceId,
-							title: doc.title,
-						});
-					}}
-					onEditDocument={(doc) => {
-						openEditorPanel({
-							documentId: doc.id,
-							searchSpaceId,
-							title: doc.title,
-						});
-					}}
-					onDeleteDocument={(doc) => handleDeleteDocument(doc.id)}
-					onMoveDocument={handleMoveDocument}
-					onExportDocument={handleExportDocument}
-					activeTypes={activeTypes}
-					onDropIntoFolder={handleDropIntoFolder}
-					onReorderFolder={handleReorderFolder}
-				/>
+					<FolderTreeView
+						folders={treeFolders}
+						documents={searchFilteredDocuments}
+						expandedIds={expandedIds}
+						onToggleExpand={toggleFolderExpand}
+						mentionedDocIds={mentionedDocIds}
+						onToggleChatMention={handleToggleChatMention}
+						onToggleFolderSelect={handleToggleFolderSelect}
+						onRenameFolder={handleRenameFolder}
+						onDeleteFolder={handleDeleteFolder}
+						onMoveFolder={handleMoveFolder}
+						onCreateFolder={handleCreateFolder}
+						searchQuery={debouncedSearch.trim() || undefined}
+						onPreviewDocument={(doc) => {
+							openEditorPanel({
+								documentId: doc.id,
+								searchSpaceId,
+								title: doc.title,
+							});
+						}}
+						onEditDocument={(doc) => {
+							openEditorPanel({
+								documentId: doc.id,
+								searchSpaceId,
+								title: doc.title,
+							});
+						}}
+						onDeleteDocument={(doc) => handleDeleteDocument(doc.id)}
+						onMoveDocument={handleMoveDocument}
+						onExportDocument={handleExportDocument}
+						onVersionHistory={(doc) => setVersionDocId(doc.id)}
+						activeTypes={activeTypes}
+						onDropIntoFolder={handleDropIntoFolder}
+						onReorderFolder={handleReorderFolder}
+						watchedFolderIds={watchedFolderIds}
+						onRescanFolder={handleRescanFolder}
+						onStopWatchingFolder={handleStopWatching}
+					/>
+				</div>
 			</div>

+			{versionDocId !== null && (
+				<VersionHistoryDialog
+					open
+					onOpenChange={(open) => {
+						if (!open) setVersionDocId(null);
+					}}
+					documentId={versionDocId}
+				/>
+			)}
+
 			<FolderPickerDialog
 				open={folderPickerOpen}
 				onOpenChange={setFolderPickerOpen}
--- a/surfsense_web/components/layout/ui/sidebar/InboxSidebar.tsx
+++ b/surfsense_web/components/layout/ui/sidebar/InboxSidebar.tsx
@ -20,7 +20,7 @@ import {
 } from "lucide-react";
 import { useParams, useRouter } from "next/navigation";
 import { useTranslations } from "next-intl";
-import { useCallback, useEffect, useMemo, useRef, useState } from "react";
+import { useCallback, useDeferredValue, useEffect, useMemo, useRef, useState } from "react";
 import { getDocumentTypeLabel } from "@/app/dashboard/[search_space_id]/documents/(manage)/components/DocumentTypeIcon";
 import { setTargetCommentIdAtom } from "@/atoms/chat/current-thread.atom";
 import { convertRenderedToDisplay } from "@/components/chat-comments/comment-item/comment-item";
@ -178,12 +178,23 @@ export function InboxSidebarContent({
 	const [mounted, setMounted] = useState(false);
 	const [openDropdown, setOpenDropdown] = useState<"filter" | null>(null);
 	const [connectorScrollPos, setConnectorScrollPos] = useState<"top" | "middle" | "bottom">("top");
+	const connectorRafRef = useRef<number>();
 	const handleConnectorScroll = useCallback((e: React.UIEvent<HTMLDivElement>) => {
 		const el = e.currentTarget;
-		const atTop = el.scrollTop <= 2;
-		const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight <= 2;
-		setConnectorScrollPos(atTop ? "top" : atBottom ? "bottom" : "middle");
+		if (connectorRafRef.current) return;
+		connectorRafRef.current = requestAnimationFrame(() => {
+			const atTop = el.scrollTop <= 2;
+			const atBottom = el.scrollHeight - el.scrollTop - el.clientHeight <= 2;
+			setConnectorScrollPos(atTop ? "top" : atBottom ? "bottom" : "middle");
+			connectorRafRef.current = undefined;
+		});
 	}, []);
+	useEffect(
+		() => () => {
+			if (connectorRafRef.current) cancelAnimationFrame(connectorRafRef.current);
+		},
+		[]
+	);
 	const [filterDrawerOpen, setFilterDrawerOpen] = useState(false);
 	const [markingAsReadId, setMarkingAsReadId] = useState<number | null>(null);

@ -289,15 +300,14 @@ export function InboxSidebarContent({
 		[activeFilter]
 	);

+	// Defer non-urgent list updates so the search input stays responsive.
+	// The deferred snapshot lags one render behind the live value intentionally.
+	const deferredTabItems = useDeferredValue(activeSource.items);
+	const deferredSearchItems = useDeferredValue(searchResponse?.items ?? []);
+
 	// Two data paths: search mode (API) or default (per-tab data source)
 	const filteredItems = useMemo(() => {
-		let tabItems: InboxItem[];
-
-		if (isSearchMode) {
-			tabItems = searchResponse?.items ?? [];
-		} else {
-			tabItems = activeSource.items;
-		}
+		const tabItems: InboxItem[] = isSearchMode ? deferredSearchItems : deferredTabItems;

 		let result = tabItems;
 		if (activeFilter !== "all") {
@ -310,8 +320,8 @@ export function InboxSidebarContent({
 		return result;
 	}, [
 		isSearchMode,
-		searchResponse,
-		activeSource.items,
+		deferredSearchItems,
+		deferredTabItems,
 		activeTab,
 		activeFilter,
 		selectedSource,
@ -920,6 +930,7 @@ export function InboxSidebarContent({
 										"transition-colors cursor-pointer",
 										isMarkingAsRead && "opacity-50 pointer-events-none"
 									)}
+									style={{ contentVisibility: "auto", containIntrinsicSize: "0 80px" }}
 								>
 									{isMobile ? (
 										<button
--- a/surfsense_web/components/layout/ui/tabs/DocumentTabContent.tsx
+++ b/surfsense_web/components/layout/ui/tabs/DocumentTabContent.tsx
@ -1,18 +1,24 @@
 "use client";

-import { AlertCircle, Pencil } from "lucide-react";
+import { Download, FileQuestionMark, FileText, Loader2, PenLine, RefreshCw } from "lucide-react";
 import { useCallback, useEffect, useRef, useState } from "react";
 import { toast } from "sonner";
 import { PlateEditor } from "@/components/editor/plate-editor";
 import { MarkdownViewer } from "@/components/markdown-viewer";
+import { Alert, AlertDescription } from "@/components/ui/alert";
 import { Button } from "@/components/ui/button";
 import { authenticatedFetch, getBearerToken, redirectToLogin } from "@/lib/auth-utils";

+const LARGE_DOCUMENT_THRESHOLD = 2 * 1024 * 1024; // 2MB
+
 interface DocumentContent {
 	document_id: number;
 	title: string;
 	document_type?: string;
 	source_markdown: string;
+	content_size_bytes?: number;
+	chunk_count?: number;
+	truncated?: boolean;
 }

 function DocumentSkeleton() {
@ -49,13 +55,16 @@ export function DocumentTabContent({ documentId, searchSpaceId, title }: Documen
 	const [error, setError] = useState<string | null>(null);
 	const [isEditing, setIsEditing] = useState(false);
 	const [saving, setSaving] = useState(false);
+	const [downloading, setDownloading] = useState(false);
 	const [editedMarkdown, setEditedMarkdown] = useState<string | null>(null);
 	const markdownRef = useRef<string>("");
 	const initialLoadDone = useRef(false);
 	const changeCountRef = useRef(0);

+	const isLargeDocument = (doc?.content_size_bytes ?? 0) > LARGE_DOCUMENT_THRESHOLD;
+
 	useEffect(() => {
-		let cancelled = false;
+		const controller = new AbortController();
 		setIsLoading(true);
 		setError(null);
 		setDoc(null);
@ -64,7 +73,7 @@ export function DocumentTabContent({ documentId, searchSpaceId, title }: Documen
 		initialLoadDone.current = false;
 		changeCountRef.current = 0;

-		const fetchContent = async () => {
+		const doFetch = async () => {
 			const token = getBearerToken();
 			if (!token) {
 				redirectToLogin();
@ -74,10 +83,15 @@ export function DocumentTabContent({ documentId, searchSpaceId, title }: Documen
 			try {
 				const response = await authenticatedFetch(
 					`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/search-spaces/${searchSpaceId}/documents/${documentId}/editor-content`,
-					{ method: "GET" }
+					{ method: "GET", signal: controller.signal }
+				const url = new URL(
+					`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/search-spaces/${searchSpaceId}/documents/${documentId}/editor-content`
 				);
+				url.searchParams.set("max_length", String(LARGE_DOCUMENT_THRESHOLD));

-				if (cancelled) return;
+				const response = await authenticatedFetch(url.toString(), { method: "GET" });
+
+				if (controller.signal.aborted) return;

 				if (!response.ok) {
 					const errorData = await response
@ -98,18 +112,16 @@ export function DocumentTabContent({ documentId, searchSpaceId, title }: Documen
 				setDoc(data);
 				initialLoadDone.current = true;
 			} catch (err) {
-				if (cancelled) return;
+				if (controller.signal.aborted) return;
 				console.error("Error fetching document:", err);
 				setError(err instanceof Error ? err.message : "Failed to fetch document");
 			} finally {
-				if (!cancelled) setIsLoading(false);
+				if (!controller.signal.aborted) setIsLoading(false);
 			}
 		};

-		fetchContent();
-		return () => {
-			cancelled = true;
-		};
+		doFetch().catch(() => {});
+		return () => controller.abort();
 	}, [documentId, searchSpaceId]);

 	const handleMarkdownChange = useCallback((md: string) => {
@ -160,22 +172,40 @@ export function DocumentTabContent({ documentId, searchSpaceId, title }: Documen
 	if (isLoading) return <DocumentSkeleton />;

 	if (error || !doc) {
+		const isProcessing = error?.toLowerCase().includes("still being processed");
 		return (
-			<div className="flex flex-1 flex-col items-center justify-center gap-3 p-6 text-center">
-				<AlertCircle className="size-10 text-destructive" />
-				<div>
-					<p className="font-medium text-foreground text-lg">Failed to load document</p>
-					<p className="text-sm text-muted-foreground mt-1">
-						{error || "An unknown error occurred"}
-					</p>
+			<div className="flex flex-1 flex-col items-center justify-center gap-4 p-8 text-center">
+				<div className="rounded-full bg-muted/50 p-4">
+					{isProcessing ? (
+						<RefreshCw className="size-8 text-muted-foreground animate-spin" />
+					) : (
+						<FileQuestionMark className="size-8 text-muted-foreground" />
+					)}
 				</div>
+				<div className="space-y-1.5 max-w-sm">
+					<p className="font-semibold text-foreground text-lg">
+						{isProcessing ? "Document is processing" : "Document unavailable"}
+					</p>
+					<p className="text-sm text-muted-foreground">{error || "An unknown error occurred"}</p>
+				</div>
+				{!isProcessing && (
+					<Button
+						variant="outline"
+						size="sm"
+						className="mt-1 gap-1.5"
+						onClick={() => window.location.reload()}
+					>
+						<RefreshCw className="size-3.5" />
+						Retry
+					</Button>
+				)}
 			</div>
 		);
 	}

-	const isEditable = EDITABLE_DOCUMENT_TYPES.has(doc.document_type ?? "");
+	const isEditable = EDITABLE_DOCUMENT_TYPES.has(doc.document_type ?? "") && !isLargeDocument;

-	if (isEditing) {
+	if (isEditing && !isLargeDocument) {
 		return (
 			<div className="flex flex-col h-full overflow-hidden">
 				<div className="flex items-center justify-between px-6 py-3 border-b shrink-0">
@ -229,14 +259,69 @@ export function DocumentTabContent({ documentId, searchSpaceId, title }: Documen
 						onClick={() => setIsEditing(true)}
 						className="gap-1.5"
 					>
-						<Pencil className="size-3.5" />
+						<PenLine className="size-3.5" />
 						Edit
 					</Button>
 				)}
 			</div>
 			<div className="flex-1 overflow-auto">
 				<div className="max-w-4xl mx-auto px-6 py-6">
-					<MarkdownViewer content={doc.source_markdown} />
+					{isLargeDocument ? (
+						<>
+							<Alert className="mb-4">
+								<FileText className="size-4" />
+								<AlertDescription className="flex items-center justify-between gap-4">
+									<span>
+										This document is too large for the editor (
+										{Math.round((doc.content_size_bytes ?? 0) / 1024 / 1024)}MB,{" "}
+										{doc.chunk_count ?? 0} chunks). Showing a preview below.
+									</span>
+									<Button
+										variant="outline"
+										size="sm"
+										className="shrink-0 gap-1.5"
+										disabled={downloading}
+										onClick={async () => {
+											setDownloading(true);
+											try {
+												const response = await authenticatedFetch(
+													`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/search-spaces/${searchSpaceId}/documents/${documentId}/download-markdown`,
+													{ method: "GET" }
+												);
+												if (!response.ok) throw new Error("Download failed");
+												const blob = await response.blob();
+												const url = URL.createObjectURL(blob);
+												const a = document.createElement("a");
+												a.href = url;
+												const disposition = response.headers.get("content-disposition");
+												const match = disposition?.match(/filename="(.+)"/);
+												a.download = match?.[1] ?? `${doc.title || "document"}.md`;
+												document.body.appendChild(a);
+												a.click();
+												a.remove();
+												URL.revokeObjectURL(url);
+												toast.success("Download started");
+											} catch {
+												toast.error("Failed to download document");
+											} finally {
+												setDownloading(false);
+											}
+										}}
+									>
+										{downloading ? (
+											<Loader2 className="size-3.5 animate-spin" />
+										) : (
+											<Download className="size-3.5" />
+										)}
+										{downloading ? "Preparing..." : "Download .md"}
+									</Button>
+								</AlertDescription>
+							</Alert>
+							<MarkdownViewer content={doc.source_markdown} />
+						</>
+					) : (
+						<MarkdownViewer content={doc.source_markdown} />
+					)}
 				</div>
 			</div>
 		</div>
--- a/surfsense_web/components/layout/ui/tabs/TabBar.tsx
+++ b/surfsense_web/components/layout/ui/tabs/TabBar.tsx
@ -72,7 +72,7 @@ export function TabBar({ onTabSwitch, onNewChat, rightActions, className }: TabB
 	if (tabs.length <= 1) return null;

 	return (
-		<div className={cn("mb-2 flex h-9 items-center shrink-0 px-1 gap-0.5", className)}>
+		<div className={cn("mb-2 flex h-9 items-center shrink-0 px-1 gap-0.5 select-none", className)}>
 			<div
 				ref={scrollRef}
 				className="flex h-full items-center flex-1 gap-0.5 overflow-x-auto overflow-y-hidden scrollbar-hide [scrollbar-width:none] [-ms-overflow-style:none] [&::-webkit-scrollbar]:hidden py-1"
--- a/surfsense_web/components/markdown-viewer.tsx
+++ b/surfsense_web/components/markdown-viewer.tsx
@ -3,6 +3,8 @@ import { createMathPlugin } from "@streamdown/math";
 import { Streamdown, type StreamdownProps } from "streamdown";
 import "katex/dist/katex.min.css";
 import { cn } from "@/lib/utils";
+import Image from 'next/image';
+import { is } from "drizzle-orm";

 const code = createCodePlugin({
 	themes: ["nord", "nord"],
@ -15,6 +17,7 @@ const math = createMathPlugin({
 interface MarkdownViewerProps {
 	content: string;
 	className?: string;
+	maxLength?: number;
 }

 /**
@ -79,8 +82,10 @@ function convertLatexDelimiters(content: string): string {
 	return content;
 }

-export function MarkdownViewer({ content, className }: MarkdownViewerProps) {
-	const processedContent = convertLatexDelimiters(stripOuterMarkdownFence(content));
+export function MarkdownViewer({ content, className, maxLength }: MarkdownViewerProps) {
+	const isTruncated = maxLength != null && content.length > maxLength;
+	const displayContent = isTruncated ? content.slice(0, maxLength) : content;
+	const processedContent = convertLatexDelimiters(stripOuterMarkdownFence(displayContent));
 	const components: StreamdownProps["components"] = {
 		p: ({ children, ...props }) => (
 			<p className="my-2" {...props}>
@ -124,16 +129,31 @@ export function MarkdownViewer({ content, className }: MarkdownViewerProps) {
 			<blockquote className="border-l-4 border-muted pl-4 italic my-2" {...props} />
 		),
 		hr: ({ ...props }) => <hr className="my-4 border-muted" {...props} />,
-		img: ({ src, alt, width: _w, height: _h, ...props }) => (
-			// eslint-disable-next-line @next/next/no-img-element
-			<img
-				className="max-w-full h-auto my-4 rounded"
-				alt={alt || "markdown image"}
-				src={typeof src === "string" ? src : ""}
-				loading="lazy"
-				{...props}
-			/>
-		),
+		img: ({ src, alt, width: _w, height: _h, ...props }) => {
+    	const isDataOrUnknownUrl = typeof src === "string" && (src.startsWith("data:") || !src.startsWith("http"));
+
+    return isDataOrUnknownUrl ? (
+        // eslint-disable-next-line @next/next/no-img-element
+        <img
+            className="max-w-full h-auto my-4 rounded"
+            alt={alt || "markdown image"}
+            src={src}
+            loading="lazy"
+            {...props}
+        />
+    ) : (
+        <Image
+            className="max-w-full h-auto my-4 rounded"
+            alt={alt || "markdown image"}
+            src={typeof src === "string" ? src : ""}
+            width={_w || 800}
+            height={_h || 600}
+            sizes="(max-width: 768px) 100vw, (max-width: 1200px) 75vw, 60vw"
+            unoptimized={isDataOrUnknownUrl}
+            {...props}
+        />
+    );
+},
 		table: ({ ...props }) => (
 			<div className="overflow-x-auto my-4 rounded-lg border border-border w-full">
 				<table className="w-full divide-y divide-border" {...props} />
@ -171,6 +191,12 @@ export function MarkdownViewer({ content, className }: MarkdownViewerProps) {
 			>
 				{processedContent}
 			</Streamdown>
+			{isTruncated && (
+				<p className="mt-4 text-sm text-muted-foreground italic">
+					Content truncated ({Math.round(content.length / 1024)}KB total). Showing first{" "}
+					{Math.round(maxLength / 1024)}KB.
+				</p>
+			)}
 		</div>
 	);
 }
--- a/surfsense_web/components/new-chat/document-mention-picker.tsx
+++ b/surfsense_web/components/new-chat/document-mention-picker.tsx
@ -4,6 +4,7 @@ import { keepPreviousData, useQuery } from "@tanstack/react-query";
 import {
 	forwardRef,
 	useCallback,
+	useDeferredValue,
 	useEffect,
 	useImperativeHandle,
 	useMemo,
@ -81,6 +82,9 @@ export const DocumentMentionPicker = forwardRef<
 	// Debounced search value to minimize API calls and prevent race conditions
 	const search = externalSearch;
 	const debouncedSearch = useDebounced(search, DEBOUNCE_MS);
+	// Deferred snapshot of debouncedSearch — client-side filtering uses this so it
+	// is treated as a non-urgent update, keeping the input responsive.
+	const deferredSearch = useDeferredValue(debouncedSearch);
 	const [highlightedIndex, setHighlightedIndex] = useState(0);
 	const itemRefs = useRef<Map<number, HTMLButtonElement>>(new Map());
 	const scrollContainerRef = useRef<HTMLDivElement>(null);
@ -245,12 +249,14 @@ export const DocumentMentionPicker = forwardRef<
 	 * Client-side filtering for single character searches.
 	 * Filters cached documents locally for instant feedback without additional API calls.
 	 * Server-side search is reserved for 2+ character queries to leverage database indexing.
+	 * Uses deferredSearch (a deferred snapshot of debouncedSearch) so this memo is treated
+	 * as non-urgent — React can interrupt it to keep the input responsive.
 	 */
 	const clientFilteredDocs = useMemo(() => {
 		if (!isSingleCharSearch) return null;
-		const searchLower = debouncedSearch.trim().toLowerCase();
+		const searchLower = deferredSearch.trim().toLowerCase();
 		return accumulatedDocuments.filter((doc) => doc.title.toLowerCase().includes(searchLower));
-	}, [isSingleCharSearch, debouncedSearch, accumulatedDocuments]);
+	}, [isSingleCharSearch, deferredSearch, accumulatedDocuments]);

 	// Select data source based on search length: client-filtered for single char, server results for 2+
 	const actualDocuments = isSingleCharSearch ? (clientFilteredDocs ?? []) : accumulatedDocuments;
--- a/Show more
+++ b/Show more