mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-04-27 17:56:25 +02:00
chore: updated docs for docling
This commit is contained in:
parent
50f84e1d0a
commit
a0aa29eeb0
5 changed files with 39 additions and 14 deletions
|
|
@ -67,7 +67,7 @@ To set up Google OAuth:
|
|||
|
||||
## File Upload's
|
||||
|
||||
SurfSense supports two ETL (Extract, Transform, Load) services for converting files to LLM-friendly formats:
|
||||
SurfSense supports three ETL (Extract, Transform, Load) services for converting files to LLM-friendly formats:
|
||||
|
||||
### Option 1: Unstructured
|
||||
|
||||
|
|
@ -85,6 +85,16 @@ Files are converted using [LlamaIndex](https://www.llamaindex.ai/) which offers
|
|||
2. Sign up for a LlamaCloud account to access their parsing services
|
||||
3. LlamaCloud provides enhanced parsing capabilities for complex documents
|
||||
|
||||
### Option 3: Docling (Recommended for Privacy)
|
||||
|
||||
Files are processed locally using [Docling](https://github.com/DS4SD/docling) - IBM's open-source document parsing library.
|
||||
|
||||
1. **No API key required** - all processing happens locally
|
||||
2. **Privacy-focused** - documents never leave your system
|
||||
3. **Supported formats**: PDF, Office documents (Word, Excel, PowerPoint), images (PNG, JPEG, TIFF, BMP, WebP), HTML, CSV, AsciiDoc
|
||||
4. **Enhanced features**: Advanced table detection, image extraction, and structured document parsing
|
||||
5. **GPU acceleration** support for faster processing (when available)
|
||||
|
||||
**Note**: You only need to set up one of these services.
|
||||
|
||||
---
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue