diff --git a/README.md b/README.md index b0de8a38d..0d55ae282 100644 --- a/README.md +++ b/README.md @@ -134,7 +134,9 @@ Both installation guides include detailed OS-specific instructions for Windows, Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including: - PGVector setup -- Unstructured.io API key +- **File Processing ETL Service** (choose one): + - Unstructured.io API key (free tier available, supports 34+ formats) + - LlamaIndex API key (enhanced parsing, supports 50+ formats) - Other required API keys ## Screenshots diff --git a/surfsense_web/content/docs/index.mdx b/surfsense_web/content/docs/index.mdx index 4845a7312..cb290306c 100644 --- a/surfsense_web/content/docs/index.mdx +++ b/surfsense_web/content/docs/index.mdx @@ -67,12 +67,26 @@ To set up Google OAuth: ## File Upload's -Files are converted to LLM friendly formats using [Unstructured](https://github.com/Unstructured-IO/unstructured) +SurfSense supports two ETL (Extract, Transform, Load) services for converting files to LLM-friendly formats: + +### Option 1: Unstructured + +Files are converted using [Unstructured](https://github.com/Unstructured-IO/unstructured) 1. Get an Unstructured.io API key from [Unstructured Platform](https://platform.unstructured.io/) 2. You should be able to generate API keys once registered ![Unstructured Dashboard](/docs/unstructured.png) +### Option 2: LlamaIndex (LlamaCloud) + +Files are converted using [LlamaIndex](https://www.llamaindex.ai/) which offers superior file format support (50+ formats vs 34+ with Unstructured) + +1. Get a LlamaIndex API key from [LlamaCloud](https://cloud.llamaindex.ai/) +2. Sign up for a LlamaCloud account to access their parsing services +3. LlamaCloud provides enhanced parsing capabilities for complex documents + +**Note**: You only need to set up one of these services. LlamaIndex offers broader file format support, while Unstructured provides a generous free tier. + --- ## LLM Observability (Optional)