2024-08-14 00:29:10 -07:00
2025-04-14 19:26:23 -07:00

2024-08-12 00:32:42 -07:00
2024-11-11 03:09:22 -08:00
2025-06-04 23:51:40 -07:00
< div align = "center" >
< a href = "https://discord.gg/ejRNvftDp9" >
< img src = "https://img.shields.io/discord/1359368468260192417" alt = "Discord" >
< / a >
< / div >
2024-11-11 03:09:22 -08:00
2025-05-03 01:08:19 -07:00
2025-03-14 19:03:53 -07:00
# SurfSense
2025-06-02 11:50:31 -07:00
While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub, Discord and more to come.
2024-11-11 03:09:22 -08:00
2025-05-03 01:08:19 -07:00
< div align = "center" >
< a href = "https://trendshift.io/repositories/13606" target = "_blank" > < img src = "https://trendshift.io/api/badge/repositories/13606" alt = "MODSetter%2FSurfSense | Trendshift" style = "width: 250px; height: 55px;" width = "250" height = "55" / > < / a >
< / div >
2025-03-20 18:52:06 -07:00
2025-06-06 14:06:24 -07:00
# Video
## Old video. New one coming soon.
2025-03-14 19:03:53 -07:00
https://github.com/user-attachments/assets/48142909-6391-4084-b7e8-81da388bb1fc
2024-10-10 23:14:30 -07:00
2025-05-06 00:12:22 -07:00
# Podcast's
https://github.com/user-attachments/assets/d516982f-de00-4c41-9e4c-632a7d942f41
2024-08-12 21:19:42 -07:00
2025-05-06 22:22:18 -07:00
## Podcast Sample
2025-06-06 14:06:24 -07:00
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
2025-05-06 22:22:18 -07:00
2024-08-12 21:19:42 -07:00
2024-08-12 00:32:42 -07:00
2024-08-12 21:07:21 -07:00
## Key Features
2025-03-14 19:03:53 -07:00
2025-05-13 21:13:53 -07:00
### 💡 **Idea**:
2025-03-14 19:03:53 -07:00
Have your own highly customizable private NotebookLM and Perplexity integrated with external sources.
2025-05-13 21:13:53 -07:00
### 📁 **Multiple File Format Uploading Support**
2025-05-30 19:17:19 -07:00
Save content from your own personal files *(Documents, images, videos and supports * *50+ file extensions**)* to your own personal knowledge base .
2025-05-13 21:13:53 -07:00
### 🔍 **Powerful Search**
2025-03-14 19:03:53 -07:00
Quickly research or find anything in your saved content .
2025-05-13 21:13:53 -07:00
### 💬 **Chat with your Saved Content**
2025-03-14 19:03:53 -07:00
Interact in Natural Language and get cited answers.
2025-05-13 21:13:53 -07:00
### 📄 **Cited Answers**
2025-03-14 19:03:53 -07:00
Get Cited answers just like Perplexity.
2025-05-13 21:13:53 -07:00
### 🔔 **Privacy & Local LLM Support**
2025-03-14 19:03:53 -07:00
Works Flawlessly with Ollama local LLMs.
2025-05-13 21:13:53 -07:00
### 🏠 **Self Hostable**
2025-03-14 19:03:53 -07:00
Open source and easy to deploy locally.
2025-05-13 21:13:53 -07:00
### 🎙️ Podcasts
2025-05-06 00:12:22 -07:00
- Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.)
2025-05-06 00:04:37 -07:00
- Convert your chat conversations into engaging audio content
- Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI)
2025-05-13 21:13:53 -07:00
### 📊 **Advanced RAG Techniques**
2025-03-14 19:03:53 -07:00
- Supports 150+ LLM's
- Supports 6000+ Embedding Models.
- Supports all major Rerankers (Pinecode, Cohere, Flashrank etc)
- Uses Hierarchical Indices (2 tiered RAG setup).
- Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion).
- RAG as a Service API Backend.
2025-05-13 21:13:53 -07:00
### ℹ ️ **External Sources**
2025-04-27 15:56:31 -07:00
- Search Engines (Tavily, LinkUp)
2025-03-14 19:03:53 -07:00
- Slack
2025-04-15 23:10:35 -07:00
- Linear
2025-03-14 19:03:53 -07:00
- Notion
2025-04-11 15:10:33 -07:00
- Youtube Videos
2025-04-14 19:26:23 -07:00
- GitHub
2025-06-02 11:50:31 -07:00
- Discord
2025-03-14 19:03:53 -07:00
- and more to come.....
2025-05-30 19:27:38 -07:00
## 📄 **Supported File Extensions**
2025-05-13 21:13:53 -07:00
2025-05-30 19:28:59 -07:00
> **Note**: File format support depends on your ETL service configuration. LlamaCloud supports 50+ formats, while Unstructured supports 34+ core formats.
2025-05-13 21:13:53 -07:00
2025-05-30 19:27:38 -07:00
### Documents & Text
2025-05-30 19:17:19 -07:00
**LlamaCloud**: `.pdf` , `.doc` , `.docx` , `.docm` , `.dot` , `.dotm` , `.rtf` , `.txt` , `.xml` , `.epub` , `.odt` , `.wpd` , `.pages` , `.key` , `.numbers` , `.602` , `.abw` , `.cgm` , `.cwk` , `.hwp` , `.lwp` , `.mw` , `.mcw` , `.pbd` , `.sda` , `.sdd` , `.sdp` , `.sdw` , `.sgl` , `.sti` , `.sxi` , `.sxw` , `.stw` , `.sxg` , `.uof` , `.uop` , `.uot` , `.vor` , `.wps` , `.zabw`
2025-05-13 21:13:53 -07:00
2025-05-30 19:17:19 -07:00
**Unstructured**: `.doc` , `.docx` , `.odt` , `.rtf` , `.pdf` , `.xml` , `.txt` , `.md` , `.markdown` , `.rst` , `.html` , `.org` , `.epub`
2025-05-13 21:13:53 -07:00
2025-05-30 19:27:38 -07:00
### Presentations
2025-05-30 19:17:19 -07:00
**LlamaCloud**: `.ppt` , `.pptx` , `.pptm` , `.pot` , `.potm` , `.potx` , `.odp` , `.key`
2025-05-13 21:13:53 -07:00
2025-05-30 19:17:19 -07:00
**Unstructured**: `.ppt` , `.pptx`
2025-05-13 21:13:53 -07:00
2025-05-30 19:27:38 -07:00
### Spreadsheets & Data
2025-05-30 19:17:19 -07:00
**LlamaCloud**: `.xlsx` , `.xls` , `.xlsm` , `.xlsb` , `.xlw` , `.csv` , `.tsv` , `.ods` , `.fods` , `.numbers` , `.dbf` , `.123` , `.dif` , `.sylk` , `.slk` , `.prn` , `.et` , `.uos1` , `.uos2` , `.wk1` , `.wk2` , `.wk3` , `.wk4` , `.wks` , `.wq1` , `.wq2` , `.wb1` , `.wb2` , `.wb3` , `.qpw` , `.xlr` , `.eth`
2025-05-13 21:13:53 -07:00
2025-05-30 19:17:19 -07:00
**Unstructured**: `.xls` , `.xlsx` , `.csv` , `.tsv`
2025-05-13 21:13:53 -07:00
2025-05-30 19:27:38 -07:00
### Images
2025-05-30 19:17:19 -07:00
**LlamaCloud**: `.jpg` , `.jpeg` , `.png` , `.gif` , `.bmp` , `.svg` , `.tiff` , `.webp` , `.html` , `.htm` , `.web`
2025-05-13 21:13:53 -07:00
2025-05-30 19:17:19 -07:00
**Unstructured**: `.jpg` , `.jpeg` , `.png` , `.bmp` , `.tiff` , `.heic`
2025-05-13 21:13:53 -07:00
2025-05-30 19:27:38 -07:00
### Audio & Video *(Always Supported)*
2025-05-30 19:17:19 -07:00
`.mp3` , `.mpga` , `.m4a` , `.wav` , `.mp4` , `.mpeg` , `.webm`
2025-05-13 21:13:53 -07:00
2025-05-30 19:27:38 -07:00
### Email & Communication
2025-05-30 19:17:19 -07:00
**Unstructured**: `.eml` , `.msg` , `.p7s`
2025-05-13 21:13:53 -07:00
### 🔖 Cross Browser Extension
2025-03-26 21:21:22 -07:00
- The SurfSense extension can be used to save any webpage you like.
2025-03-27 11:38:44 -07:00
- Its main usecase is to save any webpages protected beyond authentication.
2025-03-26 21:21:22 -07:00
2025-03-14 19:03:53 -07:00
2025-04-09 16:27:16 -07:00
## FEATURE REQUESTS AND FUTURE
**SurfSense is actively being developed.** While it's not yet production-ready, you can help us speed up the process.
Join the [SurfSense Discord ](https://discord.gg/ejRNvftDp9 ) and help shape the future of SurfSense!
2025-06-02 19:18:01 -07:00
## 🚀 Roadmap
2025-04-09 16:27:16 -07:00
2025-06-02 19:18:01 -07:00
Stay up to date with our development progress and upcoming features!
Check out our public roadmap and contribute your ideas or feedback:
**View the Roadmap:** [SurfSense Roadmap on GitHub Projects ](https://github.com/users/MODSetter/projects/2 )
2025-04-09 16:27:16 -07:00
2024-08-12 21:07:21 -07:00
## How to get started?
2024-09-25 14:54:25 -07:00
2025-04-24 01:39:56 -07:00
### Installation Options
2024-10-08 01:59:32 -07:00
2025-04-24 01:39:56 -07:00
SurfSense provides two installation methods:
2025-03-20 20:19:47 -07:00
2025-05-09 16:18:05 +08:00
1. ** [Docker Installation ](https://www.surfsense.net/docs/docker-installation )** - The easiest way to get SurfSense up and running with all dependencies containerized.
- Includes pgAdmin for database management through a web UI
- Supports environment variable customization via `.env` file
2025-05-14 13:18:51 +08:00
- Flexible deployment options (full stack or core services only)
- No need to manually edit configuration files between environments
2025-05-09 16:18:05 +08:00
- See [Docker Setup Guide ](DOCKER_SETUP.md ) for detailed instructions
2025-05-14 13:18:51 +08:00
- For deployment scenarios and options, see [Deployment Guide ](DEPLOYMENT_GUIDE.md )
2024-09-25 14:54:25 -07:00
2025-04-24 19:51:31 -07:00
2. ** [Manual Installation (Recommended) ](https://www.surfsense.net/docs/manual-installation )** - For users who prefer more control over their setup or need to customize their deployment.
2024-08-12 00:32:42 -07:00
2025-04-24 01:39:56 -07:00
Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.
2024-08-16 22:31:38 -07:00
2025-04-24 01:39:56 -07:00
Before installation, make sure to complete the [prerequisite setup steps ](https://www.surfsense.net/docs/ ) including:
- PGVector setup
2025-06-03 13:39:11 -07:00
- **File Processing ETL Service** (choose one):
2025-06-05 14:23:37 -07:00
- Unstructured.io API key (supports 34+ formats)
2025-06-03 13:39:11 -07:00
- LlamaIndex API key (enhanced parsing, supports 50+ formats)
2025-04-24 01:39:56 -07:00
- Other required API keys
2025-04-21 01:42:38 -07:00
2025-04-24 01:39:56 -07:00
## Screenshots
2025-04-14 19:26:23 -07:00
**Search Spaces**

2025-04-14 19:40:04 -07:00
**Manage Documents**

2025-04-14 19:26:23 -07:00
**Research Agent**

2025-05-06 00:12:22 -07:00
**Podcast Agent**

2025-04-14 19:26:23 -07:00
**Agent Chat**
2025-06-02 11:50:31 -07:00

2024-08-16 22:31:38 -07:00
2025-04-24 01:39:56 -07:00
**Browser Extension**
2024-08-16 22:31:38 -07:00
2025-03-26 21:21:22 -07:00


2024-08-12 21:07:21 -07:00
2025-05-06 00:12:22 -07:00
2025-04-24 01:39:56 -07:00
## Tech Stack
2024-08-17 00:47:12 -07:00
2024-08-12 21:07:21 -07:00
2025-03-14 19:03:53 -07:00
### **BackEnd**
- **FastAPI** : Modern, fast web framework for building APIs with Python
2025-04-14 19:26:23 -07:00
2025-03-14 19:03:53 -07:00
- **PostgreSQL with pgvector** : Database with vector search capabilities for similarity searches
- **SQLAlchemy** : SQL toolkit and ORM (Object-Relational Mapping) for database interactions
2024-08-12 21:07:21 -07:00
2025-04-14 19:26:23 -07:00
- **Alembic** : A database migrations tool for SQLAlchemy.
2025-03-14 19:03:53 -07:00
- **FastAPI Users** : Authentication and user management with JWT and OAuth support
2025-04-20 19:19:35 -07:00
- **LangGraph** : Framework for developing AI-agents.
2025-04-14 19:26:23 -07:00
2025-04-20 19:19:35 -07:00
- **LangChain** : Framework for developing AI-powered applications.
2025-03-14 19:03:53 -07:00
2025-04-20 19:19:35 -07:00
- **LLM Integration** : Integration with LLM models through LiteLLM
2025-03-14 19:03:53 -07:00
- **Rerankers** : Advanced result ranking for improved search relevance
- **Hybrid Search** : Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)
- **Vector Embeddings** : Document and text embeddings for semantic search
- **pgvector** : PostgreSQL extension for efficient vector similarity operations
- **Chonkie** : Advanced document chunking and embedding library
2025-04-14 19:26:23 -07:00
- Uses `AutoEmbeddings` for flexible embedding model selection
- `LateChunker` for optimized document chunking based on embedding model's max sequence length
2025-03-14 19:03:53 -07:00
2024-11-11 01:36:21 -08:00
---
2025-03-14 19:03:53 -07:00
### **FrontEnd**
2024-08-12 21:07:21 -07:00
2025-04-07 23:48:27 -07:00
- **Next.js 15.2.3** : React framework featuring App Router, server components, automatic code-splitting, and optimized rendering.
2024-08-12 21:07:21 -07:00
2025-03-14 19:03:53 -07:00
- **React 19.0.0** : JavaScript library for building user interfaces.
2024-08-12 21:07:21 -07:00
2025-03-14 19:03:53 -07:00
- **TypeScript** : Static type-checking for JavaScript, enhancing code quality and developer experience.
- **Vercel AI SDK Kit UI Stream Protocol**: To create scalable chat UI.
- **Tailwind CSS 4.x** : Utility-first CSS framework for building custom UI designs.
2024-08-12 21:07:21 -07:00
2025-03-14 19:03:53 -07:00
- **Shadcn** : Headless components library.
2024-08-12 21:07:21 -07:00
2025-03-14 19:03:53 -07:00
- **Lucide React** : Icon set implemented as React components.
- **Framer Motion** : Animation library for React.
- **Sonner** : Toast notification library.
- **Geist** : Font family from Vercel.
- **React Hook Form** : Form state management and validation.
- **Zod** : TypeScript-first schema validation with static type inference.
- ** @hookform/resolvers **: Resolvers for using validation libraries with React Hook Form.
- ** @tanstack/react -table**: Headless UI for building powerful tables & datagrids.
2025-05-09 16:18:05 +08:00
### **DevOps**
- **Docker** : Container platform for consistent deployment across environments
- **Docker Compose** : Tool for defining and running multi-container Docker applications
- **pgAdmin** : Web-based PostgreSQL administration tool included in Docker setup
2025-03-14 19:03:53 -07:00
### **Extension**
Manifest v3 on Plasmo
2024-08-12 21:07:21 -07:00
2024-08-14 03:21:26 -07:00
## Future Work
2025-03-14 19:03:53 -07:00
- Add More Connectors.
- Patch minor bugs.
2025-05-13 21:13:53 -07:00
- Document Chat ** [REIMPLEMENT]**
- Document Podcasts
2024-09-19 22:52:11 -07:00
2025-03-14 19:03:53 -07:00
2024-08-12 21:07:21 -07:00
## Contribute
Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues.
Fine-tuning the Backend is always desired.
2025-05-03 01:08:19 -07:00
## Star History
< a href = "https://www.star-history.com/ #MODSetter/SurfSense &Date" >
< picture >
< source media = "(prefers-color-scheme: dark)" srcset = "https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date&theme=dark" / >
< source media = "(prefers-color-scheme: light)" srcset = "https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date" / >
< img alt = "Star History Chart" src = "https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date" / >
< / picture >
< / a >