diff --git a/README.md b/README.md index 98005b35c..cfcf0c55f 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,13 @@ -  - - - # SurfSense + While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub and more to come.
- # Video https://github.com/user-attachments/assets/48142909-6391-4084-b7e8-81da388bb1fc @@ -24,31 +20,46 @@ https://github.com/user-attachments/assets/d516982f-de00-4c41-9e4c-632a7d942f41 https://github.com/user-attachments/assets/bf64a6ca-934b-47ac-9e1b-edac5fe972ec - - ## Key Features + ### 1. Latest -#### đĄ **Idea**: +#### đĄ **Idea**: + Have your own highly customizable private NotebookLM and Perplexity integrated with external sources. + #### đ **Multiple File Format Uploading Support** -Save content from your own personal files *(Documents, images and supports **27 file extensions**)* to your own personal knowledge base . + +Save content from your own personal files _(Documents, images and supports **27 file extensions**)_ to your own personal knowledge base . + #### đ **Powerful Search** + Quickly research or find anything in your saved content . + #### đŦ **Chat with your Saved Content** - Interact in Natural Language and get cited answers. + +Interact in Natural Language and get cited answers. + #### đ **Cited Answers** + Get Cited answers just like Perplexity. + #### đ **Privacy & Local LLM Support** + Works Flawlessly with Ollama local LLMs. + #### đ **Self Hostable** + Open source and easy to deploy locally. -#### đī¸ Podcasts + +#### đī¸ Podcasts + - Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.) - Convert your chat conversations into engaging audio content - Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI) #### đ **Advanced RAG Techniques** + - Supports 150+ LLM's - Supports 6000+ Embedding Models. - Supports all major Rerankers (Pinecode, Cohere, Flashrank etc) @@ -57,6 +68,7 @@ Open source and easy to deploy locally. - RAG as a Service API Backend. #### âšī¸ **External Sources** + - Search Engines (Tavily, LinkUp) - Slack - Linear @@ -66,19 +78,16 @@ Open source and easy to deploy locally. - and more to come..... #### đ Cross Browser Extension + - The SurfSense extension can be used to save any webpage you like. - Its main usecase is to save any webpages protected beyond authentication. - ## FEATURE REQUESTS AND FUTURE - **SurfSense is actively being developed.** While it's not yet production-ready, you can help us speed up the process. Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the future of SurfSense! - - ## How to get started? ### Installation Options @@ -86,6 +95,7 @@ Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the f SurfSense provides two installation methods: 1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized. + - Includes pgAdmin for database management through a web UI - Supports environment variable customization via `.env` file - See [Docker Setup Guide](DOCKER_SETUP.md) for detailed instructions @@ -95,6 +105,7 @@ SurfSense provides two installation methods: Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux. Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including: + - PGVector setup - Google OAuth configuration - Unstructured.io API key @@ -102,22 +113,21 @@ Before installation, make sure to complete the [prerequisite setup steps](https: ## Screenshots -**Search Spaces** +**Search Spaces**  -**Manage Documents** +**Manage Documents**  -**Research Agent** +**Research Agent**  -**Podcast Agent** +**Podcast Agent**  - -**Agent Chat** +**Agent Chat**  @@ -127,89 +137,86 @@ Before installation, make sure to complete the [prerequisite setup steps](https:  - ## Tech Stack +### **BackEnd** - ### **BackEnd** +- **FastAPI**: Modern, fast web framework for building APIs with Python -- **FastAPI**: Modern, fast web framework for building APIs with Python - -- **PostgreSQL with pgvector**: Database with vector search capabilities for similarity searches +- **PostgreSQL with pgvector**: Database with vector search capabilities for similarity searches -- **SQLAlchemy**: SQL toolkit and ORM (Object-Relational Mapping) for database interactions +- **SQLAlchemy**: SQL toolkit and ORM (Object-Relational Mapping) for database interactions -- **Alembic**: A database migrations tool for SQLAlchemy. +- **Alembic**: A database migrations tool for SQLAlchemy. -- **FastAPI Users**: Authentication and user management with JWT and OAuth support +- **FastAPI Users**: Authentication and user management with JWT and OAuth support -- **LangGraph**: Framework for developing AI-agents. - -- **LangChain**: Framework for developing AI-powered applications. +- **LangGraph**: Framework for developing AI-agents. -- **LLM Integration**: Integration with LLM models through LiteLLM +- **LangChain**: Framework for developing AI-powered applications. -- **Rerankers**: Advanced result ranking for improved search relevance +- **LLM Integration**: Integration with LLM models through LiteLLM -- **Hybrid Search**: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF) +- **Rerankers**: Advanced result ranking for improved search relevance -- **Vector Embeddings**: Document and text embeddings for semantic search +- **Hybrid Search**: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF) -- **pgvector**: PostgreSQL extension for efficient vector similarity operations +- **Vector Embeddings**: Document and text embeddings for semantic search -- **Chonkie**: Advanced document chunking and embedding library - - Uses `AutoEmbeddings` for flexible embedding model selection - - `LateChunker` for optimized document chunking based on embedding model's max sequence length +- **pgvector**: PostgreSQL extension for efficient vector similarity operations +- **Chonkie**: Advanced document chunking and embedding library +- Uses `AutoEmbeddings` for flexible embedding model selection +- `LateChunker` for optimized document chunking based on embedding model's max sequence length - --- - ### **FrontEnd** -- **Next.js 15.2.3**: React framework featuring App Router, server components, automatic code-splitting, and optimized rendering. +### **FrontEnd** -- **React 19.0.0**: JavaScript library for building user interfaces. +- **Next.js 15.2.3**: React framework featuring App Router, server components, automatic code-splitting, and optimized rendering. -- **TypeScript**: Static type-checking for JavaScript, enhancing code quality and developer experience. +- **React 19.0.0**: JavaScript library for building user interfaces. + +- **TypeScript**: Static type-checking for JavaScript, enhancing code quality and developer experience. - **Vercel AI SDK Kit UI Stream Protocol**: To create scalable chat UI. -- **Tailwind CSS 4.x**: Utility-first CSS framework for building custom UI designs. +- **Tailwind CSS 4.x**: Utility-first CSS framework for building custom UI designs. -- **Shadcn**: Headless components library. +- **Shadcn**: Headless components library. -- **Lucide React**: Icon set implemented as React components. +- **Lucide React**: Icon set implemented as React components. -- **Framer Motion**: Animation library for React. +- **Framer Motion**: Animation library for React. -- **Sonner**: Toast notification library. +- **Sonner**: Toast notification library. -- **Geist**: Font family from Vercel. +- **Geist**: Font family from Vercel. -- **React Hook Form**: Form state management and validation. +- **React Hook Form**: Form state management and validation. -- **Zod**: TypeScript-first schema validation with static type inference. +- **Zod**: TypeScript-first schema validation with static type inference. -- **@hookform/resolvers**: Resolvers for using validation libraries with React Hook Form. +- **@hookform/resolvers**: Resolvers for using validation libraries with React Hook Form. -- **@tanstack/react-table**: Headless UI for building powerful tables & datagrids. +- **@tanstack/react-table**: Headless UI for building powerful tables & datagrids. +### **DevOps** - ### **DevOps** +- **Docker**: Container platform for consistent deployment across environments -- **Docker**: Container platform for consistent deployment across environments - -- **Docker Compose**: Tool for defining and running multi-container Docker applications +- **Docker Compose**: Tool for defining and running multi-container Docker applications -- **pgAdmin**: Web-based PostgreSQL administration tool included in Docker setup +- **pgAdmin**: Web-based PostgreSQL administration tool included in Docker setup +### **Extension** -### **Extension** - Manifest v3 on Plasmo +Manifest v3 on Plasmo ## Future Work + - Add More Connectors. - Patch minor bugs. -- Implement Canvas. +- Implement Canvas. - Complete Hybrid Search. **[Done]** - Add support for file uploads QA. **[Done]** - Shift to WebSockets for Streaming responses. **[Deprecated in favor of AI SDK Stream Protocol]** @@ -220,9 +227,7 @@ Before installation, make sure to complete the [prerequisite setup steps](https: - Basic keyword search page for saved sessions **[Done]** - Multi & Single Document Chat **[Done]** - - -## Contribute +## Contribute Contributions are very welcome! A contribution can be as small as a â or even finding and creating issues. Fine-tuning the Backend is always desired. @@ -236,12 +241,3 @@ Fine-tuning the Backend is always desired.