diff --git a/.env.example b/.env.example new file mode 100644 index 000000000..cc6ebe313 --- /dev/null +++ b/.env.example @@ -0,0 +1,17 @@ +# Frontend Configuration +FRONTEND_PORT=3000 +NEXT_PUBLIC_API_URL=http://backend:8000 + +# Backend Configuration +BACKEND_PORT=8000 + +# Database Configuration +POSTGRES_USER=postgres +POSTGRES_PASSWORD=postgres +POSTGRES_DB=surfsense +POSTGRES_PORT=5432 + +# pgAdmin Configuration +PGADMIN_PORT=5050 +PGADMIN_DEFAULT_EMAIL=admin@surfsense.com +PGADMIN_DEFAULT_PASSWORD=surfsense diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 000000000..a80f14583 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,45 @@ + + +## Description + + +## Motivation and Context + + +FIX # + +## Changes Overview + +- + +## Screenshots + + +## API Changes + +- [ ] This PR includes API changes + +## Types of changes + +- [ ] Bug fix (non-breaking change which fixes an issue) +- [ ] New feature (non-breaking change which adds functionality) +- [ ] Performance improvement (non-breaking change which enhances performance) +- [ ] Documentation update +- [ ] Breaking change (fix or feature that would cause existing functionality to change) + +## Testing + +- [ ] I have tested these changes locally +- [ ] I have added/updated unit tests +- [ ] I have added/updated integration tests + +## Checklist: + + +- [ ] My code follows the code style of this project +- [ ] My change requires documentation updates +- [ ] I have updated the documentation accordingly +- [ ] My change requires dependency updates +- [ ] I have updated the dependencies accordingly +- [ ] My code builds clean without any errors or warnings +- [ ] All new and existing tests passed \ No newline at end of file diff --git a/.github/workflows/docker-publish.yml b/.github/workflows/docker-publish.yml new file mode 100644 index 000000000..9b7ecc6a0 --- /dev/null +++ b/.github/workflows/docker-publish.yml @@ -0,0 +1,76 @@ +name: Docker Publish + +on: + push: + branches: [ "main" ] + +jobs: + build_and_push_backend: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Set up QEMU + uses: docker/setup-qemu-action@v3 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + - name: Log in to GitHub Container Registry + uses: docker/login-action@v3 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Build and push backend image + uses: docker/build-push-action@v5 + with: + context: ./surfsense_backend + file: ./surfsense_backend/Dockerfile + push: true + tags: ghcr.io/${{ github.repository_owner }}/surfsense_backend:${{ github.sha }} + platforms: linux/amd64,linux/arm64 + labels: | + org.opencontainers.image.source=${{ github.repositoryUrl }} + org.opencontainers.image.created=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }} + org.opencontainers.image.revision=${{ github.sha }} + + build_and_push_frontend: + runs-on: ubuntu-latest + permissions: + contents: read + packages: write + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Set up QEMU + uses: docker/setup-qemu-action@v3 + + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + + - name: Log in to GitHub Container Registry + uses: docker/login-action@v3 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Build and push frontend image + uses: docker/build-push-action@v5 + with: + context: ./surfsense_web + file: ./surfsense_web/Dockerfile + push: true + tags: ghcr.io/${{ github.repository_owner }}/surfsense_web:${{ github.sha }} + platforms: linux/amd64,linux/arm64 + labels: | + org.opencontainers.image.source=${{ github.repositoryUrl }} + org.opencontainers.image.created=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }} + org.opencontainers.image.revision=${{ github.sha }} diff --git a/.gitignore b/.gitignore index ac1266863..1a7f2267f 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ -.flashrank_cache* \ No newline at end of file +.flashrank_cache* +podcasts/ diff --git a/DEPLOYMENT_GUIDE.md b/DEPLOYMENT_GUIDE.md new file mode 100644 index 000000000..e4cc86dec --- /dev/null +++ b/DEPLOYMENT_GUIDE.md @@ -0,0 +1,124 @@ +# SurfSense Deployment Guide + +This guide explains the different deployment options available for SurfSense using Docker Compose. + +## Deployment Options + +SurfSense uses a flexible Docker Compose configuration that allows you to easily switch between deployment modes without manually editing files. Our approach uses Docker's built-in override functionality with two configuration files: + +1. **docker-compose.yml**: Contains essential core services (database and pgAdmin) +2. **docker-compose.override.yml**: Contains application services (frontend and backend) + +This structure provides several advantages: +- No need to comment/uncomment services manually +- Clear separation between core infrastructure and application services +- Easy switching between development and production environments + +## Deployment Modes + +### Full Stack Mode (Development) + +This mode runs everything: frontend, backend, database, and pgAdmin. It's ideal for development environments where you need the complete application stack. + +```bash +# Both files are automatically used (docker-compose.yml + docker-compose.override.yml) +docker compose up -d +``` + +### Core Services Mode (Production) + +This mode runs only the database and pgAdmin services. It's suitable for production environments where you might want to deploy the frontend and backend separately or need to run database migrations. + +```bash +# Explicitly use only the main file +docker compose -f docker-compose.yml up -d +``` + +## Custom Deployment Options + +### Running Specific Services + +You can specify which services to start by naming them: + +```bash +# Start only database +docker compose up -d db + +# Start database and pgAdmin +docker compose up -d db pgadmin + +# Start only backend (requires db to be running) +docker compose up -d backend +``` + +### Using Custom Override Files + +You can create and use custom override files for different environments: + +```bash +# Create a staging configuration +docker compose -f docker-compose.yml -f docker-compose.staging.yml up -d +``` + +## Environment Variables + +The deployment can be customized using environment variables: + +```bash +# Change default ports +FRONTEND_PORT=4000 BACKEND_PORT=9000 docker compose up -d + +# Or use a .env file +# Create or modify .env file with your desired values +docker compose up -d +``` + +## Common Deployment Workflows + +### Initial Setup + +```bash +# Clone the repository +git clone https://github.com/MODSetter/SurfSense.git +cd SurfSense + +# Copy example env files +cp .env.example .env +cp surfsense_backend/.env.example surfsense_backend/.env +cp surfsense_web/.env.example surfsense_web/.env + +# Edit the .env files with your configuration + +# Start full stack for development +docker compose up -d +``` + +### Database-Only Mode (for migrations or maintenance) + +```bash +# Start just the database +docker compose -f docker-compose.yml up -d db + +# Run migrations or maintenance tasks +docker compose exec db psql -U postgres -d surfsense +``` + +### Scaling in Production + +For production deployments, you might want to: + +1. Run core services with Docker Compose +2. Deploy frontend/backend with specialized services like Vercel, Netlify, or dedicated application servers + +This separation allows for better scaling and resource utilization in production environments. + +## Troubleshooting + +If you encounter issues with the deployment: + +- Check container logs: `docker compose logs -f [service_name]` +- Ensure all required environment variables are set +- Verify network connectivity between containers +- Check that required ports are available and not blocked by firewalls + +For more detailed setup instructions, refer to [DOCKER_SETUP.md](DOCKER_SETUP.md). \ No newline at end of file diff --git a/DOCKER_SETUP.md b/DOCKER_SETUP.md index 44e6a142f..6b7ee4764 100644 --- a/DOCKER_SETUP.md +++ b/DOCKER_SETUP.md @@ -7,73 +7,186 @@ This document explains how to run the SurfSense project using Docker Compose. - Docker and Docker Compose installed on your machine - Git (to clone the repository) +## Environment Variables Configuration + +SurfSense Docker setup supports configuration through environment variables. You can set these variables in two ways: + +1. Create a `.env` file in the project root directory (copy from `.env.example`) +2. Set environment variables directly in your shell before running Docker Compose + +The following environment variables are available: + +``` +# Frontend Configuration +FRONTEND_PORT=3000 +NEXT_PUBLIC_API_URL=http://backend:8000 + +# Backend Configuration +BACKEND_PORT=8000 + +# Database Configuration +POSTGRES_USER=postgres +POSTGRES_PASSWORD=postgres +POSTGRES_DB=surfsense +POSTGRES_PORT=5432 + +# pgAdmin Configuration +PGADMIN_PORT=5050 +PGADMIN_DEFAULT_EMAIL=admin@surfsense.com +PGADMIN_DEFAULT_PASSWORD=surfsense +``` + +## Deployment Options + +SurfSense uses a flexible Docker Compose setup that allows you to choose between different deployment modes: + +### Option 1: Full-Stack Deployment (Development Mode) +Includes frontend, backend, database, and pgAdmin. This is the default when running `docker compose up`. + +### Option 2: Core Services Only (Production Mode) +Includes only database and pgAdmin, suitable for production environments where you might deploy frontend/backend separately. + +Our setup uses two files: +- `docker-compose.yml`: Contains core services (database and pgAdmin) +- `docker-compose.override.yml`: Contains application services (frontend and backend) + ## Setup 1. Make sure you have all the necessary environment variables set up: - Copy `surfsense_backend/.env.example` to `surfsense_backend/.env` and fill in the required values - Copy `surfsense_web/.env.example` to `surfsense_web/.env` and fill in the required values + - Optionally: Copy `.env.example` to `.env` in the project root to customize Docker settings -2. Build and start the containers: +2. Deploy based on your needs: + + **Full Stack (Development Mode)**: ```bash - docker-compose up --build + # Both files are automatically used + docker compose up --build + ``` + + **Core Services Only (Production Mode)**: + ```bash + # Explicitly use only the main file + docker compose -f docker-compose.yml up --build ``` 3. To run in detached mode (in the background): ```bash - docker-compose up -d + # Full stack + docker compose up -d + + # Core services only + docker compose -f docker-compose.yml up -d ``` 4. Access the applications: - - Frontend: http://localhost:3000 - - Backend API: http://localhost:8000 - - API Documentation: http://localhost:8000/docs + - Frontend: http://localhost:3000 (when using full stack) + - Backend API: http://localhost:8000 (when using full stack) + - API Documentation: http://localhost:8000/docs (when using full stack) + - pgAdmin: http://localhost:5050 + +## Customizing the Deployment + +If you need to make temporary changes to either full stack or core services deployment, you can: + +1. **Temporarily disable override file**: + ```bash + docker compose -f docker-compose.yml up -d + ``` + +2. **Use a custom override file**: + ```bash + docker compose -f docker-compose.yml -f custom-override.yml up -d + ``` + +3. **Temporarily modify which services start**: + ```bash + docker compose up -d db pgadmin + ``` ## Useful Commands - Stop the containers: ```bash - docker-compose down + docker compose down ``` - View logs: ```bash # All services - docker-compose logs -f + docker compose logs -f # Specific service - docker-compose logs -f backend - docker-compose logs -f frontend - docker-compose logs -f db + docker compose logs -f backend + docker compose logs -f frontend + docker compose logs -f db + docker compose logs -f pgadmin ``` - Restart a specific service: ```bash - docker-compose restart backend + docker compose restart backend ``` - Execute commands in a running container: ```bash # Backend - docker-compose exec backend python -m pytest + docker compose exec backend python -m pytest # Frontend - docker-compose exec frontend pnpm lint + docker compose exec frontend pnpm lint ``` ## Database The PostgreSQL database with pgvector extensions is available at: - Host: localhost -- Port: 5432 -- Username: postgres -- Password: postgres -- Database: surfsense +- Port: 5432 (configurable via POSTGRES_PORT) +- Username: postgres (configurable via POSTGRES_USER) +- Password: postgres (configurable via POSTGRES_PASSWORD) +- Database: surfsense (configurable via POSTGRES_DB) -You can connect to it using any PostgreSQL client. +You can connect to it using any PostgreSQL client or the included pgAdmin. + +## pgAdmin + +pgAdmin is a web-based administration tool for PostgreSQL. It is included in the Docker setup for easier database management. + +- URL: http://localhost:5050 (configurable via PGADMIN_PORT) +- Default Email: admin@surfsense.com (configurable via PGADMIN_DEFAULT_EMAIL) +- Default Password: surfsense (configurable via PGADMIN_DEFAULT_PASSWORD) + +### Connecting to the Database in pgAdmin + +1. Log in to pgAdmin using the credentials above +2. Right-click on "Servers" in the left sidebar and select "Create" > "Server" +3. In the "General" tab, give your connection a name (e.g., "SurfSense DB") +4. In the "Connection" tab, enter the following: + - Host: db + - Port: 5432 + - Maintenance database: surfsense + - Username: postgres + - Password: postgres +5. Click "Save" to establish the connection ## Troubleshooting - If you encounter permission errors, you may need to run the docker commands with `sudo`. -- If ports are already in use, modify the port mappings in the `docker-compose.yml` file. +- If ports are already in use, modify the port mappings in the `.env` file or directly in the `docker-compose.yml` file. - For backend dependency issues, you may need to modify the `Dockerfile` in the backend directory. -- For frontend dependency issues, you may need to modify the `Dockerfile` in the frontend directory. +- If you encounter frontend dependency errors, adjust the frontend's `Dockerfile` accordingly. +- If pgAdmin doesn't connect to the database, ensure you're using `db` as the hostname, not `localhost`, as that's the Docker network name. +- If you need only specific services, you can explicitly name them: `docker compose up db pgadmin` + +## Understanding Docker Compose File Structure + +The project uses Docker's default override mechanism: + +1. **docker-compose.yml**: Contains essential services (database and pgAdmin) +2. **docker-compose.override.yml**: Contains development services (frontend and backend) + +When you run `docker compose up` without additional flags, Docker automatically merges both files. +When you run `docker compose -f docker-compose.yml up`, only the specified file is used. + +This approach lets you maintain a cleaner codebase without manually commenting/uncommenting services in your configuration files. diff --git a/README.md b/README.md index e412fe2be..7272206a4 100644 --- a/README.md +++ b/README.md @@ -1,39 +1,53 @@ - -  + # SurfSense -While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more to come. +While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub and more to come. + +
# Video https://github.com/user-attachments/assets/48142909-6391-4084-b7e8-81da388bb1fc +# Podcast's + +https://github.com/user-attachments/assets/d516982f-de00-4c41-9e4c-632a7d942f41 + +## Podcast Sample + +https://github.com/user-attachments/assets/bf64a6ca-934b-47ac-9e1b-edac5fe972ec ## Key Features -### 1. Latest -#### đĄ **Idea**: +### đĄ **Idea**: Have your own highly customizable private NotebookLM and Perplexity integrated with external sources. -#### đ **Multiple File Format Uploading Support** -Save content from your own personal files *(Documents, images and supports **27 file extensions**)* to your own personal knowledge base . -#### đ **Powerful Search** +### đ **Multiple File Format Uploading Support** +Save content from your own personal files *(Documents, images, videos and supports **50+ file extensions**)* to your own personal knowledge base . +### đ **Powerful Search** Quickly research or find anything in your saved content . -#### đŦ **Chat with your Saved Content** +### đŦ **Chat with your Saved Content** Interact in Natural Language and get cited answers. -#### đ **Cited Answers** +### đ **Cited Answers** Get Cited answers just like Perplexity. -#### đ **Privacy & Local LLM Support** +### đ **Privacy & Local LLM Support** Works Flawlessly with Ollama local LLMs. -#### đ **Self Hostable** +### đ **Self Hostable** Open source and easy to deploy locally. -#### đ **Advanced RAG Techniques** +### đī¸ Podcasts +- Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.) +- Convert your chat conversations into engaging audio content +- Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI) + +### đ **Advanced RAG Techniques** - Supports 150+ LLM's - Supports 6000+ Embedding Models. - Supports all major Rerankers (Pinecode, Cohere, Flashrank etc) @@ -41,8 +55,8 @@ Open source and easy to deploy locally. - Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion). - RAG as a Service API Backend. -#### âšī¸ **External Sources** -- Search Engines (Tavily) +### âšī¸ **External Sources** +- Search Engines (Tavily, LinkUp) - Slack - Linear - Notion @@ -50,17 +64,41 @@ Open source and easy to deploy locally. - GitHub - and more to come..... -#### đ Cross Browser Extension +## đ **Supported File Extensions** + +> **Note**: File format support depends on your ETL service configuration. LlamaCloud supports 50+ formats, while Unstructured supports 34+ core formats. + +### Documents & Text +**LlamaCloud**: `.pdf`, `.doc`, `.docx`, `.docm`, `.dot`, `.dotm`, `.rtf`, `.txt`, `.xml`, `.epub`, `.odt`, `.wpd`, `.pages`, `.key`, `.numbers`, `.602`, `.abw`, `.cgm`, `.cwk`, `.hwp`, `.lwp`, `.mw`, `.mcw`, `.pbd`, `.sda`, `.sdd`, `.sdp`, `.sdw`, `.sgl`, `.sti`, `.sxi`, `.sxw`, `.stw`, `.sxg`, `.uof`, `.uop`, `.uot`, `.vor`, `.wps`, `.zabw` + +**Unstructured**: `.doc`, `.docx`, `.odt`, `.rtf`, `.pdf`, `.xml`, `.txt`, `.md`, `.markdown`, `.rst`, `.html`, `.org`, `.epub` + +### Presentations +**LlamaCloud**: `.ppt`, `.pptx`, `.pptm`, `.pot`, `.potm`, `.potx`, `.odp`, `.key` + +**Unstructured**: `.ppt`, `.pptx` + +### Spreadsheets & Data +**LlamaCloud**: `.xlsx`, `.xls`, `.xlsm`, `.xlsb`, `.xlw`, `.csv`, `.tsv`, `.ods`, `.fods`, `.numbers`, `.dbf`, `.123`, `.dif`, `.sylk`, `.slk`, `.prn`, `.et`, `.uos1`, `.uos2`, `.wk1`, `.wk2`, `.wk3`, `.wk4`, `.wks`, `.wq1`, `.wq2`, `.wb1`, `.wb2`, `.wb3`, `.qpw`, `.xlr`, `.eth` + +**Unstructured**: `.xls`, `.xlsx`, `.csv`, `.tsv` + +### Images +**LlamaCloud**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.svg`, `.tiff`, `.webp`, `.html`, `.htm`, `.web` + +**Unstructured**: `.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.heic` + +### Audio & Video *(Always Supported)* +`.mp3`, `.mpga`, `.m4a`, `.wav`, `.mp4`, `.mpeg`, `.webm` + +### Email & Communication +**Unstructured**: `.eml`, `.msg`, `.p7s` + +### đ Cross Browser Extension - The SurfSense extension can be used to save any webpage you like. - Its main usecase is to save any webpages protected beyond authentication. -### 2. Temporarily Deprecated - -#### Podcasts -- The SurfSense Podcast feature is currently being reworked for better UI and stability. Expect it soon. - - ## FEATURE REQUESTS AND FUTURE @@ -76,7 +114,13 @@ Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the f SurfSense provides two installation methods: -1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized. Less Customization. +1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized. + - Includes pgAdmin for database management through a web UI + - Supports environment variable customization via `.env` file + - Flexible deployment options (full stack or core services only) + - No need to manually edit configuration files between environments + - See [Docker Setup Guide](DOCKER_SETUP.md) for detailed instructions + - For deployment scenarios and options, see [Deployment Guide](DEPLOYMENT_GUIDE.md) 2. **[Manual Installation (Recommended)](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment. @@ -84,7 +128,6 @@ Both installation guides include detailed OS-specific instructions for Windows, Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including: - PGVector setup -- Google OAuth configuration - Unstructured.io API key - Other required API keys @@ -101,6 +144,9 @@ Before installation, make sure to complete the [prerequisite setup steps](https:  +**Podcast Agent** + + **Agent Chat** @@ -112,6 +158,7 @@ Before installation, make sure to complete the [prerequisite setup steps](https:  + ## Tech Stack @@ -178,6 +225,14 @@ Before installation, make sure to complete the [prerequisite setup steps](https: - **@tanstack/react-table**: Headless UI for building powerful tables & datagrids. + ### **DevOps** + +- **Docker**: Container platform for consistent deployment across environments + +- **Docker Compose**: Tool for defining and running multi-container Docker applications + +- **pgAdmin**: Web-based PostgreSQL administration tool included in Docker setup + ### **Extension** Manifest v3 on Plasmo @@ -185,16 +240,8 @@ Before installation, make sure to complete the [prerequisite setup steps](https: ## Future Work - Add More Connectors. - Patch minor bugs. -- Implement Canvas. -- Complete Hybrid Search. **[Done]** -- Add support for file uploads QA. **[Done]** -- Shift to WebSockets for Streaming responses. **[Deprecated in favor of AI SDK Stream Protocol]** -- Based on feedback, I will work on making it compatible with local models. **[Done]** -- Cross Browser Extension **[Done]** -- Critical Notifications **[Done | PAUSED]** -- Saving Chats **[Done]** -- Basic keyword search page for saved sessions **[Done]** -- Multi & Single Document Chat **[Done]** +- Document Chat **[REIMPLEMENT]** +- Document Podcasts @@ -203,3 +250,13 @@ Before installation, make sure to complete the [prerequisite setup steps](https: Contributions are very welcome! A contribution can be as small as a â or even finding and creating issues. Fine-tuning the Backend is always desired. +## Star History + + +- Manage your connected services and data sources. -
-- You haven't added any connectors yet. Add one to enhance your search capabilities. -
- -+ Manage your connected services and data sources. +
++ You haven't added any connectors yet. Add one to enhance your + search capabilities. +
+ +Listen to generated podcasts.
+Loading podcasts...
+{error}
++ {searchQuery + ? 'Try adjusting your search filters' + : 'Generate podcasts from your chats to get started'} +
+Loading podcast...
+
+
{source.description}
+{source.description}
-+ SurfSense Cloud is currently in development. Check Docs for more information on Self-Hosted version. +
++ Don't have an account?{" "} + + Register here + +
++ Already have an account?{" "} + + Sign in + +
+- A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more. + A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub and more.