feat: new docs

This commit is contained in:
DESKTOP-RTLN3BA\$punk 2025-04-24 01:39:56 -07:00
parent 4be3a811e8
commit da2d606a43
6 changed files with 501 additions and 166 deletions

130
README.md
View file

@ -72,97 +72,23 @@ Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the f
## How to get started?
### PRE-START CHECKS
### Installation Options
#### PGVector
Make sure pgvector extension is installed on your machine. Setup Guide https://github.com/pgvector/pgvector?tab=readme-ov-file#installation
SurfSense provides two installation methods:
#### File Uploading Support
For File uploading you need Unstructured.io API key. You can get it at http://platform.unstructured.io/
1. **[Docker Installation (Recommended)](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized.
#### Auth
SurfSense now only works with Google OAuth. Make sure to set your OAuth Client at https://developers.google.com/identity/protocols/oauth2 . We need client id and client secret for backend. Make sure to enable people api and add the required scopes under data access (openid, userinfo.email, userinfo.profile)
2. **[Manual Installation](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment.
![gauth](https://github.com/user-attachments/assets/80d60fe5-889b-48a6-b947-200fdaf544c1)
Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.
#### LLM Observability
One easy way to observe SurfSense Researcher Agent is to use LangSmith. Get its API KEY from https://smith.langchain.com/
Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including:
- PGVector setup
- Google OAuth configuration
- Unstructured.io API key
- Other required API keys
**Open AI LLMS**
![openai_langraph](https://github.com/user-attachments/assets/b1f4c7a1-0a66-4d21-9053-2e09a5634f95)
**Ollama LLMS**
![ollama_langgraph](https://github.com/user-attachments/assets/5b6c870e-095c-4368-86e6-f7488e0fca28)
#### Crawler Support
SurfSense currently uses [Firecrawl.py](https://www.firecrawl.dev/) right now. Playwright crawler support will be added soon.
## Quick Start
### Preferred Method: Docker Setup
The recommended way to run SurfSense is using Docker, which ensures consistent environment across different systems.
1. Make sure you have Docker and Docker Compose installed
2. Follow the detailed instructions in our [Docker Setup Guide](DOCKER_SETUP.md)
```bash
# Start all services with one command
docker-compose up --build
```
---
### Alternative: Manual Setup
### Backend (./surfsense_backend)
This is the core of SurfSense. Before we begin let's look at `.env` variables' that we need to successfully setup SurfSense.
|ENV VARIABLE|DESCRIPTION|
|--|--|
| DATABASE_URL| Your PostgreSQL database connection string. Eg. `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`|
| SECRET_KEY| JWT Secret key used for authentication. Should be a secure random string. Eg. `SURFSENSE_SECRET_KEY_123456789`|
| GOOGLE_OAUTH_CLIENT_ID| Google OAuth client ID obtained from Google Cloud Console when setting up OAuth authentication|
| GOOGLE_OAUTH_CLIENT_SECRET| Google OAuth client secret obtained from Google Cloud Console when setting up OAuth authentication|
| NEXT_FRONTEND_URL| URL where your frontend application is hosted. Eg. `http://localhost:3000`|
| EMBEDDING_MODEL| Name of the embedding model to use for vector embeddings. Currently works with Sentence Transformers only. Expect other embeddings soon. Eg. `mixedbread-ai/mxbai-embed-large-v1`|
| RERANKERS_MODEL_NAME| Name of the reranker model for search result reranking. Eg. `ms-marco-MiniLM-L-12-v2`|
| RERANKERS_MODEL_TYPE| Type of reranker model being used. Eg. `flashrank`|
| FAST_LLM| LiteLLM routed Smaller, faster LLM for quick responses. Eg. `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`|
| STRATEGIC_LLM| LiteLLM routed Advanced LLM for complex reasoning tasks. Eg. `openai/gpt-4o`, `ollama/gemma3:12b`|
| LONG_CONTEXT_LLM| LiteLLM routed LLM capable of handling longer context windows. Eg. `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`|
| UNSTRUCTURED_API_KEY| API key for Unstructured.io service for document parsing|
| FIRECRAWL_API_KEY| API key for Firecrawl service for web crawling and data extraction|
IMPORTANT: Since LLM calls are routed through LiteLLM make sure to include API keys of LLM models you are using. For example if you used `openai/gpt-4o` make sure to include OpenAI API Key `OPENAI_API_KEY` or if you use `gemini/gemini-2.0-flash` then you include `GEMINI_API_KEY`.
You can also integrate any LLM just follow this https://docs.litellm.ai/docs/providers
Now once you have everything let's proceed to run SurfSense.
1. Install `uv` : https://docs.astral.sh/uv/getting-started/installation/
2. Now just run this command to install dependencies i.e `uv sync`
3. That's it. Now just run the `main.py` file using `uv run main.py`. You can also optionally pass `--reload` as an argument to enable hot reloading.
4. If everything worked fine you should see screen like this.
![backend](https://i.ibb.co/542Vhqw/backendrunning.png)
---
### FrontEnd (./surfsense_web)
For local frontend setup just fill out the `.env` file of frontend.
|ENV VARIABLE|DESCRIPTION|
|--|--|
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | Give hosted backend url here. Eg. `http://localhost:8000`|
1. Now install dependencies using `pnpm install`
2. Run it using `pnpm run dev`
You should see your Next.js frontend running at `localhost:3000`
#### Some FrontEnd Screens
## Screenshots
**Search Spaces**
@ -180,43 +106,13 @@ You should see your Next.js frontend running at `localhost:3000`
![chat](https://github.com/user-attachments/assets/bb352d52-1c6d-4020-926b-722d0b98b491)
---
### Extension (./surfsense_browser_extension)
Extension is in plasmo framework which is a cross browser extension framework. Extension main usecase is to save any webpages protected beyond authentication.
For building extension just fill out the `.env` file of frontend.
|ENV VARIABLE|DESCRIPTION|
|--|--|
| PLASMO_PUBLIC_BACKEND_URL| SurfSense Backend URL eg. "http://127.0.0.1:8000" |
Build the extension for your favorite browser using this guide: https://docs.plasmo.com/framework/workflows/build#with-a-specific-target
When you load and start the extension you should see a Apu page like this
**Browser Extension**
![ext1](https://github.com/user-attachments/assets/1f042b7a-6349-422b-94fb-d40d0df16c40)
After filling in your SurfSense API key you should be able to use extension now.
![ext2](https://github.com/user-attachments/assets/a9b9f1aa-2677-404d-b0a0-c1b2dddf24a7)
|Options|Explanations|
|--|--|
| Search Space | Search Space to save your dynamic bookmarks. |
| Clear Inactive History Sessions | It clears the saved content for Inactive Tab Sessions. |
| Save Current Webpage Snapshot | Stores the current webpage session info into SurfSense history store|
| Save to SurfSense | Processes the SurfSense History Store & Initiates a Save Job |
## Tech Stack
## Tech Stack
### **BackEnd**