mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-23 19:05:16 +02:00
314 lines
13 KiB
Text
314 lines
13 KiB
Text
---
|
|
title: Docker Compose
|
|
description: Manual Docker Compose setup for SurfSense
|
|
---
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
git clone https://github.com/MODSetter/SurfSense.git
|
|
cd SurfSense/docker
|
|
cp .env.example .env
|
|
# Edit .env, at minimum set SECRET_KEY
|
|
docker compose up -d
|
|
```
|
|
|
|
After starting, access SurfSense at:
|
|
|
|
- **Frontend**: [http://localhost:3929](http://localhost:3929)
|
|
- **Backend API**: [http://localhost:8929](http://localhost:8929)
|
|
- **API Docs**: [http://localhost:8929/docs](http://localhost:8929/docs)
|
|
---
|
|
|
|
## Configuration
|
|
|
|
All configuration lives in a single `docker/.env` file (or `surfsense/.env` if you used the install script). Copy `.env.example` to `.env` and edit the values you need.
|
|
|
|
### Required
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `SECRET_KEY` | JWT secret key. Generate with: `openssl rand -base64 32`. Auto-generated by the install script. |
|
|
|
|
### Core Settings
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `SURFSENSE_VERSION` | Image tag to deploy. Use `latest`, a clean version (e.g. `0.0.14`), or a specific build (e.g. `0.0.14.1`) | `latest` |
|
|
| `AUTH_TYPE` | Authentication method: `LOCAL` (email/password) or `GOOGLE` (OAuth) | `LOCAL` |
|
|
| `ETL_SERVICE` | Document parsing: `DOCLING` (local), `UNSTRUCTURED`, or `LLAMACLOUD` | `DOCLING` |
|
|
| `EMBEDDING_MODEL` | Embedding model for vector search | `sentence-transformers/all-MiniLM-L6-v2` |
|
|
| `TTS_SERVICE` | Text-to-speech provider for podcasts | `local/kokoro` |
|
|
| `STT_SERVICE` | Speech-to-text provider for audio files | `local/base` |
|
|
| `REGISTRATION_ENABLED` | Allow new user registrations | `TRUE` |
|
|
|
|
### Ports
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `FRONTEND_PORT` | Frontend service port | `3929` |
|
|
| `BACKEND_PORT` | Backend API service port | `8929` |
|
|
| `ZERO_CACHE_PORT` | Zero-cache real-time sync port | `5929` |
|
|
|
|
### Custom Domain / Reverse Proxy
|
|
|
|
Only set these if serving SurfSense on a real domain via a reverse proxy (Caddy, Nginx, Cloudflare Tunnel, etc.). Leave commented out for standard localhost deployments.
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `NEXT_FRONTEND_URL` | Public frontend URL (e.g. `https://app.yourdomain.com`) |
|
|
| `BACKEND_URL` | Public backend URL for OAuth callbacks (e.g. `https://api.yourdomain.com`) |
|
|
| `NEXT_PUBLIC_FASTAPI_BACKEND_URL` | Backend URL used by the frontend (e.g. `https://api.yourdomain.com`) |
|
|
| `NEXT_PUBLIC_ZERO_CACHE_URL` | Zero-cache URL used by the frontend (e.g. `https://zero.yourdomain.com`) |
|
|
|
|
### Zero-cache (Real-Time Sync)
|
|
|
|
Defaults work out of the box. Change `ZERO_ADMIN_PASSWORD` for security in production.
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `ZERO_ADMIN_PASSWORD` | Password for the zero-cache admin UI and `/statz` endpoint | `surfsense-zero-admin` |
|
|
| `ZERO_UPSTREAM_DB` | PostgreSQL connection URL for replication (must be a direct connection, not via pgbouncer) | *(built from DB_* vars)* |
|
|
| `ZERO_CVR_DB` | PostgreSQL connection URL for client view records | *(built from DB_* vars)* |
|
|
| `ZERO_CHANGE_DB` | PostgreSQL connection URL for replication log entries | *(built from DB_* vars)* |
|
|
| `ZERO_APP_PUBLICATIONS` | PostgreSQL publication restricting which tables are replicated (created by migration 116, verified by the `migrations` service before `zero-cache` starts) | `zero_publication` |
|
|
| `ZERO_NUM_SYNC_WORKERS` | Number of view-sync worker processes. Must be ≤ connection pool sizes | `4` |
|
|
| `ZERO_UPSTREAM_MAX_CONNS` | Max connections to upstream PostgreSQL for mutations | `20` |
|
|
| `ZERO_CVR_MAX_CONNS` | Max connections to the CVR database | `30` |
|
|
|
|
### Database
|
|
|
|
Defaults work out of the box. Change for security in production.
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `DB_USER` | PostgreSQL username | `surfsense` |
|
|
| `DB_PASSWORD` | PostgreSQL password | `surfsense` |
|
|
| `DB_NAME` | PostgreSQL database name | `surfsense` |
|
|
| `DB_HOST` | PostgreSQL host | `db` |
|
|
| `DB_PORT` | PostgreSQL port | `5432` |
|
|
| `DB_SSLMODE` | SSL mode: `disable`, `require`, `verify-ca`, `verify-full` | `disable` |
|
|
| `DATABASE_URL` | Full connection URL override. Use for managed databases (RDS, Supabase, etc.) | *(built from above)* |
|
|
|
|
### Authentication
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `GOOGLE_OAUTH_CLIENT_ID` | Google OAuth client ID (required if `AUTH_TYPE=GOOGLE`) |
|
|
| `GOOGLE_OAUTH_CLIENT_SECRET` | Google OAuth client secret (required if `AUTH_TYPE=GOOGLE`) |
|
|
|
|
Create credentials at the [Google Cloud Console](https://console.cloud.google.com/apis/credentials).
|
|
|
|
### External API Keys
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `FIRECRAWL_API_KEY` | [Firecrawl](https://www.firecrawl.dev/) API key for web crawling |
|
|
| `UNSTRUCTURED_API_KEY` | [Unstructured.io](https://unstructured.io/) API key (required if `ETL_SERVICE=UNSTRUCTURED`) |
|
|
| `LLAMA_CLOUD_API_KEY` | [LlamaCloud](https://cloud.llamaindex.ai/) API key (required if `ETL_SERVICE=LLAMACLOUD`) |
|
|
|
|
### Connector OAuth Keys
|
|
|
|
Uncomment the connectors you want to use. Redirect URIs follow the pattern `http://localhost:8000/api/v1/auth/<connector>/connector/callback`.
|
|
|
|
| Connector | Variables |
|
|
|-----------|-----------|
|
|
| Google Drive / Gmail / Calendar | `GOOGLE_DRIVE_REDIRECT_URI`, `GOOGLE_GMAIL_REDIRECT_URI`, `GOOGLE_CALENDAR_REDIRECT_URI` |
|
|
| Notion | `NOTION_CLIENT_ID`, `NOTION_CLIENT_SECRET`, `NOTION_REDIRECT_URI` |
|
|
| Slack | `SLACK_CLIENT_ID`, `SLACK_CLIENT_SECRET`, `SLACK_REDIRECT_URI` |
|
|
| Discord | `DISCORD_CLIENT_ID`, `DISCORD_CLIENT_SECRET`, `DISCORD_BOT_TOKEN`, `DISCORD_REDIRECT_URI` |
|
|
| Atlassian (Jira & Confluence) | `ATLASSIAN_CLIENT_ID`, `ATLASSIAN_CLIENT_SECRET`, `JIRA_REDIRECT_URI`, `CONFLUENCE_REDIRECT_URI` |
|
|
| Linear | `LINEAR_CLIENT_ID`, `LINEAR_CLIENT_SECRET`, `LINEAR_REDIRECT_URI` |
|
|
| ClickUp | `CLICKUP_CLIENT_ID`, `CLICKUP_CLIENT_SECRET`, `CLICKUP_REDIRECT_URI` |
|
|
| Airtable | `AIRTABLE_CLIENT_ID`, `AIRTABLE_CLIENT_SECRET`, `AIRTABLE_REDIRECT_URI` |
|
|
| Microsoft (Teams & OneDrive) | `MICROSOFT_CLIENT_ID`, `MICROSOFT_CLIENT_SECRET`, `TEAMS_REDIRECT_URI`, `ONEDRIVE_REDIRECT_URI` |
|
|
| Dropbox | `DROPBOX_APP_KEY`, `DROPBOX_APP_SECRET`, `DROPBOX_REDIRECT_URI` |
|
|
|
|
### Observability (optional)
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `LANGSMITH_TRACING` | Enable LangSmith tracing (`true` / `false`) |
|
|
| `LANGSMITH_ENDPOINT` | LangSmith API endpoint |
|
|
| `LANGSMITH_API_KEY` | LangSmith API key |
|
|
| `LANGSMITH_PROJECT` | LangSmith project name |
|
|
|
|
### Advanced (optional)
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `SCHEDULE_CHECKER_INTERVAL` | How often to check for scheduled connector tasks (e.g. `5m`, `1h`) | `5m` |
|
|
| `RERANKERS_ENABLED` | Enable document reranking for improved search | `FALSE` |
|
|
| `RERANKERS_MODEL_NAME` | Reranker model name (e.g. `ms-marco-MiniLM-L-12-v2`) | |
|
|
| `RERANKERS_MODEL_TYPE` | Reranker model type (e.g. `flashrank`) | |
|
|
| `PAGES_LIMIT` | Max pages per user for ETL services | unlimited |
|
|
|
|
---
|
|
|
|
## Docker Services
|
|
|
|
| Service | Description |
|
|
|---------|-------------|
|
|
| `db` | PostgreSQL with pgvector extension |
|
|
| `migrations` | Short-lived: runs `alembic upgrade head` and verifies `zero_publication`, then exits |
|
|
| `redis` | Message broker for Celery |
|
|
| `searxng` | Local privacy-respecting search backend |
|
|
| `backend` | FastAPI application server |
|
|
| `celery_worker` | Background task processing (document indexing, etc.) |
|
|
| `celery_beat` | Periodic task scheduler (connector sync) |
|
|
| `zero-cache` | Rocicorp Zero real-time sync (replicates Postgres to clients) |
|
|
| `frontend` | Next.js web application |
|
|
|
|
All services start automatically with `docker compose up -d`.
|
|
|
|
### How startup ordering works
|
|
|
|
Schema migrations run as a dedicated `migrations` service that exits 0 on
|
|
success and non-zero on failure. Every other backend-image service gates on
|
|
it via `condition: service_completed_successfully`:
|
|
|
|
```text
|
|
db (healthy) ──▶ migrations (alembic upgrade head + verify zero_publication)
|
|
│
|
|
├── exit 0 ─▶ backend ──▶ frontend
|
|
│ celery_worker
|
|
│ celery_beat
|
|
│ zero-cache ──▶ frontend
|
|
│
|
|
└── exit ≠ 0 ─▶ compose halts the rest of the stack
|
|
```
|
|
|
|
This guarantees `zero-cache` only starts after `zero_publication` exists in
|
|
Postgres. Before this design, a silent migration failure would leave
|
|
`zero-cache` crash-looping with `Unknown or invalid publications. Specified:
|
|
[zero_publication]. Found: []`.
|
|
|
|
### Readiness vs liveness
|
|
|
|
The backend exposes two endpoints:
|
|
|
|
- `GET /health` — lightweight liveness probe (always returns 200 if the
|
|
process is up).
|
|
- `GET /ready` — readiness probe that confirms `zero_publication` exists.
|
|
Returns 503 if not. The compose `backend.healthcheck` uses `/ready` so the
|
|
container only reports `healthy` once the schema is actually usable by
|
|
zero-cache.
|
|
|
|
You can also monitor startup progress with `docker compose ps` (look for
|
|
`(health: starting)` → `(healthy)`). The install script polls these states
|
|
automatically and times out after 5 minutes if the stack does not converge.
|
|
|
|
---
|
|
|
|
## Useful Commands
|
|
|
|
```bash
|
|
# View logs (all services)
|
|
docker compose logs -f
|
|
|
|
# View logs for a specific service
|
|
docker compose logs -f backend
|
|
|
|
# Stop all services
|
|
docker compose down
|
|
|
|
# Restart a specific service
|
|
docker compose restart backend
|
|
|
|
# Stop and remove all containers + volumes (destructive!)
|
|
docker compose down -v
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
- **Ports already in use**: Change the relevant `*_PORT` variable in `.env` and restart.
|
|
- **Permission errors on Linux**: You may need to prefix `docker` commands with `sudo`.
|
|
- **Real-time updates not working**: Open DevTools → Console and check for WebSocket errors. Verify `NEXT_PUBLIC_ZERO_CACHE_URL` matches the running zero-cache address.
|
|
- **Line ending issues on Windows**: Run `git config --global core.autocrlf true` before cloning.
|
|
|
|
### Migration service exited non-zero
|
|
|
|
The `migrations` service exits non-zero in two cases:
|
|
|
|
1. `alembic upgrade head` failed (timeout or SQL error).
|
|
2. `alembic` succeeded but `zero_publication` is still missing from
|
|
`pg_publication`.
|
|
|
|
Inspect the logs and the alembic state:
|
|
|
|
```bash
|
|
docker compose logs migrations
|
|
docker compose exec db psql -U surfsense -d surfsense \
|
|
-c 'SELECT * FROM alembic_version;'
|
|
docker compose exec db psql -U surfsense -d surfsense \
|
|
-c 'SELECT pubname FROM pg_publication;'
|
|
```
|
|
|
|
The default migration timeout is 900 seconds. Slow disks (Windows / WSL2)
|
|
may need more — set `MIGRATION_TIMEOUT` in `.env` to increase it.
|
|
|
|
### Zero-cache stuck on `Unknown or invalid publications`
|
|
|
|
Symptom (in `docker compose logs zero-cache`):
|
|
|
|
```text
|
|
Error: Unknown or invalid publications. Specified: [zero_publication]. Found: []
|
|
```
|
|
|
|
This means `zero-cache` started before `zero_publication` was created. With
|
|
the current compose files this should be impossible — the `migrations`
|
|
service blocks `zero-cache` from starting. If you see it, your stack
|
|
predates the fix or you brought up `zero-cache` manually with `docker
|
|
compose up zero-cache` before the migrations service ran.
|
|
|
|
Recovery:
|
|
|
|
```bash
|
|
docker compose down
|
|
docker volume rm surfsense-zero-cache # wipe half-built SQLite replica
|
|
docker compose up -d # migrations runs first, then zero-cache
|
|
```
|
|
|
|
The install script (`install.ps1` / `install.sh`) detects this case
|
|
automatically: if it finds a `surfsense-zero-cache` volume from a previous
|
|
install with no matching `surfsense-zero-init` volume, it removes the stale
|
|
volume before bringing the stack up.
|
|
|
|
### Zero-cache crashes with `_zero.tableMetadata` errors
|
|
|
|
This indicates a half-initialized SQLite replica left behind by a previous
|
|
crash. The `migrations` service writes a marker file on a shared volume
|
|
(`surfsense-zero-init`) when the publication oid changes; zero-cache wipes
|
|
its replica and re-syncs on next start. If the marker mechanism somehow did
|
|
not trigger, run the recovery one-liner above.
|
|
|
|
### Ensuring `wal_level = logical`
|
|
|
|
Logical replication is required by zero-cache. The bundled
|
|
`docker/postgresql.conf` sets `wal_level = logical` automatically. If you
|
|
swap in your own config or use a managed Postgres, confirm with:
|
|
|
|
```bash
|
|
docker compose exec db psql -U surfsense -d surfsense \
|
|
-c "SHOW wal_level;"
|
|
```
|
|
|
|
### Using `docker-compose.deps-only.yml`
|
|
|
|
`docker-compose.deps-only.yml` runs only the dependencies (Postgres, Redis,
|
|
SearXNG, zero-cache) on Docker while the backend and frontend run on the
|
|
host. Because there is no backend container in this stack, there is no
|
|
`migrations` service either, and you must run alembic on the host **before**
|
|
bringing the stack up:
|
|
|
|
```bash
|
|
cd surfsense_backend
|
|
uv run alembic upgrade head
|
|
cd ../docker
|
|
docker compose -f docker-compose.deps-only.yml up -d
|
|
```
|
|
|
|
If you skip the alembic step, `zero-cache` will crash-loop with `Unknown or
|
|
invalid publications. Specified: [zero_publication]`.
|