dograh/docs/deployment/scaling.mdx
Abhishek 59619e9eaa
feat: an option to setup remote server with docker compose build (#280)
* feat: remote setup with docker build option

* chore: update documentation

* chore: make script run in non tty

* chore: add warning about slow build

* chore: add more documentation

* feat: add FASTAPI_WORKERS parameter

* feat: add scaling docs

* feat: add update script

* fix: fix semver options in update_remote.sh
2026-05-13 17:22:14 +05:30

159 lines
8.2 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Scaling"
description: "Run multiple FastAPI worker processes behind nginx for higher throughput"
---
By default, the Dograh API container runs a single uvicorn worker. For production traffic — especially with many concurrent voice calls (long-lived WebSockets) — you'll want multiple workers. Dograh ships with built-in support for this: nginx load-balances across N independent uvicorn processes using a `least_conn` strategy.
This page covers how the multi-worker setup works, how to choose a worker count at install time, and how to change it on a running stack.
<Warning>
Multi-worker support requires **Dograh v1.29.0 or newer**. Earlier releases used `uvicorn --workers` and ship a different `setup_remote.sh` / `start_services_docker.sh` / `nginx.conf` layout — the steps below will not work on them. If your stack is older, [update first](/deployment/update) and then come back to this page.
</Warning>
## How it works
The API container starts `FASTAPI_WORKERS` separate uvicorn processes, each bound to its own port (`8000`, `8001`, `8002`, …). nginx exposes a single upstream `dograh_api` that includes all worker ports and routes new requests to whichever worker currently has the **fewest active connections**.
```
┌───────────────────────────────────┐
│ api container │
│ uvicorn worker 0 → :8000 │
browser ──► nginx ──► │ uvicorn worker 1 → :8001 │
(443) (least_conn) uvicorn worker 2 → :8002 │
│ uvicorn worker 3 → :8003 │
└───────────────────────────────────┘
```
<Note>
This is intentionally **not** `uvicorn --workers N` (the built-in pre-fork mode). With pre-fork, the Linux kernel distributes new TCP connections across workers via `accept()` — fine for short HTTP requests, but long-lived WebSockets stick to whichever worker first accepted them. A handful of unlucky workers end up handling most of the streaming traffic while the others idle. Routing at the nginx layer with `least_conn` knows the actual per-worker connection count and distributes WebSockets evenly.
</Note>
The `ari_manager` and `campaign_orchestrator` processes inside the API container stay as **singletons** regardless of `FASTAPI_WORKERS` — they coordinate global state (Asterisk channels, campaign scheduling) and should not be duplicated. ARQ background workers are controlled separately via `ARQ_WORKERS`.
## Choosing a worker count
A safe starting point is **one worker per available vCPU**, capped at 8 unless you've profiled your workload. The [Remote Server Deployment prerequisites](/deployment/docker#prerequisites) ask for a minimum of 4 vCPUs, so:
| vCPUs | Suggested `FASTAPI_WORKERS` |
|-------|-----------------------------|
| 4 | 4 |
| 8 | 68 |
| 16+ | profile first |
Each worker holds its own Python process and memory — budget roughly **300500 MB RAM per worker** in addition to the postgres/redis/minio overhead. If you're near the 8 GB RAM minimum and see OOMs, drop the worker count before adding more.
## Setting the worker count at install time
`setup_remote.sh` prompts for the worker count alongside the other configuration:
```
Number of FastAPI workers (uvicorn processes nginx will load-balance):
[4]:
```
Press Enter for the default (`4`) or enter a different positive integer. Non-interactive callers (cloud-init, CI, Terraform) can set the value via environment variable instead:
```bash
SERVER_IP=... TURN_SECRET=... FASTAPI_WORKERS=8 ./setup_remote.sh
```
The script wires the value into two places:
- **`.env`** — sets `FASTAPI_WORKERS=N`, which `docker-compose.yaml` substitutes into the API container's environment.
- **`nginx.conf`** — generates an `upstream dograh_api` block with one `server api:800X` entry per worker.
Both must agree, which is why the script generates them together.
## Changing the worker count on a running stack
Once Dograh is running, increasing or decreasing the worker count is a two-file edit plus a restart. You'll touch:
1. **`.env`** — controls how many uvicorn processes the API container spawns.
2. **`nginx.conf`** — controls which worker ports nginx forwards to.
<Warning>
Both files must stay in sync. If `.env` says `FASTAPI_WORKERS=8` but `nginx.conf` only lists 4 upstream servers, half your workers will be idle. If `nginx.conf` lists more upstreams than there are workers, those upstreams will throw connection errors and trip the `proxy_next_upstream` fallback.
</Warning>
### Steps
All commands run from your `dograh/` directory (the one with `docker-compose.yaml`).
**1. Edit `.env`** and change the `FASTAPI_WORKERS` line:
```bash
# Before
FASTAPI_WORKERS=4
# After
FASTAPI_WORKERS=8
```
**2. Edit `nginx.conf`** and update the `upstream dograh_api` block so it has exactly one `server api:800X` line per worker, with ports starting at `8000`:
```nginx
upstream dograh_api {
least_conn;
server api:8000 max_fails=3 fail_timeout=10s;
server api:8001 max_fails=3 fail_timeout=10s;
server api:8002 max_fails=3 fail_timeout=10s;
server api:8003 max_fails=3 fail_timeout=10s;
server api:8004 max_fails=3 fail_timeout=10s; # ← new
server api:8005 max_fails=3 fail_timeout=10s; # ← new
server api:8006 max_fails=3 fail_timeout=10s; # ← new
server api:8007 max_fails=3 fail_timeout=10s; # ← new
keepalive 32;
}
```
To **scale down**, remove the trailing `server` lines so the list matches the new `FASTAPI_WORKERS` value.
**3. Recreate the affected containers.** The simplest path — brief downtime, no surprises:
```bash
sudo docker compose --profile remote down
sudo docker compose --profile remote up -d
```
If you want to avoid downtime and your stack is healthy, you can recreate only the `api` and `nginx` containers:
```bash
sudo docker compose --profile remote up -d --force-recreate api nginx
```
`--force-recreate` ensures the api container picks up the new `FASTAPI_WORKERS` value and nginx re-reads the updated `nginx.conf` (which is mounted read-only from disk).
**4. Verify.** Confirm the right number of uvicorn processes are running. The API image is slim and doesn't include `ps`, so use Docker's host-side view instead:
```bash
sudo docker compose --profile remote top api | grep uvicorn
```
You should see one line per worker. To confirm the bound ports, check the startup logs — each worker logs an `Uvicorn running on http://0.0.0.0:800X` line on boot:
```bash
sudo docker compose --profile remote logs api | grep "Uvicorn running"
```
Then hit the API through nginx to confirm requests still flow:
```bash
curl -k https://YOUR_SERVER_IP/api/v1/health
```
### Why not just re-run `setup_remote.sh`?
`setup_remote.sh` refuses to overwrite an existing install by design — re-running it would regenerate `OSS_JWT_SECRET` (logging everyone out), reset the TURN shared secret (breaking WebRTC auth on connected clients), and regenerate SSL certificates. The two-file edit above is the supported way to change worker count after install.
If you genuinely want a clean reinstall, see the `DOGRAH_FORCE_OVERWRITE=1` escape hatch documented in the script.
## What this does not scale
Multi-worker mode scales the HTTP/WebSocket API surface. It does **not** scale:
- **ARQ background workers** — controlled by `ARQ_WORKERS` (defaults to 1). Increase this in the API container's environment if your background job queue backs up.
- **`ari_manager` / `campaign_orchestrator`** — singletons by design; they don't benefit from extra processes.
- **Postgres, Redis, MinIO** — each runs as a single container in the stack. For production-scale Postgres you'd run a managed service and point `DATABASE_URL` at it; the same applies to Redis and S3-compatible storage.
For multi-machine horizontal scaling (separate API containers across hosts), see the [Custom Domain](/deployment/custom-domain) guide for the load-balancer-in-front-of-multiple-hosts pattern — it's the same idea as the in-container `least_conn` upstream, just one layer higher.