12 KiB
Deployment Guide
Deployment Options
NOMYO Router can be deployed in various environments depending on your requirements.
1. Bare Metal / VM Deployment
Prerequisites
- Python 3.10+
- pip
- Virtual environment (recommended)
Installation
# Clone the repository
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
cd nomyo-router
# Create virtual environment
python3 -m venv .venv/router
source .venv/router/bin/activate
# Install dependencies
pip3 install -r requirements.txt
# Configure endpoints
nano config.yaml
Running the Router
# Basic startup
uvicorn router:app --host 0.0.0.0 --port 12434
# With custom configuration path
export NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml
uvicorn router:app --host 0.0.0.0 --port 12434
# With custom database path
export NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db
uvicorn router:app --host 0.0.0.0 --port 12434
Systemd Service
Create /etc/systemd/system/nomyo-router.service:
[Unit]
Description=NOMYO Router - Ollama Proxy
After=network.target
[Service]
User=nomyo
Group=nomyo
WorkingDirectory=/opt/nomyo-router
Environment="NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml"
Environment="NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db"
ExecStart=/opt/nomyo-router/.venv/router/bin/uvicorn router:app --host 0.0.0.0 --port 12434
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nomyo-router
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable nomyo-router
sudo systemctl start nomyo-router
sudo systemctl status nomyo-router
2. Docker Deployment
Image variants
| Tag | Semantic cache | Image size |
|---|---|---|
latest |
❌ exact match only | ~300 MB |
latest-semantic |
✅ sentence-transformers +all-MiniLM-L6-v2 pre-baked |
~800 MB |
The :semantic variant enables cache_similarity < 1.0 in config.yaml. The lean image falls back to exact-match caching with a warning if semantic mode is configured.
Build the Image
# Lean build (exact match cache, default)
docker build -t nomyo-router .
# Semantic build (~500 MB larger, all-MiniLM-L6-v2 model baked in at build time)
docker build --build-arg SEMANTIC_CACHE=true -t nomyo-router:semantic .
Run the Container
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /absolute/path/to/config_folder:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
nomyo-router
Advanced Docker Configuration
Custom Port
docker run -d \
--name nomyo-router \
-p 9000:12434 \
-v /path/to/config:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
nomyo-router \
-- --port 9000
Custom Host
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /path/to/config:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
-e UVICORN_HOST=0.0.0.0 \
nomyo-router
Persistent Database
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /path/to/config:/app/config/ \
-v /path/to/db:/app/token_counts.db \
-e CONFIG_PATH=/app/config/config.yaml \
-e NOMYO_ROUTER_DB_PATH=/app/token_counts.db \
nomyo-router
Docker Compose Example
See examples/docker-compose.yml for a complete Docker Compose example.
3. Kubernetes Deployment
Prerequisites
- Kubernetes cluster
- kubectl configured
- Helm (optional)
Basic Deployment
Create nomyo-router-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nomyo-router
labels:
app: nomyo-router
spec:
replicas: 2
selector:
matchLabels:
app: nomyo-router
template:
metadata:
labels:
app: nomyo-router
spec:
containers:
- name: nomyo-router
image: nomyo-router:latest
ports:
- containerPort: 12434
env:
- name: CONFIG_PATH
value: "/app/config/config.yaml"
- name: NOMYO_ROUTER_DB_PATH
value: "/app/token_counts.db"
volumeMounts:
- name: config-volume
mountPath: /app/config
- name: db-volume
mountPath: /app/token_counts.db
subPath: token_counts.db
volumes:
- name: config-volume
configMap:
name: nomyo-router-config
- name: db-volume
persistentVolumeClaim:
claimName: nomyo-router-db-pvc
---
apiVersion: v1
kind: Service
metadata:
name: nomyo-router
spec:
selector:
app: nomyo-router
ports:
- protocol: TCP
port: 80
targetPort: 12434
type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nomyo-router-config
data:
config.yaml: |
endpoints:
- http://ollama-service:11434
max_concurrent_connections: 2
nomyo-router-api-key: ""
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nomyo-router-db-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Apply the deployment:
kubectl apply -f nomyo-router-deployment.yaml
Horizontal Pod Autoscaler
Create nomyo-router-hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nomyo-router-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nomyo-router
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Apply the HPA:
kubectl apply -f nomyo-router-hpa.yaml
4. Production Deployment
High Availability Setup
For production environments with multiple Ollama instances:
# config.yaml
endpoints:
- http://ollama-worker1:11434
- http://ollama-worker2:11434
- http://ollama-worker3:11434
- https://api.openai.com/v1
max_concurrent_connections: 4
# Optional router-level API key to secure the router and dashboard (leave blank to disable)
nomyo-router-api-key: ""
api_keys:
"https://api.openai.com/v1": "${OPENAI_KEY}"
Load Balancing
Deploy multiple router instances behind a load balancer:
┌───────────────────────────────────────────────────────────────┐
│ Load Balancer (NGINX, Traefik) │
└───────────────────────────────────────────────────────────────┘
│
├─┬───────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐
│ Router Instance │ │ Router Instance │ │ Router Instance │
│ (Pod 1) │ │ (Pod 2) │ │ (Pod 3) │
└─────────────────┘ └─────────────────┘ └─────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────┐
│ Ollama Cluster │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ Ollama │ │ Ollama │ │ OpenAI API │ │
│ │ Worker 1 │ │ Worker 2 │ │ (Fallback) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
Monitoring and Logging
Prometheus Monitoring
Create a Prometheus scrape configuration:
scrape_configs:
- job_name: 'nomyo-router'
metrics_path: '/metrics'
static_configs:
- targets: ['nomyo-router:12434']
Logging
Configure log aggregation:
# In Docker
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /path/to/config:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
--log-driver=fluentd \
--log-opt fluentd-address=fluentd:24224 \
nomyo-router
Deployment Checklist
Pre-Deployment
- Configure all Ollama endpoints
- Set appropriate
max_concurrent_connections - Configure API keys for remote endpoints
- Test configuration locally
- Set up monitoring and alerting
- Configure logging
- Set up backup for token counts database
Post-Deployment
- Verify health endpoint:
curl http://<router>/health - Check endpoint status:
curl http://<router>/api/config - Monitor connection counts:
curl http://<router>/api/usage - Set up regular backups
- Configure auto-restart on failure
- Monitor performance metrics
Scaling Guidelines
Vertical Scaling
- Increase
max_concurrent_connectionsfor more parallel requests - Add more CPU/memory to the router instance
- Monitor memory usage (token buffer grows with usage)
Horizontal Scaling
- Deploy multiple router instances
- Use a load balancer to distribute traffic
- Each instance maintains its own connection tracking
- Database can be shared or per-instance
Database Considerations
- SQLite is sufficient for single-instance deployments
- For multi-instance deployments, consider PostgreSQL
- Regular backups are recommended
- Database size grows with token usage history
Security Best Practices
Network Security
- Use TLS for all external connections
- Restrict access to router port (12434)
- Use firewall rules to limit access
- Consider using VPN for internal communications
Configuration Security
- Store API keys in environment variables
- Restrict access to config.yaml
- Use secrets management for production deployments
- Rotate API keys regularly
Runtime Security
- Run as non-root user
- Set appropriate file permissions
- Monitor for suspicious activity
- Keep dependencies updated
Troubleshooting Deployment Issues
Common Issues
Problem: Router not starting
- Check: Logs for configuration errors
- Solution: Validate config.yaml syntax
Problem: Endpoints showing as unavailable
- Check: Network connectivity from router to endpoints
- Solution: Verify firewall rules and DNS resolution
Problem: High latency
- Check: Endpoint health and connection counts
- Solution: Add more endpoints or increase concurrency limits
Problem: Database errors
- Check: Database file permissions
- Solution: Ensure write permissions for the database path
Problem: Connection limits being hit
- Check:
/api/usageendpoint - Solution: Increase
max_concurrent_connectionsor add endpoints
Examples
See the examples directory for ready-to-use deployment examples.