added buffer_lock to prevent race condition in high concurrency scenarios added documentation
11 KiB
11 KiB
Deployment Guide
Deployment Options
NOMYO Router can be deployed in various environments depending on your requirements.
1. Bare Metal / VM Deployment
Prerequisites
- Python 3.10+
- pip
- Virtual environment (recommended)
Installation
# Clone the repository
git clone https://github.com/nomyo-ai/nomyo-router.git
cd nomyo-router
# Create virtual environment
python3 -m venv .venv/router
source .venv/router/bin/activate
# Install dependencies
pip3 install -r requirements.txt
# Configure endpoints
nano config.yaml
Running the Router
# Basic startup
uvicorn router:app --host 0.0.0.0 --port 12434
# With custom configuration path
export NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml
uvicorn router:app --host 0.0.0.0 --port 12434
# With custom database path
export NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db
uvicorn router:app --host 0.0.0.0 --port 12434
Systemd Service
Create /etc/systemd/system/nomyo-router.service:
[Unit]
Description=NOMYO Router - Ollama Proxy
After=network.target
[Service]
User=nomyo
Group=nomyo
WorkingDirectory=/opt/nomyo-router
Environment="NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml"
Environment="NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db"
ExecStart=/opt/nomyo-router/.venv/router/bin/uvicorn router:app --host 0.0.0.0 --port 12434
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nomyo-router
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable nomyo-router
sudo systemctl start nomyo-router
sudo systemctl status nomyo-router
2. Docker Deployment
Build the Image
docker build -t nomyo-router .
Run the Container
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /absolute/path/to/config_folder:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
nomyo-router
Advanced Docker Configuration
Custom Port
docker run -d \
--name nomyo-router \
-p 9000:12434 \
-v /path/to/config:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
nomyo-router \
-- --port 9000
Custom Host
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /path/to/config:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
-e UVICORN_HOST=0.0.0.0 \
nomyo-router
Persistent Database
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /path/to/config:/app/config/ \
-v /path/to/db:/app/token_counts.db \
-e CONFIG_PATH=/app/config/config.yaml \
-e NOMYO_ROUTER_DB_PATH=/app/token_counts.db \
nomyo-router
Docker Compose Example
See examples/docker-compose.yml for a complete Docker Compose example.
3. Kubernetes Deployment
Prerequisites
- Kubernetes cluster
- kubectl configured
- Helm (optional)
Basic Deployment
Create nomyo-router-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nomyo-router
labels:
app: nomyo-router
spec:
replicas: 2
selector:
matchLabels:
app: nomyo-router
template:
metadata:
labels:
app: nomyo-router
spec:
containers:
- name: nomyo-router
image: nomyo-router:latest
ports:
- containerPort: 12434
env:
- name: CONFIG_PATH
value: "/app/config/config.yaml"
- name: NOMYO_ROUTER_DB_PATH
value: "/app/token_counts.db"
volumeMounts:
- name: config-volume
mountPath: /app/config
- name: db-volume
mountPath: /app/token_counts.db
subPath: token_counts.db
volumes:
- name: config-volume
configMap:
name: nomyo-router-config
- name: db-volume
persistentVolumeClaim:
claimName: nomyo-router-db-pvc
---
apiVersion: v1
kind: Service
metadata:
name: nomyo-router
spec:
selector:
app: nomyo-router
ports:
- protocol: TCP
port: 80
targetPort: 12434
type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nomyo-router-config
data:
config.yaml: |
endpoints:
- http://ollama-service:11434
max_concurrent_connections: 2
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nomyo-router-db-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Apply the deployment:
kubectl apply -f nomyo-router-deployment.yaml
Horizontal Pod Autoscaler
Create nomyo-router-hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nomyo-router-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nomyo-router
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Apply the HPA:
kubectl apply -f nomyo-router-hpa.yaml
4. Production Deployment
High Availability Setup
For production environments with multiple Ollama instances:
# config.yaml
endpoints:
- http://ollama-worker1:11434
- http://ollama-worker2:11434
- http://ollama-worker3:11434
- https://api.openai.com/v1
max_concurrent_connections: 4
api_keys:
"https://api.openai.com/v1": "${OPENAI_KEY}"
Load Balancing
Deploy multiple router instances behind a load balancer:
┌───────────────────────────────────────────────────────────────┐
│ Load Balancer (NGINX, Traefik) │
└───────────────────────────────────────────────────────────────┘
│
├─┬───────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐
│ Router Instance │ │ Router Instance │ │ Router Instance │
│ (Pod 1) │ │ (Pod 2) │ │ (Pod 3) │
└─────────────────┘ └─────────────────┘ └─────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────┐
│ Ollama Cluster │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ Ollama │ │ Ollama │ │ OpenAI API │ │
│ │ Worker 1 │ │ Worker 2 │ │ (Fallback) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
Monitoring and Logging
Prometheus Monitoring
Create a Prometheus scrape configuration:
scrape_configs:
- job_name: 'nomyo-router'
metrics_path: '/metrics'
static_configs:
- targets: ['nomyo-router:12434']
Logging
Configure log aggregation:
# In Docker
docker run -d \
--name nomyo-router \
-p 12434:12434 \
-v /path/to/config:/app/config/ \
-e CONFIG_PATH=/app/config/config.yaml \
--log-driver=fluentd \
--log-opt fluentd-address=fluentd:24224 \
nomyo-router
Deployment Checklist
Pre-Deployment
- Configure all Ollama endpoints
- Set appropriate
max_concurrent_connections - Configure API keys for remote endpoints
- Test configuration locally
- Set up monitoring and alerting
- Configure logging
- Set up backup for token counts database
Post-Deployment
- Verify health endpoint:
curl http://<router>/health - Check endpoint status:
curl http://<router>/api/config - Monitor connection counts:
curl http://<router>/api/usage - Set up regular backups
- Configure auto-restart on failure
- Monitor performance metrics
Scaling Guidelines
Vertical Scaling
- Increase
max_concurrent_connectionsfor more parallel requests - Add more CPU/memory to the router instance
- Monitor memory usage (token buffer grows with usage)
Horizontal Scaling
- Deploy multiple router instances
- Use a load balancer to distribute traffic
- Each instance maintains its own connection tracking
- Database can be shared or per-instance
Database Considerations
- SQLite is sufficient for single-instance deployments
- For multi-instance deployments, consider PostgreSQL
- Regular backups are recommended
- Database size grows with token usage history
Security Best Practices
Network Security
- Use TLS for all external connections
- Restrict access to router port (12434)
- Use firewall rules to limit access
- Consider using VPN for internal communications
Configuration Security
- Store API keys in environment variables
- Restrict access to config.yaml
- Use secrets management for production deployments
- Rotate API keys regularly
Runtime Security
- Run as non-root user
- Set appropriate file permissions
- Monitor for suspicious activity
- Keep dependencies updated
Troubleshooting Deployment Issues
Common Issues
Problem: Router not starting
- Check: Logs for configuration errors
- Solution: Validate config.yaml syntax
Problem: Endpoints showing as unavailable
- Check: Network connectivity from router to endpoints
- Solution: Verify firewall rules and DNS resolution
Problem: High latency
- Check: Endpoint health and connection counts
- Solution: Add more endpoints or increase concurrency limits
Problem: Database errors
- Check: Database file permissions
- Solution: Ensure write permissions for the database path
Problem: Connection limits being hit
- Check:
/api/usageendpoint - Solution: Increase
max_concurrent_connectionsor add endpoints
Examples
See the examples directory for ready-to-use deployment examples.