nomyo-router/doc/deployment.md

12 KiB

Deployment Guide

Deployment Options

NOMYO Router can be deployed in various environments depending on your requirements.

1. Bare Metal / VM Deployment

Prerequisites

  • Python 3.10+
  • pip
  • Virtual environment (recommended)

Installation

# Clone the repository
git clone https://github.com/nomyo-ai/nomyo-router.git
cd nomyo-router

# Create virtual environment
python3 -m venv .venv/router
source .venv/router/bin/activate

# Install dependencies
pip3 install -r requirements.txt

# Configure endpoints
nano config.yaml

Running the Router

# Basic startup
uvicorn router:app --host 0.0.0.0 --port 12434

# With custom configuration path
export NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml
uvicorn router:app --host 0.0.0.0 --port 12434

# With custom database path
export NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db
uvicorn router:app --host 0.0.0.0 --port 12434

Systemd Service

Create /etc/systemd/system/nomyo-router.service:

[Unit]
Description=NOMYO Router - Ollama Proxy
After=network.target

[Service]
User=nomyo
Group=nomyo
WorkingDirectory=/opt/nomyo-router
Environment="NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml"
Environment="NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db"
ExecStart=/opt/nomyo-router/.venv/router/bin/uvicorn router:app --host 0.0.0.0 --port 12434
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nomyo-router

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable nomyo-router
sudo systemctl start nomyo-router
sudo systemctl status nomyo-router

2. Docker Deployment

Image variants

Tag Semantic cache Image size
latest exact match only ~300 MB
latest-semantic sentence-transformers + all-MiniLM-L6-v2 pre-baked ~800 MB

The :semantic variant enables cache_similarity < 1.0 in config.yaml. The lean image falls back to exact-match caching with a warning if semantic mode is configured.

Build the Image

# Lean build (exact match cache, default)
docker build -t nomyo-router .

# Semantic build (~500 MB larger, all-MiniLM-L6-v2 model baked in at build time)
docker build --build-arg SEMANTIC_CACHE=true -t nomyo-router:semantic .

Run the Container

docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /absolute/path/to/config_folder:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  nomyo-router

Advanced Docker Configuration

Custom Port

docker run -d \
  --name nomyo-router \
  -p 9000:12434 \
  -v /path/to/config:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  nomyo-router \
  -- --port 9000

Custom Host

docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /path/to/config:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  -e UVICORN_HOST=0.0.0.0 \
  nomyo-router

Persistent Database

docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /path/to/config:/app/config/ \
  -v /path/to/db:/app/token_counts.db \
  -e CONFIG_PATH=/app/config/config.yaml \
  -e NOMYO_ROUTER_DB_PATH=/app/token_counts.db \
  nomyo-router

Docker Compose Example

See examples/docker-compose.yml for a complete Docker Compose example.

3. Kubernetes Deployment

Prerequisites

  • Kubernetes cluster
  • kubectl configured
  • Helm (optional)

Basic Deployment

Create nomyo-router-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nomyo-router
  labels:
    app: nomyo-router
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nomyo-router
  template:
    metadata:
      labels:
        app: nomyo-router
    spec:
      containers:
      - name: nomyo-router
        image: nomyo-router:latest
        ports:
        - containerPort: 12434
        env:
        - name: CONFIG_PATH
          value: "/app/config/config.yaml"
        - name: NOMYO_ROUTER_DB_PATH
          value: "/app/token_counts.db"
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        - name: db-volume
          mountPath: /app/token_counts.db
          subPath: token_counts.db
      volumes:
      - name: config-volume
        configMap:
          name: nomyo-router-config
      - name: db-volume
        persistentVolumeClaim:
          claimName: nomyo-router-db-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: nomyo-router
spec:
  selector:
    app: nomyo-router
  ports:
    - protocol: TCP
      port: 80
      targetPort: 12434
  type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nomyo-router-config
data:
  config.yaml: |
    endpoints:
      - http://ollama-service:11434
    max_concurrent_connections: 2
    nomyo-router-api-key: ""
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nomyo-router-db-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Apply the deployment:

kubectl apply -f nomyo-router-deployment.yaml

Horizontal Pod Autoscaler

Create nomyo-router-hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nomyo-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nomyo-router
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply the HPA:

kubectl apply -f nomyo-router-hpa.yaml

4. Production Deployment

High Availability Setup

For production environments with multiple Ollama instances:

# config.yaml
endpoints:
  - http://ollama-worker1:11434
  - http://ollama-worker2:11434
  - http://ollama-worker3:11434
  - https://api.openai.com/v1

max_concurrent_connections: 4

# Optional router-level API key to secure the router and dashboard (leave blank to disable)
nomyo-router-api-key: ""

api_keys:
  "https://api.openai.com/v1": "${OPENAI_KEY}"

Load Balancing

Deploy multiple router instances behind a load balancer:

┌───────────────────────────────────────────────────────────────┐
│                     Load Balancer (NGINX, Traefik)             │
└───────────────────────────────────────────────────────────────┘
                        │
                        ├─┬───────────────────────────────────────┐
                        │   │                                   │
                        ▼   ▼                                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐
│  Router Instance │ │  Router Instance │ │  Router Instance        │
│  (Pod 1)        │ │  (Pod 2)        │ │  (Pod 3)               │
└─────────────────┘ └─────────────────┘ └─────────────────────────┘
                        │
                        ▼
┌───────────────────────────────────────────────────────────────┐
│                     Ollama Cluster                              │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐  │
│  │ Ollama      │ │ Ollama      │ │ OpenAI API                 │  │
│  │ Worker 1    │ │ Worker 2    │ │ (Fallback)                 │  │
│  └─────────────┘ └─────────────┘ └─────────────────────────────┘  │
└───────────────────────────────────────────────────────────────┘

Monitoring and Logging

Prometheus Monitoring

Create a Prometheus scrape configuration:

scrape_configs:
  - job_name: 'nomyo-router'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['nomyo-router:12434']

Logging

Configure log aggregation:

# In Docker
docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /path/to/config:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  --log-driver=fluentd \
  --log-opt fluentd-address=fluentd:24224 \
  nomyo-router

Deployment Checklist

Pre-Deployment

  • Configure all Ollama endpoints
  • Set appropriate max_concurrent_connections
  • Configure API keys for remote endpoints
  • Test configuration locally
  • Set up monitoring and alerting
  • Configure logging
  • Set up backup for token counts database

Post-Deployment

  • Verify health endpoint: curl http://<router>/health
  • Check endpoint status: curl http://<router>/api/config
  • Monitor connection counts: curl http://<router>/api/usage
  • Set up regular backups
  • Configure auto-restart on failure
  • Monitor performance metrics

Scaling Guidelines

Vertical Scaling

  • Increase max_concurrent_connections for more parallel requests
  • Add more CPU/memory to the router instance
  • Monitor memory usage (token buffer grows with usage)

Horizontal Scaling

  • Deploy multiple router instances
  • Use a load balancer to distribute traffic
  • Each instance maintains its own connection tracking
  • Database can be shared or per-instance

Database Considerations

  • SQLite is sufficient for single-instance deployments
  • For multi-instance deployments, consider PostgreSQL
  • Regular backups are recommended
  • Database size grows with token usage history

Security Best Practices

Network Security

  • Use TLS for all external connections
  • Restrict access to router port (12434)
  • Use firewall rules to limit access
  • Consider using VPN for internal communications

Configuration Security

  • Store API keys in environment variables
  • Restrict access to config.yaml
  • Use secrets management for production deployments
  • Rotate API keys regularly

Runtime Security

  • Run as non-root user
  • Set appropriate file permissions
  • Monitor for suspicious activity
  • Keep dependencies updated

Troubleshooting Deployment Issues

Common Issues

Problem: Router not starting

  • Check: Logs for configuration errors
  • Solution: Validate config.yaml syntax

Problem: Endpoints showing as unavailable

  • Check: Network connectivity from router to endpoints
  • Solution: Verify firewall rules and DNS resolution

Problem: High latency

  • Check: Endpoint health and connection counts
  • Solution: Add more endpoints or increase concurrency limits

Problem: Database errors

  • Check: Database file permissions
  • Solution: Ensure write permissions for the database path

Problem: Connection limits being hit

  • Check: /api/usage endpoint
  • Solution: Increase max_concurrent_connections or add endpoints

Examples

See the examples directory for ready-to-use deployment examples.