# Deployment Guide

## Deployment Options

NOMYO Router can be deployed in various environments depending on your requirements.

## 1. Bare Metal / VM Deployment

### Prerequisites

- Python 3.10+
- pip
- Virtual environment (recommended)

### Installation

```bash
# Clone the repository
git clone https://github.com/nomyo-ai/nomyo-router.git
cd nomyo-router

# Create virtual environment
python3 -m venv .venv/router
source .venv/router/bin/activate

# Install dependencies
pip3 install -r requirements.txt

# Configure endpoints
nano config.yaml
```

### Running the Router

```bash
# Basic startup
uvicorn router:app --host 0.0.0.0 --port 12434

# With custom configuration path
export NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml
uvicorn router:app --host 0.0.0.0 --port 12434

# With custom database path
export NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db
uvicorn router:app --host 0.0.0.0 --port 12434
```

### Systemd Service

Create `/etc/systemd/system/nomyo-router.service`:

```ini
[Unit]
Description=NOMYO Router - Ollama Proxy
After=network.target

[Service]
User=nomyo
Group=nomyo
WorkingDirectory=/opt/nomyo-router
Environment="NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml"
Environment="NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db"
ExecStart=/opt/nomyo-router/.venv/router/bin/uvicorn router:app --host 0.0.0.0 --port 12434
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nomyo-router

[Install]
WantedBy=multi-user.target
```

Enable and start the service:

```bash
sudo systemctl daemon-reload
sudo systemctl enable nomyo-router
sudo systemctl start nomyo-router
sudo systemctl status nomyo-router
```

## 2. Docker Deployment

### Image variants

| Tag | Semantic cache | Image size |
|---|---|---|
| `latest` | ❌ exact match only | ~300 MB |
| `latest-semantic` | ✅ sentence-transformers + `all-MiniLM-L6-v2` pre-baked | ~800 MB |

The `:semantic` variant enables `cache_similarity < 1.0` in `config.yaml`. The lean image falls back to exact-match caching with a warning if semantic mode is configured.

### Build the Image

```bash
# Lean build (exact match cache, default)
docker build -t nomyo-router .

# Semantic build (~500 MB larger, all-MiniLM-L6-v2 model baked in at build time)
docker build --build-arg SEMANTIC_CACHE=true -t nomyo-router:semantic .
```

### Run the Container

```bash
docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /absolute/path/to/config_folder:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  nomyo-router
```

### Advanced Docker Configuration

#### Custom Port

```bash
docker run -d \
  --name nomyo-router \
  -p 9000:12434 \
  -v /path/to/config:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  nomyo-router \
  -- --port 9000
```

#### Custom Host

```bash
docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /path/to/config:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  -e UVICORN_HOST=0.0.0.0 \
  nomyo-router
```

#### Persistent Database

```bash
docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /path/to/config:/app/config/ \
  -v /path/to/db:/app/token_counts.db \
  -e CONFIG_PATH=/app/config/config.yaml \
  -e NOMYO_ROUTER_DB_PATH=/app/token_counts.db \
  nomyo-router
```

### Docker Compose Example

See [examples/docker-compose.yml](examples/docker-compose.yml) for a complete Docker Compose example.

## 3. Kubernetes Deployment

### Prerequisites

- Kubernetes cluster
- kubectl configured
- Helm (optional)

### Basic Deployment

Create `nomyo-router-deployment.yaml`:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nomyo-router
  labels:
    app: nomyo-router
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nomyo-router
  template:
    metadata:
      labels:
        app: nomyo-router
    spec:
      containers:
      - name: nomyo-router
        image: nomyo-router:latest
        ports:
        - containerPort: 12434
        env:
        - name: CONFIG_PATH
          value: "/app/config/config.yaml"
        - name: NOMYO_ROUTER_DB_PATH
          value: "/app/token_counts.db"
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        - name: db-volume
          mountPath: /app/token_counts.db
          subPath: token_counts.db
      volumes:
      - name: config-volume
        configMap:
          name: nomyo-router-config
      - name: db-volume
        persistentVolumeClaim:
          claimName: nomyo-router-db-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: nomyo-router
spec:
  selector:
    app: nomyo-router
  ports:
    - protocol: TCP
      port: 80
      targetPort: 12434
  type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nomyo-router-config
data:
  config.yaml: |
    endpoints:
      - http://ollama-service:11434
    max_concurrent_connections: 2
    nomyo-router-api-key: ""
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nomyo-router-db-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
```

Apply the deployment:

```bash
kubectl apply -f nomyo-router-deployment.yaml
```

### Horizontal Pod Autoscaler

Create `nomyo-router-hpa.yaml`:

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nomyo-router-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nomyo-router
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```

Apply the HPA:

```bash
kubectl apply -f nomyo-router-hpa.yaml
```

## 4. Production Deployment

### High Availability Setup

For production environments with multiple Ollama instances:

```yaml
# config.yaml
endpoints:
  - http://ollama-worker1:11434
  - http://ollama-worker2:11434
  - http://ollama-worker3:11434
  - https://api.openai.com/v1

max_concurrent_connections: 4

# Optional router-level API key to secure the router and dashboard (leave blank to disable)
nomyo-router-api-key: ""

api_keys:
  "https://api.openai.com/v1": "${OPENAI_KEY}"
```

### Load Balancing

Deploy multiple router instances behind a load balancer:

```
┌───────────────────────────────────────────────────────────────┐
│                     Load Balancer (NGINX, Traefik)             │
└───────────────────────────────────────────────────────────────┘
                        │
                        ├─┬───────────────────────────────────────┐
                        │   │                                   │
                        ▼   ▼                                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐
│  Router Instance │ │  Router Instance │ │  Router Instance        │
│  (Pod 1)        │ │  (Pod 2)        │ │  (Pod 3)               │
└─────────────────┘ └─────────────────┘ └─────────────────────────┘
                        │
                        ▼
┌───────────────────────────────────────────────────────────────┐
│                     Ollama Cluster                              │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────────┐  │
│  │ Ollama      │ │ Ollama      │ │ OpenAI API                 │  │
│  │ Worker 1    │ │ Worker 2    │ │ (Fallback)                 │  │
│  └─────────────┘ └─────────────┘ └─────────────────────────────┘  │
└───────────────────────────────────────────────────────────────┘
```

### Monitoring and Logging

#### Prometheus Monitoring

Create a Prometheus scrape configuration:

```yaml
scrape_configs:
  - job_name: 'nomyo-router'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['nomyo-router:12434']
```

#### Logging

Configure log aggregation:

```bash
# In Docker
docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /path/to/config:/app/config/ \
  -e CONFIG_PATH=/app/config/config.yaml \
  --log-driver=fluentd \
  --log-opt fluentd-address=fluentd:24224 \
  nomyo-router
```

## Deployment Checklist

### Pre-Deployment

- [ ] Configure all Ollama endpoints
- [ ] Set appropriate `max_concurrent_connections`
- [ ] Configure API keys for remote endpoints
- [ ] Test configuration locally
- [ ] Set up monitoring and alerting
- [ ] Configure logging
- [ ] Set up backup for token counts database

### Post-Deployment

- [ ] Verify health endpoint: `curl http://<router>/health`
- [ ] Check endpoint status: `curl http://<router>/api/config`
- [ ] Monitor connection counts: `curl http://<router>/api/usage`
- [ ] Set up regular backups
- [ ] Configure auto-restart on failure
- [ ] Monitor performance metrics

## Scaling Guidelines

### Vertical Scaling

- Increase `max_concurrent_connections` for more parallel requests
- Add more CPU/memory to the router instance
- Monitor memory usage (token buffer grows with usage)

### Horizontal Scaling

- Deploy multiple router instances
- Use a load balancer to distribute traffic
- Each instance maintains its own connection tracking
- Database can be shared or per-instance

### Database Considerations

- SQLite is sufficient for single-instance deployments
- For multi-instance deployments, consider PostgreSQL
- Regular backups are recommended
- Database size grows with token usage history

## Security Best Practices

### Network Security

- Use TLS for all external connections
- Restrict access to router port (12434)
- Use firewall rules to limit access
- Consider using VPN for internal communications

### Configuration Security

- Store API keys in environment variables
- Restrict access to config.yaml
- Use secrets management for production deployments
- Rotate API keys regularly

### Runtime Security

- Run as non-root user
- Set appropriate file permissions
- Monitor for suspicious activity
- Keep dependencies updated

## Troubleshooting Deployment Issues

### Common Issues

**Problem**: Router not starting

- **Check**: Logs for configuration errors
- **Solution**: Validate config.yaml syntax

**Problem**: Endpoints showing as unavailable

- **Check**: Network connectivity from router to endpoints
- **Solution**: Verify firewall rules and DNS resolution

**Problem**: High latency

- **Check**: Endpoint health and connection counts
- **Solution**: Add more endpoints or increase concurrency limits

**Problem**: Database errors

- **Check**: Database file permissions
- **Solution**: Ensure write permissions for the database path

**Problem**: Connection limits being hit

- **Check**: `/api/usage` endpoint
- **Solution**: Increase `max_concurrent_connections` or add endpoints

## Examples

See the [examples](examples/) directory for ready-to-use deployment examples.