198 lines
5 KiB
Markdown
198 lines
5 KiB
Markdown
|
|
# Configuration Guide
|
|||
|
|
|
|||
|
|
## Configuration File
|
|||
|
|
|
|||
|
|
The NOMYO Router is configured via a YAML file (default: `config.yaml`). This file defines the Ollama endpoints, connection limits, and API keys.
|
|||
|
|
|
|||
|
|
### Basic Configuration
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# config.yaml
|
|||
|
|
endpoints:
|
|||
|
|
- http://localhost:11434
|
|||
|
|
- http://ollama-server:11434
|
|||
|
|
|
|||
|
|
# Maximum concurrent connections *per endpoint‑model pair*
|
|||
|
|
max_concurrent_connections: 2
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Complete Example
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# config.yaml
|
|||
|
|
endpoints:
|
|||
|
|
- http://192.168.0.50:11434
|
|||
|
|
- http://192.168.0.51:11434
|
|||
|
|
- http://192.168.0.52:11434
|
|||
|
|
- https://api.openai.com/v1
|
|||
|
|
|
|||
|
|
# Maximum concurrent connections *per endpoint‑model pair* (equals to OLLAMA_NUM_PARALLEL)
|
|||
|
|
max_concurrent_connections: 2
|
|||
|
|
|
|||
|
|
# API keys for remote endpoints
|
|||
|
|
# Set an environment variable like OPENAI_KEY
|
|||
|
|
# Confirm endpoints are exactly as in endpoints block
|
|||
|
|
api_keys:
|
|||
|
|
"http://192.168.0.50:11434": "ollama"
|
|||
|
|
"http://192.168.0.51:11434": "ollama"
|
|||
|
|
"http://192.168.0.52:11434": "ollama"
|
|||
|
|
"https://api.openai.com/v1": "${OPENAI_KEY}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Configuration Options
|
|||
|
|
|
|||
|
|
### `endpoints`
|
|||
|
|
|
|||
|
|
**Type**: `list[str]`
|
|||
|
|
|
|||
|
|
**Description**: List of Ollama endpoint URLs. Can include both Ollama endpoints (`http://host:11434`) and OpenAI-compatible endpoints (`https://api.openai.com/v1`).
|
|||
|
|
|
|||
|
|
**Examples**:
|
|||
|
|
```yaml
|
|||
|
|
endpoints:
|
|||
|
|
- http://localhost:11434
|
|||
|
|
- http://ollama1:11434
|
|||
|
|
- http://ollama2:11434
|
|||
|
|
- https://api.openai.com/v1
|
|||
|
|
- https://api.anthropic.com/v1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Notes**:
|
|||
|
|
- Ollama endpoints use the standard `/api/` prefix
|
|||
|
|
- OpenAI-compatible endpoints use `/v1` prefix
|
|||
|
|
- The router automatically detects endpoint type based on URL pattern
|
|||
|
|
|
|||
|
|
### `max_concurrent_connections`
|
|||
|
|
|
|||
|
|
**Type**: `int`
|
|||
|
|
|
|||
|
|
**Default**: `1`
|
|||
|
|
|
|||
|
|
**Description**: Maximum number of concurrent connections allowed per endpoint-model pair. This corresponds to Ollama's `OLLAMA_NUM_PARALLEL` setting.
|
|||
|
|
|
|||
|
|
**Example**:
|
|||
|
|
```yaml
|
|||
|
|
max_concurrent_connections: 4
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Notes**:
|
|||
|
|
- This setting controls how many requests can be processed simultaneously for a specific model on a specific endpoint
|
|||
|
|
- When this limit is reached, the router will route requests to other endpoints with available capacity
|
|||
|
|
- Higher values allow more parallel requests but may increase memory usage
|
|||
|
|
|
|||
|
|
### `api_keys`
|
|||
|
|
|
|||
|
|
**Type**: `dict[str, str]`
|
|||
|
|
|
|||
|
|
**Description**: Mapping of endpoint URLs to API keys. Used for authenticating with remote endpoints.
|
|||
|
|
|
|||
|
|
**Example**:
|
|||
|
|
```yaml
|
|||
|
|
api_keys:
|
|||
|
|
"http://192.168.0.50:11434": "ollama"
|
|||
|
|
"https://api.openai.com/v1": "${OPENAI_KEY}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Environment Variables**:
|
|||
|
|
- API keys can reference environment variables using `${VAR_NAME}` syntax
|
|||
|
|
- The router will expand these references at startup
|
|||
|
|
- Example: `${OPENAI_KEY}` will be replaced with the value of the `OPENAI_KEY` environment variable
|
|||
|
|
|
|||
|
|
## Environment Variables
|
|||
|
|
|
|||
|
|
### `NOMYO_ROUTER_CONFIG_PATH`
|
|||
|
|
|
|||
|
|
**Description**: Path to the configuration file. If not set, defaults to `config.yaml` in the current working directory.
|
|||
|
|
|
|||
|
|
**Example**:
|
|||
|
|
```bash
|
|||
|
|
export NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### `NOMYO_ROUTER_DB_PATH`
|
|||
|
|
|
|||
|
|
**Description**: Path to the SQLite database file for storing token counts. If not set, defaults to `token_counts.db` in the current working directory.
|
|||
|
|
|
|||
|
|
**Example**:
|
|||
|
|
```bash
|
|||
|
|
export NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### API-Specific Keys
|
|||
|
|
|
|||
|
|
You can set API keys directly as environment variables:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
export OPENAI_KEY=your_openai_api_key
|
|||
|
|
export ANTHROPIC_KEY=your_anthropic_api_key
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Configuration Best Practices
|
|||
|
|
|
|||
|
|
### Multiple Ollama Instances
|
|||
|
|
|
|||
|
|
For a cluster of Ollama instances:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
endpoints:
|
|||
|
|
- http://ollama-worker1:11434
|
|||
|
|
- http://ollama-worker2:11434
|
|||
|
|
- http://ollama-worker3:11434
|
|||
|
|
|
|||
|
|
max_concurrent_connections: 2
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Recommendation**: Set `max_concurrent_connections` to match your Ollama instances' `OLLAMA_NUM_PARALLEL` setting.
|
|||
|
|
|
|||
|
|
### Mixed Endpoints
|
|||
|
|
|
|||
|
|
Combining Ollama and OpenAI endpoints:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
endpoints:
|
|||
|
|
- http://localhost:11434
|
|||
|
|
- https://api.openai.com/v1
|
|||
|
|
|
|||
|
|
api_keys:
|
|||
|
|
"https://api.openai.com/v1": "${OPENAI_KEY}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Note**: The router will automatically route requests based on model availability across all endpoints.
|
|||
|
|
|
|||
|
|
### High Availability
|
|||
|
|
|
|||
|
|
For production deployments:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
endpoints:
|
|||
|
|
- http://ollama-primary:11434
|
|||
|
|
- http://ollama-secondary:11434
|
|||
|
|
- http://ollama-tertiary:11434
|
|||
|
|
|
|||
|
|
max_concurrent_connections: 3
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Recommendation**: Use multiple endpoints for redundancy and load distribution.
|
|||
|
|
|
|||
|
|
## Configuration Validation
|
|||
|
|
|
|||
|
|
The router validates the configuration at startup:
|
|||
|
|
|
|||
|
|
1. **Endpoint URLs**: Must be valid URLs
|
|||
|
|
2. **API Keys**: Must be strings (can reference environment variables)
|
|||
|
|
3. **Connection Limits**: Must be positive integers
|
|||
|
|
|
|||
|
|
If the configuration is invalid, the router will exit with an error message.
|
|||
|
|
|
|||
|
|
## Dynamic Configuration
|
|||
|
|
|
|||
|
|
The configuration is loaded at startup and cannot be changed without restarting the router. For production deployments, consider:
|
|||
|
|
|
|||
|
|
1. Using a configuration management system
|
|||
|
|
2. Implementing a rolling restart strategy
|
|||
|
|
3. Using environment variables for sensitive data
|
|||
|
|
|
|||
|
|
## Example Configurations
|
|||
|
|
|
|||
|
|
See the [examples](examples/) directory for ready-to-use configuration examples.
|