nomyo-router/doc/configuration.md
alpha-nerd-nomyo 20a016269d feat:
added buffer_lock to prevent race condition in high concurrency scenarios
added documentation
2026-01-05 17:16:31 +01:00

197 lines
5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Configuration Guide
## Configuration File
The NOMYO Router is configured via a YAML file (default: `config.yaml`). This file defines the Ollama endpoints, connection limits, and API keys.
### Basic Configuration
```yaml
# config.yaml
endpoints:
- http://localhost:11434
- http://ollama-server:11434
# Maximum concurrent connections *per endpointmodel pair*
max_concurrent_connections: 2
```
### Complete Example
```yaml
# config.yaml
endpoints:
- http://192.168.0.50:11434
- http://192.168.0.51:11434
- http://192.168.0.52:11434
- https://api.openai.com/v1
# Maximum concurrent connections *per endpointmodel pair* (equals to OLLAMA_NUM_PARALLEL)
max_concurrent_connections: 2
# API keys for remote endpoints
# Set an environment variable like OPENAI_KEY
# Confirm endpoints are exactly as in endpoints block
api_keys:
"http://192.168.0.50:11434": "ollama"
"http://192.168.0.51:11434": "ollama"
"http://192.168.0.52:11434": "ollama"
"https://api.openai.com/v1": "${OPENAI_KEY}"
```
## Configuration Options
### `endpoints`
**Type**: `list[str]`
**Description**: List of Ollama endpoint URLs. Can include both Ollama endpoints (`http://host:11434`) and OpenAI-compatible endpoints (`https://api.openai.com/v1`).
**Examples**:
```yaml
endpoints:
- http://localhost:11434
- http://ollama1:11434
- http://ollama2:11434
- https://api.openai.com/v1
- https://api.anthropic.com/v1
```
**Notes**:
- Ollama endpoints use the standard `/api/` prefix
- OpenAI-compatible endpoints use `/v1` prefix
- The router automatically detects endpoint type based on URL pattern
### `max_concurrent_connections`
**Type**: `int`
**Default**: `1`
**Description**: Maximum number of concurrent connections allowed per endpoint-model pair. This corresponds to Ollama's `OLLAMA_NUM_PARALLEL` setting.
**Example**:
```yaml
max_concurrent_connections: 4
```
**Notes**:
- This setting controls how many requests can be processed simultaneously for a specific model on a specific endpoint
- When this limit is reached, the router will route requests to other endpoints with available capacity
- Higher values allow more parallel requests but may increase memory usage
### `api_keys`
**Type**: `dict[str, str]`
**Description**: Mapping of endpoint URLs to API keys. Used for authenticating with remote endpoints.
**Example**:
```yaml
api_keys:
"http://192.168.0.50:11434": "ollama"
"https://api.openai.com/v1": "${OPENAI_KEY}"
```
**Environment Variables**:
- API keys can reference environment variables using `${VAR_NAME}` syntax
- The router will expand these references at startup
- Example: `${OPENAI_KEY}` will be replaced with the value of the `OPENAI_KEY` environment variable
## Environment Variables
### `NOMYO_ROUTER_CONFIG_PATH`
**Description**: Path to the configuration file. If not set, defaults to `config.yaml` in the current working directory.
**Example**:
```bash
export NOMYO_ROUTER_CONFIG_PATH=/etc/nomyo-router/config.yaml
```
### `NOMYO_ROUTER_DB_PATH`
**Description**: Path to the SQLite database file for storing token counts. If not set, defaults to `token_counts.db` in the current working directory.
**Example**:
```bash
export NOMYO_ROUTER_DB_PATH=/var/lib/nomyo-router/token_counts.db
```
### API-Specific Keys
You can set API keys directly as environment variables:
```bash
export OPENAI_KEY=your_openai_api_key
export ANTHROPIC_KEY=your_anthropic_api_key
```
## Configuration Best Practices
### Multiple Ollama Instances
For a cluster of Ollama instances:
```yaml
endpoints:
- http://ollama-worker1:11434
- http://ollama-worker2:11434
- http://ollama-worker3:11434
max_concurrent_connections: 2
```
**Recommendation**: Set `max_concurrent_connections` to match your Ollama instances' `OLLAMA_NUM_PARALLEL` setting.
### Mixed Endpoints
Combining Ollama and OpenAI endpoints:
```yaml
endpoints:
- http://localhost:11434
- https://api.openai.com/v1
api_keys:
"https://api.openai.com/v1": "${OPENAI_KEY}"
```
**Note**: The router will automatically route requests based on model availability across all endpoints.
### High Availability
For production deployments:
```yaml
endpoints:
- http://ollama-primary:11434
- http://ollama-secondary:11434
- http://ollama-tertiary:11434
max_concurrent_connections: 3
```
**Recommendation**: Use multiple endpoints for redundancy and load distribution.
## Configuration Validation
The router validates the configuration at startup:
1. **Endpoint URLs**: Must be valid URLs
2. **API Keys**: Must be strings (can reference environment variables)
3. **Connection Limits**: Must be positive integers
If the configuration is invalid, the router will exit with an error message.
## Dynamic Configuration
The configuration is loaded at startup and cannot be changed without restarting the router. For production deployments, consider:
1. Using a configuration management system
2. Implementing a rolling restart strategy
3. Using environment variables for sensitive data
## Example Configurations
See the [examples](examples/) directory for ready-to-use configuration examples.