nomyo-router/config.yaml

# config.yaml
endpoints:
  - http://192.168.0.50:11434
  - http://192.168.0.51:11434
  - http://192.168.0.52:11434
  - https://api.openai.com/v1

llama_server_endpoints:
  - http://192.168.0.50:8889/v1

# Maximum concurrent connections *per endpoint‑model pair* (equals to OLLAMA_NUM_PARALLEL)
max_concurrent_connections: 2

# Optional router-level API key that gates router/API/web UI access (leave empty to disable)
nomyo-router-api-key: ""

# API keys for remote endpoints
# Set an environment variable like OPENAI_KEY
# Confirm endpoints are exactly as in endpoints block
api_keys:
  "http://192.168.0.50:11434": "ollama"
  "http://192.168.0.51:11434": "ollama"
  "http://192.168.0.52:11434": "ollama"
  "https://api.openai.com/v1": "${OPENAI_KEY}"
  "http://192.168.0.50:8889/v1": "llama"

# -------------------------------------------------------------
# Semantic LLM Cache (optional — disabled by default)
# Caches LLM responses to cut costs and latency on repeated or
# semantically similar prompts.
# Cached routes: /api/chat  /api/generate  /v1/chat/completions  /v1/completions
# MOE requests (moe-* model prefix) always bypass the cache.
# -------------------------------------------------------------
# cache_enabled: true

# Backend — where cached responses are stored:
#   memory  → in-process LRU (lost on restart, not shared across replicas) [default]
#   sqlite  → persistent file-based   (single instance, survives restart)
#   redis   → distributed             (shared across replicas, requires Redis)
# cache_backend: memory

# Cosine similarity threshold for a cache hit:
#   1.0  → exact match only  (works on any image variant)
#   <1.0 → semantic matching (requires the :semantic Docker image tag)
# cache_similarity: 0.9

# Response TTL in seconds. Remove the key or set to null to cache forever.
# cache_ttl: 3600

# SQLite backend: path to the cache database file
# cache_db_path: llm_cache.db

# Redis backend: connection URL
# cache_redis_url: redis://localhost:6379/0

# Weight of the BM25-weighted chat-history embedding vs last-user-message embedding.
# 0.3 = 30% history context signal, 70% question signal.
# Only relevant when cache_similarity < 1.0.
# cache_history_weight: 0.3
-												Create config.yaml
											
										
										
											2025-09-17 11:43:12 +02:00
+								# config.yaml
 								endpoints:
 								  - http://192.168.0.50:11434
 								  - http://192.168.0.51:11434
 								  - http://192.168.0.52:11434
 								  - https://api.openai.com/v1
-												feat(router): add logprob support in /api/chat

Add logprob support to the OpenAI-to-Ollama proxy by converting OpenAI logprob formats to Ollama types. Also update the ollama dependency.

											
										
										
											2026-02-13 13:29:45 +01:00
+								llama_server_endpoints:
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
+								  - http://192.168.0.50:8889/v1
-												feat(router): add logprob support in /api/chat

Add logprob support to the OpenAI-to-Ollama proxy by converting OpenAI logprob formats to Ollama types. Also update the ollama dependency.

											
										
										
											2026-02-13 13:29:45 +01:00
-												Create config.yaml
											
										
										
											2025-09-17 11:43:12 +02:00
+								# Maximum concurrent connections *per endpoint‑model pair* (equals to OLLAMA_NUM_PARALLEL)
 								max_concurrent_connections: 2
-												add: Optional router-level API key that gates router/API/web UI access

Optional router-level API key that gates router/API/web UI access (leave empty to disable)

## Supplying the router API key

If you set `nomyo-router-api-key` in `config.yaml` (or `NOMYO_ROUTER_API_KEY` env), every request to NOMYO Router must include the key:

- HTTP header (recommended): `Authorization: Bearer <router_key>`
- Query param (fallback): `?api_key=<router_key>`

Examples:
```bash
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
curl "http://localhost:12434/api/tags?api_key=$NOMYO_ROUTER_API_KEY"
```

											
										
										
											2026-01-14 09:28:02 +01:00
+								# Optional router-level API key that gates router/API/web UI access (leave empty to disable)
 								nomyo-router-api-key: ""
-												Create config.yaml
											
										
										
											2025-09-17 11:43:12 +02:00
+								# API keys for remote endpoints
 								# Set an environment variable like OPENAI_KEY
 								# Confirm endpoints are exactly as in endpoints block
 								api_keys:
 								  "http://192.168.0.50:11434": "ollama"
 								  "http://192.168.0.51:11434": "ollama"
 								  "http://192.168.0.52:11434": "ollama"
 								  "https://api.openai.com/v1": "${OPENAI_KEY}"
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
+								  "http://192.168.0.50:8889/v1": "llama"
 								# -------------------------------------------------------------
 								# Semantic LLM Cache (optional — disabled by default)
 								# Caches LLM responses to cut costs and latency on repeated or
 								# semantically similar prompts.
 								# Cached routes: /api/chat  /api/generate  /v1/chat/completions  /v1/completions
 								# MOE requests (moe-* model prefix) always bypass the cache.
 								# -------------------------------------------------------------
-												conf: clean default conf

											
										
										
											2026-03-08 09:35:40 +01:00
+								# cache_enabled: true
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
 								# Backend — where cached responses are stored:
 								#   memory  → in-process LRU (lost on restart, not shared across replicas) [default]
 								#   sqlite  → persistent file-based   (single instance, survives restart)
 								#   redis   → distributed             (shared across replicas, requires Redis)
-												conf: clean default conf

											
										
										
											2026-03-08 09:35:40 +01:00
+								# cache_backend: memory
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
 								# Cosine similarity threshold for a cache hit:
 								#   1.0  → exact match only  (works on any image variant)
 								#   <1.0 → semantic matching (requires the :semantic Docker image tag)
-												conf: clean default conf

											
										
										
											2026-03-08 09:35:40 +01:00
+								# cache_similarity: 0.9
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
 								# Response TTL in seconds. Remove the key or set to null to cache forever.
-												conf: clean default conf

											
										
										
											2026-03-08 09:35:40 +01:00
+								# cache_ttl: 3600
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
 								# SQLite backend: path to the cache database file
-												conf: clean default conf

											
										
										
											2026-03-08 09:35:40 +01:00
+								# cache_db_path: llm_cache.db
-												feat: adding a semantic cache layer

											
										
										
											2026-03-08 09:12:09 +01:00
 								# Redis backend: connection URL
 								# cache_redis_url: redis://localhost:6379/0
 								# Weight of the BM25-weighted chat-history embedding vs last-user-message embedding.
 								# 0.3 = 30% history context signal, 70% question signal.
 								# Only relevant when cache_similarity < 1.0.
 								# cache_history_weight: 0.3