nomyo-ai/nomyo-router

Fork 0

A transparent (O)llama proxy with model deployment aware routing which auto-manages multiple (O)llama instances in a given network.

Find a file

alpha-nerd-nomyo 95d03d828e Merge pull request #34 from nomyo-ai/dev-v0.7.x docs: adding ghcr docker pull instructions		2026-03-10 15:58:45 +01:00
.github/workflows	feat: add docker publish workflow	2026-03-05 11:09:16 +01:00
doc	fix: better sample config	2026-02-13 10:52:14 +01:00
static	fix(router): normalize model names for usage tracking across endpoints	2026-02-17 11:35:53 +01:00
.gitignore	sqlite: adding connection pooling and WAL	2025-11-20 15:37:04 +01:00
config.yaml	feat(router): add logprob support in /api/chat	2026-02-13 13:29:45 +01:00
db.py	fix(db.py): remove full table scans with proper where clauses for dashboard statistics and calc in db rather than python	2026-03-03 17:20:33 +01:00
Dockerfile	fix: supress dockerfile build warnings	2026-02-19 19:41:58 -05:00
enhance.py	feat: enhance code quality and documentation	2026-01-29 19:59:08 +01:00
entrypoint.sh	feat: enhance code quality and documentation	2026-01-29 19:59:08 +01:00
LICENSE	Create LICENSE	2025-08-26 18:14:45 +02:00
README.md	docs: adding ghcr docker pull instructions	2026-03-05 11:54:42 +01:00
requirements.txt	feat: add uvloop to requirements.txt as optional dependency to improve performance in high concurrent scenarios	2026-03-03 10:31:10 +01:00
router.py	bump version	2026-03-05 11:09:20 +01:00

README.md

NOMYO Router

is a transparent proxy for Ollama with model deployment aware routing.

It runs between your frontend application and Ollama backend and is transparent for both, the front- and backend.

Installation

Copy/Clone the repository, edit the config.yaml by adding your Ollama backend servers and the max_concurrent_connections setting per endpoint. This equals to your OLLAMA_NUM_PARALLEL config settings.

# config.yaml
# Ollama or OpenAI API V1 endpoints
endpoints:
  - http://ollama0:11434
  - http://ollama1:11434
  - http://ollama2:11434
  - https://api.openai.com/v1

# llama.cpp server endpoints
llama_server_endpoints:
  - http://192.168.0.33:8889/v1

# Maximum concurrent connections *per endpoint‑model pair*
max_concurrent_connections: 2

# Optional router-level API key to lock down router + dashboard (leave empty to disable)
nomyo-router-api-key: ""

# API keys for remote endpoints
# Set an environment variable like OPENAI_KEY
# Confirm endpoints are exactly as in endpoints block
api_keys:
  "http://192.168.0.50:11434": "ollama"
  "http://192.168.0.51:11434": "ollama"
  "http://192.168.0.52:11434": "ollama"
  "https://api.openai.com/v1": "${OPENAI_KEY}"
  "http://192.168.0.33:8889/v1": "llama"

Run the NOMYO Router in a dedicated virtual environment, install the requirements and run with uvicorn:

python3 -m venv .venv/router
source .venv/router/bin/activate
pip3 install -r requirements.txt

[optional] on the shell do:

export OPENAI_KEY=YOUR_SECRET_API_KEY
# Optional: router-level key (clients must send Authorization: Bearer)
# export NOMYO_ROUTER_API_KEY=YOUR_ROUTER_KEY

finally you can

uvicorn router:app --host 127.0.0.1 --port 12434

in very high concurrent scenarios (> 500 simultaneous requests) you can also run with uvloop

uvicorn router:app --host 127.0.0.1 --port 12434 --loop uvloop

Docker Deployment

Pre-built image (GitHub Container Registry)

Pre-built multi-arch images (linux/amd64, linux/arm64) are published automatically on every release:

docker pull ghcr.io/nomyo-ai/nomyo-router:latest

Specific version:

docker pull ghcr.io/nomyo-ai/nomyo-router:0.7.0

Build the container image locally:

docker build -t nomyo-router .

Run the router in Docker with your own configuration file mounted from the host. The entrypoint script accepts a --config-path argument so you can point to a file anywhere inside the container:

docker run -d \
  --name nomyo-router \
  -p 12434:12434 \
  -v /absolute/path/to/config_folder:/app/config/ \
  -e CONFIG_PATH /app/config/config.yaml
  nomyo-router \

Notes:

-e CONFIG_PATH sets the NOMYO_ROUTER_CONFIG_PATH environment variable under the hood; you can export it directly instead if you prefer.
To override the bind address or port, export UVICORN_HOST or UVICORN_PORT, or pass the corresponding uvicorn flags after --, e.g. nomyo-router --config-path /config/config.yaml -- --port 9000.
Use docker logs nomyo-router to confirm the loaded endpoints and concurrency settings at startup.

Routing

NOMYO Router accepts any Ollama request on the configured port for any Ollama endpoint from your frontend application. It then checks the available backends for the specific request. When the request is embed(dings), chat or generate the request will be forwarded to a single Ollama server, answered and send back to the router which forwards it back to the frontend.

If another request for the same model config is made, NOMYO Router is aware which model runs on which Ollama server and routes the request to an Ollama server where this model is already deployed.

If at the same time there are more than max concurrent connections than configured, NOMYO Router will route this request to another Ollama server serving the requested model and having the least connections for fastest completion.

This way the Ollama backend servers are utilized more efficient than by simply using a wheighted, round-robin or least-connection approach.

NOMYO Router also supports OpenAI API compatible v1 backend servers.

Supplying the router API key

If you set nomyo-router-api-key in config.yaml (or NOMYO_ROUTER_API_KEY env), every request to NOMYO Router must include the key:

HTTP header (recommended): Authorization: Bearer <router_key>
Query param (fallback): ?api_key=<router_key>

Examples:

curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
curl "http://localhost:12434/api/tags?api_key=$NOMYO_ROUTER_API_KEY"

README.md Unescape Escape