nomyo-ai/nomyo-router

Fork 0

A transparent (O)llama proxy with model deployment aware routing which auto-manages multiple (O)llama instances in a given network. https://www.nomyo.ai/

Find a file

alpha-nerd-nomyo 8fde5834ba Update README.md		2025-08-30 12:55:47 +02:00
static	Add files via upload	2025-08-30 12:43:35 +02:00
config.yaml	Initial commit	2025-08-26 18:19:43 +02:00
LICENSE	Create LICENSE	2025-08-26 18:14:45 +02:00
README.md	Update README.md	2025-08-30 12:55:47 +02:00
requirements.txt	Adding OpenAI compatibility	2025-08-28 09:40:33 +02:00
router.py	Add files via upload	2025-08-30 12:43:35 +02:00

README.md

NOMYO Router

is a transparent proxy for Ollama with model deployment aware routing.

Screenshot_NOMYO_Router_Dashboard

It runs between your frontend application and Ollama backend and is transparent for both, the front- and backend.

Installation

Copy/Clone the repository, edit the config.yaml by adding your Ollama backend servers and the max_concurrent_connections setting per endpoint. This equals to your OLLAMA_NUM_PARALLEL config settings.

# config.yaml
endpoints:
  - http://ollama0:11434
  - http://ollama1:11434
  - http://ollama2:11434

# Maximum concurrent connections *per endpoint‑model pair*
max_concurrent_connections: 2

Run the NOMYO Router in a dedicated virtual environment, install the requirements and run with uvicorn:

python3 -m venv .venv/router
source .venv/router/bin/activate
pip3 install requirements.txt -r

finally you can

uvicorn router:app --host 127.0.0.1 --port 12434

Routing

NOMYO Router accepts any Ollama request on the configured port for any Ollama endpoint from your frontend application. It then checks the available backends for the specific request. When the request is embed(dings), chat or generate the request will be forwarded to a single Ollama server, answered and send back to the router which forwards it back to the frontend.

If now a another request for the same model config is made, NOMYO Router is aware which model runs on which Ollama server and routes the request to an Ollama server where this model is already deployed.

If at the same time there are more than max concurrent connections than configured, NOMYO Router will route this request to another Ollama server for completion.

This way the Ollama backend servers are utilized more efficient than by simply using a wheighted, round-robin or least-connection approach.

NOMYO Router also supports OpenAI API compatible v1 backend servers.

README.md Unescape Escape

NOMYO Router

Installation

Routing

README.md