is a transparent proxy for  with model deployment aware routing.
It runs between your frontend application and Ollama backend and is transparent for both, the front- and backend.
Copy/Clone the repository, edit the config.yaml by adding your Ollama backend servers and the max_concurrent_connections setting per endpoint. This equals to your OLLAMA_NUM_PARALLEL config settings.
Run the NOMYO Router in a dedicated virtual environment, install the requirements and run with uvicorn: