From 9d135459328deb1a6a0a454d25a9816ccf2c44af Mon Sep 17 00:00:00 2001 From: alpha-nerd-nomyo Date: Tue, 26 Aug 2025 19:51:36 +0200 Subject: [PATCH] Update README.md --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 9dbb58c..ef86831 100644 --- a/README.md +++ b/README.md @@ -25,4 +25,13 @@ uvicorn router:app --host 127.0.0.1 --port 12434 # Routing +NOMYO Router accepts any Ollama request on the configured port for any Ollama endpoint from your frontend application. It then checks the available backends for the specific request. +When the request is embed(dings), chat or generate the request will be forwarded to a single Ollama server, answered and send back to the router which forwards it back to the frontend. + +If now a another request for the same model config is made, NOMYO Router is aware which model runs on which Ollama server and routes the request to an Ollama server where this model is already deployed. + +If at the same time there are more than max concurrent connections than configured, NOMYO Router will route this request to another Ollama server for completion. + +This way the Ollama backend servers are utilized more efficient than by simply using a wheighted, round-robin or least-connection approach. + ![routing](https://github.com/user-attachments/assets/ed05dfbb-fcc8-4ff2-b8ca-3cdce2660c9f)