openai compatible endpoints not used #128

Closed
opened 2026-06-25 10:08:20 +02:00 by JTHesse · 3 comments

Hi alpha nerd, I am not entirely sure if I get this right, but I think that openai compatible endpoints won't be used in combination with ollama endpoints that promote the same model.
I am currently adding a vllm server to the nomyo config. If the endpoint is used solely, everything works just fine.
However, if nomyo is trying to check whether a model is already loaded in _fetch_loaded_models_internal openai compatible endpoints will be filtered out. I guess an exception for those should fix this, so for openai endpoints models are always loaded.
Thank you for considering this ;)

Hi alpha nerd, I am not entirely sure if I get this right, but I think that openai compatible endpoints won't be used in combination with ollama endpoints that promote the same model. I am currently adding a vllm server to the nomyo config. If the endpoint is used solely, everything works just fine. However, if nomyo is trying to check whether a model is already loaded in `_fetch_loaded_models_internal` openai compatible endpoints will be filtered out. I guess an exception for those should fix this, so for openai endpoints models are always loaded. Thank you for considering this ;)
Author

In addition to this nomyo also sometimes tries to use models that are not promoted by the endpoint:
[generate_proxy] upstream error from (http://192.168.0.1:8000/v1, qwen2.5-coder:1.5b-base) status=404 type=NotFoundError: Error code: 404 - {'error': {'message': 'The model qwen2.5-coder:1.5b-base does not exist.', 'type': 'NotFoundError', 'param': 'model', 'code': 404}}

In addition to this nomyo also sometimes tries to use models that are not promoted by the endpoint: `[generate_proxy] upstream error from (http://192.168.0.1:8000/v1, qwen2.5-coder:1.5b-base) status=404 type=NotFoundError: Error code: 404 - {'error': {'message': 'The model `qwen2.5-coder:1.5b-base` does not exist.', 'type': 'NotFoundError', 'param': 'model', 'code': 404}}`
alpha-nerd added the
bug
label 2026-06-25 10:55:39 +02:00
alpha-nerd self-assigned this 2026-06-25 10:56:15 +02:00
Owner

@JTHesse wrote in #128 (comment):

Hi alpha nerd, I am not entirely sure if I get this right, but I think that openai compatible endpoints won't be used in combination with ollama endpoints that promote the same model. I am currently adding a vllm server to the nomyo config. If the endpoint is used solely, everything works just fine. However, if nomyo is trying to check whether a model is already loaded in _fetch_loaded_models_internal openai compatible endpoints will be filtered out. I guess an exception for those should fix this, so for openai endpoints models are always loaded. Thank you for considering this ;)

This is a clear bug in the routing engine and might have been introduced during refactoring.
Thank you for pointing this out. A fix is on the way.

@JTHesse wrote in https://bitfreedom.net/code/nomyo-ai/nomyo-router/issues/128#issue-2255: > Hi alpha nerd, I am not entirely sure if I get this right, but I think that openai compatible endpoints won't be used in combination with ollama endpoints that promote the same model. I am currently adding a vllm server to the nomyo config. If the endpoint is used solely, everything works just fine. However, if nomyo is trying to check whether a model is already loaded in `_fetch_loaded_models_internal` openai compatible endpoints will be filtered out. I guess an exception for those should fix this, so for openai endpoints models are always loaded. Thank you for considering this ;) This is a clear bug in the routing engine and might have been introduced during refactoring. Thank you for pointing this out. A fix is on the way.
Owner

docker pull bitfreedom.net/nomyo-ai/nomyo-router:latest contains this fix

`docker pull bitfreedom.net/nomyo-ai/nomyo-router:latest` contains this fix
Sign in to join this conversation.
No description provided.