On premise Ray Serve I/O handling

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.


We have several high end GPU machines (windows) in our on premise render farm and we would like serve AI models for image processing via Ray Serve on a Ray Cluster. I managed to set up a cluster and deploy models but at the moment all requests via Post REST are send to the head node and then distributed to the worker nodes. From my understanding this causes a massive I/O bottle neck on the head node.
We have the following workflow:

  1. Deploy python Pytorch inference script via “Serve deploy” to the cluster with a config file and a fixed IP.
  2. Send serialized image data as POST REST request from other software packages in C++ or python ( for example Nuke from the Foundry) to the IP address to the defined endpoint.
  3. Get the results back via POST REST

My Question:
As we are building a node for Nuke for example, this has to work in C++ as well. What options are there to avoid sending the request first to the head node ? Are there any options to send it directly to a free worker node ? How would I acquire that IP address of that worker node ? Or would you suggest another approach like using the ray API in C++?

I’m happy for any ideas!


You can add your own load balancer on top of the cluster so the request can be sent to a worker node or head node. Architecture — Ray 2.3.1