On premise Ray Serve I/O handling

bananajoe182 · April 6, 2023, 7:49pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,

We have several high end GPU machines (windows) in our on premise render farm and we would like serve AI models for image processing via Ray Serve on a Ray Cluster. I managed to set up a cluster and deploy models but at the moment all requests via Post REST are send to the head node and then distributed to the worker nodes. From my understanding this causes a massive I/O bottle neck on the head node.
We have the following workflow:

Deploy python Pytorch inference script via “Serve deploy” to the cluster with a config file and a fixed IP.
Send serialized image data as POST REST request from other software packages in C++ or python ( for example Nuke from the Foundry) to the IP address to the defined endpoint.
Get the results back via POST REST

My Question:
As we are building a node for Nuke for example, this has to work in C++ as well. What options are there to avoid sending the request first to the head node ? Are there any options to send it directly to a free worker node ? How would I acquire that IP address of that worker node ? Or would you suggest another approach like using the ray API in C++?

I’m happy for any ideas!

Thanks,
Oliver

Akshay_Malik · April 7, 2023, 6:20pm

You can add your own load balancer on top of the cluster so the request can be sent to a worker node or head node. Architecture — Ray 2.3.1

Topic		Replies	Views
Deployment to on-premise Cluster Ray Clusters	0	333	August 17, 2023
Using ray serve for video pipeline Ray Serve	1	470	June 5, 2023
Ray Serve: custom resource optimization Ray Serve	3	471	January 26, 2023
Expose deployments Ray Serve	3	756	August 28, 2023
Ray Serve HTTP requests handling	6	973	April 6, 2023

On premise Ray Serve I/O handling

Related topics