Hi all, I am currently optimizing the executing speed of a machine learning project. The main thread includes two parts, one is running the inference on GPU using Pytorch lightning framework, the other is image processing on the batch output. To best utilize CPU and GPU device, I use a queue to cache the output from GPU batch output and use a thread as a background worker to handle the image processing tasks from the queue so that the GPU could run independently. However this kind of design will meet the python GIL issue, the image processing thread has to wait for the main thread sometimes and slow down total inference speed. Wonder if there is a better design format using ray actors or some other mechanism. So many thanks in advance!
For this use case you can use two Ray actors, one actor for image process and the other for inference.
1 Like
@simon-mo Thanks so much, one problem I concerned is where to initialize the second actor? Do you mean I could initialize an actor inside the pytorch lightning predict step?
Ideally you would put image processing in one actor, and wrap the pytorch lightning in another actor. Then you code can ship the data around:
single_out_ref = predictor.remote(process.remote(input))
Thanks so much, I am going to implement it.
1 Like
Hi @Eddie-kindergarden, I stumbled upon this question and was wondering if you had tried out Ray Datasets? It supports this use case pretty well; we actually have an example for pipelined parallel batch inference in our docs!
1 Like