Creation of multiple actors on a single GPU in Ray leads to multiple Cuda context loading, causing increased memory usage and slower speed

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Dear maintainers of the Ray open-source project,

We have recently discovered that when multiple Actors are created on a single GPU in Ray, it uses multiple processes to implement them, which causes multiple loading of Cuda contexts. This leads to significant increase in GPU memory usage, slower launch speed of Cuda kernels, and more time for communication due to additional memcpy. We have also found a similar issue reported by other users on Does ray load the CUDA context multiple times?.

We would like to inquire if there are any possible solutions to mitigate this issue, or if there are any plans to address this problem in the future. We appreciate your help and advice with resolving this issue.

Thank you for your attention to our issue.

Unfortunately, as stated in the post you mentioned, ray doesn’t share gpu context between processes. I think this is for isolation purpose.

One thing you can do is maybe have one actor created multiple process and share the gpu context in the application layer?