Dear maintainers of the Ray open-source project,
We have recently discovered that when multiple Actors are created on a single GPU in Ray, it uses multiple processes to implement them, which causes multiple loading of Cuda contexts. This leads to significant increase in GPU memory usage, slower launch speed of Cuda kernels, and more time for communication due to additional memcpy. We have also found a similar issue reported by other users on Does ray load the CUDA context multiple times?.
We would like to inquire if there are any possible solutions to mitigate this issue, or if there are any plans to address this problem in the future. We appreciate your help and advice with resolving this issue.
Thank you for your attention to our issue.