Hi,
I see here that it seems possible to allocate fractions of GPU to a Ray actor with something like @ray.remote(num_gpus=0.25)
.
If N actors are on the same GPU, how do they run?
- A) they run in parallel on different cuda cores?
- B) the GPU is time-sliced :The CUDA kernels requested by each actor will be run in sequence (what I believe to be the default in NVIDIA - if 2 processes talk to the GPU, there kernels are done one after the other - Running more than one CUDA applications on one GPU - Stack Overflow)
If it’s A, then it’s pretty revolutionary, as I think only MPS, cuda streams or MIG enable true concurrency on NVIDIA GPUs. If it’s B, then I encourage putting (A) in the roadmap to make Ray even more appealing.