I am using ray to parallelize inference. For this task I store the model in the object store (via ray.put). Now I wondered whether this would be saved on the gpu automatically if I initiated ray with gpus. Or would it make sense to move the model to the gpu (via model.to(device_gpu))?
My idea is to minimize the time required to load the model. However, I also have multiple gpus so I also wondered whether this would mean having to copy it to all gpus?
I’d appreciate any help, cheers!