How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am trying to train on a given dataset using distributed training in ray. I ran the sample code given in the example on ray.train.torch.TorchTrainer — Ray 2.8.0. I am running this for a large dataset, and during the run, I get the following log:
Running: 0.0/48.0 CPU, 0.0/2.0 GPU, 11.84 GiB/18.0 GiB object_store_memory
and a progress bar on the right. Does it mean that I am not using the available GPUs? My GPU usage using nvidia-smi
shows that two processes, namely Worker__execute.get_next
, are using the GPU but I am not sure if it’s being used or not.