Hi,
In our cluster setup with a custom docker images, the ray dash does not show GPU usage data in dash. In the tooltilp, it mentioned
sage of each GPU device. If no GPU usage is detected, here are the potential root causes:
1. non-GPU Ray image is used on this node. Switch to a GPU Ray image and try again.
2. Non Nvidia GPUs are being used. Non Nvidia GPUs' utilizations are not currently supported.
3. pynvml module raises an exception.
1
may not be a good choice for us. For 2
, we have nvidia-smi available inside cluster node container.
So, I’m wondering what does 3
means?
And we can ssh into our cluster node containers. is there a way to rootcause why the GPU usage chart not available in dash directly? Thanks