GPU usage data not available in dash

Hi,

In our cluster setup with a custom docker images, the ray dash does not show GPU usage data in dash. In the tooltilp, it mentioned

sage of each GPU device. If no GPU usage is detected, here are the potential root causes:
1. non-GPU Ray image is used on this node. Switch to a GPU Ray image and try again.
2. Non Nvidia GPUs are being used. Non Nvidia GPUs' utilizations are not currently supported.
3. pynvml module raises an exception.

1 may not be a good choice for us. For 2, we have nvidia-smi available inside cluster node container.
So, I’m wondering what does 3 means?

And we can ssh into our cluster node containers. is there a way to rootcause why the GPU usage chart not available in dash directly? Thanks

So you are on Nvidia GPUs correct?

Correct. We are using A10

@pcpLiu : Could it be a side-effect or related to this issue Ray 2.10: CPU / RAM / GPU usage not correctly displayed on Windows 11 - #3 by PhilippWillms ? This was recently fixed, so you can try with nightly built.

2 Likes

Any luck with the above @pcpLiu ?

Hey Yeah. Upgrading to the new package fixes this!

which package? ray? we’re facing the same issue.
even after upgrading to rayproject/ray:2.38.0-py310-gpu docker image.