Extract GPU statistics per process in Ray Tune

Hi, I was wondering how I am able to extract GPU statistics per process in Ray Tune train function.
Right now I’m able to use the nvsmi.get_gpu_processes(), matching the current pid with the particular gpu_process id, and extract used_memory from the GPU_process. But I’m not sure how to do the same thing for extracting GPU utilization. Here the assumption can be that no two processes co-locate in the same GPU.
Also, in ray dashboard, I’m able to monitor the GPU Util, are there any ways to extract that statistics?

I think you will see them in your logs if you do:

tune.run(config={"log_sys_usage": True})
1 Like

For that single configuration/process, the util reported would contain all GPUs’ util percent in that machine, including some other GPUs that this process does not use. How to just get the gpu util that a particular process is using?

Maybe you can instead access the particular GPU via ray.get_gpu_ids() in your training code and use GPUtil to get utilization stats for the GPU.

1 Like