RuntimeError: No CUDA GPUs are available

Icarus · October 22, 2021, 12:37pm

Yes, I do have found some hacks to work around this issue. Here is an example of the hack:

def training_function(config):
    assert torch.cuda.is_available()
    # do your training here

tune.run(
    training_function,
    max_failures=100, # set this to a large value, 100 works in my case
    # more parameters for your problem
)

For trails that do not initialize GPU correctly, it will fail by the assertion. By setting max_failures to a very large value, ray will keep relaunch the trail until it is running correctly.

Topic		Replies	Views
getGPUs error when no GPU avaliable Ray Tune	0	308	July 7, 2023
NVIDIA GPU not deteted RLlib	3	475	October 3, 2021
Ray Train/Tune issue: concurrent trials conflict on GPU nodes Ray Tune	2	50	February 12, 2025
Getting Started with Ray does not work on any computer I try it Ray Tune	4	2412	September 13, 2023
ERROR: Check failed: resource_pair.second > 0 Ray Tune	2	379	October 18, 2021

RuntimeError: No CUDA GPUs are available

Related topics