Torch cuda not available within Tune Trainable

Eric_Pfleiderer1 · March 22, 2024, 2:10pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,

I am using Tune to train machine learning models through the Trainable API and having issues with Tune not making CUDA devices available within the Trainable. In the Trainable documentation

https://docs.ray.io/en/latest/tune/api/doc/ray.tune.Trainable.default_resource_request.html#ray.tune.Trainable.default_resource_request

it mentions we should use the default_resource_request method to provide a static resource requirement, but when specifying {‘gpu’:1}, CUDA remains unavailable within the Trainable.step method for training. I also found this short guide

https://docs.ray.io/en/latest/tune/tutorials/tune-resources.html

which suggests I wrap my Trainable callable in the tune.with_resources method:

trainable_with_gpu = tune.with_resources(trainable, {“gpu”: 1})
tuner = tune.Tuner(
trainable_with_gpu,
tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

Both these suggestions do not make CUDA devices available within the Trainable.step() method for me. I should mention that torch correctly shows my devices are available outside the tuner.fit() call and I’ve tried reinstalling torch and cuda to no avail. Am I doing something wrong?

I am publicly hosting my project on my Github: EricPfleiderer/timeseries-forecasting/

You can checkout commit : 99ac544 (CUDA issue, temp removing wandb) specifically.

Thanks,
Eric

Eric_Pfleiderer1 · March 22, 2024, 2:11pm

github link:

Topic		Replies	Views
Using fractional GPU with TorchTrainer and Tuner API Ray Libraries (Data, Train, Tune, Serve)	3	870	August 22, 2023
RuntimeError: No CUDA GPUs are available Ray Tune	12	14062	February 3, 2023
How to use fraction GPU in `ray.tune.Tuner`? Ray Train	6	1001	August 24, 2023
Attempting to deserialize object on a CUDA device... error on 2 GPU machine Ray Tune	3	2894	April 6, 2021
Status: all CUDA-capable devices are busy or unavailable Ray Tune	7	1726	February 15, 2022

Torch cuda not available within Tune Trainable

Related topics