Allocating different GPUs to different instances of Ray(Tune) in python

Dear Ray community,

I have a python script that runs hyperaparameter tuning for machine learning models with RayTune. The models are implemented in PyTorch, and the script takes an argument device which specifies the GPU that PyTorch uses. In my case, it is either device='cuda:0' (first GPU) or device='cuda:1' (second GPU). The script also has a resources argument which specifies the resources that ray can allocate for trials, such as resources={‘cpu’:4, ‘gpu’:1}.

I would like to run two instances of this python script in parallel (in different terminal windows), so that the first instance runs the script with device = “cuda:0”, and the second instance runs the script with device = “cuda:1”. However, how can I tell Ray to use the first GPU for the first instance, and second GPU for the second? I have tried setting CUDA_VISIBLE_DEVICES inside the python script, e.g. by

import os

If I set this “0” for first instance, and “1” for the second, then the first instance runs but the second just hangs.

Do you have any recommendations how to fix this issue? Apologies that I don’t have a reproducible example, but I can create a script if needed. I am using a Windows machine, ray[tune] 2.2.0, and python 3.9.

Many thanks,


@tammandres Thanks for filing this issue. You can pin any particular device to a Ray task; however, you can create placement groups.

cc: @xwjiang2010 Any suggestions here?

I am wondering why you want to run two copies of the script separately. Can I know more about your use case?

Hi @Jules_Damji , @xwjiang2010, thank you for your quick reply!

About my use case: I have been running hyperparameter tuning for a few different neural network architectures – for example, one is a simple feedforward neural network (MLP), another is a neural additive model (NAM). When I run the tuning script for any of these models, I am running it for 40 trials, and I am not parallelising the trials, because I am using the hyperopt algorithm which can benefit from sequential execution (as it suggests a new parameter combination based on the performance of a previous one). However, as I have two GPUs, I could tune the models faster and make better use of my resources, by tuning one model (e.g. MLP) on one GPU, and at the same time tuning the other model (e.g. NAM) on the other GPU. So trials for one model would run sequentially on one GPU, and trials for another model on the other.

On the other hand, instead of tuning each model on a separate GPU, I could also make better use of my resources by running some tuning trials in parallel, so that some trials use one GPU and other trials the other GPU, even within a model. But perhaps this is not as good as the previous scenario, because hyperopt might benefit from sequential execution of trials. (Although the first 20 trials are actually random draws before hyperopt kicks in, and these could be parallelised without loss of performance. But I have not considered this at the moment, because it seemed easier not to do separate parallelisation for earlier and later trials, and just run the two models on separate GPUs.)

Would this use case make sense to you as well, or would you suggest that I organise the tuning differently?

Thank you again,

@tammandres Just sharing my experience: I am running two copies of the same script as separate files in two terminals, specifying separate folders, one on eaching GPU using CUDA_VISIBLE_DEVICES, and it works smoothly, but I am using Tensorflow.

Could be a Pytorch-specific thing?

@Teenforever Thanks, this is really helpful to know! I’ll see if I can create a minimal reproducible script that would illustrate the issue I have with this method. I don’t imagine it’s PyTorch, maybe it’s something to do with the virtual computer I am using.