Pytorch dataloader num_workers with ray tune

jack_caster · January 20, 2025, 1:32pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

I started using Ray Tune for hyperparameters tuning and I got a prototype working. Now, I would like to understand a bit better how to allocate resources.

I have a pytorch model, which is trained on CPU (it is a little model with recursion, which does not benefit from a GPU). Currently, I assigned 1 CPU to each trial. The data is supplied to the model via a pytorch dataloader, which can be also parallelized setting num_workers (i.e., one trial can use num_workers to load the data). Without Ray Tune, I would set num_workers = <n_cpus>, but with Ray Tune already distributing trials across CPUs, I do not know what is the best value for num_workers.

Do you have any suggestions? Shall I set num_workers = 1 or would it be ok also to set num_workers = <n_cpus>?

christina · January 29, 2025, 12:15am

Hi jack!
So you’re correct num_workers works entirely on how many CPUs you have available. You can definitely increase it to <n_cpus> but overall I would try to not exceed n_cpus to overload the existing resources.

There’s a few docs on this that I’m gonna link here that might be of interest to you!

ishaan-mehta · May 6, 2025, 9:06pm

Hi @christina, quick question for my understanding — if each trial is allotted 1 CPU, how can you set num_workers for the DataLoader to <n_cpus>? If we do that, aren’t we trying to use more resources than are available? For example if we run 4 trials on a worker node with 16 CPUs, then aren’t there only 12 logical CPUs available for the dataloaders?

Topic		Replies	Views
Cpu allocation confusion	3	1391	March 7, 2023
Ray Train hangs for long time Ray Train	11	1794	July 20, 2022
All ray resources mapped to only two physical processors Configure Algorithm, Training, Evaluation, Scaling	0	205	December 8, 2023
CPU usage exceed the num_cpus Ray Tune	5	146	September 4, 2024
Specifying overall maximum number of cores to be used in RayTune RLlib	1	787	June 7, 2023

Pytorch dataloader num_workers with ray tune

Related topics