PyTorch stops update weights if I use ray to parallelize the code

wang_gaoyuan · April 1, 2023, 12:34am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hey,
I’m trying to use ray to parallel train NN model with PyTorch with different hyper-parameters. My algorithm is very similar to this tutorial Simple Parallel Model Selection — Ray 2.3.1
but with different dataset and other choices of hyperparameters to look at.
The tutorial can train without any issue.
However with my own code, the weight of my nn model simply would not update although gradient is present. And If I run the code sequentially by deleting @remote and .remote to enable ray, then my nn models weight start to update again.
Do you have any idea where I could possible check?

Huaiwei_Sun · April 5, 2023, 1:40pm

@wang_gaoyuan can you share more details about your code? It’s hard to help without more details.

Topic		Replies	Views
Ray pytorch model partition Ray Core	1	29	October 31, 2024
Why does ray tune sometimes finish, and sometimes stalls on running when combined with pytorch lightning Ray Tune	1	573	October 27, 2022
Very slow gradient descent on remote workers RLlib	14	2457	June 8, 2021
torch.nn.DataParallel with tune.run() Ray Tune	1	755	June 28, 2022
Model initialization is different inside vs outside Ray-Tune Ray Tune	3	505	March 28, 2022

PyTorch stops update weights if I use ray to parallelize the code

Related topics