PyTorch stops update weights if I use ray to parallelize the code

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hey,
I’m trying to use ray to parallel train NN model with PyTorch with different hyper-parameters. My algorithm is very similar to this tutorial Simple Parallel Model Selection — Ray 2.3.1
but with different dataset and other choices of hyperparameters to look at.
The tutorial can train without any issue.
However with my own code, the weight of my nn model simply would not update although gradient is present. And If I run the code sequentially by deleting @remote and .remote to enable ray, then my nn models weight start to update again.
Do you have any idea where I could possible check?

@wang_gaoyuan can you share more details about your code? It’s hard to help without more details.