DDPPO on CPU vs GPU: NaN values during training

I have DDPPO running to train an agent, accelerate SGD across multiple CPUs and GPUs.
AWS instance : p2.xlarge
AMI: Ubuntu 18 Deep learning AMI with pytorch , python 3.7
Training the agent without GPUs is successful. While training with workers on GPU generates the following error :

 File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/distributions/distribution.py", line 53, in __init__
(pid=6229)     raise ValueError("The parameter {} has invalid values".format(param))
(pid=6229) ValueError: The parameter logits has invalid values

I observe the nan values in forward pass while generating action and value.
I have tried to work with learning rate but that doesnt seem to help.
Looking for some helping hand to debug this!

1 Like

Happy to help debugging! Could you visualize the loss function values on Tensorboard (just do tensorboard --logdir ~/ray_results/), so we can figure out which component went wrong? It would also be nice to also see the gradient norm.

1 Like

Hi @michaelzhiluo Thanks for the help!

Here is the snap shot of policy loss. It seems something is going wrong…The loss values