I have DDPPO running to train an agent, accelerate SGD across multiple CPUs and GPUs.
AWS instance : p2.xlarge
AMI: Ubuntu 18 Deep learning AMI with pytorch , python 3.7
Training the agent without GPUs is successful. While training with workers on GPU generates the following error :
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/distributions/distribution.py", line 53, in __init__
(pid=6229) raise ValueError("The parameter {} has invalid values".format(param))
(pid=6229) ValueError: The parameter logits has invalid values
I observe the nan values in forward pass while generating action and value.
I have tried to work with learning rate but that doesnt seem to help.
Looking for some helping hand to debug this!