DDPPO on CPU vs GPU: NaN values during training

Param_Rajpura · June 6, 2021, 3:41pm

I have DDPPO running to train an agent, accelerate SGD across multiple CPUs and GPUs.
AWS instance : p2.xlarge
AMI: Ubuntu 18 Deep learning AMI with pytorch , python 3.7
Training the agent without GPUs is successful. While training with workers on GPU generates the following error :


 File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/distributions/distribution.py", line 53, in __init__
(pid=6229)     raise ValueError("The parameter {} has invalid values".format(param))
(pid=6229) ValueError: The parameter logits has invalid values

I observe the nan values in forward pass while generating action and value.
I have tried to work with learning rate but that doesnt seem to help.
Looking for some helping hand to debug this!

michaelzhiluo · June 7, 2021, 8:01pm

Happy to help debugging! Could you visualize the loss function values on Tensorboard (just do tensorboard --logdir ~/ray_results/), so we can figure out which component went wrong? It would also be nice to also see the gradient norm.

Param_Rajpura · June 13, 2021, 6:09pm

Hi @michaelzhiluo Thanks for the help!

Here is the snap shot of policy loss. It seems something is going wrong…The loss values

Topic		Replies	Views
Nan in train_batch[SampleBatch.ACTION_LOGP] RLlib	7	892	July 8, 2021
When I convert PPO to DDPPO in rllib for distributed training, it prompts: RuntimeError: No CUDA GPUs are available RLlib	5	661	February 21, 2023
PPO nan in actor logits RLlib	7	711	October 1, 2024
PPO Training Error: NaN Values in Gradients and Near-Zero Loss RLlib	6	292	September 3, 2024
Error: nan Tensors in PyTorch with Ray RLlib for MARL RLlib	12	1198	August 10, 2024

DDPPO on CPU vs GPU: NaN values during training

Related topics