I’m training a Pytorch based RL agent and after some time the gradients become nan
.
I narrowed it down to the issue that an element in train_batch[SampleBatch.ACTION_LOGP]
becomes nan, i.e.:
ipdb> train_batch[SampleBatch.ACTION_LOGP]
tensor([-2.2935, -2.1049, -6.6760, -2.9978, -3.4345, -2.3617, -7.0113, -2.2205,
-2.7104, -2.4619, -2.6365, -4.2575, -1.9757, -2.1059, -3.3431, -4.4665,
-2.4221, -2.0094, -2.5488, -3.4173, -3.6056, -3.2375, -2.1103, -1.9414,
-5.7886, -2.8885, -2.7877, -1.9022, -5.3155, -4.7475, -1.7011, -2.7406,
-3.8260, -5.1334, -3.9586, -1.7644, -3.9647, -3.0025, -2.3528, -2.3174,
-2.4714, -2.8532, -2.3275, -2.1873, -2.3088, -2.7994, -8.2306, -3.3107,
-2.0088, -3.3923, -2.5899, -2.0405, -1.8622, -2.3435, -1.9985, -2.9948,
-1.4785, -3.2709, -4.4852, -2.0569, -6.1308, -3.3117, -1.9325, -2.4115,
-2.0660, -1.9637, -4.5762, -2.5048, -1.8225, -1.6604, -2.2738, -1.9422,
-3.6144, -2.9782, -1.8503, -2.2936, -2.9227, -2.6610, -4.4914, -2.4797,
-4.8016, -3.0306, -2.6312, -3.3183, -1.8548, -2.0372, -1.9313, -1.9252,
-3.2634, -2.9477, -3.6409, -1.7080, -2.6747, -2.1803, -2.4812, -2.2985,
-2.1640, -2.9551, -2.7904, -2.7588, -2.4371, -2.8985, -2.0045, nan,
-5.1289, -2.1477, -3.5149, -1.9273, -2.8681, -2.8987, -4.2561, -2.6884,
-2.7404, -2.1192, -6.0505, -3.0273, -3.1711, -3.4646, -2.6108, -3.6579,
-4.1734, -2.0386, -3.0869, -3.0038, -2.3043, -1.8855, -1.9970, -2.1687],
device='cuda:0')
which causes the nan
value to propagate from the ppo_surrogate_loss
(here).
Any suggestions how to proceed from here? It’s not clear to me where it’s being set (e.g. maybe here?). Everything seems to be normal in terms of observations and the model weights prior to this happening. Also, the gradients are not vanishing nor exploding, so that’s not the issue.