Policy returning NaN weights and NaN biases. In addition, Policy observation space is different than expected

MrDracoG · January 31, 2023, 1:14am

I looked into it more and it seems like NaNs are introduced
via the gradients. The gradients become NaNs on the backward call in ray.rllib.policy.torch_policy_v2._multi_gpu_parallel_grad_calc. More specifically from the loss_out[opt_idx].backward(retain_graph=True) call which ultimately calls Variable._execution_engine.run_backward ( which calls out to the torch C extension ). Then, the NaN values are passed along to the model parameters in the torch.optim.Adam.step method.

I think this means that I have run away gradients. I am not entirely sure though. From everything I have seen it would make sense to me that this is the case. I am going to look more into how to solve the runaway gradient problem. Maybe with gradient clipping, smaller learning rates, etc and see if I can prevent the NaNs from coming back.

Topic		Replies	Views
PPO nan in actor logits RLlib	7	808	October 1, 2024
Error: nan Tensors in PyTorch with Ray RLlib for MARL RLlib	12	1242	August 10, 2024
ValueError: Expected parameter logits (...) to satisfy the constraint IndependentConstraint(Real(), 1) RLlib	38	9127	October 14, 2024
Nan in train_batch[SampleBatch.ACTION_LOGP] RLlib	7	921	July 8, 2021
NaNs in reward fields in `results` dict RLlib	2	635	December 1, 2023

Policy returning NaN weights and NaN biases. In addition, Policy observation space is different than expected

Related topics