Policy returning NaN weights and NaN biases. In addition, Policy observation space is different than expected

Lars_Simon_Zehnder · January 11, 2023, 9:04pm

@arturn this might be related to the issue here (not because of the algo, but there might be NaNs in the model and I had a similar issue in my PPO)

@MrDracoG From what you mention the NaNs in your weights might stem from very high losses/gradients. Did you observe any spikes in your losses?
The fact that you were able to mitigate the problem by decreasing lambda and lowering the clipping in the loss might point to very high advantages. Are you also training with very long episodes (and possibly "complete_episodes" in batch_mode hyperparameter)?

In regard to the squashed observation space: do you normalize your observations with RLlib’s MeanStdFilter?

Topic		Replies	Views
Nan in the policy network after training for longer duration Configure Algorithm, Training, Evaluation, Scaling	0	251	October 13, 2023
NaNs in reward fields in `results` dict RLlib	2	558	December 1, 2023
Multi Agent Policies and Checkpoint RLlib	0	282	July 1, 2021
Continuous actions go beyond defined action_space and then nan for multi-agent PPO RLlib	0	326	July 3, 2021
PPO nan in actor logits RLlib	7	402	October 1, 2024

Policy returning NaN weights and NaN biases. In addition, Policy observation space is different than expected

Related topics