How to handle non-finite gradient in Rllib?

Pier-Olivier_Marquis · August 6, 2023, 6:50pm

I am constantly running into this issue which is preventing me from effectively training my network. I know that the parameters, including the learning rate, can cause those issues, but I need a way to overcome this error, potentially replacing non-finite values with finite ones.

I am using the same data and parameters on Stable Baseline 3 without running into this issue.

RuntimeError: The total norm of order 2.0 for gradients from parameters is non-finite, so it cannot be clipped.

config = PPOConfig()

config = config.training(
    lr=0.003, 
    grad_clip=1.0, 
    clip_param=0.2,
    num_sgd_iter=10,
    gamma=0.99,
    lambda_=0.95,
    entropy_coeff=0
    )

mannyv · August 6, 2023, 11:04pm

Hi @Pier-Olivier_Marquis,

The first step I would take n this situation would be to figure out if the problematic gradients are coming from the action policy network or the value network.

Also two big differences in PPO between SB3 and RLLIB.

SB3 does not use KL in the loss, they only use it as an early stopping heuristic.
They do value function clipping very differently. SB3 treats it like actor clipping where it is relative to the initial loss on the first iteration. RLLIB treats it as an absolute clip value.

Do you have a continuous action space? If so you can run into a problem where the variance of the actions become infinitesimal and blow up the log probability.

Topic		Replies	Views
PPO Training Error: NaN Values in Gradients and Near-Zero Loss RLlib	6	273	September 3, 2024
Nan in train_batch[SampleBatch.ACTION_LOGP] RLlib	7	888	July 8, 2021
Bets way to handling policy change RLlib	0	192	May 17, 2022
PPO gives "Infinity" value for kl and total_loss RLlib	5	1548	October 1, 2021
PPO nan in actor logits RLlib	7	685	October 1, 2024

How to handle non-finite gradient in Rllib?

Related topics