I am constantly running into this issue which is preventing me from effectively training my network. I know that the parameters, including the learning rate, can cause those issues, but I need a way to overcome this error, potentially replacing non-finite values with finite ones.
I am using the same data and parameters on Stable Baseline 3 without running into this issue.
RuntimeError: The total norm of order 2.0 for gradients from
parameters is non-finite, so it cannot be clipped.
config = PPOConfig()
config = config.training(
lr=0.003,
grad_clip=1.0,
clip_param=0.2,
num_sgd_iter=10,
gamma=0.99,
lambda_=0.95,
entropy_coeff=0
)