Overflow encountered in reduce

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am training an agent for a game using PPO algorithm. Each step there are different legal actions available to the agent, so I am using action_masking. Everything works fine, however, after many iterations I get an error in _mean() function in numpy:

Runtime warning: overflow encountered in reduce
 ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)

My configurations are:

config = (
        ppo.PPOConfig()
        .environment(
            MyEnv,
        )
        .training(
            model= {
                "custom_model": ActionMaskModel,
            },
            gamma=0.9, 
            lr=0.0001, 
            kl_coeff=0.3,
            grad_clip=4
        )
        .framework("tf2")
        .resources(
            num_gpus=0
        ).rollouts(num_rollout_workers=1)
    )


    stop = {
        "training_iteration": 1,
        "episodes_total": 1,
        "timesteps_total": 100
    }

    tuner = tune.Tuner(
            "PPO",
            param_space=config.to_dict(),
            run_config=air.RunConfig(stop=stop, verbose=2, failure_config=air.FailureConfig(fail_fast=True)),
        )
    tuner.fit()

I suspect it is due to the gradient explosion, however, I am not completely sure. What could be the reason for the warning?

Hi i am facing the same issue can anyone suggest any solution to this issue

Very likely your gradients are exploding, are your rewards nan values as well?
PPO works by clipping the gradients, which also stabilizes the learning procedure. Try using lower values for this. Cheers