Overflow encountered in reduce

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am training an agent for a game using PPO algorithm. Each step there are different legal actions available to the agent, so I am using action_masking. Everything works fine, however, after many iterations I get an error in _mean() function in numpy:

Runtime warning: overflow encountered in reduce
 ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)

My configurations are:

config = (
        ppo.PPOConfig()
        .environment(
            MyEnv,
        )
        .training(
            model= {
                "custom_model": ActionMaskModel,
            },
            gamma=0.9, 
            lr=0.0001, 
            kl_coeff=0.3,
            grad_clip=4
        )
        .framework("tf2")
        .resources(
            num_gpus=0
        ).rollouts(num_rollout_workers=1)
    )


    stop = {
        "training_iteration": 1,
        "episodes_total": 1,
        "timesteps_total": 100
    }

    tuner = tune.Tuner(
            "PPO",
            param_space=config.to_dict(),
            run_config=air.RunConfig(stop=stop, verbose=2, failure_config=air.FailureConfig(fail_fast=True)),
        )
    tuner.fit()

I suspect it is due to the gradient explosion, however, I am not completely sure. What could be the reason for the warning?

Hi i am facing the same issue can anyone suggest any solution to this issue

Very likely your gradients are exploding, are your rewards nan values as well?
PPO works by clipping the gradients, which also stabilizes the learning procedure. Try using lower values for this. Cheers

In my case (which I suspect is the same of yours, since we’re both using action masks and you probably copied rllib’s example too), the cause of the warning was the ray.rllib.utils.debug.summary.summarize, which was taking the mean of the action logits of the whole training batch. The sum inside the mean formula was overflowing, because it was adding together several ray.rllib.utils.torch_utils.FLOAT_MIN, which, well, is the minimum float.

Tip: add np.seterr(all="raise") at the beginning of your code so you can have a traceback and see exactly when/where it’s happening (if you’re using a debugger, it helps to set num_rollout_workers=0 in the rollouts config too so everything runs in the same process)

1 Like