Overflow encountered in reduce

nrgs · July 26, 2023, 5:30am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am training an agent for a game using PPO algorithm. Each step there are different legal actions available to the agent, so I am using action_masking. Everything works fine, however, after many iterations I get an error in _mean() function in numpy:

Runtime warning: overflow encountered in reduce
 ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)

My configurations are:

config = (
        ppo.PPOConfig()
        .environment(
            MyEnv,
        )
        .training(
            model= {
                "custom_model": ActionMaskModel,
            },
            gamma=0.9, 
            lr=0.0001, 
            kl_coeff=0.3,
            grad_clip=4
        )
        .framework("tf2")
        .resources(
            num_gpus=0
        ).rollouts(num_rollout_workers=1)
    )


    stop = {
        "training_iteration": 1,
        "episodes_total": 1,
        "timesteps_total": 100
    }

    tuner = tune.Tuner(
            "PPO",
            param_space=config.to_dict(),
            run_config=air.RunConfig(stop=stop, verbose=2, failure_config=air.FailureConfig(fail_fast=True)),
        )
    tuner.fit()

I suspect it is due to the gradient explosion, however, I am not completely sure. What could be the reason for the warning?

lokeshwaran · August 12, 2023, 10:23am

Hi i am facing the same issue can anyone suggest any solution to this issue

Christiaan · August 18, 2023, 3:25pm

Very likely your gradients are exploding, are your rewards nan values as well?
PPO works by clipping the gradients, which also stabilizes the learning procedure. Try using lower values for this. Cheers

victorsevero · October 26, 2023, 11:55pm

In my case (which I suspect is the same of yours, since we’re both using action masks and you probably copied rllib’s example too), the cause of the warning was the ray.rllib.utils.debug.summary.summarize, which was taking the mean of the action logits of the whole training batch. The sum inside the mean formula was overflowing, because it was adding together several ray.rllib.utils.torch_utils.FLOAT_MIN, which, well, is the minimum float.

Tip: add np.seterr(all="raise") at the beginning of your code so you can have a traceback and see exactly when/where it’s happening (if you’re using a debugger, it helps to set num_rollout_workers=0 in the rollouts config too so everything runs in the same process)

Topic		Replies	Views
A little help for a novice RLlib	1	431	October 26, 2022
PPO Training Error: NaN Values in Gradients and Near-Zero Loss RLlib	6	248	September 3, 2024
[RLlib] GPU Memory Leak? Tune + PPO, Policy Server + Client RLlib	18	1211	May 29, 2023
Questions and Confusion: Getting started with RLlib Configure Algorithm, Training, Evaluation, Scaling	0	47	February 19, 2025
Custom model with LSTM crashes PPO sampler.py RLlib	0	264	November 24, 2023

Overflow encountered in reduce

Related topics